101
|
Voinarovska V, Kabeshov M, Dudenko D, Genheden S, Tetko IV. When Yield Prediction Does Not Yield Prediction: An Overview of the Current Challenges. J Chem Inf Model 2024; 64:42-56. [PMID: 38116926 PMCID: PMC10778086 DOI: 10.1021/acs.jcim.3c01524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/29/2023] [Accepted: 11/30/2023] [Indexed: 12/21/2023]
Abstract
Machine Learning (ML) techniques face significant challenges when predicting advanced chemical properties, such as yield, feasibility of chemical synthesis, and optimal reaction conditions. These challenges stem from the high-dimensional nature of the prediction task and the myriad essential variables involved, ranging from reactants and reagents to catalysts, temperature, and purification processes. Successfully developing a reliable predictive model not only holds the potential for optimizing high-throughput experiments but can also elevate existing retrosynthetic predictive approaches and bolster a plethora of applications within the field. In this review, we systematically evaluate the efficacy of current ML methodologies in chemoinformatics, shedding light on their milestones and inherent limitations. Additionally, a detailed examination of a representative case study provides insights into the prevailing issues related to data availability and transferability in the discipline.
Collapse
Affiliation(s)
- Varvara Voinarovska
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
- TUM
Graduate School, Faculty of Chemistry, Technical
University of Munich, 85748 Garching, Germany
| | - Mikhail Kabeshov
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
| | - Dmytro Dudenko
- Enamine
Ltd., 78 Chervonotkatska str., 02094 Kyiv, Ukraine
| | - Samuel Genheden
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
| | - Igor V. Tetko
- Molecular
Targets and Therapeutics Center, Helmholtz Munich − Deutsches
Forschungszentrum für Gesundheit und Umwelt (GmbH), Institute of Structural Biology, 85764 Neuherberg, Germany
| |
Collapse
|
102
|
Gu Y, Wang Y, Zhu K, Li W, Liu G, Tang Y. DBPP-Predictor: a novel strategy for prediction of chemical drug-likeness based on property profiles. J Cheminform 2024; 16:4. [PMID: 38183072 PMCID: PMC10771006 DOI: 10.1186/s13321-024-00800-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 01/03/2024] [Indexed: 01/07/2024] Open
Abstract
Evaluation of chemical drug-likeness is essential for the discovery of high-quality drug candidates while avoiding unwarranted biological and clinical trial costs. A high-quality drug candidate should have promising drug-like properties, including pharmacological activity, suitable physicochemical and ADMET properties. Hence, in silico prediction of chemical drug-likeness has been proposed while being a challenging task. Although several prediction models have been developed to assess chemical drug-likeness, they have such drawbacks as sample dependence and poor interpretability. In this study, we developed a novel strategy, named DBPP-Predictor, to predict chemical drug-likeness based on property profile representation by integrating physicochemical and ADMET properties. The results demonstrated that DBPP-Predictor exhibited considerable generalization capability with AUC (area under the curve) values from 0.817 to 0.913 on external validation sets. In terms of application feasibility analysis, the results indicated that DBPP-Predictor not only demonstrated consistent and reasonable scoring performance on different data sets, but also was able to guide structural optimization. Moreover, it offered a new drug-likeness assessment perspective, without significant linear correlation with existing methods. We also developed a free standalone software for users to make drug-likeness prediction and property profile visualization for their compounds of interest. In summary, our DBPP-Predictor provided a valuable tool for the prediction of chemical drug-likeness, helping to identify appropriate drug candidates for further development.
Collapse
Affiliation(s)
- Yaxin Gu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yimeng Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Keyun Zhu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| |
Collapse
|
103
|
Arora S, Satija S, Mittal A, Solanki S, Mohanty SK, Srivastava V, Sengupta D, Rout D, Arul Murugan N, Borkar RM, Ahuja G. Unlocking The Mysteries of DNA Adducts with Artificial Intelligence. Chembiochem 2024; 25:e202300577. [PMID: 37874183 DOI: 10.1002/cbic.202300577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/18/2023] [Accepted: 10/23/2023] [Indexed: 10/25/2023]
Abstract
Cellular genome is considered a dynamic blueprint of a cell since it encodes genetic information that gets temporally altered due to various endogenous and exogenous insults. Largely, the extent of genomic dynamicity is controlled by the trade-off between DNA repair processes and the genotoxic potential of the causative agent (genotoxins or potential carcinogens). A subset of genotoxins form DNA adducts by covalently binding to the cellular DNA, triggering structural or functional changes that lead to significant alterations in cellular processes via genetic (e. g., mutations) or non-genetic (e. g., epigenome) routes. Identification, quantification, and characterization of DNA adducts are indispensable for their comprehensive understanding and could expedite the ongoing efforts in predicting carcinogenicity and their mode of action. In this review, we elaborate on using Artificial Intelligence (AI)-based modeling in adducts biology and present multiple computational strategies to gain advancements in decoding DNA adducts. The proposed AI-based strategies encompass predictive modeling for adduct formation via metabolic activation, novel adducts' identification, prediction of biochemical routes for adduct formation, adducts' half-life predictions within biological ecosystems, and, establishing methods to predict the link between adducts chemistry and its location within the genomic DNA. In summary, we discuss some futuristic AI-based approaches in DNA adduct biology.
Collapse
Affiliation(s)
- Sakshi Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Shiva Satija
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Aayushi Mittal
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Saveena Solanki
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Sanjay Kumar Mohanty
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Vaibhav Srivastava
- Division of Glycoscience, Department of Chemistry CBH School, Royal Institute of Technology (KTH) AlbaNova University Center, 10691, Stockholm, Sweden
| | - Debarka Sengupta
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Diptiranjan Rout
- Department of Transfusion Medicine National Cancer Institute, AIIMS, New Delhi, All India Institute of Medical Sciences, Ansari Nagar, New Delhi, 110608, India
| | - Natarajan Arul Murugan
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Roshan M Borkar
- Department of Pharmaceutical Analysis, National Institute of Pharmaceutical Education and Research (NIPER)-Guwahati, Sila Katamur Halugurisuk P.O.: Changsari, Dist, Guwahati, Assam, 781101, India
| | - Gaurav Ahuja
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| |
Collapse
|
104
|
Wang Y, Yu X, Gu Y, Li W, Zhu K, Chen L, Tang Y, Liu G. XGraphCDS: An explainable deep learning model for predicting drug sensitivity from gene pathways and chemical structures. Comput Biol Med 2024; 168:107746. [PMID: 38039896 DOI: 10.1016/j.compbiomed.2023.107746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 10/29/2023] [Accepted: 11/20/2023] [Indexed: 12/03/2023]
Abstract
Cancer is a highly complex disease characterized by genetic and phenotypic heterogeneity among individuals. In the era of precision medicine, understanding the genetic basis of these individual differences is crucial for developing new drugs and achieving personalized treatment. Despite the increasing abundance of cancer genomics data, predicting the relationship between cancer samples and drug sensitivity remains challenging. In this study, we developed an explainable graph neural network framework for predicting cancer drug sensitivity (XGraphCDS) based on comparative learning by integrating cancer gene expression information and drug chemical structure knowledge. Specifically, XGraphCDS consists of a unified heterogeneous network and multiple sub-networks, with molecular graphs representing drugs and gene enrichment scores representing cell lines. Experimental results showed that XGraphCDS consistently outperformed most state-of-the-art baselines (R2 = 0.863, AUC = 0.858). We also constructed a separate in vivo prediction model by using transfer learning strategies with in vitro experimental data and achieved good predictive power (AUC = 0.808). Simultaneously, our framework is interpretable, providing insights into resistance mechanisms alongside accurate predictions. The excellent performance of XGraphCDS highlights its immense potential in aiding the development of selective anti-tumor drugs and personalized dosing strategies in the field of precision medicine.
Collapse
Affiliation(s)
- Yimeng Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Xinxin Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yaxin Gu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Keyun Zhu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Long Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| |
Collapse
|
105
|
Almeida RL, Maltarollo VG, Coelho FGF. Overcoming class imbalance in drug discovery problems: Graph neural networks and balancing approaches. J Mol Graph Model 2024; 126:108627. [PMID: 37801808 DOI: 10.1016/j.jmgm.2023.108627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 09/12/2023] [Accepted: 09/12/2023] [Indexed: 10/08/2023]
Abstract
This research investigates the application of Graph Neural Networks (GNNs) to enhance the cost-effectiveness of drug development, addressing the limitations of cost and time. Class imbalances within classification datasets, such as the discrepancy between active and inactive compounds, give rise to difficulties that can be resolved through strategies like oversampling, undersampling, and manipulation of the loss function. A comparison is conducted between three distinct datasets using three different GNN architectures. This benchmarking research can steer future investigations and enhance the efficacy of GNNs in drug discovery and design. Three hundred models for each combination of architecture and dataset were trained using hyperparameter tuning techniques and evaluated using a range of metrics. Notably, the oversampling technique outperforms eight experiments, showcasing its potential. While balancing techniques boost imbalanced dataset models, their efficacy depends on dataset specifics and problem type. Although oversampling aids molecular graph datasets, more research is needed to optimize its usage and explore other class imbalance solutions.
Collapse
Affiliation(s)
- Rafael Lopes Almeida
- Graduate Program in Electrical Engineering - Universidade Federal de Minas Gerais, Av. Antônio Carlos 6627, Belo Horizonte, 31270-901, MG, Brazil
| | - Vinícius Gonçalves Maltarollo
- Department of Pharmaceutical Products - Universidade Federal de Minas Gerais, Av. Antônio Carlos 6627, Belo Horizonte, 31270-901, MG, Brazil.
| | - Frederico Gualberto Ferreira Coelho
- Department of Electronical Engineering - Universidade Federal de Minas Gerais, Av. Antônio Carlos 6627, Belo Horizonte, 31270-901, MG, Brazil
| |
Collapse
|
106
|
Qiu W, Liang Q, Yu L, Xiao X, Qiu W, Lin W. LSTM-SAGDTA: Predicting Drug-target Binding Affinity with an Attention Graph Neural Network and LSTM Approach. Curr Pharm Des 2024; 30:468-476. [PMID: 38323613 PMCID: PMC11071654 DOI: 10.2174/0113816128282837240130102817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/14/2024] [Accepted: 01/19/2024] [Indexed: 02/08/2024]
Abstract
INTRODUCTION Drug development is a challenging and costly process, yet it plays a crucial role in improving healthcare outcomes. Drug development requires extensive research and testing to meet the demands for economic efficiency, cures, and pain relief. METHODS Drug development is a vital research area that necessitates innovation and collaboration to achieve significant breakthroughs. Computer-aided drug design provides a promising avenue for drug discovery and development by reducing costs and improving the efficiency of drug design and testing. RESULTS In this study, a novel model, namely LSTM-SAGDTA, capable of accurately predicting drug-target binding affinity, was developed. We employed SeqVec for characterizing the protein and utilized the graph neural networks to capture information on drug molecules. By introducing self-attentive graph pooling, the model achieved greater accuracy and efficiency in predicting drug-target binding affinity. CONCLUSION Moreover, LSTM-SAGDTA obtained superior accuracy over current state-of-the-art methods only by using less training time. The results of experiments suggest that this method represents a highprecision solution for the DTA predictor.
Collapse
Affiliation(s)
- Wenjing Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Qianle Liang
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Liyi Yu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Weizhong Lin
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| |
Collapse
|
107
|
Liu T, Cao Z, Huang Y, Wan Y, Wu J, Hsieh CY, Hou T, Kang Y. SynCluster: Reaction Type Clustering and Recommendation Framework for Synthesis Planning. JACS AU 2023; 3:3446-3461. [PMID: 38155655 PMCID: PMC10751778 DOI: 10.1021/jacsau.3c00607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 11/07/2023] [Accepted: 11/08/2023] [Indexed: 12/30/2023]
Abstract
AI-assisted synthesis planning has emerged as a valuable tool in accelerating synthetic chemistry for the discovery of new drugs and materials. The template-free approach, which showcases superior generalization capabilities, is seen as the mainstream direction in this field. However, it remains unclear whether such an end-to-end approach can achieve problem-solving performance on par with experienced chemists without fully revealing insights into the chemical mechanisms involved. Moreover, there is a lack of unified and chemically inspired frameworks for improving multitask reaction predictions in this area. In this study, we have addressed these challenges by investigating the impact of fine-grained reaction-type labels on multiple downstream tasks and propose a novel framework named SynCluster. This framework incorporates unsupervised clustering cues into the baseline models and identifies plausible chemical subspaces which is compatible with multitask extensions and can serve as model-independent indicators to effectively enhance the performance of multiple downstream tasks. In retrosynthesis prediction, SynCluster achieves significant improvements of 4.1 and 11.0% in top-1 and top-10 prediction accuracy, respectively, compared to the baseline Molecular Transformer, and achieves a notable enhancement of 13.9% in top-10 accuracy when combined with Retroformer. By incorporating simplified molecular-input line-entry system augmentation, our framework achieves higher top-10 accuracy compared to state-of-the-art sequence-based retrosynthesis models and improves over the baseline on the diversity and validity of reactants. SynCluster also achieves 94.9% top-10 accuracy in forward synthesis prediction and 51.5% top-10 Maxfrag accuracy in reagent prediction. Overall, SynCluster provides a fresh perspective with chemical interpretability and reinforcement of domain knowledge in the synthesis design. It offers a promising solution for improving the accuracy and efficiency of AI-assisted synthesis planning and bridges the gap between template-free approaches and the problem-solving abilities of experienced chemists.
Collapse
Affiliation(s)
- Tiantao Liu
- Innovation
Institute for Artificial Intelligence in Medicine of Zhejiang University,
College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Zheng Cao
- College
of Computer Science and Technology, Zhejiang
University, Hangzhou 310027, Zhejiang, China
| | - Yuansheng Huang
- Innovation
Institute for Artificial Intelligence in Medicine of Zhejiang University,
College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yue Wan
- Tencent
Quantum Laboratory, Shenzhen 518057, Guangdong, China
| | - Jian Wu
- Second
Affiliated Hospital School of Medicine, and School of Public Health, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chang-Yu Hsieh
- Innovation
Institute for Artificial Intelligence in Medicine of Zhejiang University,
College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- Innovation
Institute for Artificial Intelligence in Medicine of Zhejiang University,
College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- Innovation
Institute for Artificial Intelligence in Medicine of Zhejiang University,
College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
108
|
Bu Y, Traore MDM, Zhang L, Wang L, Liu Z, Hu H, Wang M, Li C, Sun D. A gastrointestinal locally activating Janus kinase inhibitor to treat ulcerative colitis. J Biol Chem 2023; 299:105467. [PMID: 37979913 PMCID: PMC10755797 DOI: 10.1016/j.jbc.2023.105467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/11/2023] [Accepted: 11/06/2023] [Indexed: 11/20/2023] Open
Abstract
In this study, we integrated machine learning (ML), structure-tissue selectivity-activity-relationship (STAR), and wet lab synthesis/testing to design a gastrointestinal (GI) locally activating JAK inhibitor for ulcerative colitis treatment. The JAK inhibitor achieves site-specific efficacy through high local GI tissue selectivity while minimizing the requirement for JAK isoform specificity to reduce systemic toxicity. We used the ML model (CoGT) to classify whether the designed compounds were inhibitors or noninhibitors. Then we used the regression ML model (MTATFP) to predict their IC50 against related JAK isoforms of predicted JAK inhibitors. The ML model predicted MMT3-72, which was retained in the GI tract, to be a weak JAK1 inhibitor, while MMT3-72-M2, which accumulated in only GI tissues, was predicted to be an inhibitor of JAK1/2 and TYK2. ML docking methods were applied to simulate their docking poses in JAK isoforms. Application of these ML models enabled us to limit our synthetic efforts to MMT3-72 and MMT3-72-M2 for subsequent wet lab testing. The kinase assay confirmed MMT3-72 weakly inhibited JAK1, and MMT3-72-M2 inhibited JAK1/2 and TYK2. We found that MMT3-72 accumulated in the GI lumen, but not in GI tissue or plasma, but released MMT3-72-M2 accumulated in colon tissue with minimal exposure in the plasma. MMT3-72 achieved superior efficacy and reduced p-STAT3 in DSS-induced colitis. Overall, the integration of ML, the structure-tissue selectivity-activity-relationship system, and wet lab synthesis/testing could minimize the effort in the optimization of a JAK inhibitor to treat colitis. This site-specific inhibitor reduces systemic toxicity by minimizing the need for JAK isoform specificity.
Collapse
Affiliation(s)
- Yingzi Bu
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, North Campus Research Complex, Ann Arbor, Michigan, USA; Michigan Institute for Computational Discovery & Engineering, University of Michigan, Ann Arbor, Michigan, USA
| | - Mohamed Dit Mady Traore
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, North Campus Research Complex, Ann Arbor, Michigan, USA
| | - Luchen Zhang
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, North Campus Research Complex, Ann Arbor, Michigan, USA
| | - Lu Wang
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, North Campus Research Complex, Ann Arbor, Michigan, USA
| | - Zhongwei Liu
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, North Campus Research Complex, Ann Arbor, Michigan, USA
| | - Hongxiang Hu
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, North Campus Research Complex, Ann Arbor, Michigan, USA
| | - Meilin Wang
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, North Campus Research Complex, Ann Arbor, Michigan, USA
| | - Chengyi Li
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, North Campus Research Complex, Ann Arbor, Michigan, USA
| | - Duxin Sun
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, North Campus Research Complex, Ann Arbor, Michigan, USA.
| |
Collapse
|
109
|
Matevosyan M, Harutyunyan V, Abelyan N, Khachatryan H, Tirosyan I, Gabrielyan Y, Sahakyan V, Gevorgyan S, Arakelov V, Arakelov G, Zakaryan H. Design of new chemical entities targeting both native and H275Y mutant influenza a virus by deep reinforcement learning. J Biomol Struct Dyn 2023; 41:10798-10812. [PMID: 36541127 DOI: 10.1080/07391102.2022.2158936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 12/10/2022] [Indexed: 12/24/2022]
Abstract
Influenza virus remains a major public health challenge due to its high morbidity and mortality and seasonal surge. Although antiviral drugs against the influenza virus are widely used as a first-line defense, the virus undergoes rapid genetic changes, resulting in the emergence of drug-resistant strains. Thus, new antiviral drugs that can outwit resistant strains are of significant importance. Herein, we used deep reinforcement learning (RL) algorithm to design new chemical entities (NCEs) that are able to bind to the native and H275Y mutant (oseltamivir-resistant) neuraminidases (NAs) of influenza A virus with better binding energy than oseltamivir. We generated more than 66211 NCEs, which were prioritized based on the filtering rules, structural alerts, and synthetic accessibility. Then, 18 NCEs with better MM/PBSA scores than oseltamivir were further analyzed in molecular dynamics (MD) simulations conducted for 100 ns. The MD experiments showed that 8 NCEs formed very stable complexes with the binding pocket of both native and H275Y mutant NAs of H1N1. Furthermore, most NCEs demonstrated much better binding affinity to group 2 (N2, N3, and N9) and influenza B virus NAs than oseltamivir. Although all 8 NCEs have non-sialic acid-like structures, they showed a similar binding mode as oseltamivir, indicating that it is possible to find new scaffolds with better binding and antiviral properties than sialic acid-like inhibitors. In conclusion, we have designed potential compounds as antiviral candidates for further synthesis and testing against wild and mutant influenza virus.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Vahram Arakelov
- Denovo Sciences Inc, Yerevan, Armenia
- Institute of Molecular Biology of National Academy of Sciences, Yerevan, Armenia
| | - Grigor Arakelov
- Denovo Sciences Inc, Yerevan, Armenia
- Institute of Molecular Biology of National Academy of Sciences, Yerevan, Armenia
| | - Hovakim Zakaryan
- Denovo Sciences Inc, Yerevan, Armenia
- Institute of Molecular Biology of National Academy of Sciences, Yerevan, Armenia
| |
Collapse
|
110
|
Zhang X, Li Y, Wang J, Xu G, Gu Y. A Multi-perspective Model for Protein-Ligand-Binding Affinity Prediction. Interdiscip Sci 2023; 15:696-709. [PMID: 37815680 DOI: 10.1007/s12539-023-00582-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 07/09/2023] [Accepted: 07/13/2023] [Indexed: 10/11/2023]
Abstract
Gathering information from multi-perspective graphs is an essential issue for many applications especially for protein-ligand-binding affinity prediction. Most of traditional approaches obtained such information individually with low interpretability. In this paper, we harness the rich information from multi-perspective graphs with a general model, which abstractly represents protein-ligand complexes with better interpretability while achieving excellent predictive performance. In addition, we specially analyze the protein-ligand-binding affinity problem, taking into account the heterogeneity of proteins and ligands. Experimental evaluations demonstrate the effectiveness of our data representation strategy on public datasets by fusing information from different perspectives. All codes are available in the https://github.com/Jthy-af/HaPPy .
Collapse
Affiliation(s)
- Xianfeng Zhang
- School of Computer and Electronic Information, Nanjing Normal University, Nanjing, 210023, China
| | - Yafei Li
- School of Chemistry and Materials Science, Nanjing Normal University, Nanjing, 210023, China
| | - Jinlan Wang
- School of Physics, Southeast University, Nanjing, 211189, China
| | - Guandong Xu
- School of Computer Science, University of Technology Sydney, Sydney, NSW 2008, Australia
| | - Yanhui Gu
- School of Computer and Electronic Information, Nanjing Normal University, Nanjing, 210023, China.
| |
Collapse
|
111
|
Paykan Heyrati M, Ghorbanali Z, Akbari M, Pishgahi G, Zare-Mirakabad F. BioAct-Het: A Heterogeneous Siamese Neural Network for Bioactivity Prediction Using Novel Bioactivity Representation. ACS OMEGA 2023; 8:44757-44772. [PMID: 38046344 PMCID: PMC10688196 DOI: 10.1021/acsomega.3c05778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/13/2023] [Accepted: 10/24/2023] [Indexed: 12/05/2023]
Abstract
Drug failure during experimental procedures due to low bioactivity presents a significant challenge. To mitigate this risk and enhance compound bioactivities, predicting bioactivity classes during lead optimization is essential. The existing studies on structure-activity relationships have highlighted the connection between the chemical structures of compounds and their bioactivity. However, these studies often overlook the intricate relationship between drugs and bioactivity, which encompasses multiple factors beyond the chemical structure alone. To address this issue, we propose the BioAct-Het model, employing a heterogeneous siamese neural network to model the complex relationship between drugs and bioactivity classes, bringing them into a unified latent space. In particular, we introduce a novel representation for the bioactivity classes, called Bio-Prof, and enhance the original bioactivity data sets to tackle data scarcity. These innovative approaches resulted in our model outperforming the previous ones. The evaluation of BioAct-Het is conducted through three distinct strategies: association-based, bioactivity class-based, and compound-based. The association-based strategy utilizes supervised learning classification, while the bioactivity class-based strategy adopts a retrospective study evaluation approach. On the other hand, the compound-based strategy demonstrates similarities to the concept of meta-learning. Furthermore, the model's effectiveness in addressing real-world problems is analyzed through a case study on the application of vancomycin and oseltamivir for COVID-19 treatment as well as molnupiravir's potential efficacy in treating COVID-19 patients. The data and code underlying this article are available on https://github.com/CBRC-lab/BioAct-Het. However, data sets were derived from sources in the public domain.
Collapse
Affiliation(s)
- Mehdi Paykan Heyrati
- Computational
Biology Research Center (CBRC), Department of Mathematics and Computer
Science, Amirkabir University of Technology, Tehran 1591634311, Iran
| | - Zahra Ghorbanali
- Computational
Biology Research Center (CBRC), Department of Mathematics and Computer
Science, Amirkabir University of Technology, Tehran 1591634311, Iran
| | - Mohammad Akbari
- Computational
Biology Research Center (CBRC), Department of Mathematics and Computer
Science, Amirkabir University of Technology, Tehran 1591634311, Iran
| | - Ghasem Pishgahi
- Students’
Scientific Research Center (SSRC), Tehran
University of Medical Sciences, Tehran 1416753955, Iran
| | - Fatemeh Zare-Mirakabad
- Computational
Biology Research Center (CBRC), Department of Mathematics and Computer
Science, Amirkabir University of Technology, Tehran 1591634311, Iran
| |
Collapse
|
112
|
Lee J, Yang H, Park C, Park SH, Jang E, Kwack H, Lee CH, Song CI, Choi YC, Han S, Lee H. Attention-based solubility prediction of polysulfide and electrolyte analysis for lithium-sulfur batteries. Sci Rep 2023; 13:20784. [PMID: 38012171 PMCID: PMC10682475 DOI: 10.1038/s41598-023-47154-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 11/09/2023] [Indexed: 11/29/2023] Open
Abstract
During the continuous charge and discharge process in lithium-sulfur batteries, one of the next-generation batteries, polysulfides are generated in the battery's electrolyte, and impact its performance in terms of power and capacity by involving the process. The amount of polysulfides in the electrolyte could be estimated by the change of the Gibbs free energy of the electrolyte, [Formula: see text] in the presence of polysulfide. However, obtaining [Formula: see text] of the diverse mixtures of components in the electrolyte is a complex and expensive task that shows itself as a bottleneck in optimization of electrolytes. In this work, we present a machine-learning approach for predicting [Formula: see text] of electrolytes. The proposed architecture utilizes (1) an attention-based model (Attentive FP), a contrastive learning model (MolCLR) or morgan fingerprints to represent chemical components, and (2) transformers to account for the interactions between chemicals in the electrolyte. This architecture was not only capable of predicting electrolyte properties, including those of chemicals not used during training, but also providing insights into chemical interactions within electrolytes. It revealed that interactions with other chemicals relate to the logP and molecular weight of the chemicals.
Collapse
Affiliation(s)
- Jaewan Lee
- LG AI Research, ISC, 30, Magokjungang 10-ro, Gangseo-gu, Seoul, 07796, Republic of Korea
| | - Hongjun Yang
- LG AI Research, ISC, 30, Magokjungang 10-ro, Gangseo-gu, Seoul, 07796, Republic of Korea
| | - Changyoung Park
- LG AI Research, ISC, 30, Magokjungang 10-ro, Gangseo-gu, Seoul, 07796, Republic of Korea
| | - Seong-Hyo Park
- LG Energy Solution, LTD., LG Science Park E5, 30, Magokjungang 10-ro, Gangseo-gu, Seoul, 07796, Republic of Korea
| | - Eunji Jang
- LG Energy Solution, LTD., LG Science Park E5, 30, Magokjungang 10-ro, Gangseo-gu, Seoul, 07796, Republic of Korea
| | - Hobeom Kwack
- LG Energy Solution, LTD., LG Science Park E5, 30, Magokjungang 10-ro, Gangseo-gu, Seoul, 07796, Republic of Korea
| | - Chang Hoon Lee
- LG Energy Solution, LTD., LG Science Park E5, 30, Magokjungang 10-ro, Gangseo-gu, Seoul, 07796, Republic of Korea
| | - Chang-Ik Song
- LG Energy Solution, LTD., LG Science Park E5, 30, Magokjungang 10-ro, Gangseo-gu, Seoul, 07796, Republic of Korea
| | - Young Cheol Choi
- LG Energy Solution, LTD., LG Science Park E5, 30, Magokjungang 10-ro, Gangseo-gu, Seoul, 07796, Republic of Korea
| | - Sehui Han
- LG AI Research, ISC, 30, Magokjungang 10-ro, Gangseo-gu, Seoul, 07796, Republic of Korea
| | - Honglak Lee
- LG AI Research, ISC, 30, Magokjungang 10-ro, Gangseo-gu, Seoul, 07796, Republic of Korea.
| |
Collapse
|
113
|
Kang Q, Fang P, Zhang S, Qiu H, Lan Z. Deep graph convolutional network for small-molecule retention time prediction. J Chromatogr A 2023; 1711:464439. [PMID: 37865024 DOI: 10.1016/j.chroma.2023.464439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/04/2023] [Accepted: 10/06/2023] [Indexed: 10/23/2023]
Abstract
The retention time (RT) is a crucial source of data for liquid chromatography-mass spectrometry (LCMS). A model that can accurately predict the RT for each molecule would empower filtering candidates with similar spectra but differing RT in LCMS-based molecule identification. Recent research shows that graph neural networks (GNNs) outperform traditional machine learning algorithms in RT prediction. However, all of these models use relatively shallow GNNs. This study for the first time investigates how depth affects GNNs' performance on RT prediction. The results demonstrate that a notable improvement can be achieved by pushing the depth of GNNs to 16 layers by the adoption of residual connection. Additionally, we also find that graph convolutional network (GCN) model benefits from the edge information. The developed deep graph convolutional network, DeepGCN-RT, significantly outperforms the previous state-of-the-art method and achieves the lowest mean absolute percentage error (MAPE) of 3.3% and the lowest mean absolute error (MAE) of 26.55 s on the SMRT test set. We also finetune DeepGCN-RT on seven datasets with various chromatographic conditions. The mean MAE of the seven datasets largely decreases 30% compared to previous state-of-the-art method. On the RIKEN-PlaSMA dataset, we also test the effectiveness of DeepGCN-RT in assisting molecular structure identification. By 30% lessening the number of potential structures, DeepGCN-RT is able to improve top-1 accuracy by about 11%.
Collapse
Affiliation(s)
- Qiyue Kang
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China.
| | - Pengfei Fang
- School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, 210096, China
| | - Shuai Zhang
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China
| | - Huachuan Qiu
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China
| | - Zhenzhong Lan
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China.
| |
Collapse
|
114
|
Ren Q, Qu N, Sun J, Zhou J, Liu J, Ni L, Tong X, Zhang Z, Kong X, Wen Y, Wang Y, Wang D, Luo X, Zhang S, Zheng M, Li X. KinomeMETA: meta-learning enhanced kinome-wide polypharmacology profiling. Brief Bioinform 2023; 25:bbad461. [PMID: 38113075 PMCID: PMC10729787 DOI: 10.1093/bib/bbad461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Revised: 11/08/2023] [Accepted: 11/22/2023] [Indexed: 12/21/2023] Open
Abstract
Kinase inhibitors are crucial in cancer treatment, but drug resistance and side effects hinder the development of effective drugs. To address these challenges, it is essential to analyze the polypharmacology of kinase inhibitor and identify compound with high selectivity profile. This study presents KinomeMETA, a framework for profiling the activity of small molecule kinase inhibitors across a panel of 661 kinases. By training a meta-learner based on a graph neural network and fine-tuning it to create kinase-specific learners, KinomeMETA outperforms benchmark multi-task models and other kinase profiling models. It provides higher accuracy for understudied kinases with limited known data and broader coverage of kinase types, including important mutant kinases. Case studies on the discovery of new scaffold inhibitors for membrane-associated tyrosine- and threonine-specific cdc2-inhibitory kinase and selective inhibitors for fibroblast growth factor receptors demonstrate the role of KinomeMETA in virtual screening and kinome-wide activity profiling. Overall, KinomeMETA has the potential to accelerate kinase drug discovery by more effectively exploring the kinase polypharmacology landscape.
Collapse
Affiliation(s)
- Qun Ren
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Ning Qu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Jingjing Sun
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Jingyi Zhou
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Lingang Laboratory, Shanghai 200031, China
| | - Jin Liu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Lin Ni
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Zimei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Xiangtai Kong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Yiming Wen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Yitian Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Dingyan Wang
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, Hangzhou 330106, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| |
Collapse
|
115
|
Yan J, Ye Z, Yang Z, Lu C, Zhang S, Liu Q, Qiu J. Multi-task bioassay pre-training for protein-ligand binding affinity prediction. Brief Bioinform 2023; 25:bbad451. [PMID: 38084920 PMCID: PMC10783875 DOI: 10.1093/bib/bbad451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/27/2023] [Accepted: 11/15/2023] [Indexed: 12/18/2023] Open
Abstract
Protein-ligand binding affinity (PLBA) prediction is the fundamental task in drug discovery. Recently, various deep learning-based models predict binding affinity by incorporating the three-dimensional (3D) structure of protein-ligand complexes as input and achieving astounding progress. However, due to the scarcity of high-quality training data, the generalization ability of current models is still limited. Although there is a vast amount of affinity data available in large-scale databases such as ChEMBL, issues such as inconsistent affinity measurement labels (i.e. IC50, Ki, Kd), different experimental conditions, and the lack of available 3D binding structures complicate the development of high-precision affinity prediction models using these data. To address these issues, we (i) propose Multi-task Bioassay Pre-training (MBP), a pre-training framework for structure-based PLBA prediction; (ii) construct a pre-training dataset called ChEMBL-Dock with more than 300k experimentally measured affinity labels and about 2.8M docked 3D structures. By introducing multi-task pre-training to treat the prediction of different affinity labels as different tasks and classifying relative rankings between samples from the same bioassay, MBP learns robust and transferrable structural knowledge from our new ChEMBL-Dock dataset with varied and noisy labels. Experiments substantiate the capability of MBP on the structure-based PLBA prediction task. To the best of our knowledge, MBP is the first affinity pre-training model and shows great potential for future development. MBP web-server is now available for free at: https://huggingface.co/spaces/jiaxianustc/mbp.
Collapse
Affiliation(s)
- Jiaxian Yan
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Zhaofeng Ye
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Ziyi Yang
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Chengqiang Lu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Shengyu Zhang
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Qi Liu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Jiezhong Qiu
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| |
Collapse
|
116
|
Wang Z, Feng Z, Li Y, Li B, Wang Y, Sha C, He M, Li X. BatmanNet: bi-branch masked graph transformer autoencoder for molecular representation. Brief Bioinform 2023; 25:bbad400. [PMID: 38033291 PMCID: PMC10783874 DOI: 10.1093/bib/bbad400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/02/2023] [Accepted: 10/17/2023] [Indexed: 12/02/2023] Open
Abstract
Although substantial efforts have been made using graph neural networks (GNNs) for artificial intelligence (AI)-driven drug discovery, effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, the approaches in these studies require multiple complex self-supervised tasks and large-scale datasets , which are time-consuming, computationally expensive and difficult to pre-train end-to-end. Here, we design a simple yet effective self-supervised strategy to simultaneously learn local and global information about molecules, and further propose a novel bi-branch masked graph transformer autoencoder (BatmanNet) to learn molecular representations. BatmanNet features two tailored complementary and asymmetric graph autoencoders to reconstruct the missing nodes and edges, respectively, from a masked molecular graph. With this design, BatmanNet can effectively capture the underlying structure and semantic information of molecules, thus improving the performance of molecular representation. BatmanNet achieves state-of-the-art results for multiple drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 13 benchmark datasets, demonstrating its great potential and superiority in molecular representation learning.
Collapse
Affiliation(s)
- Zhen Wang
- College of Electrical and Information Engineering, Hunan University, Changsha, 410082, Hunan, China
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China
| | - Zheng Feng
- Department of Health Outcomes & Biomedical Informatics, College of Medecine, University of Florida, Gainesville, 32611, FL, USA
| | - Yanjun Li
- Department of Medicinal Chemistry, College of Pharmacy, University of Florida, Gainesville, 32610, FL, USA
- Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, 32610, FL, USA
| | - Bowen Li
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China
| | - Yongrui Wang
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China
| | - Chulin Sha
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China
| | - Min He
- College of Electrical and Information Engineering, Hunan University, Changsha, 410082, Hunan, China
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China
| | - Xiaolin Li
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China
- ElasticMind Inc, Hangzhou, 310018, Zhejiang, China
| |
Collapse
|
117
|
Hu J, Li Z, Lin J, Zhang L. Prediction and Interpretability of Glass Transition Temperature of Homopolymers by Data-Augmented Graph Convolutional Neural Networks. ACS APPLIED MATERIALS & INTERFACES 2023; 15:54006-54017. [PMID: 37934171 DOI: 10.1021/acsami.3c13698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
Establishing the structure-property relationship by machine learning (ML) models is extremely valuable for accelerating the molecular design of polymers. However, existing ML models for the polymers are subject to scarcity issues of training data and fewer variations of graph structures of molecules. In addition, limited works have explored the interpretability of ML models to infer the latent knowledge in the field of polymer science that could inspire ML-assisted molecular design. In this contribution, we integrate graph convolutional neural networks (GCNs) with data augmentation strategy to predict the glass transition temperature Tg of polymers. It is demonstrated that the data-augmented GCN model outperforms the conventional models and achieves a higher accuracy for the prediction of Tg despite a small amount of training data. Furthermore, taking advantage of molecular graph representations, the data-augmented GCN model has the capability to infer the importance of atoms or substructures from the understanding of Tg, which generally agrees with the experimental findings in the field of polymer science. The inferred knowledge of the GCN model is used to advise on the design of functional polymers with specific Tg. The data-augmented GCN model possesses prominent superiorities in the establishment of structure-property relationship and also provides an efficient way for accelerating the rational design of polymer molecules.
Collapse
Affiliation(s)
- Junyang Hu
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Zean Li
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Jiaping Lin
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Liangshun Zhang
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
118
|
Fan F, Wu G, Yang Y, Liu F, Qian Y, Yu Q, Ren H, Geng J. A Graph Neural Network Model with a Transparent Decision-Making Process Defines the Applicability Domain for Environmental Estrogen Screening. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:18236-18245. [PMID: 37749748 DOI: 10.1021/acs.est.3c04571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2023]
Abstract
The application of deep learning (DL) models for screening environmental estrogens (EEs) for the sound management of chemicals has garnered significant attention. However, the currently available DL model for screening EEs lacks both a transparent decision-making process and effective applicability domain (AD) characterization, making the reliability of its prediction results uncertain and limiting its practical applications. To address this issue, a graph neural network (GNN) model was developed to screen EEs, achieving accuracy rates of 88.9% and 92.5% on the internal and external test sets, respectively. The decision-making process of the GNN model was explored through the network-like similarity graphs (NSGs) based on the model features (FT). We discovered that the accuracy of the predictions is dependent on the feature distribution of compounds in NSGs. An AD characterization method called ADFT was proposed, which excludes predictions falling outside of the model's prediction range, leading to a 15% improvement in the F1 score of the GNN model. The GNN model with the AD method may serve as an efficient tool for screening EEs, identifying 800 potential EEs in the Inventory of Existing Chemical Substances of China. Additionally, this study offers new insights into comprehending the decision-making process of DL models.
Collapse
Affiliation(s)
- Fan Fan
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Gang Wu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Yining Yang
- School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Fu Liu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Yuli Qian
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Qingmiao Yu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environment, Ministry of Education, Chongqing University, Chongqing 400044, China
| | - Hongqiang Ren
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Jinju Geng
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environment, Ministry of Education, Chongqing University, Chongqing 400044, China
| |
Collapse
|
119
|
Cao H, Peng J, Zhou Z, Yang Z, Wang L, Sun Y, Wang Y, Liang Y. Investigation of the Binding Fraction of PFAS in Human Plasma and Underlying Mechanisms Based on Machine Learning and Molecular Dynamics Simulation. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17762-17773. [PMID: 36282672 DOI: 10.1021/acs.est.2c04400] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
More than 7000 per- and polyfluorinated alkyl substances (PFAS) have been documented in the U.S. Environmental Protection Agency's CompTox Chemicals database. These PFAS can be used in a broad range of industrial and consumer applications but may pose potential environmental issues and health risks. However, little is known about emerging PFAS bioaccumulation to assess their chemical safety. This study focuses specifically on the large and high-quality data set of fluorochemicals from the related environmental and pharmaceutical chemicals databases, and machine learning (ML) models were developed for the classification prediction of the unbound fraction of compounds in plasma. A comprehensive evaluation of the ML models shows that the best blending model yields an accuracy of 0.901 for the test set. The predictions suggest that most PFAS (∼92%) have a high binding fraction in plasma. Introduction of alkaline amino groups is likely to reduce the binding affinities of PFAS with plasma proteins. Molecular dynamics simulations indicate a clear distinction between the high and low binding fractions of PFAS. These computational workflows can be used to predict the bioaccumulation of emerging PFAS and are also helpful for the molecular design of PFAS to prevent the release of high-bioaccumulation compounds into the environment.
Collapse
Affiliation(s)
- Huiming Cao
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Jianhua Peng
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Zhen Zhou
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Zeguo Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ling Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yuzhen Sun
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yawei Wang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Yong Liang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| |
Collapse
|
120
|
Li H, Zhang R, Min Y, Ma D, Zhao D, Zeng J. A knowledge-guided pre-training framework for improving molecular representation learning. Nat Commun 2023; 14:7568. [PMID: 37989998 PMCID: PMC10663446 DOI: 10.1038/s41467-023-43214-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 11/03/2023] [Indexed: 11/23/2023] Open
Abstract
Learning effective molecular feature representation to facilitate molecular property prediction is of great significance for drug discovery. Recently, there has been a surge of interest in pre-training graph neural networks (GNNs) via self-supervised learning techniques to overcome the challenge of data scarcity in molecular property prediction. However, current self-supervised learning-based methods suffer from two main obstacles: the lack of a well-defined self-supervised learning strategy and the limited capacity of GNNs. Here, we propose Knowledge-guided Pre-training of Graph Transformer (KPGT), a self-supervised learning framework to alleviate the aforementioned issues and provide generalizable and robust molecular representations. The KPGT framework integrates a graph transformer specifically designed for molecular graphs and a knowledge-guided pre-training strategy, to fully capture both structural and semantic knowledge of molecules. Through extensive computational tests on 63 datasets, KPGT exhibits superior performance in predicting molecular properties across various domains. Moreover, the practical applicability of KPGT in drug discovery has been validated by identifying potential inhibitors of two antitumor targets: hematopoietic progenitor kinase 1 (HPK1) and fibroblast growth factor receptor 1 (FGFR1). Overall, KPGT can provide a powerful and useful tool for advancing the artificial intelligence (AI)-aided drug discovery process.
Collapse
Affiliation(s)
- Han Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China
| | - Ruotian Zhang
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China
| | - Yaosen Min
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China
| | - Dacheng Ma
- Research Center for Biological Computation, Zhejiang Province, Zhejiang Laboratory, 311100, Hangzhou, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China.
- School of Engineering, Westlake University, Zhejiang Province, 310030, Hangzhou, China.
| |
Collapse
|
121
|
Shen C, Luo J, Xia K. Molecular geometric deep learning. CELL REPORTS METHODS 2023; 3:100621. [PMID: 37875121 PMCID: PMC10694498 DOI: 10.1016/j.crmeth.2023.100621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 06/16/2023] [Accepted: 09/28/2023] [Indexed: 10/26/2023]
Abstract
Molecular representation learning plays an important role in molecular property prediction. Existing molecular property prediction models rely on the de facto standard of covalent-bond-based molecular graphs for representing molecular topology at the atomic level and totally ignore the non-covalent interactions within the molecule. In this study, we propose a molecular geometric deep learning model to predict the properties of molecules that aims to comprehensively consider the information of covalent and non-covalent interactions of molecules. The essential idea is to incorporate a more general molecular representation into geometric deep learning (GDL) models. We systematically test molecular GDL (Mol-GDL) on fourteen commonly used benchmark datasets. The results show that Mol-GDL can achieve a better performance than state-of-the-art (SOTA) methods. Extensive tests have demonstrated the important role of non-covalent interactions in molecular property prediction and the effectiveness of Mol-GDL models.
Collapse
Affiliation(s)
- Cong Shen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000, China; School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000, China.
| | - Kelin Xia
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore.
| |
Collapse
|
122
|
Wu W, Qian J, Liang C, Yang J, Ge G, Zhou Q, Guan X. GeoDILI: A Robust and Interpretable Model for Drug-Induced Liver Injury Prediction Using Graph Neural Network-Based Molecular Geometric Representation. Chem Res Toxicol 2023; 36:1717-1730. [PMID: 37839069 DOI: 10.1021/acs.chemrestox.3c00199] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2023]
Abstract
Drug-induced liver injury (DILI) is a significant cause of drug failure and withdrawal due to liver damage. Accurate prediction of hepatotoxic compounds is crucial for safe drug development. Several DILI prediction models have been published, but they are built on different data sets, making it difficult to compare model performance. Moreover, most existing models are based on molecular fingerprints or descriptors, neglecting molecular geometric properties and lacking interpretability. To address these limitations, we developed GeoDILI, an interpretable graph neural network that uses a molecular geometric representation. First, we utilized a geometry-based pretrained molecular representation and optimized it on the DILI data set to improve predictive performance. Second, we leveraged gradient information to obtain high-precision atomic-level weights and deduce the dominant substructure. We benchmarked GeoDILI against recently published DILI prediction models, as well as popular GNN models and fingerprint-based machine learning models using the same data set, showing superior predictive performance of our proposed model. We applied the interpretable method in the DILI data set and derived seven precise and mechanistically elucidated structural alerts. Overall, GeoDILI provides a promising approach for accurate and interpretable DILI prediction with potential applications in drug discovery and safety assessment. The data and source code are available at GitHub repository (https://github.com/CSU-QJY/GeoDILI).
Collapse
Affiliation(s)
- Wenxuan Wu
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Jiayu Qian
- School of Mathematics and Statistics, Central South University, Changsha, Hunan 410083, China
| | - Changjie Liang
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Jingya Yang
- School of Mathematics and Statistics, Central South University, Changsha, Hunan 410083, China
| | - Guangbo Ge
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Qingping Zhou
- School of Mathematics and Statistics, Central South University, Changsha, Hunan 410083, China
| | - Xiaoqing Guan
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| |
Collapse
|
123
|
Lecca P, Lecca M. Graph embedding and geometric deep learning relevance to network biology and structural chemistry. Front Artif Intell 2023; 6:1256352. [PMID: 38035201 PMCID: PMC10687447 DOI: 10.3389/frai.2023.1256352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/16/2023] [Indexed: 12/02/2023] Open
Abstract
Graphs are used as a model of complex relationships among data in biological science since the advent of systems biology in the early 2000. In particular, graph data analysis and graph data mining play an important role in biology interaction networks, where recent techniques of artificial intelligence, usually employed in other type of networks (e.g., social, citations, and trademark networks) aim to implement various data mining tasks including classification, clustering, recommendation, anomaly detection, and link prediction. The commitment and efforts of artificial intelligence research in network biology are motivated by the fact that machine learning techniques are often prohibitively computational demanding, low parallelizable, and ultimately inapplicable, since biological network of realistic size is a large system, which is characterised by a high density of interactions and often with a non-linear dynamics and a non-Euclidean latent geometry. Currently, graph embedding emerges as the new learning paradigm that shifts the tasks of building complex models for classification, clustering, and link prediction to learning an informative representation of the graph data in a vector space so that many graph mining and learning tasks can be more easily performed by employing efficient non-iterative traditional models (e.g., a linear support vector machine for the classification task). The great potential of graph embedding is the main reason of the flourishing of studies in this area and, in particular, the artificial intelligence learning techniques. In this mini review, we give a comprehensive summary of the main graph embedding algorithms in light of the recent burgeoning interest in geometric deep learning.
Collapse
Affiliation(s)
- Paola Lecca
- Faculty of Engineering, Free University of Bozen-Bolzano, Bolzano, Italy
| | - Michela Lecca
- Fondazione Bruno Kessler, Digital Industry Center, Technologies of Vision, Trento, Italy
| |
Collapse
|
124
|
Xia S, Chen E, Zhang Y. Integrated Molecular Modeling and Machine Learning for Drug Design. J Chem Theory Comput 2023; 19:7478-7495. [PMID: 37883810 PMCID: PMC10653122 DOI: 10.1021/acs.jctc.3c00814] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Modern therapeutic development often involves several stages that are interconnected, and multiple iterations are usually required to bring a new drug to the market. Computational approaches have increasingly become an indispensable part of helping reduce the time and cost of the research and development of new drugs. In this Perspective, we summarize our recent efforts on integrating molecular modeling and machine learning to develop computational tools for modulator design, including a pocket-guided rational design approach based on AlphaSpace to target protein-protein interactions, delta machine learning scoring functions for protein-ligand docking as well as virtual screening, and state-of-the-art deep learning models to predict calculated and experimental molecular properties based on molecular mechanics optimized geometries. Meanwhile, we discuss remaining challenges and promising directions for further development and use a retrospective example of FDA approved kinase inhibitor Erlotinib to demonstrate the use of these newly developed computational tools.
Collapse
Affiliation(s)
- Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Eric Chen
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
125
|
Gong S, Yan K, Xie T, Shao-Horn Y, Gomez-Bombarelli R, Ji S, Grossman JC. Examining graph neural networks for crystal structures: Limitations and opportunities for capturing periodicity. SCIENCE ADVANCES 2023; 9:eadi3245. [PMID: 37948518 PMCID: PMC10637739 DOI: 10.1126/sciadv.adi3245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 10/13/2023] [Indexed: 11/12/2023]
Abstract
Graph neural networks (GNNs) have recently been used to learn the representations of crystal structures through an end-to-end data-driven approach. However, a systematic top-down approach to evaluate and understand the limitations of GNNs in accurately capturing crystal structures has yet to be established. In this study, we introduce an approach using human-designed descriptors as a compendium of human knowledge to investigate the extent to which GNNs can comprehend crystal structures. Our findings reveal that current state-of-the-art GNNs fall short in accurately capturing the periodicity of crystal structures. We analyze this failure by exploring three aspects: local expressive power, long-range information processing, and readout function. To address these identified limitations, we propose a straightforward and general solution: the hybridization of descriptors with GNNs, which directly supplements the missing information to GNNs. The hybridization enhances the predictive accuracy of GNNs for specific material properties, most notably phonon internal energy and heat capacity, which heavily rely on the periodicity of materials.
Collapse
Affiliation(s)
- Sheng Gong
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Keqiang Yan
- Computer Science and Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Tian Xie
- Microsoft Research, Cambridge CB1 2FB, UK
| | - Yang Shao-Horn
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Rafael Gomez-Bombarelli
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Shuiwang Ji
- Computer Science and Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Jeffrey C. Grossman
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
126
|
Wang H, Liu W, Chen J, Wang Z. Applicability Domains Based on Molecular Graph Contrastive Learning Enable Graph Attention Network Models to Accurately Predict 15 Environmental End Points. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:16906-16917. [PMID: 37897806 DOI: 10.1021/acs.est.3c03860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/30/2023]
Abstract
In silico models for predicting physicochemical properties and environmental fate parameters are necessary for the sound management of chemicals. This study employed graph attention network (GAT) algorithms to construct such models on 15 end points. The results showed that the GAT models outperformed the previous state-of-the-art models, and their performance was not influenced by the presence or absence of compounds with certain structures. Molecular similarity density (ρs) was found to be a key metrics characterizing data set modelability, in addition to the proportion of compounds at activity cliffs. By introducing molecular graph (MG) contrastive learning, MG-based ρs and molecular inconsistency in activities (IA) were calculated and employed for characterizing the structure-activity landscape (SAL)-based applicability domain ADSAL{ρs, IA}. The GAT models coupled with ADSAL{ρs, IA} significantly improved the prediction coefficient of determination (R2) on all the end points by an average of 14.4% and enabled all the end points to have R2 > 0.9, which could hardly be achieved previously. The models were employed to screen persistent, mobile, and/or bioaccumulative chemicals from inventories consisting of about 106 chemicals. Given the current state-of-the-art model performance and coverage of the various environmental end points, the constructed models with ADSAL{ρs, IA} may serve as benchmarks for future efforts to improve modeling efficacy.
Collapse
Affiliation(s)
- Haobo Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Wenjia Liu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zhongyu Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
127
|
Zhang W, Hu F, Li W, Yin P. Does protein pretrained language model facilitate the prediction of protein-ligand interaction? Methods 2023; 219:8-15. [PMID: 37690736 DOI: 10.1016/j.ymeth.2023.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 08/22/2023] [Accepted: 08/29/2023] [Indexed: 09/12/2023] Open
Abstract
Protein-ligand interaction (PLI) is a critical step for drug discovery. Recently, protein pretrained language models (PLMs) have showcased exceptional performance across a wide range of protein-related tasks. However, a significant heterogeneity exists between the PLM and PLI tasks, leading to a degree of uncertainty. In this study, we propose a method that quantitatively assesses the significance of protein PLMs in PLI prediction. Specifically, we analyze the performance of three widely-used protein PLMs (TAPE, ESM-1b, and ProtTrans) on three PLI tasks (PDBbind, Kinase, and DUD-E). The model with pre-training consistently achieves improved performance and decreased time cost, demonstrating that enhance both the accuracy and efficiency of PLI prediction. By quantitatively assessing the transferability, the optimal PLM for each PLI task is identified without the need for costly transfer experiments. Additionally, we examine the contributions of PLMs on the distribution of feature space, highlighting the improved discriminability after pre-training. Our findings provide insights into the mechanisms underlying PLMs in PLI prediction and pave the way for the design of more interpretable and accurate PLMs in the future. Code and data are freely available at https://github.com/brian-zZZ/PLM-PLI.
Collapse
Affiliation(s)
- Weihong Zhang
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fan Hu
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| | - Wang Li
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Peng Yin
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| |
Collapse
|
128
|
Bernardi A, Bennett WFD, He S, Jones D, Kirshner D, Bennion BJ, Carpenter TS. Advances in Computational Approaches for Estimating Passive Permeability in Drug Discovery. MEMBRANES 2023; 13:851. [PMID: 37999336 PMCID: PMC10673305 DOI: 10.3390/membranes13110851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 10/19/2023] [Accepted: 10/21/2023] [Indexed: 11/25/2023]
Abstract
Passive permeation of cellular membranes is a key feature of many therapeutics. The relevance of passive permeability spans all biological systems as they all employ biomembranes for compartmentalization. A variety of computational techniques are currently utilized and under active development to facilitate the characterization of passive permeability. These methods include lipophilicity relations, molecular dynamics simulations, and machine learning, which vary in accuracy, complexity, and computational cost. This review briefly introduces the underlying theories, such as the prominent inhomogeneous solubility diffusion model, and covers a number of recent applications. Various machine-learning applications, which have demonstrated good potential for high-volume, data-driven permeability predictions, are also discussed. Due to the confluence of novel computational methods and next-generation exascale computers, we anticipate an exciting future for computationally driven permeability predictions.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Timothy S. Carpenter
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA; (A.B.); (W.F.D.B.); (S.H.); (D.J.); (D.K.); (B.J.B.)
| |
Collapse
|
129
|
Tran TTV, Tayara H, Chong KT. Recent Studies of Artificial Intelligence on In Silico Drug Absorption. J Chem Inf Model 2023; 63:6198-6211. [PMID: 37819031 DOI: 10.1021/acs.jcim.3c00960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Absorption is an important area of research in pharmacochemistry and drug development, because the drug has to be absorbed before any drug effects can occur. Furthermore, the ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profile of drugs can be directly and considerably altered by modulating factors affecting absorption. Many drugs in development fail because of poor absorption. The research and continuous efforts of researchers in recent years have brought many successes and promises in drug absorption property prediction, especially in silico, which helps to reduce the time and cost significantly for screening undesirable drug candidates. In this report, we explicitly provide an overview of recent in silico studies on predicting absorption properties, especially from 2019 to the present, using artificial intelligence. Additionally, we have collected and investigated public databases that support absorption prediction research. On those grounds, we also proposed the challenges and development directions of absorption prediction in the future. We hope this review can provide researchers with valuable guidelines on absorption prediction to facilitate the development of newer approaches in drug discovery.
Collapse
Affiliation(s)
- Thi Tuyet Van Tran
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Faculty of Information Technology, An Giang University, Long Xuyen 880000, Vietnam
- Vietnam National University, Ho Chi Minh City, Ho Chi Minh 700000, Vietnam
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Kil To Chong
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea
| |
Collapse
|
130
|
Stewart M, Ohno PE, McKinney K, Martin ST. Prediction of the Response of a Photoionization Detector to a Complex Gaseous Mixture of Volatile Organic Compounds Produced by α-Pinene Oxidation. ACS EARTH & SPACE CHEMISTRY 2023; 7:1956-1970. [PMID: 37876663 PMCID: PMC10592314 DOI: 10.1021/acsearthspacechem.3c00054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 09/19/2023] [Accepted: 09/19/2023] [Indexed: 10/26/2023]
Abstract
Photoionization detectors (PIDs) are lightweight and respond in real time to the concentrations of volatile organic compounds (VOCs), making them suitable for environmental measurements on many platforms. However, the nonselective sensing mechanism of PIDs challenges data interpretation, particularly when exposed to the complex VOC mixtures prevalent in the Earth's atmosphere. Herein, two approaches to this challenge are investigated. In the first, quantum-chemistry calculations are used to estimate photoionization cross sections and ionization potentials of individual species. In the second, machine learning models are trained on these calculated values, as well as empirical PID response factors, and then used for prediction. For both approaches, the resulting information for individual species is used to model the overall PID response to a complex VOC mixture. In complement, laboratory experiments in the Harvard Environmental Chamber are carried out to measure the PID response to the complex molecular mixture produced by α-pinene oxidation under various conditions. The observations show that the measured PID response is 15% to 30% smaller than the PID response modeled by quantum-chemistry calculations of the photoionization cross section for the photo-oxidation experiments and 15% to 20% for the ozonolysis experiments. By comparison, the measured PID response is captured within a 95% confidence interval by the use of machine learning to model the PID response based on the empirical response factor in all experiments. Taken together, the results of this study demonstrate the application of machine learning to augment the performance of a nonselective chemical sensor. The approach can be generalized to other reactive species, oxidants, and reaction mechanisms, thus enhancing the utility and interpretability of PID measurements for studying atmospheric VOCs.
Collapse
Affiliation(s)
- Matthew
P. Stewart
- School
of Engineering and Applied Sciences, Harvard
University, Cambridge, Massachusetts 02138, United States
| | - Paul E. Ohno
- School
of Engineering and Applied Sciences, Harvard
University, Cambridge, Massachusetts 02138, United States
| | - Karena McKinney
- Department
of Chemistry, Colby College, Waterville, Maine 04901, United States
| | - Scot T. Martin
- School
of Engineering and Applied Sciences, Harvard
University, Cambridge, Massachusetts 02138, United States
- Department
of Earth and Planetary Sciences, Harvard
University, Cambridge, Massachusetts 02138, United States
| |
Collapse
|
131
|
Deng J, Yang Z, Wang H, Ojima I, Samaras D, Wang F. A systematic study of key elements underlying molecular property prediction. Nat Commun 2023; 14:6395. [PMID: 37833262 PMCID: PMC10575948 DOI: 10.1038/s41467-023-41948-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 09/18/2023] [Indexed: 10/15/2023] Open
Abstract
Artificial intelligence (AI) has been widely applied in drug discovery with a major task as molecular property prediction. Despite booming techniques in molecular representation learning, key elements underlying molecular property prediction remain largely unexplored, which impedes further advancements in this field. Herein, we conduct an extensive evaluation of representative models using various representations on the MoleculeNet datasets, a suite of opioids-related datasets and two additional activity datasets from the literature. To investigate the predictive power in low-data and high-data space, a series of descriptors datasets of varying sizes are also assembled to evaluate the models. In total, we have trained 62,820 models, including 50,220 models on fixed representations, 4200 models on SMILES sequences and 8400 models on molecular graphs. Based on extensive experimentation and rigorous comparison, we show that representation learning models exhibit limited performance in molecular property prediction in most datasets. Besides, multiple key elements underlying molecular property prediction can affect the evaluation results. Furthermore, we show that activity cliffs can significantly impact model prediction. Finally, we explore into potential causes why representation learning models can fail and show that dataset size is essential for representation learning models to excel.
Collapse
Affiliation(s)
- Jianyuan Deng
- Stony Brook University, Department of Biomedical Informatics, Stony Brook, NY, 11794, USA
| | - Zhibo Yang
- Stony Brook University, Department of Computer Science, Stony Brook, NY, 11794, USA
| | - Hehe Wang
- Stony Brook University, Department of Chemistry, Stony Brook, NY, 11794, USA
| | - Iwao Ojima
- Stony Brook University, Department of Chemistry, Stony Brook, NY, 11794, USA
| | - Dimitris Samaras
- Stony Brook University, Department of Computer Science, Stony Brook, NY, 11794, USA
| | - Fusheng Wang
- Stony Brook University, Department of Biomedical Informatics, Stony Brook, NY, 11794, USA.
- Stony Brook University, Department of Computer Science, Stony Brook, NY, 11794, USA.
| |
Collapse
|
132
|
Shilpa S, Kashyap G, Sunoj RB. Recent Applications of Machine Learning in Molecular Property and Chemical Reaction Outcome Predictions. J Phys Chem A 2023; 127:8253-8271. [PMID: 37769193 DOI: 10.1021/acs.jpca.3c04779] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/30/2023]
Abstract
Burgeoning developments in machine learning (ML) and its rapidly growing adaptations in chemistry are noteworthy. Motivated by the successful deployments of ML in the realm of molecular property prediction (MPP) and chemical reaction prediction (CRP), herein we highlight some of its most recent applications in predictive chemistry. We present a nonmathematical and concise overview of the progression of ML implementations, ranging from an ensemble-based random forest model to advanced graph neural network algorithms. Similarly, the prospects of various feature engineering and feature learning approaches that work in conjunction with ML models are described. Highly accurate predictions reported in MPP tasks (e.g., lipophilicity, solubility, distribution coefficient), using methods such as D-MPNN, MolCLR, SMILES-BERT, and MolBERT, offer promising avenues in molecular design and drug discovery. Whereas MPP pertains to a given molecule, ML applications in chemical reactions present a different level of challenge, primarily arising from the simultaneous involvement of multiple molecules and their diverse roles in a reaction setting. The reported RMSEs in MPP tasks range from 0.287 to 2.20, while those for yield predictions are well over 4.9 in the lower end, reaching thresholds of >10.0 in several examples. Our Review concludes with a set of persisting challenges in dealing with reaction data sets and an overall optimistic outlook on benefits of ML-driven workflows for various MPP as well as CRP tasks.
Collapse
Affiliation(s)
- Shilpa Shilpa
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Gargee Kashyap
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
- Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| |
Collapse
|
133
|
Liu Z, Moroz YS, Isayev O. The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions. Chem Sci 2023; 14:10835-10846. [PMID: 37829036 PMCID: PMC10566507 DOI: 10.1039/d3sc03902a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 09/12/2023] [Indexed: 10/14/2023] Open
Abstract
Accurate prediction of reaction yield is the holy grail for computer-assisted synthesis prediction, but current models have failed to generalize to large literature datasets. To understand the causes and inspire future design, we systematically benchmarked the yield prediction task. We carefully curated and augmented a literature dataset of 41 239 amide coupling reactions, each with information on reactants, products, intermediates, yields, and reaction contexts, and provided 3D structures for the molecules. We calculated molecular features related to 2D and 3D structure information, as well as physical and electronic properties. These descriptors were paired with 4 categories of machine learning methods (linear, kernel, ensemble, and neural network), yielding valuable benchmarks about feature and model performance. Despite the excellent performance on a high-throughput experiment (HTE) dataset (R2 around 0.9), no method gave satisfactory results on the literature data. The best performance was an R2 of 0.395 ± 0.020 using the stack technique. Error analysis revealed that reactivity cliff and yield uncertainty are among the main reasons for incorrect predictions. Removing reactivity cliffs and uncertain reactions boosted the R2 to 0.457 ± 0.006. These results highlight that yield prediction models must be sensitive to the reactivity change due to the subtle structure variance, as well as be robust to the uncertainty associated with yield measurements.
Collapse
Affiliation(s)
- Zhen Liu
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University Pittsburgh PA 15213 USA
| | - Yurii S Moroz
- Enamine Ltd Kyïv 02660 Ukraine
- Chemspace LLC Kyïv 02094 Ukraine
- Taras Shevchenko National University of Kyïv Kyïv 01601 Ukraine
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University Pittsburgh PA 15213 USA
| |
Collapse
|
134
|
Liu J, Lei X, Ji C, Pan Y. Fragment-pair based drug molecule solubility prediction through attention mechanism. Front Pharmacol 2023; 14:1255181. [PMID: 37881183 PMCID: PMC10595153 DOI: 10.3389/fphar.2023.1255181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 09/26/2023] [Indexed: 10/27/2023] Open
Abstract
The purpose of drug discovery is to identify new drugs, and the solubility of drug molecules is an important physicochemical property in medicinal chemistry, that plays a crucial role in drug discovery. In solubility prediction, high-precision computational methods can significantly reduce the experimental costs and time associated with drug development. Therefore, artificial intelligence technologies have been widely used for solubility prediction. This study utilized the attention layer in mechanism in the deep learning model to consider the atomic-level features of the molecules, and used gated recurrent neural networks to aggregate vectors between layers. It also utilized molecular fragment technology to divide the complete molecule into pairs of fragments, extracted characteristics from each fragment pair, and finally fused the characteristics to predict the solubility of drug molecules. We compared and evaluated our method with five existing models using two performance evaluation indicators, demonstrating that our method has better performance and greater robustness.
Collapse
Affiliation(s)
- Jianping Liu
- School of Computer Science, Shaanxi Normal University, Xi’an, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi’an, China
| | - Chunyan Ji
- Computer Science Department, BNU-HKBU United International College, Zhuhai, China
| | - Yi Pan
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Shenzhen Key Laboratory of Intelligent Bioinformatics, Shenzhen Institute of Advanced Technology, Shenzhen, China
| |
Collapse
|
135
|
Lehner MT, Katzberger P, Maeder N, Schiebroek CC, Teetz J, Landrum GA, Riniker S. DASH: Dynamic Attention-Based Substructure Hierarchy for Partial Charge Assignment. J Chem Inf Model 2023; 63:6014-6028. [PMID: 37738206 PMCID: PMC10565818 DOI: 10.1021/acs.jcim.3c00800] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Indexed: 09/24/2023]
Abstract
We present a robust and computationally efficient approach for assigning partial charges of atoms in molecules. The method is based on a hierarchical tree constructed from attention values extracted from a graph neural network (GNN), which was trained to predict atomic partial charges from accurate quantum-mechanical (QM) calculations. The resulting dynamic attention-based substructure hierarchy (DASH) approach provides fast assignment of partial charges with the same accuracy as the GNN itself, is software-independent, and can easily be integrated in existing parametrization pipelines, as shown for the Open force field (OpenFF). The implementation of the DASH workflow, the final DASH tree, and the training set are available as open source/open data from public repositories.
Collapse
Affiliation(s)
| | | | - Niels Maeder
- Department of Chemistry and
Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Carl C.G. Schiebroek
- Department of Chemistry and
Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Jakob Teetz
- Department of Chemistry and
Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Gregory A. Landrum
- Department of Chemistry and
Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Sereina Riniker
- Department of Chemistry and
Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
136
|
Riedl M, Mukherjee S, Gauthier M. Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma. Mol Pharm 2023; 20:4984-4993. [PMID: 37656906 DOI: 10.1021/acs.molpharmaceut.3c00129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/03/2023]
Abstract
Chemical-specific parameters are either measured in vitro or estimated using quantitative structure-activity relationship (QSAR) models. The existing body of QSAR work relies on extracting a set of descriptors or fingerprints, subset selection, and training a machine learning model. In this work, we used a state-of-the-art natural language processing model, Bidirectional Encoder Representations from Transformers, which allowed us to circumvent the need for calculation of these chemical descriptors. In this approach, simplified molecular-input line-entry system (SMILES) strings were embedded in a high-dimensional space using a two-stage training approach. The model was first pre-trained on a masked SMILES token task and then fine-tuned on a QSAR prediction task. The pre-training task learned meaningful high-dimensional embeddings based upon the relationships between the chemical tokens in the SMILES strings derived from the "in-stock" portion of the ZINC 15 dataset─a large dataset of commercially available chemicals. The fine-tuning task then perturbed the pre-trained embeddings to facilitate prediction of a specific QSAR endpoint of interest. The power of this model stems from the ability to reuse the pre-trained model for multiple different fine-tuning tasks, reducing the computational burden of developing multiple models for different endpoints. We used our framework to develop a predictive model for fraction unbound in human plasma (fu,p). This approach is flexible, requires minimum domain expertise, and can be generalized for other parameters of interest for rapid and accurate estimation of absorption, distribution, metabolism, excretion, and toxicity.
Collapse
|
137
|
Andrews-Morger A, Reutlinger M, Parrott N, Olivares-Morales A. A Machine Learning Framework to Improve Rat Clearance Predictions and Inform Physiologically Based Pharmacokinetic Modeling. Mol Pharm 2023; 20:5052-5065. [PMID: 37713584 DOI: 10.1021/acs.molpharmaceut.3c00374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/17/2023]
Abstract
During drug discovery and development, achieving appropriate pharmacokinetics is key to establishment of the efficacy and safety of new drugs. Physiologically based pharmacokinetic (PBPK) models integrating in vitro-to-in vivo extrapolation have become an essential in silico tool to achieve this goal. In this context, the most important and probably most challenging pharmacokinetic parameter to estimate is the clearance. Recent work on high-throughput PBPK modeling during drug discovery has shown that a good estimate of the unbound intrinsic clearance (CLint,u,) is the key factor for useful PBPK application. In this work, three different machine learning-based strategies were explored to predict the rat CLint,u as the input into PBPK. Therefore, in vivo and in vitro data was collected for a total of 2639 proprietary compounds. The strategies were compared to the standard in vitro bottom-up approach. Using the well-stirred liver model to back-calculate in vivo CLint,u from in vivo rat clearance and then training a machine learning model on this CLint,u led to more accurate clearance predictions (absolute average fold error (AAFE) 3.1 in temporal cross-validation) than the bottom-up approach (AAFE 3.6-16, depending on the scaling method) and has the advantage that no experimental in vitro data is needed. However, building a machine learning model on the bias between the back-calculated in vivo CLint,u and the bottom-up scaled in vitro CLint,u also performed well. For example, using unbound hepatocyte scaling, adding the bias prediction improved the AAFE in the temporal cross-validation from 16 for bottom-up to 2.9 together with the bias prediction. Similarly, the log Pearson r2 improved from 0.1 to 0.29. Although it would still require in vitro measurement of CLint,u., using unbound scaling for the bottom-up approach, the need for correction of the fu,inc by fu,p data is circumvented. While the above-described ML models were built on all data points available per approach, it is discussed that evaluation comparison across all approaches could only be performed on a subset because ca. 75% of the molecules had missing or unquantifiable measurements of the fraction unbound in plasma or in vitro unbound intrinsic clearance, or they dropped out due to the blood-flow limitation assumed by the well-stirred model. Advantageously, by predicting CLint,u as the input into PBPK, existing workflows can be reused and the prediction of the in vivo clearance and other PK parameters can be improved.
Collapse
Affiliation(s)
- Andrea Andrews-Morger
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070 Basel, Switzerland
| | - Michael Reutlinger
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070 Basel, Switzerland
| | - Neil Parrott
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070 Basel, Switzerland
| | - Andrés Olivares-Morales
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070 Basel, Switzerland
| |
Collapse
|
138
|
Yu J, Li Z, Chen G, Kong X, Hu J, Wang D, Cao D, Li Y, Huo R, Wang G, Liu X, Jiang H, Li X, Luo X, Zheng M. Computing the relative binding affinity of ligands based on a pairwise binding comparison network. NATURE COMPUTATIONAL SCIENCE 2023; 3:860-872. [PMID: 38177766 PMCID: PMC10766524 DOI: 10.1038/s43588-023-00529-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 09/05/2023] [Indexed: 01/06/2024]
Abstract
Structure-based lead optimization is an open challenge in drug discovery, which is still largely driven by hypotheses and depends on the experience of medicinal chemists. Here we propose a pairwise binding comparison network (PBCNet) based on a physics-informed graph attention mechanism, specifically tailored for ranking the relative binding affinity among congeneric ligands. Benchmarking on two held-out sets (provided by Schrödinger and Merck) containing over 460 ligands and 16 targets, PBCNet demonstrated substantial advantages in terms of both prediction accuracy and computational efficiency. Equipped with a fine-tuning operation, the performance of PBCNet reaches that of Schrödinger's FEP+, which is much more computationally intensive and requires substantial expert intervention. A further simulation-based experiment showed that active learning-optimized PBCNet may accelerate lead optimization campaigns by 473%. Finally, for the convenience of users, a web service for PBCNet is established to facilitate complex relative binding affinity prediction through an easy-to-operate graphical interface.
Collapse
Affiliation(s)
- Jie Yu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Information Science and Technology, Shanghai Tech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Zhaojun Li
- College of Computer and Information Engineering, Dezhou University, Dezhou City, China
- Development Department, Suzhou Alphama Biotechnology Co., Ltd, Suzhou City, China
| | - Geng Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Xiangtai Kong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jie Hu
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Duanhua Cao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yanbei Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Ruifeng Huo
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China
| | - Gang Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xiaohong Liu
- Development Department, Suzhou Alphama Biotechnology Co., Ltd, Suzhou City, China
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
- State Key Laboratory of Pharmaceutical Biotechnology, Nanjing University, Nanjing, Jiangsu, China.
| |
Collapse
|
139
|
Zhong Y, Zheng H, Chen X, Zhao Y, Gao T, Dong H, Luo H, Weng Z. DDI-GCN: Drug-drug interaction prediction via explainable graph convolutional networks. Artif Intell Med 2023; 144:102640. [PMID: 37783544 DOI: 10.1016/j.artmed.2023.102640] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 03/21/2023] [Accepted: 08/20/2023] [Indexed: 10/04/2023]
Abstract
Drug-drug interactions (DDI) may lead to unexpected side effects, which is a growing concern in both academia and industry. Many DDIs have been reported, but the underlying mechanisms are not well understood. Predicting and understanding DDIs can help researchers to improve drug safety and protect patient health. Here, we introduce DDI-GCN, a method that utilizes graph convolutional networks (GCN) to predict DDIs based on chemical structures. We demonstrate that this method achieves state-of-the-art prediction performance on the independent hold-out set. It can also provide visualization of structural features associated with DDIs, which can help us to study the underlying mechanisms. To make it easy and accessible to use, we developed a web server for DDI-GCN, which is freely available at http://wengzq-lab.cn/ddi/.
Collapse
Affiliation(s)
- Yi Zhong
- The Center for Big Data Research in Burns and Trauma, College of Computer and Data Science/College of Software, Fuzhou University, Fujian Province, China
| | - Houbing Zheng
- Department of Plastic Surgery, the First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Xiaoming Chen
- The Center for Big Data Research in Burns and Trauma, College of Computer and Data Science/College of Software, Fuzhou University, Fujian Province, China
| | - Yu Zhao
- The Center for Big Data Research in Burns and Trauma, College of Computer and Data Science/College of Software, Fuzhou University, Fujian Province, China
| | - Tingfang Gao
- College of Biological Science and Engineering, Fuzhou University, Fujian Province, China
| | - Huiqun Dong
- College of Biological Science and Engineering, Fuzhou University, Fujian Province, China
| | - Heng Luo
- The Center for Big Data Research in Burns and Trauma, College of Computer and Data Science/College of Software, Fuzhou University, Fujian Province, China; MetaNovas Biotech Inc., Foster City, CA, USA.
| | - Zuquan Weng
- College of Biological Science and Engineering, Fuzhou University, Fujian Province, China; The Center for Big Data Research in Burns and Trauma, College of Computer and Data Science/College of Software, Fuzhou University, Fujian Province, China; Department of Plastic Surgery, the First Affiliated Hospital of Fujian Medical University, Fuzhou, China.
| |
Collapse
|
140
|
Axelrod S, Shakhnovich E, Gómez-Bombarelli R. Mapping the Space of Photoswitchable Ligands and Photodruggable Proteins with Computational Modeling. J Chem Inf Model 2023; 63:5794-5802. [PMID: 37671878 DOI: 10.1021/acs.jcim.3c00484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
Light-activated drugs are a promising way to localize biological activity and minimize side effects. However, their development is complicated by the numerous photophysical and biological properties that must be simultaneously optimized. To accelerate the design of photoactive drugs, we describe a procedure that combines ligand-protein docking with chemical property prediction based on machine learning (ML). We apply this procedure to 58 proteins and 9000 photo-drug candidates based on azobenzene cis-trans isomerism. We find that most proteins display a preference for trans isomers over cis and that the binding affinities of nominally active/inactive pairs are in fact highly correlated. These findings have significant value for photopharmacology research, and reinforce the need for virtual screening to identify compounds with rare desirable properties. Further, we combine our procedure with quantum chemical validation to identify promising candidates for the photoactive inhibition of PARP1, an enzyme that is over-expressed in cancer cells. The top compounds are predicted to have long-lived active forms, differential bioactivity, and absorption in the near-infrared therapeutic window.
Collapse
Affiliation(s)
- Simon Axelrod
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, United States
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Eugene Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, United States
| | - Rafael Gómez-Bombarelli
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
141
|
Xu C, Liu R, Huang S, Li W, Li Z, Luo HB. 3D-SMGE: a pipeline for scaffold-based molecular generation and evaluation. Brief Bioinform 2023; 24:bbad327. [PMID: 37756591 DOI: 10.1093/bib/bbad327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/19/2023] [Accepted: 08/30/2023] [Indexed: 09/29/2023] Open
Abstract
In the process of drug discovery, one of the key problems is how to improve the biological activity and ADMET properties starting from a specific structure, which is also called structural optimization. Based on a starting scaffold, the use of deep generative model to generate molecules with desired drug-like properties will provide a powerful tool to accelerate the structural optimization process. However, the existing generative models remain challenging in extracting molecular features efficiently in 3D space to generate drug-like 3D molecules. Moreover, most of the existing ADMET prediction models made predictions of different properties through a single model, which can result in reduced prediction accuracy on some datasets. To effectively generate molecules from a specific scaffold and provide basis for the structural optimization, the 3D-SMGE (3-Dimensional Scaffold-based Molecular Generation and Evaluation) work consisting of molecular generation and prediction of ADMET properties is presented. For the molecular generation, we proposed 3D-SMG, a novel deep generative model for the end-to-end design of 3D molecules. In the 3D-SMG model, we designed the cross-aggregated continuous-filter convolution (ca-cfconv), which is used to achieve efficient and low-cost 3D spatial feature extraction while ensuring the invariance of atomic space rotation. 3D-SMG was proved to generate valid, unique and novel molecules with high drug-likeness. Besides, the proposed data-adaptive multi-model ADMET prediction method outperformed or maintained the best evaluation metrics on 24 out of 27 ADMET benchmark datasets. 3D-SMGE is anticipated to emerge as a powerful tool for hit-to-lead structural optimizations and accelerate the drug discovery process.
Collapse
Affiliation(s)
- Chao Xu
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Pharmaceutical Sciences, Hainan University, Haikou 570228, Hainan, P.R. China
| | - Runduo Liu
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510000, Guangdong, P.R. China
| | - Shuheng Huang
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Pharmaceutical Sciences, Hainan University, Haikou 570228, Hainan, P.R. China
| | - Wenchao Li
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510000, Guangdong, P.R. China
| | - Zhe Li
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510000, Guangdong, P.R. China
| | - Hai-Bin Luo
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Pharmaceutical Sciences, Hainan University, Haikou 570228, Hainan, P.R. China
| |
Collapse
|
142
|
Li B, Lin M, Chen T, Wang L. FG-BERT: a generalized and self-supervised functional group-based molecular representation learning framework for properties prediction. Brief Bioinform 2023; 24:bbad398. [PMID: 37930026 DOI: 10.1093/bib/bbad398] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/25/2023] [Accepted: 10/14/2023] [Indexed: 11/07/2023] Open
Abstract
Artificial intelligence-based molecular property prediction plays a key role in molecular design such as bioactive molecules and functional materials. In this study, we propose a self-supervised pretraining deep learning (DL) framework, called functional group bidirectional encoder representations from transformers (FG-BERT), pertained based on ~1.45 million unlabeled drug-like molecules, to learn meaningful representation of molecules from function groups. The pretrained FG-BERT framework can be fine-tuned to predict molecular properties. Compared to state-of-the-art (SOTA) machine learning and DL methods, we demonstrate the high performance of FG-BERT in evaluating molecular properties in tasks involving physical chemistry, biophysics and physiology across 44 benchmark datasets. In addition, FG-BERT utilizes attention mechanisms to focus on FG features that are critical to the target properties, thereby providing excellent interpretability for downstream training tasks. Collectively, FG-BERT does not require any artificially crafted features as input and has excellent interpretability, providing an out-of-the-box framework for developing SOTA models for a variety of molecule (especially for drug) discovery tasks.
Collapse
Affiliation(s)
- Biaoshun Li
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Mujie Lin
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Tiegen Chen
- Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Room 109, Building C, SSIP Healthcare and Medicine Demonstration Zone, Zhongshan Tsuihang New District, Zhongshan, Guangdong, 528400, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| |
Collapse
|
143
|
Liu X, Yang H, Ai C, Ding Y, Guo F, Tang J. MVML-MPI: Multi-View Multi-Label Learning for Metabolic Pathway Inference. Brief Bioinform 2023; 24:bbad393. [PMID: 37930024 DOI: 10.1093/bib/bbad393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 09/20/2023] [Accepted: 10/11/2023] [Indexed: 11/07/2023] Open
Abstract
Development of robust and effective strategies for synthesizing new compounds, drug targeting and constructing GEnome-scale Metabolic models (GEMs) requires a deep understanding of the underlying biological processes. A critical step in achieving this goal is accurately identifying the categories of pathways in which a compound participated. However, current machine learning-based methods often overlook the multifaceted nature of compounds, resulting in inaccurate pathway predictions. Therefore, we present a novel framework on Multi-View Multi-Label Learning for Metabolic Pathway Inference, hereby named MVML-MPI. First, MVML-MPI learns the distinct compound representations in parallel with corresponding compound encoders to fully extract features. Subsequently, we propose an attention-based mechanism that offers a fusion module to complement these multi-view representations. As a result, MVML-MPI accurately represents and effectively captures the complex relationship between compounds and metabolic pathways and distinguishes itself from current machine learning-based methods. In experiments conducted on the Kyoto Encyclopedia of Genes and Genomes pathways dataset, MVML-MPI outperformed state-of-the-art methods, demonstrating the superiority of MVML-MPI and its potential to utilize the field of metabolic pathway design, which can aid in optimizing drug-like compounds and facilitating the development of GEMs. The code and data underlying this article are freely available at https://github.com/guofei-tju/MVML-MPI. Contact: jtang@cse.sc.edu, guofei@csu.edu.com or wuxi_dyj@csj.uestc.edu.cn.
Collapse
Affiliation(s)
- Xiaoyi Liu
- Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Hongpeng Yang
- Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Chengwei Ai
- Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Fei Guo
- Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Nanshan 518055, China
| |
Collapse
|
144
|
Ru Z, Wu Y, Shao J, Yin J, Qian L, Miao X. A dual-modal graph learning framework for identifying interaction events among chemical and biotech drugs. Brief Bioinform 2023; 24:bbad271. [PMID: 37507113 DOI: 10.1093/bib/bbad271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 06/18/2023] [Accepted: 07/06/2023] [Indexed: 07/30/2023] Open
Abstract
Drug-drug interaction (DDI) identification is essential to clinical medicine and drug discovery. The two categories of drugs (i.e. chemical drugs and biotech drugs) differ remarkably in molecular properties, action mechanisms, etc. Biotech drugs are up-to-comers but highly promising in modern medicine due to higher specificity and fewer side effects. However, existing DDI prediction methods only consider chemical drugs of small molecules, not biotech drugs of large molecules. Here, we build a large-scale dual-modal graph database named CB-DB and customize a graph-based framework named CB-TIP to reason event-aware DDIs for both chemical and biotech drugs. CB-DB comprehensively integrates various interaction events and two heterogeneous kinds of molecular structures. It imports endogenous proteins founded on the fact that most drugs take effects by interacting with endogenous proteins. In the modality of molecular structure, drugs and endogenous proteins are two heterogeneous kinds of graphs, while in the modality of interaction, they are nodes connected by events (i.e. edges of different relationships). CB-TIP employs graph representation learning methods to generate drug representations from either modality and then contrastively mixes them to predict how likely an event occurs when a drug meets another in an end-to-end manner. Experiments demonstrate CB-TIP's great superiority in DDI prediction and the promising potential of uncovering novel DDIs.
Collapse
Affiliation(s)
- Zhongying Ru
- Center for Data Science, Zhejiang University, 866 Yuhangtang Rd, 310058, Hangzhou, P.R. China
- Polytechnic Institute, Zhejiang University, 866 Yuhangtang Rd, 310058, Hangzhou, P.R. China
| | - Yangyang Wu
- Center for Data Science, Zhejiang University, 866 Yuhangtang Rd, 310058, Hangzhou, P.R. China
| | - Jinning Shao
- Institute of Drug Metabolism and Pharmaceutical Analysis, Zhejiang Province Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Cancer Center, & Hangzhou Institute of Innovative Medicine, Zhejiang University, 866 Yuhangtang Rd, 310058, Hangzhou, P.R. China
| | - Jianwei Yin
- Center for Data Science, Zhejiang University, 866 Yuhangtang Rd, 310058, Hangzhou, P.R. China
- College of Computer Science, Zhejiang University, 866 Yuhangtang Rd, 310058, Hangzhou, P.R. China
| | - Linghui Qian
- Institute of Drug Metabolism and Pharmaceutical Analysis, Zhejiang Province Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Cancer Center, & Hangzhou Institute of Innovative Medicine, Zhejiang University, 866 Yuhangtang Rd, 310058, Hangzhou, P.R. China
| | - Xiaoye Miao
- Center for Data Science, Zhejiang University, 866 Yuhangtang Rd, 310058, Hangzhou, P.R. China
| |
Collapse
|
145
|
Xie A, Zhang Z, Guan J, Zhou S. Self-supervised learning with chemistry-aware fragmentation for effective molecular property prediction. Brief Bioinform 2023; 24:bbad296. [PMID: 37598424 DOI: 10.1093/bib/bbad296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/04/2023] [Accepted: 07/25/2023] [Indexed: 08/22/2023] Open
Abstract
Molecular property prediction (MPP) is a crucial and fundamental task for AI-aided drug discovery (AIDD). Recent studies have shown great promise of applying self-supervised learning (SSL) to producing molecular representations to cope with the widely-concerned data scarcity problem in AIDD. As some specific substructures of molecules play important roles in determining molecular properties, molecular representations learned by deep learning models are expected to attach more importance to such substructures implicitly or explicitly to achieve better predictive performance. However, few SSL pre-trained models for MPP in the literature have ever focused on such substructures. To challenge this situation, this paper presents a Chemistry-Aware Fragmentation for Effective MPP (CAFE-MPP in short) under the self-supervised contrastive learning framework. First, a novel fragment-based molecular graph (FMG) is designed to represent the topological relationship between chemistry-aware substructures that constitute a molecule. Then, with well-designed hard negative pairs, a is pre-trained on fragment-level by contrastive learning to extract representations for the nodes in FMGs. Finally, a Graphormer model is leveraged to produce molecular representations for MPP based on the embeddings of fragments. Experiments on 11 benchmark datasets show that the proposed CAFE-MPP method achieves state-of-the-art performance on 7 of the 11 datasets and the second-best performance on 3 datasets, compared with six remarkable self-supervised methods. Further investigations also demonstrate that CAFE-MPP can learn to embed molecules into representations implicitly containing the information of fragments highly correlated to molecular properties, and can alleviate the over-smoothing problem of graph neural networks.
Collapse
Affiliation(s)
- Ailin Xie
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, 200438 Shanghai, China
| | - Ziqiao Zhang
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, 200438 Shanghai, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, 201804 Shanghai, China
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, 200438 Shanghai, China
| |
Collapse
|
146
|
Gao J, Shen Z, Xie Y, Lu J, Lu Y, Chen S, Bian Q, Guo Y, Shen L, Wu J, Zhou B, Hou T, He Q, Che J, Dong X. TransFoxMol: predicting molecular property with focused attention. Brief Bioinform 2023; 24:bbad306. [PMID: 37605947 DOI: 10.1093/bib/bbad306] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 07/17/2023] [Accepted: 08/04/2023] [Indexed: 08/23/2023] Open
Abstract
Predicting the biological properties of molecules is crucial in computer-aided drug development, yet it's often impeded by data scarcity and imbalance in many practical applications. Existing approaches are based on self-supervised learning or 3D data and using an increasing number of parameters to improve performance. These approaches may not take full advantage of established chemical knowledge and could inadvertently introduce noise into the respective model. In this study, we introduce a more elegant transformer-based framework with focused attention for molecular representation (TransFoxMol) to improve the understanding of artificial intelligence (AI) of molecular structure property relationships. TransFoxMol incorporates a multi-scale 2D molecular environment into a graph neural network + Transformer module and uses prior chemical maps to obtain a more focused attention landscape compared to that obtained using existing approaches. Experimental results show that TransFoxMol achieves state-of-the-art performance on MoleculeNet benchmarks and surpasses the performance of baselines that use self-supervised learning or geometry-enhanced strategies on small-scale datasets. Subsequent analyses indicate that TransFoxMol's predictions are highly interpretable and the clever use of chemical knowledge enables AI to perceive molecules in a simple but rational way, enhancing performance.
Collapse
Affiliation(s)
- Jian Gao
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Zheyuan Shen
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yufeng Xie
- School of Software Technology, Zhejiang University, Hangzhou, China
| | - Jialiang Lu
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yang Lu
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Sikang Chen
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Qingyu Bian
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yue Guo
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou, China
| | - Liteng Shen
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jian Wu
- School of Software Technology, Zhejiang University, Hangzhou, China
| | - Binbin Zhou
- Department of Computer Science and Computing, Zhejiang University City College, Hangzhou, China
| | - Tingjun Hou
- State Key Lab of CAD&CG, College of Pharmaceutical Sciences, Zhejiang University, Zhejiang, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou, China
| | - Qiaojun He
- Institute of Pharmacology & Toxicology, Zhejiang Province Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, PR China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou, China
- Centre for Drug Safety Evaluation and Research of ZJU, Hangzhou, 310058, PR China
- Cancer Center of Zhejiang University, Hangzhou, China
| | - Jinxin Che
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Xiaowu Dong
- Hangzhou Institute of Innovative Medicine, Institute of Drug Discovery and Design, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou, China
- Cancer Center of Zhejiang University, Hangzhou, China
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| |
Collapse
|
147
|
Dong L, Shi S, Qu X, Luo D, Wang B. Ligand binding affinity prediction with fusion of graph neural networks and 3D structure-based complex graph. Phys Chem Chem Phys 2023; 25:24110-24120. [PMID: 37655493 DOI: 10.1039/d3cp03651k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Accurate prediction of protein-ligand binding affinity is pivotal for drug design and discovery. Here, we proposed a novel deep fusion graph neural networks framework named FGNN to learn the protein-ligand interactions from the 3D structures of protein-ligand complexes. Unlike 1D sequences for proteins or 2D graphs for ligands, the 3D graph of protein-ligand complex enables the more accurate representations of the protein-ligand interactions. Benchmark studies have shown that our fusion models FGNN can achieve more accurate prediction of binding affinity than any individual algorithm. The advantages of fusion strategies have been demonstrated in terms of expressive power of data, learning efficiency and model interpretability. Our fusion models show satisfactory performances on diverse data sets, demonstrating their generalization ability. Given the good performances in both binding affinity prediction and virtual screening, our fusion models are expected to be practically applied for drug screening and design. Our work highlights the potential of the fusion graph neural network algorithm in solving complex prediction problems in computational biology and chemistry. The fusion graph neural networks (FGNN) model is freely available in https://github.com/LinaDongXMU/FGNN.
Collapse
Affiliation(s)
- Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Shuai Shi
- Department of Algorithm, TuringQ Co., Ltd., Shanghai, 200240, China
| | - Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen, 361005, China
| |
Collapse
|
148
|
Cremer J, Medrano Sandonas L, Tkatchenko A, Clevert DA, De Fabritiis G. Equivariant Graph Neural Networks for Toxicity Prediction. Chem Res Toxicol 2023; 36. [PMID: 37690056 PMCID: PMC10583285 DOI: 10.1021/acs.chemrestox.3c00032] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Indexed: 09/12/2023]
Abstract
Predictive modeling of toxicity is a crucial step in the drug discovery pipeline. It can help filter out molecules with a high probability of failing in the early stages of de novo drug design. Thus, several machine learning (ML) models have been developed to predict the toxicity of molecules by combining classical ML techniques or deep neural networks with well-known molecular representations such as fingerprints or 2D graphs. But the more natural, accurate representation of molecules is expected to be defined in physical 3D space like in ab initio methods. Recent studies successfully used equivariant graph neural networks (EGNNs) for representation learning based on 3D structures to predict quantum-mechanical properties of molecules. Inspired by this, we investigated the performance of EGNNs to construct reliable ML models for toxicity prediction. We used the equivariant transformer (ET) model in TorchMD-NET for this. Eleven toxicity data sets taken from MoleculeNet, TDCommons, and ToxBenchmark have been considered to evaluate the capability of ET for toxicity prediction. Our results show that ET adequately learns 3D representations of molecules that can successfully correlate with toxicity activity, achieving good accuracies on most data sets comparable to state-of-the-art models. We also test a physicochemical property, namely, the total energy of a molecule, to inform the toxicity prediction with a physical prior. However, our work suggests that these two properties can not be related. We also provide an attention weight analysis for helping to understand the toxicity prediction in 3D space and thus increase the explainability of the ML model. In summary, our findings offer promising insights considering 3D geometry information via EGNNs and provide a straightforward way to integrate molecular conformers into ML-based pipelines for predicting and investigating toxicity prediction in physical space. We expect that in the future, especially for larger, more diverse data sets, EGNNs will be an essential tool in this domain.
Collapse
Affiliation(s)
- Julian Cremer
- Computational
Science Laboratory, Universitat Pompeu Fabra,
Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
- Machine
Learning Research, Pfizer Worldwide Research
Development and Medical, Linkstr. 10, 10785 Berlin, Germany
| | - Leonardo Medrano Sandonas
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Djork-Arné Clevert
- Machine
Learning Research, Pfizer Worldwide Research
Development and Medical, Linkstr. 10, 10785 Berlin, Germany
| | - Gianni De Fabritiis
- Computational
Science Laboratory, Universitat Pompeu Fabra,
Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
- ICREA, Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
149
|
Song Y, Chang S, Tian J, Pan W, Feng L, Ji H. A Comprehensive Comparative Analysis of Deep Learning Based Feature Representations for Molecular Taste Prediction. Foods 2023; 12:3386. [PMID: 37761095 PMCID: PMC10529232 DOI: 10.3390/foods12183386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 08/30/2023] [Accepted: 09/01/2023] [Indexed: 09/29/2023] Open
Abstract
Taste determination in small molecules is critical in food chemistry but traditional experimental methods can be time-consuming. Consequently, computational techniques have emerged as valuable tools for this task. In this study, we explore taste prediction using various molecular feature representations and assess the performance of different machine learning algorithms on a dataset comprising 2601 molecules. The results reveal that GNN-based models outperform other approaches in taste prediction. Moreover, consensus models that combine diverse molecular representations demonstrate improved performance. Among these, the molecular fingerprints + GNN consensus model emerges as the top performer, highlighting the complementary strengths of GNNs and molecular fingerprints. These findings have significant implications for food chemistry research and related fields. By leveraging these computational approaches, taste prediction can be expedited, leading to advancements in understanding the relationship between molecular structure and taste perception in various food components and related compounds.
Collapse
Affiliation(s)
- Yu Song
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, China;
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Sihao Chang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Jing Tian
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Weihua Pan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Lu Feng
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, China;
| | - Hongchao Ji
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| |
Collapse
|
150
|
Wang Y, Xiong J, Xiao F, Zhang W, Cheng K, Rao J, Niu B, Tong X, Qu N, Zhang R, Wang D, Chen K, Li X, Zheng M. LogD7.4 prediction enhanced by transferring knowledge from chromatographic retention time, microscopic pKa and logP. J Cheminform 2023; 15:76. [PMID: 37670374 PMCID: PMC10478446 DOI: 10.1186/s13321-023-00754-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 08/25/2023] [Indexed: 09/07/2023] Open
Abstract
Lipophilicity is a fundamental physical property that significantly affects various aspects of drug behavior, including solubility, permeability, metabolism, distribution, protein binding, and toxicity. Accurate prediction of lipophilicity, measured by the logD7.4 value (the distribution coefficient between n-octanol and buffer at physiological pH 7.4), is crucial for successful drug discovery and design. However, the limited availability of data for logD modeling poses a significant challenge to achieving satisfactory generalization capability. To address this challenge, we have developed a novel logD7.4 prediction model called RTlogD, which leverages knowledge from multiple sources. RTlogD combines pre-training on a chromatographic retention time (RT) dataset since the RT is influenced by lipophilicity. Additionally, microscopic pKa values are incorporated as atomic features, providing valuable insights into ionizable sites and ionization capacity. Furthermore, logP is integrated as an auxiliary task within a multitask learning framework. We conducted ablation studies and presented a detailed analysis, showcasing the effectiveness and interpretability of RT, pKa, and logP in the RTlogD model. Notably, our RTlogD model demonstrated superior performance compared to commonly used algorithms and prediction tools. These results underscore the potential of the RTlogD model to improve the accuracy and generalization of logD prediction in drug discovery and design. In summary, the RTlogD model addresses the challenge of limited data availability in logD modeling by leveraging knowledge from RT, microscopic pKa, and logP. Incorporating these factors enhances the predictive capabilities of our model, and it holds promise for real-world applications in drug discovery and design scenarios.
Collapse
Affiliation(s)
- Yitian Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Jiacheng Xiong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Fu Xiao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China
| | - Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Kaiyang Cheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Buying Niu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Ning Qu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Runze Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | | | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China.
| |
Collapse
|