1
|
Chen L, Jiang J, Dou B, Feng H, Liu J, Zhu Y, Zhang B, Zhou T, Wei GW. Machine learning study of the extended drug-target interaction network informed by pain related voltage-gated sodium channels. Pain 2024; 165:908-921. [PMID: 37851391 PMCID: PMC11021136 DOI: 10.1097/j.pain.0000000000003089] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 09/09/2023] [Indexed: 10/19/2023]
Abstract
ABSTRACT Pain is a significant global health issue, and the current treatment options for pain management have limitations in terms of effectiveness, side effects, and potential for addiction. There is a pressing need for improved pain treatments and the development of new drugs. Voltage-gated sodium channels, particularly Nav1.3, Nav1.7, Nav1.8, and Nav1.9, play a crucial role in neuronal excitability and are predominantly expressed in the peripheral nervous system. Targeting these channels may provide a means to treat pain while minimizing central and cardiac adverse effects. In this study, we construct protein-protein interaction (PPI) networks based on pain-related sodium channels and develop a corresponding drug-target interaction network to identify potential lead compounds for pain management. To ensure reliable machine learning predictions, we carefully select 111 inhibitor data sets from a pool of more than 1000 targets in the PPI network. We employ 3 distinct machine learning algorithms combined with advanced natural language processing (NLP)-based embeddings, specifically pretrained transformer and autoencoder representations. Through a systematic screening process, we evaluate the side effects and repurposing potential of more than 150,000 drug candidates targeting Nav1.7 and Nav1.8 sodium channels. In addition, we assess the ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties of these candidates to identify leads with near-optimal characteristics. Our strategy provides an innovative platform for the pharmacological development of pain treatments, offering the potential for improved efficacy and reduced side effects.
Collapse
Affiliation(s)
- Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, P R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, P R. China
- Department of Mathematics, Michigan State University, East Lansing, MI, United States
| | - Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, P R. China
| | - Hongsong Feng
- Department of Mathematics, Michigan State University, East Lansing, MI, United States
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, P R. China
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, P R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, P R. China
| | - Tianshou Zhou
- Key Laboratory of Computational Mathematics, Guangdong Province, and School of Mathematics, Sun Yat-sen University, Guangzhou, P R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, United States
| |
Collapse
|
2
|
Melancon K, Pliushcheuskaya P, Meiler J, Künze G. Targeting ion channels with ultra-large library screening for hit discovery. Front Mol Neurosci 2024; 16:1336004. [PMID: 38249296 PMCID: PMC10796734 DOI: 10.3389/fnmol.2023.1336004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 12/05/2023] [Indexed: 01/23/2024] Open
Abstract
Ion channels play a crucial role in a variety of physiological and pathological processes, making them attractive targets for drug development in diseases such as diabetes, epilepsy, hypertension, cancer, and chronic pain. Despite the importance of ion channels in drug discovery, the vastness of chemical space and the complexity of ion channels pose significant challenges for identifying drug candidates. The use of in silico methods in drug discovery has dramatically reduced the time and cost of drug development and has the potential to revolutionize the field of medicine. Recent advances in computer hardware and software have enabled the screening of ultra-large compound libraries. Integration of different methods at various scales and dimensions is becoming an inevitable trend in drug development. In this review, we provide an overview of current state-of-the-art computational chemistry methodologies for ultra-large compound library screening and their application to ion channel drug discovery research. We discuss the advantages and limitations of various in silico techniques, including virtual screening, molecular mechanics/dynamics simulations, and machine learning-based approaches. We also highlight several successful applications of computational chemistry methodologies in ion channel drug discovery and provide insights into future directions and challenges in this field.
Collapse
Affiliation(s)
- Kortney Melancon
- Department of Chemistry, Vanderbilt University, Nashville, TN, United States
- Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | | | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, United States
- Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
- Medical Faculty, Institute for Drug Discovery, Leipzig University, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence, Leipzig University, Leipzig, Germany
| | - Georg Künze
- Medical Faculty, Institute for Drug Discovery, Leipzig University, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence, Leipzig University, Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| |
Collapse
|
3
|
Liu W, Hopkins AM, Yan P, Du S, Luyt LG, Li Y, Hou J. Can machine learning 'transform' peptides/peptidomimetics into small molecules? A case study with ghrelin receptor ligands. Mol Divers 2023; 27:2239-2255. [PMID: 36331785 DOI: 10.1007/s11030-022-10555-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 10/19/2022] [Indexed: 11/06/2022]
Abstract
There has been considerable interest in transforming peptides into small molecules as peptide-based molecules often present poorer bioavailability and lower metabolic stability. Our studies looked into building machine learning (ML) models to investigate if ML is able to identify the 'bioactive' features of peptides and use the features to accurately discriminate between binding and non-binding small molecules. The ghrelin receptor (GR), a receptor that is implicated in various diseases, was used as an example to demonstrate whether ML models derived from a peptide library can be used to predict small molecule binders. ML models based on three different algorithms, namely random forest, support vector machine, and extreme gradient boosting, were built based on a carefully curated dataset of peptide/peptidomimetic and small molecule GR ligands. The results indicated that ML models trained with a dataset exclusively composed of peptides/peptidomimetics provide limited predictive power for small molecules, but that ML models trained with a diverse dataset composed of an array of both peptides/peptidomimetics and small molecules displayed exceptional results in terms of accuracy and false rates. The diversified models can accurately differentiate the binding small molecules from non-binding small molecules using an external validation set with new small molecules that we synthesized previously. Structural features that are the most critical contributors to binding activity were extracted and are remarkably consistent with the crystallography and mutagenesis studies.
Collapse
Affiliation(s)
- Wenjie Liu
- Department of Chemistry, Lakehead University and Thunder Bay Regional Health Research Institute, 980 Oliver Road, Thunder Bay, ON, P7B 6V4, Canada
| | - Austin M Hopkins
- Department of Chemistry, Lakehead University and Thunder Bay Regional Health Research Institute, 980 Oliver Road, Thunder Bay, ON, P7B 6V4, Canada
| | - Peizhi Yan
- Department of Electrical and Computer Engineering, The University of British Columbia, Vancouver, BC, Canada
| | - Shan Du
- Department of Computer Science, Mathematics, Physics and Statistics, The University of British Columbia, Okanagan, Kelowna, BC, Canada
| | - Leonard G Luyt
- Department of Chemistry, University of Western Ontario, London, ON, Canada
- London Regional Cancer Program, Lawson Health Research Institute, London, ON, Canada
| | - Yifeng Li
- Department of Computer Science, Brock University, Saint Catharines, ON, Canada
| | - Jinqiang Hou
- Department of Chemistry, Lakehead University and Thunder Bay Regional Health Research Institute, 980 Oliver Road, Thunder Bay, ON, P7B 6V4, Canada.
| |
Collapse
|
4
|
Gu Y, Li J, Kang H, Zhang B, Zheng S. Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning. Molecules 2023; 28:5982. [PMID: 37630234 PMCID: PMC10459669 DOI: 10.3390/molecules28165982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 07/27/2023] [Accepted: 08/03/2023] [Indexed: 08/27/2023] Open
Abstract
Ligand-based virtual screening (LBVS) is a promising approach for rapid and low-cost screening of potentially bioactive molecules in the early stage of drug discovery. Compared with traditional similarity-based machine learning methods, deep learning frameworks for LBVS can more effectively extract high-order molecule structure representations from molecular fingerprints or structures. However, the 3D conformation of a molecule largely influences its bioactivity and physical properties, and has rarely been considered in previous deep learning-based LBVS methods. Moreover, the relative bioactivity benchmark dataset is still lacking. To address these issues, we introduce a novel end-to-end deep learning architecture trained from molecular conformers for LBVS. We first extracted molecule conformers from multiple public molecular bioactivity data and consolidated them into a large-scale bioactivity benchmark dataset, which totally includes millions of endpoints and molecules corresponding to 954 targets. Then, we devised a deep learning-based LBVS called EquiVS to learn molecule representations from conformers for bioactivity prediction. Specifically, graph convolutional network (GCN) and equivariant graph neural network (EGNN) are sequentially stacked to learn high-order molecule-level and conformer-level representations, followed with attention-based deep multiple-instance learning (MIL) to aggregate these representations and then predict the potential bioactivity for the query molecule on a given target. We conducted various experiments to validate the data quality of our benchmark dataset, and confirmed EquiVS achieved better performance compared with 10 traditional machine learning or deep learning-based LBVS methods. Further ablation studies demonstrate the significant contribution of molecular conformation for bioactivity prediction, as well as the reasonability and non-redundancy of deep learning architecture in EquiVS. Finally, a model interpretation case study on CDK2 shows the potential of EquiVS in optimal conformer discovery. The overall study shows that our proposed benchmark dataset and EquiVS method have promising prospects in virtual screening applications.
Collapse
Affiliation(s)
- Yaowen Gu
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
- Department of Chemistry, New York University, New York, NY 10027, USA
| | - Jiao Li
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
| | - Hongyu Kang
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing 100081, China
| | - Bowen Zhang
- Beijing StoneWise Technology Co., Ltd., Beijing 100080, China;
| | - Si Zheng
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China; (Y.G.); (J.L.); (H.K.)
- Institute for Artificial Intelligence, Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing 100084, China
| |
Collapse
|
5
|
Luo Y, Wang P, Mou M, Zheng H, Hong J, Tao L, Zhu F. A novel strategy for designing the magic shotguns for distantly related target pairs. Brief Bioinform 2023; 24:6984790. [PMID: 36631399 DOI: 10.1093/bib/bbac621] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 11/09/2022] [Accepted: 12/17/2022] [Indexed: 01/13/2023] Open
Abstract
Due to its promising capacity in improving drug efficacy, polypharmacology has emerged to be a new theme in the drug discovery of complex disease. In the process of novel multi-target drugs (MTDs) discovery, in silico strategies come to be quite essential for the advantage of high throughput and low cost. However, current researchers mostly aim at typical closely related target pairs. Because of the intricate pathogenesis networks of complex diseases, many distantly related targets are found to play crucial role in synergistic treatment. Therefore, an innovational method to develop drugs which could simultaneously target distantly related target pairs is of utmost importance. At the same time, reducing the false discovery rate in the design of MTDs remains to be the daunting technological difficulty. In this research, effective small molecule clustering in the positive dataset, together with a putative negative dataset generation strategy, was adopted in the process of model constructions. Through comprehensive assessment on 10 target pairs with hierarchical similarity-levels, the proposed strategy turned out to reduce the false discovery rate successfully. Constructed model types with much smaller numbers of inhibitor molecules gained considerable yields and showed better false-hit controllability than before. To further evaluate the generalization ability, an in-depth assessment of high-throughput virtual screening on ChEMBL database was conducted. As a result, this novel strategy could hierarchically improve the enrichment factors for each target pair (especially for those distantly related/unrelated target pairs), corresponding to target pair similarity-levels.
Collapse
Affiliation(s)
- Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Panpan Wang
- College of Chemistry and Pharmaceutical Engineering, Huanghuai University, Zhumadian 463000, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Hanqi Zheng
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jiajun Hong
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou 310036, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
6
|
Kong W, Huang W, Peng C, Zhang B, Duan G, Ma W, Huang Z. Multiple machine learning methods aided virtual screening of Na V 1.5 inhibitors. J Cell Mol Med 2022; 27:266-276. [PMID: 36573431 PMCID: PMC9843531 DOI: 10.1111/jcmm.17652] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 10/30/2022] [Accepted: 12/06/2022] [Indexed: 12/28/2022] Open
Abstract
Nav 1.5 sodium channels contribute to the generation of the rapid upstroke of the myocardial action potential and thereby play a central role in the excitability of myocardial cells. At present, the patch clamp method is the gold standard for ion channel inhibitor screening. However, this method has disadvantages such as high technical difficulty, high cost and low speed. In this study, novel machine learning models to screen chemical blockers were developed to overcome the above shortage. The data from the ChEMBL Database were employed to establish the machine learning models. Firstly, six molecular fingerprints together with five machine learning algorithms were used to develop 30 classification models to predict effective inhibitors. A validation and a test set were used to evaluate the performance of the models. Subsequently, the privileged substructures tightly associated with the inhibition of the Nav 1.5 ion channel were extracted using the bioalerts Python package. In the validation set, the RF-Graph model performed best. Similarly, RF-Graph produced the best result in the test set in which the Prediction Accuracy (Q) was 0.9309 and Matthew's correlation coefficient was 0.8627, further indicating the model had high classification ability. The results of the privileged substructures indicated Sulfa structures and fragments with large Steric hindrance tend to block Nav 1.5. In the unsupervised learning task of identifying sulfa drugs, MACCS and Graph fingerprints had good results. In summary, effective machine learning models have been constructed which help to screen potential inhibitors of the Nav 1.5 ion channel and key privileged substructures with high affinity were also extracted.
Collapse
Affiliation(s)
- Weikaixin Kong
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina,Institute for Molecular Medicine Finland (FIMM)HiLIFE, University of HelsinkiHelsinkiFinland,Institute Sanqu Technology (Hangzhou) Co., Ltd.HangzhouChina
| | - Weiran Huang
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina
| | - Chao Peng
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina
| | - Bowen Zhang
- ComMedX (Computational Medicine Beijing Co., Ltd.)BeijingChina
| | - Guifang Duan
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina
| | - Weining Ma
- Department of NeurologyShengjing Hospital affiliated to China Medical UniversityShenyangChina
| | - Zhuo Huang
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina,State Key Laboratory of Natural and Biomimetic Drugs, Department of Molecular and Cellular Pharmacology, School of Pharmaceutical SciencesPeking University Health Science CenterBeijingChina
| |
Collapse
|
7
|
Gu Y, Zheng S, Yin Q, Jiang R, Li J. REDDA: Integrating multiple biological relations to heterogeneous graph neural network for drug-disease association prediction. Comput Biol Med 2022; 150:106127. [PMID: 36182762 DOI: 10.1016/j.compbiomed.2022.106127] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 07/27/2022] [Accepted: 09/18/2022] [Indexed: 11/03/2022]
Abstract
Computational drug repositioning is an effective way to find new indications for existing drugs, thus can accelerate drug development and reduce experimental costs. Recently, various deep learning-based repurposing methods have been established to identify the potential drug-disease associations (DDA). However, effective utilization of the relations of biological entities to capture the biological interactions to enhance the drug-disease association prediction is still challenging. To resolve the above problem, we proposed a heterogeneous graph neural network called REDDA (Relations-Enhanced Drug-Disease Association prediction). Assembled with three attention mechanisms, REDDA can sequentially learn drug/disease representations by a general heterogeneous graph convolutional network-based node embedding block, a topological subnet embedding block, a graph attention block, and a layer attention block. Performance comparisons on our proposed benchmark dataset show that REDDA outperforms 8 advanced drug-disease association prediction methods, achieving relative improvements of 0.76% on the area under the receiver operating characteristic curve (AUC) score and 13.92% on the precision-recall curve (AUPR) score compared to the suboptimal method. On the other benchmark dataset, REDDA also obtains relative improvements of 2.48% on the AUC score and 4.93% on the AUPR score. Specifically, case studies also indicate that REDDA can give valid predictions for the discovery of -new indications for drugs and new therapies for diseases. The overall results provide an inspiring potential for REDDA in the in silico drug development. The proposed benchmark dataset and source code are available in https://github.com/gu-yaowen/REDDA.
Collapse
Affiliation(s)
- Yaowen Gu
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, 100020, China
| | - Si Zheng
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, 100020, China; Institute for Artificial Intelligence, Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing, 100084, China
| | - Qijin Yin
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Jiao Li
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, 100020, China.
| |
Collapse
|
8
|
Zhu Z, Deng Z, Wang Q, Wang Y, Zhang D, Xu R, Guo L, Wen H. Simulation and Machine Learning Methods for Ion-Channel Structure Determination, Mechanistic Studies and Drug Design. Front Pharmacol 2022; 13:939555. [PMID: 35837274 PMCID: PMC9275593 DOI: 10.3389/fphar.2022.939555] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 06/07/2022] [Indexed: 11/13/2022] Open
Abstract
Ion channels are expressed in almost all living cells, controlling the in-and-out communications, making them ideal drug targets, especially for central nervous system diseases. However, owing to their dynamic nature and the presence of a membrane environment, ion channels remain difficult targets for the past decades. Recent advancement in cryo-electron microscopy and computational methods has shed light on this issue. An explosion in high-resolution ion channel structures paved way for structure-based rational drug design and the state-of-the-art simulation and machine learning techniques dramatically improved the efficiency and effectiveness of computer-aided drug design. Here we present an overview of how simulation and machine learning-based methods fundamentally changed the ion channel-related drug design at different levels, as well as the emerging trends in the field.
Collapse
Affiliation(s)
- Zhengdan Zhu
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Beijing Institute of Big Data Research, Beijing, China
| | - Zhenfeng Deng
- DP Technology, Beijing, China
- School of Pharmaceutical Sciences, Peking University, Beijing, China
| | | | | | - Duo Zhang
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- DP Technology, Beijing, China
| | - Ruihan Xu
- DP Technology, Beijing, China
- National Engineering Research Center of Visual Technology, Peking University, Beijing, China
| | | | - Han Wen
- DP Technology, Beijing, China
| |
Collapse
|
9
|
Gu Y, Zheng S, Xu Z, Yin Q, Li L, Li J. An efficient curriculum learning-based strategy for molecular graph learning. Brief Bioinform 2022; 23:6562682. [DOI: 10.1093/bib/bbac099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 01/18/2022] [Accepted: 02/27/2022] [Indexed: 12/14/2022] Open
Abstract
Abstract
Computational methods have been widely applied to resolve various core issues in drug discovery, such as molecular property prediction. In recent years, a data-driven computational method-deep learning had achieved a number of impressive successes in various domains. In drug discovery, graph neural networks (GNNs) take molecular graph data as input and learn graph-level representations in non-Euclidean space. An enormous amount of well-performed GNNs have been proposed for molecular graph learning. Meanwhile, efficient use of molecular data during training process, however, has not been paid enough attention. Curriculum learning (CL) is proposed as a training strategy by rearranging training queue based on calculated samples' difficulties, yet the effectiveness of CL method has not been determined in molecular graph learning. In this study, inspired by chemical domain knowledge and task prior information, we proposed a novel CL-based training strategy to improve the training efficiency of molecular graph learning, called CurrMG. Consisting of a difficulty measurer and a training scheduler, CurrMG is designed as a plug-and-play module, which is model-independent and easy-to-use on molecular data. Extensive experiments demonstrated that molecular graph learning models could benefit from CurrMG and gain noticeable improvement on five GNN models and eight molecular property prediction tasks (overall improvement is 4.08%). We further observed CurrMG’s encouraging potential in resource-constrained molecular property prediction. These results indicate that CurrMG can be used as a reliable and efficient training strategy for molecular graph learning.
Availability: The source code is available in https://github.com/gu-yaowen/CurrMG.
Collapse
Affiliation(s)
- Yaowen Gu
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China
| | - Si Zheng
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| | - Zidu Xu
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China
| | - Qijin Yin
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Liang Li
- Key Laboratory of Antibiotic Bioengineering of National Health and Family Planning Commission (NHFPC), Institute of Medicinal Biotechnology (IMB), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China
| | - Jiao Li
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China
| |
Collapse
|
10
|
Zhu J, Wang J, Wang X, Gao M, Guo B, Gao M, Liu J, Yu Y, Wang L, Kong W, An Y, Liu Z, Sun X, Huang Z, Zhou H, Zhang N, Zheng R, Xie Z. Prediction of drug efficacy from transcriptional profiles with deep learning. Nat Biotechnol 2021; 39:1444-1452. [PMID: 34140681 DOI: 10.1038/s41587-021-00946-z] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 05/06/2021] [Indexed: 02/05/2023]
Abstract
Drug discovery focused on target proteins has been a successful strategy, but many diseases and biological processes lack obvious targets to enable such approaches. Here, to overcome this challenge, we describe a deep learning-based efficacy prediction system (DLEPS) that identifies drug candidates using a change in the gene expression profile in the diseased state as input. DLEPS was trained using chemically induced changes in transcriptional profiles from the L1000 project. We found that the changes in transcriptional profiles for previously unexamined molecules were predicted with a Pearson correlation coefficient of 0.74. We examined three disorders and experimentally tested the top drug candidates in mouse disease models. Validation showed that perillen, chikusetsusaponin IV and trametinib confer disease-relevant impacts against obesity, hyperuricemia and nonalcoholic steatohepatitis, respectively. DLEPS can generate insights into pathogenic mechanisms, and we demonstrate that the MEK-ERK signaling pathway is a target for developing agents against nonalcoholic steatohepatitis. Our findings suggest that DLEPS is an effective tool for drug repurposing and discovery.
Collapse
Affiliation(s)
- Jie Zhu
- Peking University International Cancer Institute, Health Science Center, Peking University, Beijing, China.,Department of Pharmacology, School of Basic Medical Sciences, Health Science Center, Peking University, Beijing, China
| | - Jingxiang Wang
- Beijing & Qingdao Langu Pharmaceutical R&D Platform, Beijing Gigaceuticals Tech. Co. Ltd., Beijing, China
| | - Xin Wang
- Department of Pharmacology, School of Basic Medical Sciences, Health Science Center, Peking University, Beijing, China
| | - Mingjing Gao
- Beijing & Qingdao Langu Pharmaceutical R&D Platform, Beijing Gigaceuticals Tech. Co. Ltd., Beijing, China
| | - Bingbing Guo
- Department of Anatomy, Histology and Embryology, Neuroscience Research Institute, Health Science Center, Peking University, Beijing, China
| | - Miaomiao Gao
- Peking University International Cancer Institute, Health Science Center, Peking University, Beijing, China
| | - Jiarui Liu
- Department of Anatomy, Histology and Embryology, Neuroscience Research Institute, Health Science Center, Peking University, Beijing, China
| | - Yanqiu Yu
- Peking University International Cancer Institute, Health Science Center, Peking University, Beijing, China
| | - Liang Wang
- Department of Pharmacology, School of Basic Medical Sciences, Health Science Center, Peking University, Beijing, China
| | - Weikaixin Kong
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Health Science Center, Peking University, Beijing, China
| | - Yongpan An
- Department of Pharmacology, School of Basic Medical Sciences, Health Science Center, Peking University, Beijing, China
| | - Zurui Liu
- Beijing & Qingdao Langu Pharmaceutical R&D Platform, Beijing Gigaceuticals Tech. Co. Ltd., Beijing, China
| | - Xinpei Sun
- Peking University International Cancer Institute, Health Science Center, Peking University, Beijing, China
| | - Zhuo Huang
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Health Science Center, Peking University, Beijing, China
| | - Hong Zhou
- Department of Pharmacology, School of Basic Medical Sciences, Health Science Center, Peking University, Beijing, China.
| | - Ning Zhang
- Peking University International Cancer Institute, Health Science Center, Peking University, Beijing, China.
| | - Ruimao Zheng
- Department of Anatomy, Histology and Embryology, Neuroscience Research Institute, Health Science Center, Peking University, Beijing, China.
| | - Zhengwei Xie
- Peking University International Cancer Institute, Health Science Center, Peking University, Beijing, China. .,Beijing & Qingdao Langu Pharmaceutical R&D Platform, Beijing Gigaceuticals Tech. Co. Ltd., Beijing, China.
| |
Collapse
|
11
|
Vatansever S, Schlessinger A, Wacker D, Kaniskan HÜ, Jin J, Zhou M, Zhang B. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021; 41:1427-1473. [PMID: 33295676 PMCID: PMC8043990 DOI: 10.1002/med.21764] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 01/11/2023]
Abstract
Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML-driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML-powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state-of-the-art of AI/ML-guided CNS drug discovery, focusing on blood-brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.
Collapse
Affiliation(s)
- Sezen Vatansever
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Avner Schlessinger
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Daniel Wacker
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - H. Ümit Kaniskan
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Jian Jin
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ming‐Ming Zhou
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
12
|
Kawai K, Tomonou M, Machida Y, Karuo Y, Tarui A, Sato K, Ikeda Y, Kinashi T, Omote M. Effect of Learning Dataset for Identification of Active Molecules: A Case Study of Integrin αIIbβ3 Inhibitors. Mol Inform 2021; 40:e2060040. [PMID: 33738924 DOI: 10.1002/minf.202060040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 01/30/2021] [Indexed: 01/13/2023]
Abstract
Efficient in silico approaches are needed to identify strong integrin αIIbβ3 inhibitors through a small number of measurements. To address the challenge, we investigated the effect of learning dataset on the classification performance of machine learning models focusing on weak and inactive compounds. The structure and activity information of the compounds were obtained from ChEMBL, and pCHEMBL values were used to classify them as active, inactive, or weak. Datasets with various imbalance levels from active:inactive=1 : 1 to 1 : 1000 were used for the machine learning. The prediction scores of the weak samples were found to lie between the predictive values of active and inactive compounds. In addition, another dataset that consists of 149 actives and 6.9 million inactives was screened; the results indicated that the number of positive predictions decreased for models trained with a higher number of inactives. Although there is a trade-off between false positives and false negatives, for determination of compounds with strong activity using a reduced number of measurements, it is better to use a large number of inactives for learning and identifying compounds that score higher than the weak samples.
Collapse
Affiliation(s)
- Kentaro Kawai
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Mami Tomonou
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Yume Machida
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Yukiko Karuo
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Atsushi Tarui
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Kazuyuki Sato
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Yoshiki Ikeda
- Department of Molecular Genetics, Institute of Biomedical Science, Kansai Medical University, 2-5-1 Shin-machi, Hirakata, Osaka, 573-1010, Japan
| | - Tatsuo Kinashi
- Department of Molecular Genetics, Institute of Biomedical Science, Kansai Medical University, 2-5-1 Shin-machi, Hirakata, Osaka, 573-1010, Japan
| | - Masaaki Omote
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| |
Collapse
|
13
|
Zhang R, Li X, Zhang X, Qin H, Xiao W. Machine learning approaches for elucidating the biological effects of natural products. Nat Prod Rep 2021; 38:346-361. [PMID: 32869826 DOI: 10.1039/d0np00043d] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Covering: 2000 to 2020 Machine learning (ML) is an efficient tool for the prediction of bioactivity and the study of structure-activity relationships. Over the past decade, an emerging trend for combining these approaches with the study of natural products (NPs) has developed in order to manage the challenge of the discovery of bioactive NPs. In the present review, we will introduce the basic principles and protocols for using the ML approach to investigate the bioactivity of NPs, citing a series of practical examples regarding the study of anti-microbial, anti-cancer, and anti-inflammatory NPs, etc. ML algorithms manage a variety of classification and regression problems associated with bioactive NPs, from those that are linear to non-linear and from pure compounds to plant extracts. Inspired by cases reported in the literature and our own experience, a number of key points have been emphasized for reducing modeling errors, including dataset preparation and applicability domain analysis.
Collapse
Affiliation(s)
- Ruihan Zhang
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Xiaoli Li
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Xingjie Zhang
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Huayan Qin
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Weilie Xiao
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| |
Collapse
|
14
|
Kong W, Wang W, An J. Prediction of 5-hydroxytryptamine transporter inhibitors based on machine learning. Comput Biol Chem 2020; 87:107303. [PMID: 32563857 DOI: 10.1016/j.compbiolchem.2020.107303] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 05/29/2020] [Accepted: 06/04/2020] [Indexed: 01/08/2023]
Abstract
In patients with depression, the use of 5-HT reuptake inhibitors can improve the condition. Machine learning methods can be used in ligand-based activity prediction processes. In order to predict SERT inhibitors, the SERT inhibitor data from the ChEMBL database was screened and pre-processed. Then 4 machine learning methods (LR, SVM, RF, and KNN) and 4 molecular fingerprints (CDK, Graph, MACCS, and PubChem) were used to build 16 prediction models. The top 5 models of accuracy (Q) in the cross-validation of training set were used to build three different ensemble learning models. In the test1 set, the VOT_CLF3 model had the largest SP (0.871), Q (0.869), AUC (0.919), and MCC (0.728). In the unbalanced test2 set, VOT_CLF3 had the largest SE (0.857), SP (0.867), Q (0.865) and MCC (0.639). VOT_CLF3 was recommended for the virtual screening process of SERT inhibitors. In addition, 12 molecular structural alerts that frequently appear in SERT inhibitors were found (P < 0.05), which provided important reference value for the design work of SERT inhibitors.
Collapse
Affiliation(s)
- Weikaixin Kong
- Department of Molecular and Cellular Pharmacology, School of Pharmaceutical Sciences, Peking University, Beijing, 100191, China
| | - Wenyu Wang
- School of Nursing, Peking University, Beijing, 100191, China
| | - Jinbing An
- Department of Health Informatics and Management, Peking University, Beijing, 100191, China.
| |
Collapse
|