1
|
Shi W, Yang H, Xie L, Yin XX, Zhang Y. A review of machine learning-based methods for predicting drug-target interactions. Health Inf Sci Syst 2024; 12:30. [PMID: 38617016 PMCID: PMC11014838 DOI: 10.1007/s13755-024-00287-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 03/04/2024] [Indexed: 04/16/2024] Open
Abstract
The prediction of drug-target interactions (DTI) is a crucial preliminary stage in drug discovery and development, given the substantial risk of failure and the prolonged validation period associated with in vitro and in vivo experiments. In the contemporary landscape, various machine learning-based methods have emerged as indispensable tools for DTI prediction. This paper begins by placing emphasis on the data representation employed by these methods, delineating five representations for drugs and four for proteins. The methods are then categorized into traditional machine learning-based approaches and deep learning-based ones, with a discussion of representative approaches in each category and the introduction of a novel taxonomy for deep neural network models in DTI prediction. Additionally, we present a synthesis of commonly used datasets and evaluation metrics to facilitate practical implementation. In conclusion, we address current challenges and outline potential future directions in this research field.
Collapse
Affiliation(s)
- Wen Shi
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
| | - Hong Yang
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Linhai Xie
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing, 102206 China
| | - Xiao-Xia Yin
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Yanchun Zhang
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
- Department of New Networks, Peng Cheng Laboratory, Shenzhen, 518000 China
| |
Collapse
|
2
|
Chen S, Gao N, Li C, Zhai F, Jiang X, Zhang P, Guan J, Li K, Xiang R, Ling G. DrugSK: A Stacked Ensemble Learning Framework for Predicting Drug Combinations of Multiple Diseases. J Chem Inf Model 2024; 64:5317-5327. [PMID: 38900583 DOI: 10.1021/acs.jcim.4c00296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2024]
Abstract
Combination therapy is an important direction of continuous exploration in the field of medicine, with the core goals of improving treatment efficacy, reducing adverse reactions, and optimizing clinical outcomes. Machine learning technology holds great promise in improving the prediction of drug synergy combinations. However, most studies focus on single disease-oriented collaborative predictive models or involve excessive feature categories, making it challenging to predict the majority of new drugs. To address these challenges, the DrugSK comprehensive model was developed, which utilizes SMILES-BERT to extract structural information from 3492 drugs and trains on reactions from 48,756 drug combinations. DrugSK is an integrated learning model capable of predicting interactions among various drug categories. First, the primary learner is trained from the initial data set. Random forest, support vector machine, and XGboost model are selected as primary learners and logistic regression as secondary learners. A new data set is then "generated" to train level 2 learners, which can be thought of as a prediction for each model. Finally, the results are filtered using logistic regression. Furthermore, the combination of the new antibacterial drug Drafloxacin with other antibacterial agents was tested. The synergistic effect of Drafloxacin and Isavuconazonium in the fight against Candida albicans has been confirmed, providing enlightenment for the clinical treatment of skin infection. DrugSK's prediction is accurate in practical application and can also predict the probability of the outcome. In addition, the tendency of Drafloxacin and antifungal drugs to be synergistic was found. The development of DrugSK will provide a new blueprint for predicting drug combination synergies.
Collapse
Affiliation(s)
- Siqi Chen
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Nan Gao
- Wuya College of Innovation, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Chunzhi Li
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Fei Zhai
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Xiwei Jiang
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Peng Zhang
- Wuya College of Innovation, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Jibin Guan
- Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Kefeng Li
- Center for Artificial Intelligence-Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macau SR 999708, China
| | - Rongwu Xiang
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
- Liaoning Medical Big Data and Artificial Intelligence Engineering Technology Research Center, Shenyang 110016, China
| | - Guixia Ling
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
- Wuya College of Innovation, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| |
Collapse
|
3
|
Sharma R, Saghapour E, Chen JY. An NLP-based technique to extract meaningful features from drug SMILES. iScience 2024; 27:109127. [PMID: 38455979 PMCID: PMC10918220 DOI: 10.1016/j.isci.2024.109127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 09/30/2023] [Accepted: 02/01/2024] [Indexed: 03/09/2024] Open
Abstract
NLP is a well-established field in ML for developing language models that capture the sequence of words in a sentence. Similarly, drug molecule structures can also be represented as sequences using the SMILES notation. However, unlike natural language texts, special characters in drug SMILES have specific meanings and cannot be ignored. We introduce a novel NLP-based method that extracts interpretable sequences and essential features from drug SMILES notation using N-grams. Our method compares these features to Morgan fingerprint bit-vectors using UMAP-based embedding, and we validate its effectiveness through two personalized drug screening (PSD) case studies. Our NLP-based features are sparse and, when combined with gene expressions and disease phenotype features, produce better ML models for PSD. This approach provides a new way to analyze drug molecule structures represented as SMILES notation, which can help accelerate drug discovery efforts. We have also made our method accessible through a Python library.
Collapse
Affiliation(s)
- Rahul Sharma
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Ehsan Saghapour
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Jake Y. Chen
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
4
|
Venkatraman V. FP-MAP: an extensive library of fingerprint-based molecular activity prediction tools. Front Chem 2023; 11:1239467. [PMID: 37649967 PMCID: PMC10462816 DOI: 10.3389/fchem.2023.1239467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 07/31/2023] [Indexed: 09/01/2023] Open
Abstract
Discovering new drugs for disease treatment is challenging, requiring a multidisciplinary effort as well as time, and resources. With a view to improving hit discovery and lead compound identification, machine learning (ML) approaches are being increasingly used in the decision-making process. Although a number of ML-based studies have been published, most studies only report fragments of the wider range of bioactivities wherein each model typically focuses on a particular disease. This study introduces FP-MAP, an extensive atlas of fingerprint-based prediction models that covers a diverse range of activities including neglected tropical diseases (caused by viral, bacterial and parasitic pathogens) as well as other targets implicated in diseases such as Alzheimer's. To arrive at the best predictive models, performance of ≈4,000 classification/regression models were evaluated on different bioactivity data sets using 12 different molecular fingerprints. The best performing models that achieved test set AUC values of 0.62-0.99 have been integrated into an easy-to-use graphical user interface that can be downloaded from https://gitlab.com/vishsoft/fpmap.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Chemistry, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
5
|
Deep learning methods for molecular representation and property prediction. Drug Discov Today 2022; 27:103373. [PMID: 36167282 DOI: 10.1016/j.drudis.2022.103373] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/22/2022] [Accepted: 09/21/2022] [Indexed: 01/11/2023]
Abstract
With advances in artificial intelligence (AI) methods, computer-aided drug design (CADD) has developed rapidly in recent years. Effective molecular representation and accurate property prediction are crucial tasks in CADD workflows. In this review, we summarize contemporary applications of deep learning (DL) methods for molecular representation and property prediction. We categorize DL methods according to the format of molecular data (1D, 2D, and 3D). In addition, we discuss some common DL models, such as ensemble learning and transfer learning, and analyze the interpretability methods for these models. We also highlight the challenges and opportunities of DL methods for molecular representation and property prediction.
Collapse
|