1
|
Shi W, Yang H, Xie L, Yin XX, Zhang Y. A review of machine learning-based methods for predicting drug-target interactions. Health Inf Sci Syst 2024; 12:30. [PMID: 38617016 PMCID: PMC11014838 DOI: 10.1007/s13755-024-00287-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 03/04/2024] [Indexed: 04/16/2024] Open
Abstract
The prediction of drug-target interactions (DTI) is a crucial preliminary stage in drug discovery and development, given the substantial risk of failure and the prolonged validation period associated with in vitro and in vivo experiments. In the contemporary landscape, various machine learning-based methods have emerged as indispensable tools for DTI prediction. This paper begins by placing emphasis on the data representation employed by these methods, delineating five representations for drugs and four for proteins. The methods are then categorized into traditional machine learning-based approaches and deep learning-based ones, with a discussion of representative approaches in each category and the introduction of a novel taxonomy for deep neural network models in DTI prediction. Additionally, we present a synthesis of commonly used datasets and evaluation metrics to facilitate practical implementation. In conclusion, we address current challenges and outline potential future directions in this research field.
Collapse
Affiliation(s)
- Wen Shi
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
| | - Hong Yang
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Linhai Xie
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing, 102206 China
| | - Xiao-Xia Yin
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Yanchun Zhang
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
- Department of New Networks, Peng Cheng Laboratory, Shenzhen, 518000 China
| |
Collapse
|
2
|
Stefan SM, Rafehi M. Medicinal polypharmacology-a scientific glossary of terminology and concepts. Front Pharmacol 2024; 15:1419110. [PMID: 39092220 PMCID: PMC11292611 DOI: 10.3389/fphar.2024.1419110] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 04/30/2024] [Indexed: 08/04/2024] Open
Abstract
Medicinal polypharmacology is one answer to the complex reality of multifactorial human diseases that are often unresponsive to single-targeted treatment. It is an admittance that intrinsic feedback mechanisms, crosstalk, and disease networks necessitate drugs with broad modes-of-action and multitarget affinities. Medicinal polypharmacology grew to be an independent research field within the last two decades and stretches from basic drug development to clinical research. It has developed its own terminology embedded in general terms of pharmaceutical drug discovery and development at the intersection of medicinal chemistry, chemical biology, and clinical pharmacology. A clear and precise language of critical terms and a thorough understanding of underlying concepts is imperative; however, no comprehensive work exists to this date that could support researchers in this and adjacent research fields. In order to explore novel options, establish interdisciplinary collaborations, and generate high-quality research outputs, the present work provides a first-in-field glossary to clarify the numerous terms that have originated from various individual disciplines.
Collapse
Affiliation(s)
- Sven Marcel Stefan
- Medicinal Chemistry and Systems Polypharmacology, Medical Systems Biology Division, Lübeck Institute of Experimental Dermatology (LIED), University of Lübeck and University Medical Center Schleswig-Holstein (UKSH), Lübeck, Germany
- Department of Biopharmacy, Medical University of Lublin, Lublin, Poland
| | - Muhammad Rafehi
- Institute of Clinical Pharmacology, University Medical Center Göttingen, Göttingen, Germany
- Department of Medical Education, Augsburg University Medicine, Augsburg, Germany
| |
Collapse
|
3
|
Chen Y, Liang X, Du W, Liang Y, Wong G, Chen L. Drug-Target Interaction Prediction Based on an Interactive Inference Network. Int J Mol Sci 2024; 25:7753. [PMID: 39062996 PMCID: PMC11277210 DOI: 10.3390/ijms25147753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 06/25/2024] [Accepted: 06/27/2024] [Indexed: 07/28/2024] Open
Abstract
Drug-target interactions underlie the actions of chemical substances in medicine. Moreover, drug repurposing can expand use profiles while reducing costs and development time by exploiting potential multi-functional pharmacological properties based upon additional target interactions. Nonetheless, drug repurposing relies on the accurate identification and validation of drug-target interactions (DTIs). In this study, a novel drug-target interaction prediction model was developed. The model, based on an interactive inference network, contains embedding, encoding, interaction, feature extraction, and output layers. In addition, this study used Morgan and PubChem molecular fingerprints as additional information for drug encoding. The interaction layer in our model simulates the drug-target interaction process, which assists in understanding the interaction by representing the interaction space. Our method achieves high levels of predictive performance, as well as interpretability of drug-target interactions. Additionally, we predicted and validated 22 Alzheimer's disease-related targets, suggesting our model is robust and effective and thus may be beneficial for drug repurposing.
Collapse
Affiliation(s)
- Yuqi Chen
- College of Mathematics and Computer, Shantou University, Shantou 515063, China; (Y.C.); (X.L.)
| | - Xiaomin Liang
- College of Mathematics and Computer, Shantou University, Shantou 515063, China; (Y.C.); (X.L.)
| | - Wei Du
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (W.D.); (Y.L.)
| | - Yanchun Liang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (W.D.); (Y.L.)
| | - Garry Wong
- Faculty of Health Sciences, University of Macau, Taipa, Macau SAR 999078, China;
| | - Liang Chen
- College of Mathematics and Computer, Shantou University, Shantou 515063, China; (Y.C.); (X.L.)
| |
Collapse
|
4
|
Stefan SM, Rafehi M. Medicinal polypharmacology: Exploration and exploitation of the polypharmacolome in modern drug development. Drug Dev Res 2024; 85:e22125. [PMID: 37920929 DOI: 10.1002/ddr.22125] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 09/23/2023] [Accepted: 10/12/2023] [Indexed: 11/04/2023]
Abstract
At the core of complex and multifactorial human diseases, such as cancer, metabolic syndrome, or neurodegeneration, are multiple players that cross-talk in robust biological networks which are intrinsically resilient to alterations. These multifactorial diseases are characterized by sophisticated feedback mechanisms which manifest cellular imbalance and resistance to drug therapy. By adhering to the specificity paradigm ("one target-one drug concept"), research focused for many years on drugs with very narrow mechanisms of action. This narrow focus promoted therapy ineffectiveness and resistance. However, modern drug discovery has evolved over the last years, increasingly emphasizing integral strategies for the development of clinically effective drugs. These integral strategies include the controlled engagement of multiple targets to overcome therapy resistance. Apart from the additive or even synergistic effects in therapy, multitarget drugs harbor molecular-structural attributes to explore orphan targets of which intrinsic substrates/physiological role(s) and/or modulators are unknown for future therapy purposes. We designated this multidisciplinary and translational research field between medicinal chemistry, chemical biology, and molecular pharmacology as 'medicinal polypharmacology'. Medicinal polypharmacology emerged as alternative approach to common single-targeted pharmacology stretching from basic drug and target identification processes to clinical evaluation of multitarget drugs, and the exploration and exploitation of the 'polypharmacolome' is at the forefront of modern drug development research.
Collapse
Affiliation(s)
- Sven Marcel Stefan
- Drug Development and Chemical Biology, Lübeck Institute of Experimental Dermatology (LIED), University of Lübeck and University Medical Center Schleswig-Holstein, Lübeck, Germany
- Translational Neurodegeneration Research and Neuropathology Lab, Department of Pathology, Section of Neuropathology and Oslo University Hospital, University of Oslo, Oslo, Norway
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, New South Wales, Australia
| | - Muhammad Rafehi
- Department of Medical Education, Augsburg University Medicine, Augsburg, Germany
- Institute of Clinical Pharmacology, University Medical Center Göttingen, Göttingen, Germany
| |
Collapse
|
5
|
Jiang M, Shao Y, Zhang Y, Zhou W, Pang S. A deep learning method for drug-target affinity prediction based on sequence interaction information mining. PeerJ 2023; 11:e16625. [PMID: 38099302 PMCID: PMC10720480 DOI: 10.7717/peerj.16625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 11/16/2023] [Indexed: 12/17/2023] Open
Abstract
Background A critical aspect of in silico drug discovery involves the prediction of drug-target affinity (DTA). Conducting wet lab experiments to determine affinity is both expensive and time-consuming, making it necessary to find alternative approaches. In recent years, deep learning has emerged as a promising technique for DTA prediction, leveraging the substantial computational power of modern computers. Methods We proposed a novel sequence-based approach, named KC-DTA, for predicting drug-target affinity (DTA). In this approach, we converted the target sequence into two distinct matrices, while representing the molecule compound as a graph. The proposed method utilized k-mers analysis and Cartesian product calculation to capture the interactions and evolutionary information among various residues, enabling the creation of the two matrices for target sequence. For molecule, it was represented by constructing a molecular graph where atoms serve as nodes and chemical bonds serve as edges. Subsequently, the obtained target matrices and molecule graph were utilized as inputs for convolutional neural networks (CNNs) and graph neural networks (GNNs) to extract hidden features, which were further used for the prediction of binding affinity. Results In order to evaluate the effectiveness of the proposed method, we conducted several experiments and made a comprehensive comparison with the state-of-the-art approaches using multiple evaluation metrics. The results of our experiments demonstrated that the KC-DTA method achieves high performance in predicting drug-target affinity (DTA). The findings of this research underscore the significance of the KC-DTA method as a valuable tool in the field of in silico drug discovery, offering promising opportunities for accelerating the drug development process. All the data and code are available for access on https://github.com/syc2017/KCDTA.
Collapse
Affiliation(s)
- Mingjian Jiang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong, China
| | - Yunchang Shao
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong, China
| | - Wei Zhou
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong, China
| | - Shunpeng Pang
- School of Computer Engineering, WeiFang University, Weifang, Shandong, China
| |
Collapse
|
6
|
Yang SQ, Zhang LX, Ge YJ, Zhang JW, Hu JX, Shen CY, Lu AP, Hou TJ, Cao DS. In-silico target prediction by ensemble chemogenomic model based on multi-scale information of chemical structures and protein sequences. J Cheminform 2023; 15:48. [PMID: 37088813 PMCID: PMC10123967 DOI: 10.1186/s13321-023-00720-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 04/08/2023] [Indexed: 04/25/2023] Open
Abstract
Identification and validation of bioactive small-molecule targets is a significant challenge in drug discovery. In recent years, various in-silico approaches have been proposed to expedite time- and resource-consuming experiments for target detection. Herein, we developed several chemogenomic models for target prediction based on multi-scale information of chemical structures and protein sequences. By combining the information of a compound with multiple protein targets together and putting these compound-target pairs into a well-established model, the scores to indicate whether there are interactions between compounds and targets can be derived, and thus a target prediction task can be completed by sorting the outputted scores. To improve the prediction performance, we constructed several chemogenomic models using multi-scale information of chemical structures and protein sequences, and the ensemble model with the best performance was used as our final model. The model was validated by various strategies and external datasets and the promising target prediction capability of the model, i.e., the fraction of known targets identified in the top-k (1 to 10) list of the potential target candidates suggested by the model, was confirmed. Compared with multiple state-of-art target prediction methods, our model showed equivalent or better predictive ability in terms of the top-k predictions. It is expected that our method can be utilized as a powerful computational tool to narrow down the potential targets for experimental testing.
Collapse
Affiliation(s)
- Su-Qing Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Liu-Xia Zhang
- The First Hospital of Hunan University of Chinese Medicine, Changsha, 410007, Hunan, People's Republic of China
| | - You-Jin Ge
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Jin-Wei Zhang
- Departments of Biomedical Engineering and Pathology, School of Basic Medical Science, Central South University, Changsha, 410013, Hunan, People's Republic of China
| | - Jian-Xin Hu
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Cheng-Ying Shen
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, People's Republic of China
| | - Ting-Jun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China.
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China.
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, People's Republic of China.
| |
Collapse
|
7
|
A Novel Autoencoder-Based Feature Selection Method for Drug-Target Interaction Prediction with Human-Interpretable Feature Weights. Symmetry (Basel) 2023. [DOI: 10.3390/sym15010192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Drug-target interaction prediction provides important information that could be exploited for drug discovery, drug design, and drug repurposing. Chemogenomic approaches for predicting drug-target interaction assume that similar receptors bind to similar ligands. Capturing this similarity in so-called “fingerprints” and combining the target and ligand fingerprints provide an efficient way to search for protein-ligand pairs that are more likely to interact. In this study, we constructed drug and target fingerprints by employing features extracted from the DrugBank. However, the number of extracted features is quite large, necessitating an effective feature selection mechanism since some features can be redundant or irrelevant to drug-target interaction prediction problems. Although such feature selection methods are readily available in the literature, usually they act as black boxes and do not provide any quantitative information about why a specific feature is preferred over another. To alleviate this lack of human interpretability, we proposed a novel feature selection method in which we used an autoencoder as a symmetric learning method and compared the proposed method to some popular feature selection algorithms, such as Kbest, Variance Threshold, and Decision Tree. The results of a detailed performance study, in which we trained six Multi-Layer Perceptron (MLP) Networks of different sizes and configurations for prediction, demonstrate that the proposed method yields superior results compared to the aforementioned methods.
Collapse
|
8
|
Hua Y, Song X, Feng Z, Wu XJ, Kittler J, Yu DJ. CPInformer for Efficient and Robust Compound-Protein Interaction Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:285-296. [PMID: 35044921 DOI: 10.1109/tcbb.2022.3144008] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Recently, deep learning has become the mainstream methodology for Compound-Protein Interaction (CPI) prediction. However, the existing compound-protein feature extraction methods have some issues that limit their performance. First, graph networks are widely used for structural compound feature extraction, but the chemical properties of a compound depend on functional groups rather than graphic structure. Besides, the existing methods lack capabilities in extracting rich and discriminative protein features. Last, the compound-protein features are usually simply combined for CPI prediction, without considering information redundancy and effective feature mining. To address the above issues, we propose a novel CPInformer method. Specifically, we extract heterogeneous compound features, including structural graph features and functional class fingerprints, to reduce prediction errors caused by similar structural compounds. Then, we combine local and global features using dense connections to obtain multi-scale protein features. Last, we apply ProbSparse self-attention to protein features, under the guidance of compound features, to eliminate information redundancy, and to improve the accuracy of CPInformer. More importantly, the proposed method identifies the activated local regions that link a CPI, providing a good visualisation for the CPI state. The results obtained on five benchmarks demonstrate the merits and superiority of CPInformer over the state-of-the-art approaches.
Collapse
|
9
|
Das B, Kutsal M, Das R. A geometric deep learning model for display and prediction of potential drug-virus interactions against SARS-CoV-2. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS : AN INTERNATIONAL JOURNAL SPONSORED BY THE CHEMOMETRICS SOCIETY 2022; 229:104640. [PMID: 36042844 PMCID: PMC9400382 DOI: 10.1016/j.chemolab.2022.104640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 08/17/2022] [Accepted: 08/19/2022] [Indexed: 05/04/2023]
Abstract
Although the coronavirus epidemic spread rapidly with the Omicron variant, it lost its lethality rate with the effect of vaccine and immunity. The hospitalization and intense demand decreased. However, there is no definite information about when this disease will end or how dangerous the different variants could be. In addition, it is not possible to end the risk of variants that will continue to circulate among animals in nature. After this stage, drug-virus interactions should be examined in order to be able to prepare against possible new types of viruses and variants and to rapidly-produce drugs or vaccines against possible viruses. Despite experimental methods that are expensive, laborious, and time-consuming, geometric deep learning(GDL) is an alternative method that can be used to make this process faster and cheaper. In this study, we propose a new model based on geometric deep learning for the prediction of drug-virus interaction against COVID-19. First, we use the antiviral drug data in the SMILES molecular structure representation to generate too many features and better describe the structure of chemical species. Then the data is converted into a molecular representation and then into a graphical structure that the GDL model can understand. The node feature vectors are transferred to a different space with the Message Passing Neural Network (MPNN) for the training process to take place. We develop a geometric neural network architecture where the graph embedding values are passed through the fully connected layer and the prediction is actualized. The results indicate that the proposed method outperforms existing methods with 97% accuracy in predicting drug-virus interactions.
Collapse
Affiliation(s)
- Bihter Das
- Department of Software Engineering, Technology Faculty, Firat University, 23119, Elazig, Turkey
| | - Mucahit Kutsal
- Department of Software Engineering, Technology Faculty, Firat University, 23119, Elazig, Turkey
| | - Resul Das
- Department of Software Engineering, Technology Faculty, Firat University, 23119, Elazig, Turkey
| |
Collapse
|
10
|
Zhao Q, Yang M, Cheng Z, Li Y, Wang J. Biomedical Data and Deep Learning Computational Models for Predicting Compound-Protein Relations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2092-2110. [PMID: 33769935 DOI: 10.1109/tcbb.2021.3069040] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The identification of compound-protein relations (CPRs), which includes compound-protein interactions (CPIs) and compound-protein affinities (CPAs), is critical to drug development. A common method for compound-protein relation identification is the use of in vitro screening experiments. However, the number of compounds and proteins is massive, and in vitro screening experiments are labor-intensive, expensive, and time-consuming with high failure rates. Researchers have developed a computational field called virtual screening (VS) to aid experimental drug development. These methods utilize experimentally validated biological interaction information to generate datasets and use the physicochemical and structural properties of compounds and target proteins as input information to train computational prediction models. At present, deep learning has been widely used in computer vision and natural language processing and has experienced epoch-making progress. At the same time, deep learning has also been used in the field of biomedicine widely, and the prediction of CPRs based on deep learning has developed rapidly and has achieved good results. The purpose of this study is to investigate and discuss the latest applications of deep learning techniques in CPR prediction. First, we describe the datasets and feature engineering (i.e., compound and protein representations and descriptors) commonly used in CPR prediction methods. Then, we review and classify recent deep learning approaches in CPR prediction. Next, a comprehensive comparison is performed to demonstrate the prediction performance of representative methods on classical datasets. Finally, we discuss the current state of the field, including the existing challenges and our proposed future directions. We believe that this investigation will provide sufficient references and insight for researchers to understand and develop new deep learning methods to enhance CPR predictions.
Collapse
|
11
|
Zheng J, Xiao X, Qiu WR. DTI-BERT: Identifying Drug-Target Interactions in Cellular Networking Based on BERT and Deep Learning Method. Front Genet 2022; 13:859188. [PMID: 35754843 PMCID: PMC9213727 DOI: 10.3389/fgene.2022.859188] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 04/25/2022] [Indexed: 11/20/2022] Open
Abstract
Drug–target interactions (DTIs) are regarded as an essential part of genomic drug discovery, and computational prediction of DTIs can accelerate to find the lead drug for the target, which can make up for the lack of time-consuming and expensive wet-lab techniques. Currently, many computational methods predict DTIs based on sequential composition or physicochemical properties of drug and target, but further efforts are needed to improve them. In this article, we proposed a new sequence-based method for accurately identifying DTIs. For target protein, we explore using pre-trained Bidirectional Encoder Representations from Transformers (BERT) to extract sequence features, which can provide unique and valuable pattern information. For drug molecules, Discrete Wavelet Transform (DWT) is employed to generate information from drug molecular fingerprints. Then we concatenate the feature vectors of the DTIs, and input them into a feature extraction module consisting of a batch-norm layer, rectified linear activation layer and linear layer, called BRL block and a Convolutional Neural Networks module to extract DTIs features further. Subsequently, a BRL block is used as the prediction engine. After optimizing the model based on contrastive loss and cross-entropy loss, it gave prediction accuracies of the target families of G Protein-coupled receptors, ion channels, enzymes, and nuclear receptors up to 90.1, 94.7, 94.9, and 89%, which indicated that the proposed method can outperform the existing predictors. To make it as convenient as possible for researchers, the web server for the new predictor is freely accessible at: https://bioinfo.jcu.edu.cn/dtibert or http://121.36.221.79/dtibert/. The proposed method may also be a potential option for other DITs.
Collapse
Affiliation(s)
- Jie Zheng
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| | - Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| | - Wang-Ren Qiu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| |
Collapse
|
12
|
DTIP-TC2A: An analytical framework for drug-target interactions prediction methods. Comput Biol Chem 2022; 99:107707. [DOI: 10.1016/j.compbiolchem.2022.107707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 05/01/2022] [Accepted: 05/26/2022] [Indexed: 11/18/2022]
|
13
|
Yu L, Qiu W, Lin W, Cheng X, Xiao X, Dai J. HGDTI: predicting drug-target interaction by using information aggregation based on heterogeneous graph neural network. BMC Bioinformatics 2022; 23:126. [PMID: 35413800 PMCID: PMC9004085 DOI: 10.1186/s12859-022-04655-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 03/28/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In research on new drug discovery, the traditional wet experiment has a long period. Predicting drug-target interaction (DTI) in silico can greatly narrow the scope of search of candidate medications. Excellent algorithm model may be more effective in revealing the potential connection between drug and target in the bioinformatics network composed of drugs, proteins and other related data. RESULTS In this work, we have developed a heterogeneous graph neural network model, named as HGDTI, which includes a learning phase of network node embedding and a training phase of DTI classification. This method first obtains the molecular fingerprint information of drugs and the pseudo amino acid composition information of proteins, then extracts the initial features of nodes through Bi-LSTM, and uses the attention mechanism to aggregate heterogeneous neighbors. In several comparative experiments, the overall performance of HGDTI significantly outperforms other state-of-the-art DTI prediction models, and the negative sampling technology is employed to further optimize the prediction power of model. In addition, we have proved the robustness of HGDTI through heterogeneous network content reduction tests, and proved the rationality of HGDTI through other comparative experiments. These results indicate that HGDTI can utilize heterogeneous information to capture the embedding of drugs and targets, and provide assistance for drug development. CONCLUSIONS The HGDTI based on heterogeneous graph neural network model, can utilize heterogeneous information to capture the embedding of drugs and targets, and provide assistance for drug development. For the convenience of related researchers, a user-friendly web-server has been established at http://bioinfo.jcu.edu.cn/hgdti .
Collapse
Affiliation(s)
- Liyi Yu
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Weizhong Lin
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xiang Cheng
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China.
| | - Jiexia Dai
- School of Foreign Languages, Jingdezhen University, Jingdezhen, China
| |
Collapse
|
14
|
Ru X, Ye X, Sakurai T, Zou Q. NerLTR-DTA: drug-target binding affinity prediction based on neighbor relationship and learning to rank. Bioinformatics 2022; 38:1964-1971. [PMID: 35134828 DOI: 10.1093/bioinformatics/btac048] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 12/20/2021] [Accepted: 01/28/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Drug-target interaction prediction plays an important role in new drug discovery and drug repurposing. Binding affinity indicates the strength of drug-target interactions. Predicting drug-target binding affinity is expected to provide promising candidates for biologists, which can effectively reduce the workload of wet laboratory experiments and speed up the entire process of drug research. Given that, numerous new proteins are sequenced and compounds are synthesized, several improved computational methods have been proposed for such predictions, but there are still some challenges. (i) Many methods only discuss and implement one application scenario, they focus on drug repurposing and ignore the discovery of new drugs and targets. (ii) Many methods do not consider the priority order of proteins (or drugs) related to each target drug (or protein). Therefore, it is necessary to develop a comprehensive method that can be used in multiple scenarios and focuses on candidate order. RESULTS In this study, we propose a method called NerLTR-DTA that uses the neighbor relationship of similarity and sharing to extract features, and applies a ranking framework with regression attributes to predict affinity values and priority order of query drug (or query target) and its related proteins (or compounds). It is worth noting that using the characteristics of learning to rank to set different queries can smartly realize the multi-scenario application of the method, including the discovery of new drugs and new targets. Experimental results on two commonly used datasets show that NerLTR-DTA outperforms some state-of-the-art competing methods. NerLTR-DTA achieves excellent performance in all application scenarios mentioned in this study, and the rm(test)2 values guarantee such excellent performance is not obtained by chance. Moreover, it can be concluded that NerLTR-DTA can provide accurate ranking lists for the relevant results of most queries through the statistics of the association relationship of each query drug (or query protein). In general, NerLTR-DTA is a powerful tool for predicting drug-target associations and can contribute to new drug discovery and drug repurposing. AVAILABILITY AND IMPLEMENTATION The proposed method is implemented in Python and Java. Source codes and datasets are available at https://github.com/RUXIAOQING964914140/NerLTR-DTA.
Collapse
Affiliation(s)
- Xiaoqing Ru
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324000, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324000, China
| |
Collapse
|
15
|
Cong X, Ren W, Pacalon J, Xu R, Xu L, Li X, de March CA, Matsunami H, Yu H, Yu Y, Golebiowski J. Large-Scale G Protein-Coupled Olfactory Receptor-Ligand Pairing. ACS CENTRAL SCIENCE 2022; 8:379-387. [PMID: 35350604 PMCID: PMC8949627 DOI: 10.1021/acscentsci.1c01495] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Indexed: 05/22/2023]
Abstract
G protein-coupled receptors (GPCRs) conserve common structural folds and activation mechanisms, yet their ligand spectra and functions are highly diverse. This work investigated how the amino-acid sequences of olfactory receptors (ORs)-the largest GPCR family-encode diversified responses to various ligands. We established a proteochemometric (PCM) model based on OR sequence similarities and ligand physicochemical features to predict OR responses to odorants using supervised machine learning. The PCM model was constructed with the aid of site-directed mutagenesis, in vitro functional assays, and molecular simulations. We found that the ligand selectivity of the ORs is mostly encoded in the residues up to 8 Å around the orthosteric pocket. Subsequent predictions using Random Forest (RF) showed a hit rate of up to 58%, as assessed by in vitro functional assays of 111 ORs and 7 odorants of distinct scaffolds. Sixty-four new OR-odorant pairs were discovered, and 25 ORs were deorphanized here. The best model demonstrated a 56% deorphanization rate. The PCM-RF approach will accelerate OR-odorant mapping and OR deorphanization.
Collapse
Affiliation(s)
- Xiaojing Cong
- Université
Côte d’Azur, CNRS, Institut de Chimie de Nice UMR7272, Nice 06108, France
- E-mail:
| | - Wenwen Ren
- Institutes
of Biomedical Sciences, Fudan University, Shanghai 200031, People’s Republic of China
| | - Jody Pacalon
- Université
Côte d’Azur, CNRS, Institut de Chimie de Nice UMR7272, Nice 06108, France
| | - Rui Xu
- School
of Life Sciences, Shanghai University, Shanghai 200444, People’s Republic of China
| | - Lun Xu
- Ear,
Nose & Throat Institute, Department of Otolaryngology, Eye, Ear,
Nose & Throat Hospital, Fudan University, Shanghai 200031, People’s Republic of China
| | - Xuewen Li
- School
of Life Sciences, Shanghai University, Shanghai 200444, People’s Republic of China
| | - Claire A. de March
- Department
of Molecular Genetics and Microbiology, and Department of Neurobiology,
and Duke Institute for Brain Sciences, Duke
University Medical Center, Research Drive, Durham, North Carolina 27710, United States
| | - Hiroaki Matsunami
- Department
of Molecular Genetics and Microbiology, and Department of Neurobiology,
and Duke Institute for Brain Sciences, Duke
University Medical Center, Research Drive, Durham, North Carolina 27710, United States
| | - Hongmeng Yu
- Ear,
Nose & Throat Institute, Department of Otolaryngology, Eye, Ear,
Nose & Throat Hospital, Fudan University, Shanghai 200031, People’s Republic of China
- Clinical
and Research Center for Olfactory Disorders, Eye, Ear, Nose &
Throat Hospital, Fudan University, Shanghai 200031, People’s Republic of China
- Research
Units of New Technologies of Endoscopic Surgery in Skull Base Tumor,
Chinese Academy of Medical Sciences, Beijing 100730, People’s
Republic of China
| | - Yiqun Yu
- Ear,
Nose & Throat Institute, Department of Otolaryngology, Eye, Ear,
Nose & Throat Hospital, Fudan University, Shanghai 200031, People’s Republic of China
- Clinical
and Research Center for Olfactory Disorders, Eye, Ear, Nose &
Throat Hospital, Fudan University, Shanghai 200031, People’s Republic of China
- E-mail:
| | - Jérôme Golebiowski
- Université
Côte d’Azur, CNRS, Institut de Chimie de Nice UMR7272, Nice 06108, France
- Department
of Brain and Cognitive Sciences, Daegu Gyeongbuk
Institute of Science and Technology, Daegu 711-873, South Korea
- E-mail:
| |
Collapse
|
16
|
Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning. Sci Rep 2022; 12:4751. [PMID: 35306525 PMCID: PMC8934358 DOI: 10.1038/s41598-022-08787-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 03/08/2022] [Indexed: 11/21/2022] Open
Abstract
Drug-target interaction (DTI) prediction plays a crucial role in drug repositioning and virtual drug screening. Most DTI prediction methods cast the problem as a binary classification task to predict if interactions exist or as a regression task to predict continuous values that indicate a drug's ability to bind to a specific target. The regression-based methods provide insight beyond the binary relationship. However, most of these methods require the three-dimensional (3D) structural information of targets which are still not generally available to the targets. Despite this bottleneck, only a few methods address the drug-target binding affinity (DTBA) problem from a non-structure-based approach to avoid the 3D structure limitations. Here we propose Affinity2Vec, as a novel regression-based method that formulates the entire task as a graph-based problem. To develop this method, we constructed a weighted heterogeneous graph that integrates data from several sources, including drug-drug similarity, target-target similarity, and drug-target binding affinities. Affinity2Vec further combines several computational techniques from feature representation learning, graph mining, and machine learning to generate or extract features, build the model, and predict the binding affinity between the drug and the target with no 3D structural data. We conducted extensive experiments to evaluate and demonstrate the robustness and efficiency of the proposed method on benchmark datasets used in state-of-the-art non-structured-based drug-target binding affinity studies. Affinity2Vec showed superior and competitive results compared to the state-of-the-art methods based on several evaluation metrics, including mean squared error, rm2, concordance index, and area under the precision-recall curve.
Collapse
|
17
|
Yu L, Xue L, Liu F, Li Y, Jing R, Luo J. The applications of deep learning algorithms on in silico druggable proteins identification. J Adv Res 2022; 41:219-231. [PMID: 36328750 PMCID: PMC9637576 DOI: 10.1016/j.jare.2022.01.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 12/21/2021] [Accepted: 01/18/2022] [Indexed: 11/20/2022] Open
Abstract
We developed the first deep learning-based druggable protein classifier for fast and accurate identification of potential druggable proteins. Experimental results on a standard dataset demonstrate that the prediction performance of deep learning model is comparable to those of existing methods. We visualized the representations of druggable proteins learned by deep learning models, which helps us understand how they work. Our analysis reconfirms that the attention mechanism is especially useful for explaining deep learning models.
Introduction The top priority in drug development is to identify novel and effective drug targets. In vitro assays are frequently used for this purpose; however, traditional experimental approaches are insufficient for large-scale exploration of novel drug targets, as they are expensive, time-consuming and laborious. Therefore, computational methods have emerged in recent decades as an alternative to aid experimental drug discovery studies by developing sophisticated predictive models to estimate unknown drugs/compounds and their targets. The recent success of deep learning (DL) techniques in machine learning and artificial intelligence has further attracted a great deal of attention in the biomedicine field, including computational drug discovery. Objectives This study focuses on the practical applications of deep learning algorithms for predicting druggable proteins and proposes a powerful predictor for fast and accurate identification of potential drug targets. Methods Using a gold-standard dataset, we explored several typical protein features and different deep learning algorithms and evaluated their performance in a comprehensive way. We provide an overview of the entire experimental process, including protein features and descriptors, neural network architectures, libraries and toolkits for deep learning modelling, performance evaluation metrics, model interpretation and visualization. Results Experimental results show that the hybrid model (architecture: CNN-RNN (BiLSTM) + DNN; feature: dictionary encoding + DC_TC_CTD) performed better than the other models on the benchmark dataset. This hybrid model was able to achieve 90.0% accuracy and 0.800 MCC on the test dataset and 84.8% and 0.703 on a nonredundant independent test dataset, which is comparable to those of existing methods. Conclusion We developed the first deep learning-based classifier for fast and accurate identification of potential druggable proteins. We hope that this study will be helpful for future researchers who would like to use deep learning techniques to develop relevant predictive models.
Collapse
|
18
|
The Discovery of New Drug-Target Interactions for Breast Cancer Treatment. Molecules 2021; 26:molecules26247474. [PMID: 34946556 PMCID: PMC8704452 DOI: 10.3390/molecules26247474] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 12/07/2021] [Accepted: 12/07/2021] [Indexed: 01/09/2023] Open
Abstract
Drug–target interaction (DTIs) prediction plays a vital role in probing new targets for breast cancer research. Considering the multifaceted challenges associated with experimental methods identifying DTIs, the in silico prediction of such interactions merits exploration. In this study, we develop a feature-based method to infer unknown DTIs, called PsePDC-DTIs, which fuses information regarding protein sequences extracted by pseudo-position specific scoring matrix (PsePSSM), detrended cross-correlation analysis coefficient (DCCA coefficient), and an FP2 format molecular fingerprint descriptor of drug compounds. In addition, the synthetic minority oversampling technique (SMOTE) is employed for dealing with the imbalanced data after Lasso dimensionality reduction. Then, the processed feature vectors are put into a random forest classifier to perform DTIs predictions on four gold standard datasets, including nuclear receptors (NR), G-protein-coupled receptors (GPCR), ion channels (IC), and enzymes (E). Furthermore, we explore new targets for breast cancer treatment using its risk genes identified from large-scale genome-wide genetic studies using PsePDC-DTIs. Through five-fold cross-validation, the average values of accuracy in NR, GPCR, IC, and E datasets are 95.28%, 96.19%, 96.74%, and 98.22%, respectively. The PsePDC-DTIs model provides us with 10 potential DTIs for breast cancer treatment, among which erlotinib (DB00530) and FGFR2 (hsa2263), caffeine (DB00201) and KCNN4 (hsa3783), as well as afatinib (DB08916) and FGFR2 (hsa2263) are found with direct or inferred evidence. The PsePDC-DTIs model has achieved good prediction results, establishing the validity and superiority of the proposed method.
Collapse
|
19
|
Sorkhi AG, Abbasi Z, Mobarakeh MI, Pirgazi J. Drug-target interaction prediction using unifying of graph regularized nuclear norm with bilinear factorization. BMC Bioinformatics 2021; 22:555. [PMID: 34789169 PMCID: PMC8597250 DOI: 10.1186/s12859-021-04464-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 10/29/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Wet-lab experiments for identification of interactions between drugs and target proteins are time-consuming, costly and labor-intensive. The use of computational prediction of drug-target interactions (DTIs), which is one of the significant points in drug discovery, has been considered by many researchers in recent years. It also reduces the search space of interactions by proposing potential interaction candidates. RESULTS In this paper, a new approach based on unifying matrix factorization and nuclear norm minimization is proposed to find a low-rank interaction. In this combined method, to solve the low-rank matrix approximation, the terms in the DTI problem are used in such a way that the nuclear norm regularized problem is optimized by a bilinear factorization based on Rank-Restricted Soft Singular Value Decomposition (RRSSVD). In the proposed method, adjacencies between drugs and targets are encoded by graphs. Drug-target interaction, drug-drug similarity, target-target, and combination of similarities have also been used as input. CONCLUSIONS The proposed method is evaluated on four benchmark datasets known as Enzymes (E), Ion channels (ICs), G protein-coupled receptors (GPCRs) and nuclear receptors (NRs) based on AUC, AUPR, and time measure. The results show an improvement in the performance of the proposed method compared to the state-of-the-art techniques.
Collapse
Affiliation(s)
- Ali Ghanbari Sorkhi
- Faculty of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, P.O. Box, 48518-78195 Behshahr, Iran
| | - Zahra Abbasi
- School of Medicine, Faculty of Medical Biotechnology, Shahroud University of Medical Sciences, Shahroud, Iran
| | | | - Jamshid Pirgazi
- Faculty of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, P.O. Box, 48518-78195 Behshahr, Iran
| |
Collapse
|
20
|
Zhang Y, Jiang Z, Chen C, Wei Q, Gu H, Yu B. DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier. Interdiscip Sci 2021; 14:311-330. [PMID: 34731411 DOI: 10.1007/s12539-021-00488-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Revised: 10/19/2021] [Accepted: 10/21/2021] [Indexed: 12/12/2022]
Abstract
Accurate prediction of drug-target interactions (DTIs), which is often used in the fields of drug discovery and drug repositioning, is regarded a key challenge in the study of drug science. In this paper, a new method called DeepStack-DTIs is proposed to predict DTIs. First, for the target protein, pseudo-position specific score matrix, pseudo amino acid composition and SPIDER3 are used to extract the different feature information of the target protein. Meanwhile, the path-based fingerprint features of each drug are extracted. Then, the synthetic minority oversampling technique (SMOTE) and light gradient boosting machine (LightGBM) are used for data balancing and feature selection, respectively. Finally, the processed features are input to the deep-stacked ensemble classifier composed of gated recurrent unit (GRU), deep neural network (DNN), support vector machine (SVM), eXtreme gradient boosting (XGBoost) and logistic regression (LR) to predict DTIs. Under the five-fold cross-validation and compared with existing methods, the proposed method achieves higher prediction accuracy on the gold standard dataset. To evaluate the predictive power of DeepStack-DTIs, we validate the method on another dataset and predict the drug-target interaction network. The results indicate that DeepStack-DTIs has excellent predictive ability than the other methods, and provides novel insights for the prediction of DTIs. A novel method DeepStack-DTIs for drug-target interactions prediction. PsePSSM, PseAAC, SPIDER3 and FP2 are fused to convert protein sequence and drug molecule information into digital information, respectively. The SMOTE algorithm is used to balance the dataset and LightGBM feature selection algorithm is employed to remove redundant and irrelevant features to select the optimal feature subset. This optimal feature subset is inputted into the deep-stacked ensemble classifier to predict drug-target interactions. The experimental results show DeepStack-DTIs method can significantly improve the prediction accuracy of drug-target interactions.
Collapse
Affiliation(s)
- Yan Zhang
- College of Mechanical and Electrical Engineering, Qingdao University of Science and Technology, Qingdao, 266061, China.,College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Zhiwen Jiang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Cheng Chen
- School of Computer Science and Technology, Shandong University, Qingdao, 266237, China
| | - Qinqin Wei
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Haiming Gu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China. .,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China. .,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, 571158, China.
| |
Collapse
|
21
|
Thafar MA, Olayan RS, Albaradei S, Bajic VB, Gojobori T, Essack M, Gao X. DTi2Vec: Drug-target interaction prediction using network embedding and ensemble learning. J Cheminform 2021; 13:71. [PMID: 34551818 PMCID: PMC8459562 DOI: 10.1186/s13321-021-00552-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Accepted: 09/05/2021] [Indexed: 11/21/2022] Open
Abstract
Drug-target interaction (DTI) prediction is a crucial step in drug discovery and repositioning as it reduces experimental validation costs if done right. Thus, developing in-silico methods to predict potential DTI has become a competitive research niche, with one of its main focuses being improving the prediction accuracy. Using machine learning (ML) models for this task, specifically network-based approaches, is effective and has shown great advantages over the other computational methods. However, ML model development involves upstream hand-crafted feature extraction and other processes that impact prediction accuracy. Thus, network-based representation learning techniques that provide automated feature extraction combined with traditional ML classifiers dealing with downstream link prediction tasks may be better-suited paradigms. Here, we present such a method, DTi2Vec, which identifies DTIs using network representation learning and ensemble learning techniques. DTi2Vec constructs the heterogeneous network, and then it automatically generates features for each drug and target using the nodes embedding technique. DTi2Vec demonstrated its ability in drug-target link prediction compared to several state-of-the-art network-based methods, using four benchmark datasets and large-scale data compiled from DrugBank. DTi2Vec showed a statistically significant increase in the prediction performances in terms of AUPR. We verified the "novel" predicted DTIs using several databases and scientific literature. DTi2Vec is a simple yet effective method that provides high DTI prediction performance while being scalable and efficient in computation, translating into a powerful drug repositioning tool.
Collapse
Affiliation(s)
- Maha A Thafar
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
- College of Computers and Information Technology, Computer Science Department, Taif University, Taif, Kingdom of Saudi Arabia
| | - Rawan S Olayan
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Somayah Albaradei
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Kingdom of Saudi Arabia
| | - Vladimir B Bajic
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia.
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia.
| |
Collapse
|
22
|
Mathai N, Chen Y, Kirchmair J. Validation strategies for target prediction methods. Brief Bioinform 2021; 21:791-802. [PMID: 31220208 PMCID: PMC7299289 DOI: 10.1093/bib/bbz026] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 01/14/2019] [Accepted: 02/17/2019] [Indexed: 12/11/2022] Open
Abstract
Computational methods for target prediction, based on molecular similarity and network-based approaches, machine learning, docking and others, have evolved as valuable and powerful tools to aid the challenging task of mode of action identification for bioactive small molecules such as drugs and drug-like compounds. Critical to discerning the scope and limitations of a target prediction method is understanding how its performance was evaluated and reported. Ideally, large-scale prospective experiments are conducted to validate the performance of a model; however, this expensive and time-consuming endeavor is often not feasible. Therefore, to estimate the predictive power of a method, statistical validation based on retrospective knowledge is commonly used. There are multiple statistical validation techniques that vary in rigor. In this review we discuss the validation strategies employed, highlighting the usefulness and constraints of the validation schemes and metrics that are employed to measure and describe performance. We address the limitations of measuring only generalized performance, given that the underlying bioactivity and structural data are biased towards certain small-molecule scaffolds and target families, and suggest additional aspects of performance to consider in order to produce more detailed and realistic estimates of predictive power. Finally, we describe the validation strategies that were employed by some of the most thoroughly validated and accessible target prediction methods.
Collapse
Affiliation(s)
- Neann Mathai
- Department of Chemistry, University of Bergen, Bergen, Norway.,Computational Biology Unit (CBU), University of Bergen, Bergen, Norway.,Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany
| | - Ya Chen
- Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany
| | - Johannes Kirchmair
- Department of Chemistry, University of Bergen, Bergen, Norway.,Computational Biology Unit (CBU), University of Bergen, Bergen, Norway.,Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany
| |
Collapse
|
23
|
Zhang S, Wang J, Lin Z, Liang Y. Application of Machine Learning Techniques in Drug-target Interactions Prediction. Curr Pharm Des 2021; 27:2076-2087. [PMID: 33238865 DOI: 10.2174/1381612826666201125105730] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Accepted: 08/06/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Drug-Target interactions are vital for drug design and drug repositioning. However, traditional lab experiments are both expensive and time-consuming. Various computational methods which applied machine learning techniques performed efficiently and effectively in the field. RESULTS The machine learning methods can be divided into three categories basically: Supervised methods, Semi-Supervised methods and Unsupervised methods. We reviewed recent representative methods applying machine learning techniques of each category in DTIs and summarized a brief list of databases frequently used in drug discovery. In addition, we compared the advantages and limitations of these methods in each category. CONCLUSION Every prediction model has both strengths and weaknesses and should be adopted in proper ways. Three major problems in DTIs prediction including the lack of nonreactive drug-target pairs data sets, over optimistic results due to the biases and the exploiting of regression models on DTIs prediction should be seriously considered.
Collapse
Affiliation(s)
- Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Jiesheng Wang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Zhenhui Lin
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Yunyun Liang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| |
Collapse
|
24
|
Li P, Wang J, Qiao Y, Chen H, Yu Y, Yao X, Gao P, Xie G, Song S. An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Brief Bioinform 2021; 22:6262238. [PMID: 33940598 DOI: 10.1093/bib/bbab109] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 03/06/2021] [Accepted: 03/12/2021] [Indexed: 11/13/2022] Open
Abstract
How to produce expressive molecular representations is a fundamental challenge in artificial intelligence-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and poor generalization capability. Here, we propose a novel molecular pre-training graph-based deep learning framework, named MPG, that learns molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemical insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 14 benchmark datasets. The pre-trained MolGNet in MPG has the potential to become an advanced molecular encoder in the drug discovery pipeline.
Collapse
Affiliation(s)
- Pengyong Li
- Department of Biomedical Engineering at Tsinghua University, China
| | - Jun Wang
- Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China
| | - Yixuan Qiao
- Operations Research and Cybernetics at Beijing University of Technology, China
| | - Hao Chen
- Cybernetics at Beijing University of Technology, China
| | - Yihuan Yu
- Beijing University of Biomedical Engineering, China
| | - Xiaojun Yao
- Analytical Chemistry and Chemoinformatics at Lanzhou University, China
| | - Peng Gao
- Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China
| | - Guotong Xie
- Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China
| | - Sen Song
- Tsinghua Laboratory of Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Haidian, 100084 Beijing, China
| |
Collapse
|
25
|
Mahmud SMH, Chen W, Liu Y, Awal MA, Ahmed K, Rahman MH, Moni MA. PreDTIs: prediction of drug-target interactions based on multiple feature information using gradient boosting framework with data balancing and feature selection techniques. Brief Bioinform 2021; 22:6168499. [PMID: 33709119 PMCID: PMC7989622 DOI: 10.1093/bib/bbab046] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 01/25/2021] [Accepted: 01/29/2021] [Indexed: 12/13/2022] Open
Abstract
Discovering drug–target (protein) interactions (DTIs) is of great significance for researching and developing novel drugs, having a tremendous advantage to pharmaceutical industries and patients. However, the prediction of DTIs using wet-lab experimental methods is generally expensive and time-consuming. Therefore, different machine learning-based methods have been developed for this purpose, but there are still substantial unknown interactions needed to discover. Furthermore, data imbalance and feature dimensionality problems are a critical challenge in drug-target datasets, which can decrease the classifier performances that have not been significantly addressed yet. This paper proposed a novel drug–target interaction prediction method called PreDTIs. First, the feature vectors of the protein sequence are extracted by the pseudo-position-specific scoring matrix (PsePSSM), dipeptide composition (DC) and pseudo amino acid composition (PseAAC); and the drug is encoded with MACCS substructure fingerings. Besides, we propose a FastUS algorithm to handle the class imbalance problem and also develop a MoIFS algorithm to remove the irrelevant and redundant features for getting the best optimal features. Finally, balanced and optimal features are provided to the LightGBM Classifier to identify DTIs, and the 5-fold CV validation test method was applied to evaluate the prediction ability of the proposed method. Prediction results indicate that the proposed model PreDTIs is significantly superior to other existing methods in predicting DTIs, and our model could be used to discover new drugs for unknown disorders or infections, such as for the coronavirus disease 2019 using existing drugs compounds and severe acute respiratory syndrome coronavirus 2 protein sequences.
Collapse
Affiliation(s)
- S M Hasan Mahmud
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Wenyu Chen
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Yongsheng Liu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Md Abdul Awal
- Electronics and Communication Engineering Discipline, Khulna University, Khulna 9208, Bangladesh
| | - Kawsar Ahmed
- Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail-1902, Bangladesh
| | - Md Habibur Rahman
- Department of Computer Science and Engineering, Islamic University, Kushtia-7003, Bangladesh
| | - Mohammad Ali Moni
- UNSW Digital Health, WHO Center for eHealth, School of Public Health and Community Medicine, Faculty of Medicine, The University of New South Wales, Sydney, Australia
| |
Collapse
|
26
|
Wang C, Kurgan L. Survey of Similarity-Based Prediction of Drug-Protein Interactions. Curr Med Chem 2021; 27:5856-5886. [PMID: 31393241 DOI: 10.2174/0929867326666190808154841] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Revised: 04/16/2018] [Accepted: 10/23/2018] [Indexed: 12/20/2022]
Abstract
Therapeutic activity of a significant majority of drugs is determined by their interactions with proteins. Databases of drug-protein interactions (DPIs) primarily focus on the therapeutic protein targets while the knowledge of the off-targets is fragmented and partial. One way to bridge this knowledge gap is to employ computational methods to predict protein targets for a given drug molecule, or interacting drugs for given protein targets. We survey a comprehensive set of 35 methods that were published in high-impact venues and that predict DPIs based on similarity between drugs and similarity between protein targets. We analyze the internal databases of known PDIs that these methods utilize to compute similarities, and investigate how they are linked to the 12 publicly available source databases. We discuss contents, impact and relationships between these internal and source databases, and well as the timeline of their releases and publications. The 35 predictors exploit and often combine three types of similarities that consider drug structures, drug profiles, and target sequences. We review the predictive architectures of these methods, their impact, and we explain how their internal DPIs databases are linked to the source databases. We also include a detailed timeline of the development of these predictors and discuss the underlying limitations of the current resources and predictive tools. Finally, we provide several recommendations concerning the future development of the related databases and methods.
Collapse
Affiliation(s)
- Chen Wang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| |
Collapse
|
27
|
Li P, Li Y, Hsieh CY, Zhang S, Liu X, Liu H, Song S, Yao X. TrimNet: learning molecular representation from triplet messages for biomedicine. Brief Bioinform 2020; 22:5955940. [PMID: 33147620 DOI: 10.1093/bib/bbaa266] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 09/11/2020] [Accepted: 09/14/2020] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Computational methods accelerate drug discovery and play an important role in biomedicine, such as molecular property prediction and compound-protein interaction (CPI) identification. A key challenge is to learn useful molecular representation. In the early years, molecular properties are mainly calculated by quantum mechanics or predicted by traditional machine learning methods, which requires expert knowledge and is often labor-intensive. Nowadays, graph neural networks have received significant attention because of the powerful ability to learn representation from graph data. Nevertheless, current graph-based methods have some limitations that need to be addressed, such as large-scale parameters and insufficient bond information extraction. RESULTS In this study, we proposed a graph-based approach and employed a novel triplet message mechanism to learn molecular representation efficiently, named triplet message networks (TrimNet). We show that TrimNet can accurately complete multiple molecular representation learning tasks with significant parameter reduction, including the quantum properties, bioactivity, physiology and CPI prediction. In the experiments, TrimNet outperforms the previous state-of-the-art method by a significant margin on various datasets. Besides the few parameters and high prediction accuracy, TrimNet could focus on the atoms essential to the target properties, providing a clear interpretation of the prediction tasks. These advantages have established TrimNet as a powerful and useful computational tool in solving the challenging problem of molecular representation learning. AVAILABILITY The quantum and drug datasets are available on the website of MoleculeNet: http://moleculenet.ai. The source code is available in GitHub: https://github.com/yvquanli/trimnet. CONTACT xjyao@lzu.edu.cn, songsen@tsinghua.edu.cn.
Collapse
Affiliation(s)
- Pengyong Li
- Department of Biomedical Engineering at Tsinghua University
| | - Yuquan Li
- College of Chemistry and Chemical Engineering at Lanzhou University
| | | | | | | | | | | | | |
Collapse
|
28
|
Hasan Mahmud SM, Chen W, Jahan H, Dai B, Din SU, Dzisoo AM. DeepACTION: A deep learning-based method for predicting novel drug-target interactions. Anal Biochem 2020; 610:113978. [PMID: 33035462 DOI: 10.1016/j.ab.2020.113978] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2020] [Revised: 09/23/2020] [Accepted: 09/25/2020] [Indexed: 12/13/2022]
Abstract
Drug-target interactions (DTIs) play a key role in drug development and discovery processes. Wet lab prediction of DTIs is time-consuming, expensive, and tedious. Fortunately, computational approaches can identify new interactions (drug-target pairs) and accelerate the process of drug repurposing. However, a vast number of interactions remain undiscovered; therefore, we proposed a deep learning-based method (deepACTION) for predicting potential or unknown DTIs. Here, each drug chemical structure and protein sequence are transformed according to structural and sequence information using different descriptors to represent their features correctly. There have been some challenges, such as the high dimensionality and class imbalance of data during the prediction process. To address these problems, we developed the MMIB technique to balance the majority and minority instances in the dataset and utilized a LASSO model to handle the high dimensionality of the data. In addition, we trained the convolutional neural network algorithm with balanced and reduced features for accurate prediction of DTIs. In this study, the AUC is considered a primary evaluation metric for comparing the performance of the deep ACTION model with that of existing methods by a 5-fold cross-validation test. Our experiential dataset obtained from the DrugBank database and our deepACTION model achieved an AUC of 0.9836 for this dataset. The experimental results ensured that the model can predict significant numbers of new DTIs and provide complete information to motivate scientists to develop drugs.
Collapse
Affiliation(s)
- S M Hasan Mahmud
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Wenyu Chen
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China.
| | - Hosney Jahan
- College of Computer Science, Sichuan University, Chengdu, 610065, China
| | - Bo Dai
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Salah Ud Din
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Anthony Mackitz Dzisoo
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 611731, China
| |
Collapse
|
29
|
Eslami Manoochehri H, Nourani M. Drug-target interaction prediction using semi-bipartite graph model and deep learning. BMC Bioinformatics 2020; 21:248. [PMID: 32631230 PMCID: PMC7336396 DOI: 10.1186/s12859-020-3518-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Identifying drug-target interaction is a key element in drug discovery. In silico prediction of drug-target interaction can speed up the process of identifying unknown interactions between drugs and target proteins. In recent studies, handcrafted features, similarity metrics and machine learning methods have been proposed for predicting drug-target interactions. However, these methods cannot fully learn the underlying relations between drugs and targets. In this paper, we propose anew framework for drug-target interaction prediction that learns latent features from drug-target interaction network. RESULTS We present a framework to utilize the network topology and identify interacting and non-interacting drug-target pairs. We model the problem as a semi-bipartite graph in which we are able to use drug-drug and protein-protein similarity in a drug-protein network. We have then used a graph labeling method for vertex ordering in our graph embedding process. Finally, we employed deep neural network to learn the complex pattern of interacting pairs from embedded graphs. We show our approach is able to learn sophisticated drug-target topological features and outperforms other state-of-the-art approaches. CONCLUSIONS The proposed learning model on semi-bipartite graph model, can integrate drug-drug and protein-protein similarities which are semantically different than drug-protein information in a drug-target interaction network. We show our model can determine interaction likelihood for each drug-target pair and outperform other heuristics.
Collapse
Affiliation(s)
- Hafez Eslami Manoochehri
- Department of Electrical and Computer Engineering, The University of Texas at Dallas, 800 W Campbell Rd, Richardson, TX, 75080, USA
| | - Mehrdad Nourani
- Department of Electrical and Computer Engineering, The University of Texas at Dallas, 800 W Campbell Rd, Richardson, TX, 75080, USA.
| |
Collapse
|
30
|
Zhao Z, Qin J, Gou Z, Zhang Y, Yang Y. Multi-task learning models for predicting active compounds. J Biomed Inform 2020; 108:103484. [PMID: 32615159 DOI: 10.1016/j.jbi.2020.103484] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 05/29/2020] [Accepted: 06/09/2020] [Indexed: 01/21/2023]
Abstract
The computational drug discovery methods can find potential drug-target interactions more efficiently and have been widely studied over past few decades. Such methods explore the relationship between the structural properties of compounds and their biological activity with the assumption that similar compounds tend to share similar biological targets and vice versa. However, traditional Quantitative Structure - Activity Relationship (QSAR) methods often do not have desired accuracy due to insufficient data of compound activity. In this paper, we focus on building Multi-Task Learning (MTL)-based QSAR models by considering multiple similar biological targets together and make shared information transfer across from one task to another, thereby improving not only the learning efficiency, but also the prediction accuracy. This paper selects 6 assay groups with similar biological targets from PubChem and builds their QSAR models with MTL simultaneously. According to the experiment results, our MTL-based QSAR models have better performance over traditional prominent machine learning algorithms and the improvements are even more obvious when other baseline models have low accuracy. The superiority of our models is also proved by Student's t-test with level of significance 5%. Moreover, this paper also explores three different assumptions on the underlying pattern in the dataset and finds that the joint feature MTL models further improve the performance of the QSAR models and are more suitable for building QSAR models for multiple similar biological targets.
Collapse
Affiliation(s)
- Zhili Zhao
- School of Information Science and Engineering, Lanzhou University, 730000 Lanzhou, China.
| | - Jian Qin
- School of Information Science and Engineering, Lanzhou University, 730000 Lanzhou, China
| | - Zhuoyue Gou
- School of Information Science and Engineering, Lanzhou University, 730000 Lanzhou, China
| | - Yanan Zhang
- School of Information Science and Engineering, Lanzhou University, 730000 Lanzhou, China
| | - Yi Yang
- School of Information Science and Engineering, Lanzhou University, 730000 Lanzhou, China
| |
Collapse
|
31
|
Kaushik AC, Mehmood A, Dai X, Wei DQ. A comparative chemogenic analysis for predicting Drug-Target Pair via Machine Learning Approaches. Sci Rep 2020; 10:6870. [PMID: 32322011 PMCID: PMC7176722 DOI: 10.1038/s41598-020-63842-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 04/04/2020] [Indexed: 12/26/2022] Open
Abstract
A computational technique for predicting the DTIs has now turned out to be an indispensable job during the process of drug finding. It tapers the exploration room for interactions by propounding possible interaction contenders for authentication through experiments of wet-lab which are known for their expensiveness and time consumption. Chemogenomics, an emerging research area focused on the systematic examination of the biological impact of a broad series of minute molecular-weighting ligands on a broad raiment of macromolecular target spots. Additionally, with the advancement in time, the complexity of the algorithms is increasing which may result in the entry of big data technologies like Spark in this field soon. In the presented work, we intend to offer an inclusive idea and realistic evaluation of the computational Drug Target Interaction projection approaches, to perform as a guide and reference for researchers who are carrying out work in a similar direction. Precisely, we first explain the data utilized in computational Drug Target Interaction prediction attempts like this. We then sort and explain the best and most modern techniques for the prediction of DTIs. Then, a realistic assessment is executed to show the projection performance of several illustrative approaches in various situations. Ultimately, we underline possible opportunities for additional improvement of Drug Target Interaction projection enactment and also linked study objectives.
Collapse
Affiliation(s)
- Aman Chandra Kaushik
- Wuxi School of Medicine, Jiangnan University, Wuxi, China.
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.
| | - Aamir Mehmood
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Xiaofeng Dai
- Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Dong-Qing Wei
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.
| |
Collapse
|
32
|
Hao M, Bryant SH, Wang Y. Open-source chemogenomic data-driven algorithms for predicting drug-target interactions. Brief Bioinform 2020; 20:1465-1474. [PMID: 29420684 DOI: 10.1093/bib/bby010] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 01/18/2018] [Indexed: 12/25/2022] Open
Abstract
While novel technologies such as high-throughput screening have advanced together with significant investment by pharmaceutical companies during the past decades, the success rate for drug development has not yet been improved prompting researchers looking for new strategies of drug discovery. Drug repositioning is a potential approach to solve this dilemma. However, experimental identification and validation of potential drug targets encoded by the human genome is both costly and time-consuming. Therefore, effective computational approaches have been proposed to facilitate drug repositioning, which have proved to be successful in drug discovery. Doubtlessly, the availability of open-accessible data from basic chemical biology research and the success of human genome sequencing are crucial to develop effective in silico drug repositioning methods allowing the identification of potential targets for existing drugs. In this work, we review several chemogenomic data-driven computational algorithms with source codes publicly accessible for predicting drug-target interactions (DTIs). We organize these algorithms by model properties and model evolutionary relationships. We re-implemented five representative algorithms in R programming language, and compared these algorithms by means of mean percentile ranking, a new recall-based evaluation metric in the DTI prediction research field. We anticipate that this review will be objective and helpful to researchers who would like to further improve existing algorithms or need to choose appropriate algorithms to infer potential DTIs in the projects. The source codes for DTI predictions are available at: https://github.com/minghao2016/chemogenomicAlg4DTIpred.
Collapse
|
33
|
Rayhan F, Ahmed S, Mousavian Z, Farid DM, Shatabda S. FRnet-DTI: Deep convolutional neural network for drug-target interaction prediction. Heliyon 2020; 6:e03444. [PMID: 32154410 PMCID: PMC7052404 DOI: 10.1016/j.heliyon.2020.e03444] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 06/16/2019] [Accepted: 02/14/2020] [Indexed: 01/09/2023] Open
Abstract
The task of drug-target interaction prediction holds significant importance in pharmacology and therapeutic drug design. In this paper, we present FRnet-DTI, an auto-encoder based feature manipulation and a convolutional neural network based classifier for drug target interaction prediction. Two convolutional neural networks are proposed: FRnet-Encode and FRnet-Predict. Here, one model is used for feature manipulation and the other one for classification. Using the first method FRnet-Encode, we generate 4096 features for each of the instances in each of the datasets and use the second method, FRnet-Predict, to identify interaction probability employing those features. We have tested our method on four gold standard datasets extensively used by other researchers. Experimental results shows that our method significantly improves over the state-of-the-art method on three out of four drug-target interaction gold standard datasets on both area under curve for Receiver Operating Characteristic (auROC) and area under Precision Recall curve (auPR) metric. We also introduce twenty new potential drug-target pairs for interaction based on high prediction scores. The source codes and implementation details of our methods are available from https://github.com/farshidrayhanuiu/FRnet-DTI/ and also readily available to use as an web application from http://farshidrayhan.pythonanywhere.com/FRnet-DTI/.
Collapse
Affiliation(s)
- Farshid Rayhan
- Department of Computer Science and Engineering, United International University, Plot 2, United City, Madani Avenue, Satarkul, Badda, Dhaka-1212, Bangladesh
| | - Sajid Ahmed
- Department of Computer Science and Engineering, United International University, Plot 2, United City, Madani Avenue, Satarkul, Badda, Dhaka-1212, Bangladesh
| | - Zaynab Mousavian
- School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran
| | - Dewan Md Farid
- Department of Computer Science and Engineering, United International University, Plot 2, United City, Madani Avenue, Satarkul, Badda, Dhaka-1212, Bangladesh
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Plot 2, United City, Madani Avenue, Satarkul, Badda, Dhaka-1212, Bangladesh
| |
Collapse
|
34
|
Redkar S, Mondal S, Joseph A, Hareesha KS. A Machine Learning Approach for Drug-target Interaction Prediction using Wrapper Feature Selection and Class Balancing. Mol Inform 2020; 39:e1900062. [PMID: 32003548 DOI: 10.1002/minf.201900062] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 01/28/2020] [Indexed: 01/19/2023]
Abstract
Drug-Target interaction (DTI) plays a crucial role in drug discovery, drug repositioning and understanding the drug side effects which helps to identify new therapeutic profiles for various diseases. However, the exponential growth in the genomic and drugs data makes it difficult to identify the new associations between drugs and targets. Therefore, we use computational methods as it helps in accelerating the DTI identification process. Usually, available data driven sources consisting of known DTI is used to train the classifier to predict the new DTIs. Such datasets often face the problem of class imbalance. Therefore, in this study we address two challenges faced by such datasets, i. e., class imbalance and high dimensionality to develop a predictive model for DTI prediction. The study is carried out on four protein classes namely Enzyme, Ion Channel, G Protein-Coupled Receptor (GPCR) and Nuclear Receptor. We encoded the target protein sequence using the dipeptide composition and drug with a molecular descriptor. A machine learning approach is employed to predict the DTI using wrapper feature selection and synthetic minority oversampling technique (SMOTE). The ensemble approach achieved at the best an accuracy of 95.9 %, 93.4 %, 90.8 % and 90.6 % and 96.3 %, 92.8 %, 90.1 %, and 90.2 % of precision on Enzyme, Ion Channel, GPCR and Nuclear Receptor datasets, respectively, when evaluated excluding SMOTE samples with 10-fold cross validation. Furthermore, our method could predict new drug-target interactions not contained in training dataset. Selected features using wrapper feature selection may be important to understand the DTI for the protein categories under this study. Based on our evaluation, the proposed method can be used for understanding and identifying new drug-target interactions. We provide the readers with a standalone package available at https://github.com/shwetagithub1/predDTI which will be able to provide the DTI predictions to user for new query DTI pairs.
Collapse
Affiliation(s)
- Shweta Redkar
- Department of Computer Applications, Manipal Institute of Technology, Manipal Academy of Higher Education, 576104, Manipal, Karnataka, India
| | - Sukanta Mondal
- Department of Biological Sciences, Birla Institute of Technology and Science-Pilani, K.K.Birla Goa Campus, 403726, Zuarinagar, Goa, -India
| | - Alex Joseph
- Department of Pharmaceutical Chemistry, Manipal College of Pharmaceutical Sciences, Manipal Academy of Higher Education, 576104, Manipal, Karnataka, India
| | - K S Hareesha
- Department of Computer Applications, Manipal Institute of Technology, Manipal Academy of Higher Education, 576104, Manipal, Karnataka, India
| |
Collapse
|
35
|
Abstract
Background:
Identifying Drug-Target Interactions (DTIs) is a major challenge for
current drug discovery and drug repositioning. Compared to traditional experimental approaches,
in silico methods are fast and inexpensive. With the increase in open-access experimental data,
numerous computational methods have been applied to predict DTIs.
Methods:
In this study, we propose an end-to-end learning model of Factorization Machine and
Deep Neural Network (FM-DNN), which emphasizes both low-order (first or second order) and
high-order (higher than second order) feature interactions without any feature engineering other
than raw features. This approach combines the power of FM and DNN learning for feature
learning in a new neural network architecture.
Results:
The experimental DTI basic features include drug characteristics (609), target
characteristics (1819), plus drug ID, target ID, total 2430. We compare 8 models such as SVM,
GBDT, WIDE-DEEP etc, the FM-DNN algorithm model obtains the best results of AUC(0.8866)
and AUPR(0.8281).
Conclusion:
Feature engineering is a job that requires expert knowledge, it is often difficult and
time-consuming to achieve good results. FM-DNN can auto learn a lower-order expression by FM
and a high-order expression by DNN.FM-DNN model has outstanding advantages over other
commonly used models.
Collapse
Affiliation(s)
- Jihong Wang
- School of Data and Computer Science, Sun Yat-Sen University, No.132 Waihuan East Road, 510000 Guangzhou, China
| | - Hao Wang
- School of Data and Computer Science, Sun Yat-Sen University, No.132 Waihuan East Road, 510000 Guangzhou, China
| | - Xiaodan Wang
- School of Pharmaceutical Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, No. 9- 13 Wuguishan Avenue of Life Street, 528458, Zhongshan, China
| | - Huiyou Chang
- School of Data and Computer Science, Sun Yat-Sen University, No.132 Waihuan East Road, 510000 Guangzhou, China
| |
Collapse
|
36
|
Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, Najarian K. Machine learning approaches and databases for prediction of drug-target interaction: a survey paper. Brief Bioinform 2020; 22:247-269. [PMID: 31950972 PMCID: PMC7820849 DOI: 10.1093/bib/bbz157] [Citation(s) in RCA: 172] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 11/01/2019] [Accepted: 11/07/2019] [Indexed: 12/12/2022] Open
Abstract
The task of predicting the interactions between drugs and targets plays a key role in the process of drug discovery. There is a need to develop novel and efficient prediction approaches in order to avoid costly and laborious yet not-always-deterministic experiments to determine drug–target interactions (DTIs) by experiments alone. These approaches should be capable of identifying the potential DTIs in a timely manner. In this article, we describe the data required for the task of DTI prediction followed by a comprehensive catalog consisting of machine learning methods and databases, which have been proposed and utilized to predict DTIs. The advantages and disadvantages of each set of methods are also briefly discussed. Lastly, the challenges one may face in prediction of DTI using machine learning approaches are highlighted and we conclude by shedding some lights on important future research directions.
Collapse
Affiliation(s)
- Maryam Bagherian
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Elyas Sabeti
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Kai Wang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Maureen A Sartor
- Department of Pathology, University of Michigan, Ann Arbor, MI, 48109, USA
| | | | - Kayvan Najarian
- Department of Electrical Engineering and Computer Science, College of Engineering, University of Michigan, Ann Arbor, MI, 48109, USA
| |
Collapse
|
37
|
Chu Y, Kaushik AC, Wang X, Wang W, Zhang Y, Shan X, Salahub DR, Xiong Y, Wei DQ. DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Brief Bioinform 2019; 22:451-462. [PMID: 31885041 DOI: 10.1093/bib/bbz152] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2019] [Revised: 11/01/2019] [Accepted: 11/04/2019] [Indexed: 12/18/2022] Open
Abstract
Drug-target interactions (DTIs) play a crucial role in target-based drug discovery and development. Computational prediction of DTIs can effectively complement experimental wet-lab techniques for the identification of DTIs, which are typically time- and resource-consuming. However, the performances of the current DTI prediction approaches suffer from a problem of low precision and high false-positive rate. In this study, we aim to develop a novel DTI prediction method for improving the prediction performance based on a cascade deep forest (CDF) model, named DTI-CDF, with multiple similarity-based features between drugs and the similarity-based features between target proteins extracted from the heterogeneous graph, which contains known DTIs. In the experiments, we built five replicates of 10-fold cross-validation under three different experimental settings of data sets, namely, corresponding DTI values of certain drugs (SD), targets (ST), or drug-target pairs (SP) in the training sets are missed but existed in the test sets. The experimental results demonstrate that our proposed approach DTI-CDF achieves a significantly higher performance than that of the traditional ensemble learning-based methods such as random forest and XGBoost, deep neural network, and the state-of-the-art methods such as DDR. Furthermore, there are 1352 newly predicted DTIs which are proved to be correct by KEGG and DrugBank databases. The data sets and source code are freely available at https://github.com//a96123155/DTI-CDF.
Collapse
Affiliation(s)
- Yanyi Chu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | | | - Xiangeng Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | - Wei Wang
- Mathematical Sciences, Shanghai Jiao Tong University
| | - Yufang Zhang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | | | | | - Yi Xiong
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | - Dong-Qing Wei
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| |
Collapse
|
38
|
Mahmud SMH, Chen W, Meng H, Jahan H, Liu Y, Hasan SMM. Prediction of drug-target interaction based on protein features using undersampling and feature selection techniques with boosting. Anal Biochem 2019; 589:113507. [PMID: 31734254 DOI: 10.1016/j.ab.2019.113507] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Revised: 11/05/2019] [Accepted: 11/08/2019] [Indexed: 12/29/2022]
Abstract
Accurate identification of drug-target interaction (DTI) is a crucial and challenging task in the drug discovery process, having enormous benefit to the patients and pharmaceutical company. The traditional wet-lab experiments of DTI is expensive, time-consuming, and labor-intensive. Therefore, many computational techniques have been established for this purpose; although a huge number of interactions are still undiscovered. Here, we present pdti-EssB, a new computational model for identification of DTI using protein sequence and drug molecular structure. More specifically, each drug molecule is transformed as the molecular substructure fingerprint. For a protein sequence, different descriptors are utilized to represent its evolutionary, sequence, and structural information. Besides, our proposed method uses data balancing techniques to handle the imbalance problem and applies a novel feature eliminator to extract the best optimal features for accurate prediction. In this paper, four classes of DTI benchmark datasets are used to construct a predictive model with XGBoost. Here, the auROC is utilized as an evaluation metric to compare the performance of pdti-EssB method with recent methods, applying five-fold cross-validation. Finally, the experimental results indicate that our proposed method is able to outperform other approaches in predicting DTI, and introduces new drug-target interaction samples based on prediction probability scores. pdti-EssB webserver is available online at http://pdtiessb-uestc.com/.
Collapse
Affiliation(s)
- S M Hasan Mahmud
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China.
| | - Wenyu Chen
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China.
| | - Han Meng
- School of Political Science and Public Administration, University of Electronic Science and Technology of China, Chengdu, 611731, China.
| | - Hosney Jahan
- College of Computer Science, Sichuan University, Chengdu, 610065, China.
| | - Yongsheng Liu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China.
| | - S M Mamun Hasan
- Department of Internal Medicine, Rangpur Medical College, Rangpur, 5400, Bangladesh.
| |
Collapse
|
39
|
A Multi-Label Learning Framework for Drug Repurposing. Pharmaceutics 2019; 11:pharmaceutics11090466. [PMID: 31505805 PMCID: PMC6781509 DOI: 10.3390/pharmaceutics11090466] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2019] [Revised: 08/22/2019] [Accepted: 09/05/2019] [Indexed: 01/10/2023] Open
Abstract
Drug repurposing plays an important role in screening old drugs for new therapeutic efficacy. The existing methods commonly treat prediction of drug-target interaction as a problem of binary classification, in which a large number of randomly sampled drug-target pairs accounting for over 50% of the entire training dataset are necessarily required. Such a large number of negative examples that do not come from experimental observations inevitably decrease the credibility of predictions. In this study, we propose a multi-label learning framework to find new uses for old drugs and discover new drugs for known target genes. In the framework, each drug is treated as a class label and its target genes are treated as the class-specific training data to train a supervised learning model of l2-regularized logistic regression. As such, the inter-drug associations are explicitly modelled into the framework and all the class-specific training data come from experimental observations. In addition, the data constraint is less demanding, for instance, the chemical substructures of a drug are no longer needed and the novel target genes are inferred only from the underlying patterns of the known genes targeted by the drug. Stratified multi-label cross-validation shows that 84.9% of known target genes have at least one drug correctly recognized, and the proposed framework correctly recognizes 86.73% of the independent test drug-target interactions (DTIs) from DrugBank. These results show that the proposed framework could generalize well in the large drug/class space without the information of drug chemical structures and target protein structures. Furthermore, we use the trained model to predict new drugs for the known target genes, identify new genes for the old drugs, and infer new associations between old drugs and new disease phenotypes via the OMIM database. Gene ontology (GO) enrichment analyses and the disease associations reported in recent literature provide supporting evidences to the computational results, which potentially shed light on new clinical therapies for new and/or old disease phenotypes.
Collapse
|
40
|
Sachdev K, Gupta MK. A comprehensive review of feature based methods for drug target interaction prediction. J Biomed Inform 2019; 93:103159. [PMID: 30926470 DOI: 10.1016/j.jbi.2019.103159] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 03/25/2019] [Accepted: 03/26/2019] [Indexed: 12/22/2022]
Abstract
Drug target interaction is a prominent research area in the field of drug discovery. It refers to the recognition of interactions between chemical compounds and the protein targets in the human body. Wet lab experiments to identify these interactions are expensive as well as time consuming. The computational methods of interaction prediction help limit the search space for these experiments. These computational methods can be divided into ligand based approaches, docking approaches and chemogenomic approaches. In this review, we aim to describe the various feature based chemogenomic methods for drug target interaction prediction. It provides a comprehensive overview of the various techniques, datasets, tools and metrics. The feature based methods have been categorized, explained and compared. A novel framework for drug target interaction prediction has also been proposed that aims to improve the performance of existing methods. To the best of our knowledge, this is the first comprehensive review focusing only on feature based methods of drug target interaction.
Collapse
Affiliation(s)
- Kanica Sachdev
- Computer Science and Engineering Department, SMVDU, J&K, India.
| | | |
Collapse
|
41
|
Hao M, Bryant SH, Wang Y. A new chemoinformatics approach with improved strategies for effective predictions of potential drugs. J Cheminform 2018; 10:50. [PMID: 30311095 PMCID: PMC6755712 DOI: 10.1186/s13321-018-0303-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Accepted: 10/02/2018] [Indexed: 12/24/2022] Open
Abstract
Background Fast and accurate identification of potential drug candidates against therapeutic targets (i.e., drug–target interactions, DTIs) is a fundamental step in the early drug discovery process. However, experimental determination of DTIs is time-consuming and costly, especially for testing the associations between the entire chemical and genomic spaces. Therefore, computationally efficient algorithms with accurate predictions are required to achieve such a challenging task. In this work, we design a new chemoinformatics approach derived from neighbor-based collaborative filtering (NBCF) to infer potential drug candidates for targets of interest. One of the fundamental steps of NBCF in the application of DTI predictions is to accurately measure the similarity between drugs solely based on the DTI profiles of known knowledge. However, commonly used similarity calculation methods such as COSINE may be noise-prone due to the extremely sparse property of the DTI bipartite network, which decreases the model performance of NBCF. We herein propose three strategies to remedy such a dilemma, which include: (1) adopting a positive pointwise mutual information (PPMI)-based similarity metric, which is noise-immune to some extent; (2) performing low-rank approximation of the original prediction scores; (3) incorporating auxiliary (complementary) information to produce the final predictions. Results We test the proposed methods in three benchmark datasets and the results indicate that our strategies are helpful to improve the NBCF performance for DTI predictions. Comparing to the prior algorithm, our methods exhibit better results assessed by a recall-based evaluation metric. Conclusions A new chemoinformatics approach with improved strategies was successfully developed to predict potential DTIs. Among them, the model based on the sparsity resistant PPMI similarity metric exhibits the best performance, which may be helpful to researchers for identifying potential drugs against therapeutic targets of interest, and can also be applied to related research such as identifying candidate disease genes.
Collapse
Affiliation(s)
- Ming Hao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Stephen H Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Yanli Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
42
|
Chen R, Liu X, Jin S, Lin J, Liu J. Machine Learning for Drug-Target Interaction Prediction. Molecules 2018; 23:E2208. [PMID: 30200333 PMCID: PMC6225477 DOI: 10.3390/molecules23092208] [Citation(s) in RCA: 123] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 08/27/2018] [Accepted: 08/27/2018] [Indexed: 12/18/2022] Open
Abstract
Identifying drug-target interactions will greatly narrow down the scope of search of candidate medications, and thus can serve as the vital first step in drug discovery. Considering that in vitro experiments are extremely costly and time-consuming, high efficiency computational prediction methods could serve as promising strategies for drug-target interaction (DTI) prediction. In this review, our goal is to focus on machine learning approaches and provide a comprehensive overview. First, we summarize a brief list of databases frequently used in drug discovery. Next, we adopt a hierarchical classification scheme and introduce several representative methods of each category, especially the recent state-of-the-art methods. In addition, we compare the advantages and limitations of methods in each category. Lastly, we discuss the remaining challenges and future outlook of machine learning in DTI prediction. This article may provide a reference and tutorial insights on machine learning-based DTI prediction for future researchers.
Collapse
Affiliation(s)
- Ruolan Chen
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Xiangrong Liu
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Shuting Jin
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Jiawei Lin
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Juan Liu
- Department of Instrumental and Electrical Engineering, School of Aerospace Engineering, Xiamen University, Xiamen 361005, China.
| |
Collapse
|
43
|
Wang C, Kurgan L. Review and comparative assessment of similarity-based methods for prediction of drug–protein interactions in the druggable human proteome. Brief Bioinform 2018; 20:2066-2087. [DOI: 10.1093/bib/bby069] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 06/26/2018] [Accepted: 07/10/2018] [Indexed: 12/18/2022] Open
Abstract
AbstractDrug–protein interactions (DPIs) underlie the desired therapeutic actions and the adverse side effects of a significant majority of drugs. Computational prediction of DPIs facilitates research in drug discovery, characterization and repurposing. Similarity-based methods that do not require knowledge of protein structures are particularly suitable for druggable genome-wide predictions of DPIs. We review 35 high-impact similarity-based predictors that were published in the past decade. We group them based on three types of similarities and their combinations that they use. We discuss and compare key aspects of these methods including source databases, internal databases and their predictive models. Using our novel benchmark database, we perform comparative empirical analysis of predictive performance of seven types of representative predictors that utilize each type of similarity individually and all possible combinations of similarities. We assess predictive quality at the database-wide DPI level and we are the first to also include evaluation over individual drugs. Our comprehensive analysis shows that predictors that use more similarity types outperform methods that employ fewer similarities, and that the model combining all three types of similarities secures area under the receiver operating characteristic curve of 0.93. We offer a comprehensive analysis of sensitivity of predictive performance to intrinsic and extrinsic characteristics of the considered predictors. We find that predictive performance is sensitive to low levels of similarities between sequences of the drug targets and several extrinsic properties of the input drug structures, drug profiles and drug targets. The benchmark database and a webserver for the seven predictors are freely available at http://biomine.cs.vcu.edu/servers/CONNECTOR/.
Collapse
Affiliation(s)
- Chen Wang
- Computer Science Department, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Computer Science Department, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
44
|
Ezzat A, Wu M, Li XL, Kwoh CK. Computational prediction of drug–target interactions using chemogenomic approaches: an empirical survey. Brief Bioinform 2018; 20:1337-1357. [DOI: 10.1093/bib/bby002] [Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Revised: 12/21/2017] [Indexed: 01/18/2023] Open
Abstract
Abstract
Computational prediction of drug–target interactions (DTIs) has become an essential task in the drug discovery process. It narrows down the search space for interactions by suggesting potential interaction candidates for validation via wet-lab experiments that are well known to be expensive and time-consuming. In this article, we aim to provide a comprehensive overview and empirical evaluation on the computational DTI prediction techniques, to act as a guide and reference for our fellow researchers. Specifically, we first describe the data used in such computational DTI prediction efforts. We then categorize and elaborate the state-of-the-art methods for predicting DTIs. Next, an empirical comparison is performed to demonstrate the prediction performance of some representative methods under different scenarios. We also present interesting findings from our evaluation study, discussing the advantages and disadvantages of each method. Finally, we highlight potential avenues for further enhancement of DTI prediction performance as well as related research directions.
Collapse
|
45
|
Machine Learning Approaches Toward Building Predictive Models for Small Molecule Modulators of miRNA and Its Utility in Virtual Screening of Molecular Databases. Methods Mol Biol 2018; 1517:155-168. [PMID: 27924481 DOI: 10.1007/978-1-4939-6563-2_11] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The ubiquitous role of microRNAs (miRNAs) in a number of pathological processes has suggested that they could act as potential drug targets. RNA-binding small molecules offer an attractive means for modulating miRNA function. The availability of bioassay data sets for a variety of biological assays and molecules in public domain provides a new opportunity toward utilizing them to create models and further utilize them for in silico virtual screening approaches to prioritize or assign potential functions for small molecules. Here, we describe a computational strategy based on machine learning for creation of predictive models from high-throughput biological screens for virtual screening of small molecules with the potential to inhibit microRNAs. Such models could be potentially used for computational prioritization of small molecules before performing high-throughput biological assay.
Collapse
|
46
|
Drug-Target Interaction Prediction in Drug Repositioning Based on Deep Semi-Supervised Learning. COMPUTATIONAL INTELLIGENCE AND ITS APPLICATIONS 2018. [DOI: 10.1007/978-3-319-89743-1_27] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
47
|
iDTI-ESBoost: Identification of Drug Target Interaction Using Evolutionary and Structural Features with Boosting. Sci Rep 2017; 7:17731. [PMID: 29255285 PMCID: PMC5735173 DOI: 10.1038/s41598-017-18025-2] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Accepted: 12/05/2017] [Indexed: 02/07/2023] Open
Abstract
Prediction of new drug-target interactions is critically important as it can lead the researchers to find new uses for old drugs and to disclose their therapeutic profiles or side effects. However, experimental prediction of drug-target interactions is expensive and time-consuming. As a result, computational methods for predictioning new drug-target interactions have gained a tremendous interest in recent times. Here we present iDTI-ESBoost, a prediction model for identification of drug-target interactions using evolutionary and structural features. Our proposed method uses a novel data balancing and boosting technique to predict drug-target interaction. On four benchmark datasets taken from a gold standard data, iDTI-ESBoost outperforms the state-of-the-art methods in terms of area under receiver operating characteristic (auROC) curve. iDTI-ESBoost also outperforms the latest and the best-performing method found in the literature in terms of area under precision recall (auPR) curve. This is significant as auPR curves are argued as suitable metric for comparison for imbalanced datasets similar to the one studied here. Our reported results show the effectiveness of the classifier, balancing methods and the novel features incorporated in iDTI-ESBoost. iDTI-ESBoost is a novel prediction method that has for the first time exploited the structural features along with the evolutionary features to predict drug-protein interactions. We believe the excellent performance of iDTI-ESBoost both in terms of auROC and auPR would motivate the researchers and practitioners to use it to predict drug-target interactions. To facilitate that, iDTI-ESBoost is implemented and made publicly available at: http://farshidrayhan.pythonanywhere.com/iDTI-ESBoost/.
Collapse
|
48
|
Drug-target interaction prediction using ensemble learning and dimensionality reduction. Methods 2017; 129:81-88. [DOI: 10.1016/j.ymeth.2017.05.016] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Revised: 04/03/2017] [Accepted: 05/18/2017] [Indexed: 11/23/2022] Open
|
49
|
Cheng T, Hao M, Takeda T, Bryant SH, Wang Y. Large-Scale Prediction of Drug-Target Interaction: a Data-Centric Review. AAPS J 2017; 19:1264-1275. [PMID: 28577120 PMCID: PMC11097213 DOI: 10.1208/s12248-017-0092-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Accepted: 04/25/2017] [Indexed: 11/30/2022] Open
Abstract
The prediction of drug-target interactions (DTIs) is of extraordinary significance to modern drug discovery in terms of suggesting new drug candidates and repositioning old drugs. Despite technological advances, large-scale experimental determination of DTIs is still expensive and laborious. Effective and low-cost computational alternatives remain in strong need. Meanwhile, open-access resources have been rapidly growing with massive amount of bioactivity data becoming available, creating unprecedented opportunities for the development of novel in silico models for large-scale DTI prediction. In this work, we review the state-of-the-art computational approaches for identifying DTIs from a data-centric perspective: what the underlying data are and how they are utilized in each study. We also summarize popular public data resources and online tools for DTI prediction. It is found that various types of data were employed including properties of chemical structures, drug therapeutic effects and side effects, drug-target binding, drug-drug interactions, bioactivity data of drug molecules across multiple biological targets, and drug-induced gene expressions. More often, the heterogeneous data were integrated to offer better performance. However, challenges remain such as handling data imbalance, incorporating negative samples and quantitative bioactivity data, as well as maintaining cross-links among different data sources, which are essential for large-scale and automated information integration.
Collapse
Affiliation(s)
- Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Ming Hao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Takako Takeda
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Stephen H Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Yanli Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
50
|
Yuan Q, Gao J, Wu D, Zhang S, Mamitsuka H, Zhu S. DrugE-Rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics 2017; 32:i18-i27. [PMID: 27307615 PMCID: PMC4908328 DOI: 10.1093/bioinformatics/btw244] [Citation(s) in RCA: 99] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Motivation: Identifying drug–target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug–target interactions of new candidate drugs or targets. Methods: Approaches based on machine learning for this problem can be divided into two types: feature-based and similarity-based methods. Learning to rank is the most powerful technique in the feature-based methods. Similarity-based methods are well accepted, due to their idea of connecting the chemical and genomic spaces, represented by drug and target similarities, respectively. We propose a new method, DrugE-Rank, to improve the prediction performance by nicely combining the advantages of the two different types of methods. That is, DrugE-Rank uses LTR, for which multiple well-known similarity-based methods can be used as components of ensemble learning. Results: The performance of DrugE-Rank is thoroughly examined by three main experiments using data from DrugBank: (i) cross-validation on FDA (US Food and Drug Administration) approved drugs before March 2014; (ii) independent test on FDA approved drugs after March 2014; and (iii) independent test on FDA experimental drugs. Experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. Availability:http://datamining-iip.fudan.edu.cn/service/DrugE-Rank Contact:zhusf@fudan.edu.cn Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qingjun Yuan
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Junning Gao
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Dongliang Wu
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Shihua Zhang
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Japan Department of Computer Science, Aalto University, Finland
| | - Shanfeng Zhu
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China Centre for Computational System Biology, Fudan University, Shanghai, China
| |
Collapse
|