51
|
Varikoti RA, Schultz KJ, Kombala CJ, Kruel A, Brandvold KR, Zhou M, Kumar N. Integrated data-driven and experimental approaches to accelerate lead optimization targeting SARS-CoV-2 main protease. J Comput Aided Mol Des 2023:10.1007/s10822-023-00509-1. [PMID: 37314632 DOI: 10.1007/s10822-023-00509-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 05/23/2023] [Indexed: 06/15/2023]
Abstract
Identification of potential therapeutic candidates can be expedited by integrating computational modeling with domain aware machine learning (ML) models followed by experimental validation in an iterative manner. Generative deep learning models can generate thousands of new candidates, however, their physiochemical and biochemical properties are typically not fully optimized. Using our recently developed deep learning models and a scaffold as a starting point, we generated tens of thousands of compounds for SARS-CoV-2 Mpro that preserve the core scaffold. We utilized and implemented several computational tools such as structural alert and toxicity analysis, high throughput virtual screening, ML-based 3D quantitative structure-activity relationships, multi-parameter optimization, and graph neural networks on generated candidates to predict biological activity and binding affinity in advance. As a result of these combined computational endeavors, eight promising candidates were singled out and put through experimental testing using Native Mass Spectrometry and FRET-based functional assays. Two of the tested compounds with quinazoline-2-thiol and acetylpiperidine core moieties showed IC[Formula: see text] values in the low micromolar range: [Formula: see text] [Formula: see text]M and 3.41±0.0015 [Formula: see text]M, respectively. Molecular dynamics simulations further highlight that binding of these compounds results in allosteric modulations within the chain B and the interface domains of the Mpro. Our integrated approach provides a platform for data driven lead optimization with rapid characterization and experimental validation in a closed loop that could be applied to other potential protein targets.
Collapse
Affiliation(s)
- Rohith Anand Varikoti
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Katherine J Schultz
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Chathuri J Kombala
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Agustin Kruel
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Kristoffer R Brandvold
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Mowei Zhou
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Neeraj Kumar
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA.
| |
Collapse
|
52
|
Sun J, Xu M, Ru J, James-Bott A, Xiong D, Wang X, Cribbs AP. Small molecule-mediated targeting of microRNAs for drug discovery: Experiments, computational techniques, and disease implications. Eur J Med Chem 2023; 257:115500. [PMID: 37262996 DOI: 10.1016/j.ejmech.2023.115500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/05/2023] [Accepted: 05/15/2023] [Indexed: 06/03/2023]
Abstract
Small molecules have been providing medical breakthroughs for human diseases for more than a century. Recently, identifying small molecule inhibitors that target microRNAs (miRNAs) has gained importance, despite the challenges posed by labour-intensive screening experiments and the significant efforts required for medicinal chemistry optimization. Numerous experimentally-verified cases have demonstrated the potential of miRNA-targeted small molecule inhibitors for disease treatment. This new approach is grounded in their posttranscriptional regulation of the expression of disease-associated genes. Reversing dysregulated gene expression using this mechanism may help control dysfunctional pathways. Furthermore, the ongoing improvement of algorithms has allowed for the integration of computational strategies built on top of laboratory-based data, facilitating a more precise and rational design and discovery of lead compounds. To complement the use of extensive pharmacogenomics data in prioritising potential drugs, our previous work introduced a computational approach based on only molecular sequences. Moreover, various computational tools for predicting molecular interactions in biological networks using similarity-based inference techniques have been accumulated in established studies. However, there are a limited number of comprehensive reviews covering both computational and experimental drug discovery processes. In this review, we outline a cohesive overview of both biological and computational applications in miRNA-targeted drug discovery, along with their disease implications and clinical significance. Finally, utilizing drug-target interaction (DTIs) data from DrugBank, we showcase the effectiveness of deep learning for obtaining the physicochemical characterization of DTIs.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| | - Miaoer Xu
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | - Jinlong Ru
- Chair of Prevention of Microbial Diseases, School of Life Sciences Weihenstephan, Technical University of Munich, Freising, 85354, Germany
| | - Anna James-Bott
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Xia Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China.
| | - Adam P Cribbs
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| |
Collapse
|
53
|
Wang S, Song X, Zhang Y, Zhang K, Liu Y, Ren C, Pang S. MSGNN-DTA: Multi-Scale Topological Feature Fusion Based on Graph Neural Networks for Drug-Target Binding Affinity Prediction. Int J Mol Sci 2023; 24:ijms24098326. [PMID: 37176031 PMCID: PMC10179712 DOI: 10.3390/ijms24098326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 05/03/2023] [Accepted: 05/04/2023] [Indexed: 05/15/2023] Open
Abstract
The accurate prediction of drug-target binding affinity (DTA) is an essential step in drug discovery and drug repositioning. Although deep learning methods have been widely adopted for DTA prediction, the complexity of extracting drug and target protein features hampers the accuracy of these predictions. In this study, we propose a novel model for DTA prediction named MSGNN-DTA, which leverages a fused multi-scale topological feature approach based on graph neural networks (GNNs). To address the challenge of accurately extracting drug and target protein features, we introduce a gated skip-connection mechanism during the feature learning process to fuse multi-scale topological features, resulting in information-rich representations of drugs and proteins. Our approach constructs drug atom graphs, motif graphs, and weighted protein graphs to fully extract topological information and provide a comprehensive understanding of underlying molecular interactions from multiple perspectives. Experimental results on two benchmark datasets demonstrate that MSGNN-DTA outperforms the state-of-the-art models in all evaluation metrics, showcasing the effectiveness of the proposed approach. Moreover, the study conducts a case study based on already FDA-approved drugs in the DrugBank dataset to highlight the potential of the MSGNN-DTA framework in identifying drug candidates for specific targets, which could accelerate the process of virtual screening and drug repositioning.
Collapse
Affiliation(s)
- Shudong Wang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Xuanmo Song
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China
| | - Kuijie Zhang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Yingye Liu
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Chuanru Ren
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Shanchen Pang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| |
Collapse
|
54
|
Yang SQ, Zhang LX, Ge YJ, Zhang JW, Hu JX, Shen CY, Lu AP, Hou TJ, Cao DS. In-silico target prediction by ensemble chemogenomic model based on multi-scale information of chemical structures and protein sequences. J Cheminform 2023; 15:48. [PMID: 37088813 PMCID: PMC10123967 DOI: 10.1186/s13321-023-00720-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 04/08/2023] [Indexed: 04/25/2023] Open
Abstract
Identification and validation of bioactive small-molecule targets is a significant challenge in drug discovery. In recent years, various in-silico approaches have been proposed to expedite time- and resource-consuming experiments for target detection. Herein, we developed several chemogenomic models for target prediction based on multi-scale information of chemical structures and protein sequences. By combining the information of a compound with multiple protein targets together and putting these compound-target pairs into a well-established model, the scores to indicate whether there are interactions between compounds and targets can be derived, and thus a target prediction task can be completed by sorting the outputted scores. To improve the prediction performance, we constructed several chemogenomic models using multi-scale information of chemical structures and protein sequences, and the ensemble model with the best performance was used as our final model. The model was validated by various strategies and external datasets and the promising target prediction capability of the model, i.e., the fraction of known targets identified in the top-k (1 to 10) list of the potential target candidates suggested by the model, was confirmed. Compared with multiple state-of-art target prediction methods, our model showed equivalent or better predictive ability in terms of the top-k predictions. It is expected that our method can be utilized as a powerful computational tool to narrow down the potential targets for experimental testing.
Collapse
Affiliation(s)
- Su-Qing Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Liu-Xia Zhang
- The First Hospital of Hunan University of Chinese Medicine, Changsha, 410007, Hunan, People's Republic of China
| | - You-Jin Ge
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Jin-Wei Zhang
- Departments of Biomedical Engineering and Pathology, School of Basic Medical Science, Central South University, Changsha, 410013, Hunan, People's Republic of China
| | - Jian-Xin Hu
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Cheng-Ying Shen
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, People's Republic of China
| | - Ting-Jun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China.
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China.
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, People's Republic of China.
| |
Collapse
|
55
|
Chen P, Zheng H. Drug-target interaction prediction based on spatial consistency constraint and graph convolutional autoencoder. BMC Bioinformatics 2023; 24:151. [PMID: 37069493 PMCID: PMC10109239 DOI: 10.1186/s12859-023-05275-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 04/05/2023] [Indexed: 04/19/2023] Open
Abstract
BACKGROUND Drug-target interaction (DTI) prediction plays an important role in drug discovery and repositioning. However, most of the computational methods used for identifying relevant DTIs do not consider the invariance of the nearest neighbour relationships between drugs or targets. In other words, they do not take into account the invariance of the topological relationships between nodes during representation learning. It may limit the performance of the DTI prediction methods. RESULTS Here, we propose a novel graph convolutional autoencoder-based model, named SDGAE, to predict DTIs. As the graph convolutional network cannot handle isolated nodes in a network, a pre-processing step was applied to reduce the number of isolated nodes in the heterogeneous network and facilitate effective exploitation of the graph convolutional network. By maintaining the graph structure during representation learning, the nearest neighbour relationships between nodes in the embedding space remained as close as possible to the original space. CONCLUSIONS Overall, we demonstrated that SDGAE can automatically learn more informative and robust feature vectors of drugs and targets, thus exhibiting significantly improved predictive accuracy for DTIs.
Collapse
Affiliation(s)
- Peng Chen
- School of Computer Science and Technology, University of Science and Technology of China, Jinzhai Road 96, Hefei, 230027, People's Republic of China
- Anhui Key Laboratory of Software Engineering in Computing and Communication, University of Science and Technology of China, Jinzhai Road 96, Hefei, 230027, People's Republic of China
| | - Haoran Zheng
- School of Computer Science and Technology, University of Science and Technology of China, Jinzhai Road 96, Hefei, 230027, People's Republic of China.
- Anhui Key Laboratory of Software Engineering in Computing and Communication, University of Science and Technology of China, Jinzhai Road 96, Hefei, 230027, People's Republic of China.
| |
Collapse
|
56
|
D’Souza S, Prema KV, Balaji S, Shah R. Deep Learning-Based Modeling of Drug–Target Interaction Prediction Incorporating Binding Site Information of Proteins. INTERDISCIPLINARY SCIENCES: COMPUTATIONAL LIFE SCIENCES 2023; 15:306-315. [PMID: 36967455 PMCID: PMC10148762 DOI: 10.1007/s12539-023-00557-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 02/22/2023] [Accepted: 02/22/2023] [Indexed: 03/29/2023]
Abstract
AbstractChemogenomics, also known as proteochemometrics, covers various computational methods for predicting interactions between related drugs and targets on large-scale data. Chemogenomics is used in the early stages of drug discovery to predict the off-target effects of proteins against therapeutic candidates. This study aims to predict unknown ligand–target interactions using one-dimensional SMILES as inputs for ligands and binding site residues for proteins in a computationally efficient manner. We first formulate a Deep learning CNN model using one-dimensional SMILES for drugs and motif-rich binding pocket subsequences of proteins as inputs. We evaluate and compare the proposed deep learning model trained on expert-based features against shallow feature-based machine learning methods. The proposed method achieved better or similar performance on the MSE and AUPR metrics than the shallow methods. Additionally, We show that our deep learning model, DeepPS is computationally more efficient than the deep learning model trained on full-length raw sequences of proteins. We conclude that a beneficial research approach would be to integrate structural information of proteins for modeling drug-target interaction prediction of large datasets for more interpretability, high throughput, and broad applicability.
Graphical abstract
Collapse
Affiliation(s)
- Sofia D’Souza
- Department of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, India
| | - K. V. Prema
- Department of Computer Science and Engineering, Manipal Academy of Higher Education, Bengaluru, India
| | - S. Balaji
- Department of Biotechnology, Manipal Academy of Higher Education, Manipal, India
| | - Ronak Shah
- Department of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, India
| |
Collapse
|
57
|
Ji KY, Liu C, Liu ZQ, Deng YF, Hou TJ, Cao DS. Comprehensive assessment of nine target prediction web services: which should we choose for target fishing? Brief Bioinform 2023; 24:6995377. [PMID: 36681902 DOI: 10.1093/bib/bbad014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/29/2022] [Accepted: 01/03/2023] [Indexed: 01/23/2023] Open
Abstract
Identification of potential targets for known bioactive compounds and novel synthetic analogs is of considerable significance. In silico target fishing (TF) has become an alternative strategy because of the expensive and laborious wet-lab experiments, explosive growth of bioactivity data and rapid development of high-throughput technologies. However, these TF methods are based on different algorithms, molecular representations and training datasets, which may lead to different results when predicting the same query molecules. This can be confusing for practitioners in practical applications. Therefore, this study systematically evaluated nine popular ligand-based TF methods based on target and ligand-target pair statistical strategies, which will help practitioners make choices among multiple TF methods. The evaluation results showed that SwissTargetPrediction was the best method to produce the most reliable predictions while enriching more targets. High-recall similarity ensemble approach (SEA) was able to find real targets for more compounds compared with other TF methods. Therefore, SwissTargetPrediction and SEA can be considered as primary selection methods in future studies. In addition, the results showed that k = 5 was the optimal number of experimental candidate targets. Finally, a novel ensemble TF method based on consensus voting is proposed to improve the prediction performance. The precision of the ensemble TF method outperforms the individual TF method, indicating that the ensemble TF method can more effectively identify real targets within a given top-k threshold. The results of this study can be used as a reference to guide practitioners in selecting the most effective methods in computational drug discovery.
Collapse
Affiliation(s)
- Kai-Yue Ji
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Chong Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Zhao-Qian Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Ya-Feng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| |
Collapse
|
58
|
Chen W, Liu X, Zhang S, Chen S. Artificial intelligence for drug discovery: Resources, methods, and applications. MOLECULAR THERAPY. NUCLEIC ACIDS 2023; 31:691-702. [PMID: 36923950 PMCID: PMC10009646 DOI: 10.1016/j.omtn.2023.02.019] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
Abstract
Conventional wet laboratory testing, validations, and synthetic procedures are costly and time-consuming for drug discovery. Advancements in artificial intelligence (AI) techniques have revolutionized their applications to drug discovery. Combined with accessible data resources, AI techniques are changing the landscape of drug discovery. In the past decades, a series of AI-based models have been developed for various steps of drug discovery. These models have been used as complements of conventional experiments and have accelerated the drug discovery process. In this review, we first introduced the widely used data resources in drug discovery, such as ChEMBL and DrugBank, followed by the molecular representation schemes that convert data into computer-readable formats. Meanwhile, we summarized the algorithms used to develop AI-based models for drug discovery. Subsequently, we discussed the applications of AI techniques in pharmaceutical analysis including predicting drug toxicity, drug bioactivity, and drug physicochemical property. Furthermore, we introduced the AI-based models for de novo drug design, drug-target structure prediction, drug-target interaction, and binding affinity prediction. Moreover, we also highlighted the advanced applications of AI in drug synergism/antagonism prediction and nanomedicine design. Finally, we discussed the challenges and future perspectives on the applications of AI to drug discovery.
Collapse
Affiliation(s)
- Wei Chen
- State Key Laboratory of Southwestern Chinese Medicine Resources, Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.,Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Xuesong Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Sanyin Zhang
- State Key Laboratory of Southwestern Chinese Medicine Resources, Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.,Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Shilin Chen
- State Key Laboratory of Southwestern Chinese Medicine Resources, Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.,Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| |
Collapse
|
59
|
Zhao Q, Duan G, Yang M, Cheng Z, Li Y, Wang J. AttentionDTA: Drug-Target Binding Affinity Prediction by Sequence-Based Deep Learning With Attention Mechanism. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:852-863. [PMID: 35471889 DOI: 10.1109/tcbb.2022.3170365] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The identification of drug-target relations (DTRs) is substantial in drug development. A large number of methods treat DTRs as drug-target interactions (DTIs), a binary classification problem. The main drawback of these methods are the lack of reliable negative samples and the absence of many important aspects of DTR, including their dose dependence and quantitative affinities. With increasing number of publications of drug-protein binding affinity data recently, DTRs prediction can be viewed as a regression problem of drug-target affinities (DTAs) which reflects how tightly the drug binds to the target and can present more detailed and specific information than DTIs. The growth of affinity data enables the use of deep learning architectures, which have been shown to be among the state-of-the-art methods in binding affinity prediction. Although relatively effective, due to the black-box nature of deep learning, these models are less biologically interpretable. In this study, we proposed a deep learning-based model, named AttentionDTA, which uses attention mechanism to predict DTAs. Different from the models using 3D structures of drug-target complexes or graph representation of drugs and proteins, the novelty of our work is to use attention mechanism to focus on key subsequences which are important in drug and protein sequences when predicting its affinity. We use two separate one-dimensional Convolution Neural Networks (1D-CNNs) to extract the semantic information of drug's SMILES string and protein's amino acid sequence. Furthermore, a two-side multi-head attention mechanism is developed and embedded to our model to explore the relationship between drug features and protein features. We evaluate our model on three established DTA benchmark datasets, Davis, Metz, and KIBA. AttentionDTA outperforms the state-of-the-art deep learning methods under different evaluation metrics. The results show that the attention-based model can effectively extract protein features related to drug information and drug features related to protein information to better predict drug target affinities. It is worth mentioning that we test our model on IC50 dataset, which provides the binding sites between drugs and proteins, to evaluate the ability of our model to locate binding sites. Finally, we visualize the attention weight to demonstrate the biological significance of the model. The source code of AttentionDTA can be downloaded from https://github.com/zhaoqichang/AttentionDTA_TCBB.
Collapse
|
60
|
Walter W, Pohlkamp C, Meggendorfer M, Nadarajah N, Kern W, Haferlach C, Haferlach T. Artificial intelligence in hematological diagnostics: Game changer or gadget? Blood Rev 2023; 58:101019. [PMID: 36241586 DOI: 10.1016/j.blre.2022.101019] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 09/21/2022] [Accepted: 10/03/2022] [Indexed: 11/30/2022]
Abstract
The future of clinical diagnosis and treatment of hematologic diseases will inevitably involve the integration of artificial intelligence (AI)-based systems into routine practice to support the hematologists' decision making. Several studies have shown that AI-based models can already be used to automatically differentiate cells, reliably detect malignant cell populations, support chromosome banding analysis, and interpret clinical variants, contributing to early disease detection and prognosis. However, even the best tool can become useless if it is misapplied or the results are misinterpreted. Therefore, in order to comprehensively judge and correctly apply newly developed AI-based systems, the hematologist must have a basic understanding of the general concepts of machine learning. In this review, we provide the hematologist with a comprehensive overview of various machine learning techniques, their current implementations and approaches in different diagnostic subfields (e.g., cytogenetics, molecular genetics), and the limitations and unresolved challenges of the systems.
Collapse
Affiliation(s)
- Wencke Walter
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 München, Germany.
| | - Christian Pohlkamp
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 München, Germany.
| | - Manja Meggendorfer
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 München, Germany.
| | - Niroshan Nadarajah
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 München, Germany.
| | - Wolfgang Kern
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 München, Germany.
| | - Claudia Haferlach
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 München, Germany.
| | - Torsten Haferlach
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 München, Germany.
| |
Collapse
|
61
|
Chu T, Nguyen TT, Hai BD, Nguyen QH, Nguyen T. Graph Transformer for Drug Response Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1065-1072. [PMID: 36107906 DOI: 10.1109/tcbb.2022.3206888] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
BACKGROUND Previous models have shown that learning drug features from their graph representation is more efficient than learning from their strings or numeric representations. Furthermore, integrating multi-omics data of cell lines increases the performance of drug response prediction. However, these models have shown drawbacks in extracting drug features from graph representation and incorporating redundancy information from multi-omics data. This paper proposes a deep learning model, GraTransDRP, to better drug representation and reduce information redundancy. First, the Graph transformer was utilized to extract the drug representation more efficiently. Next, Convolutional neural networks were used to learn the mutation, meth, and transcriptomics features. However, the dimension of transcriptomics features was up to 17737. Therefore, KernelPCA was applied to transcriptomics features to reduce the dimension and transform them into a dense presentation before putting them through the CNN model. Finally, drug and omics features were combined to predict a response value by a fully connected network. Experimental results show that our model outperforms some state-of-the-art methods, including GraphDRP and GraOmicDRP.
Collapse
|
62
|
Zhu Z, Yao Z, Qi G, Mazur N, Yang P, Cong B. Associative learning mechanism for drug‐target interaction prediction. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2023. [DOI: 10.1049/cit2.12194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2023] Open
Affiliation(s)
- Zhiqin Zhu
- College of Automation Chongqing University of Posts and Telecommunications Chongqing China
| | - Zheng Yao
- College of Automation Chongqing University of Posts and Telecommunications Chongqing China
| | - Guanqiu Qi
- Computer Information Systems Department State University of New York at Buffalo State Buffalo New York USA
| | - Neal Mazur
- Computer Information Systems Department State University of New York at Buffalo State Buffalo New York USA
| | - Pan Yang
- Department of Cardiovascular Surgery Chongqing General Hospital University of Chinese Academy of Sciences Chongqing China
- Emergency Department The Second Affiliated Hospital of Chongqing Medical University Chongqing China
| | - Baisen Cong
- Data Scientist Diagnostics Digital DH (Shanghai) Diagnostics Co., Ltd. Danaher Company Shanghai China
| |
Collapse
|
63
|
DRaW: prediction of COVID-19 antivirals by deep learning-an objection on using matrix factorization. BMC Bioinformatics 2023; 24:52. [PMID: 36793010 PMCID: PMC9931173 DOI: 10.1186/s12859-023-05181-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 02/09/2023] [Indexed: 02/17/2023] Open
Abstract
BACKGROUND Due to the high resource consumption of introducing a new drug, drug repurposing plays an essential role in drug discovery. To do this, researchers examine the current drug-target interaction (DTI) to predict new interactions for the approved drugs. Matrix factorization methods have much attention and utilization in DTIs. However, they suffer from some drawbacks. METHODS We explain why matrix factorization is not the best for DTI prediction. Then, we propose a deep learning model (DRaW) to predict DTIs without having input data leakage. We compare our model with several matrix factorization methods and a deep model on three COVID-19 datasets. In addition, to ensure the validation of DRaW, we evaluate it on benchmark datasets. Furthermore, as an external validation, we conduct a docking study on the COVID-19 recommended drugs. RESULTS In all cases, the results confirm that DRaW outperforms matrix factorization and deep models. The docking results approve the top-ranked recommended drugs for COVID-19. CONCLUSIONS In this paper, we show that it may not be the best choice to use matrix factorization in the DTI prediction. Matrix factorization methods suffer from some intrinsic issues, e.g., sparsity in the domain of bioinformatics applications and fixed-unchanged size of the matrix-related paradigm. Therefore, we propose an alternative method (DRaW) that uses feature vectors rather than matrix factorization and demonstrates better performance than other famous methods on three COVID-19 and four benchmark datasets.
Collapse
|
64
|
Zhang Y, Li S, Xing M, Yuan Q, He H, Sun S. Universal Approach to De Novo Drug Design for Target Proteins Using Deep Reinforcement Learning. ACS OMEGA 2023; 8:5464-5474. [PMID: 36816653 PMCID: PMC9933084 DOI: 10.1021/acsomega.2c06653] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 01/05/2023] [Indexed: 05/28/2023]
Abstract
In drug design, the design and manufacture of safe and effective compounds is a long-term, complex, and complicated process. Therefore, developing a new rapid and generalizable drug design method is of great value. This study aimed to propose a general model based on reinforcement learning combined with drug-target interaction, which could be used to design new molecules according to different protein targets. The method adopted recurrent neural network molecular modeling and took the drug-target affinity model as the reward function of optimal molecular generation. It did not need to know the three-dimensional structure and active sites of protein targets but only required the information of a one-dimensional amino acid sequence. This approach was demonstrated to produce drugs highly similar to marketed drugs and design molecules with a better binding energy.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing
Key Laboratory for Green Catalysis and Separation, The Faculty of
Environment and Life, Beijing University
of Technology, Beijing100124, PR China
| | - Shuyuan Li
- Beijing
Key Laboratory for Green Catalysis and Separation, The Faculty of
Environment and Life, Beijing University
of Technology, Beijing100124, PR China
| | - Miaojuan Xing
- Beijing
Key Laboratory for Green Catalysis and Separation, The Faculty of
Environment and Life, Beijing University
of Technology, Beijing100124, PR China
| | - Qing Yuan
- Department
of Chemistry and Chemical Engineering, Beijing
University of Technology, Beijing100124, China
| | - Hong He
- Beijing
Key Laboratory for Green Catalysis and Separation, The Faculty of
Environment and Life, Beijing University
of Technology, Beijing100124, PR China
| | - Shaorui Sun
- Beijing
Key Laboratory for Green Catalysis and Separation, The Faculty of
Environment and Life, Beijing University
of Technology, Beijing100124, PR China
| |
Collapse
|
65
|
Hu L, Fu C, Ren Z, Cai Y, Yang J, Xu S, Xu W, Tang D. SSELM-neg: spherical search-based extreme learning machine for drug-target interaction prediction. BMC Bioinformatics 2023; 24:38. [PMID: 36737694 PMCID: PMC9896467 DOI: 10.1186/s12859-023-05153-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 01/18/2023] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND The experimental verification of a drug discovery process is expensive and time-consuming. Therefore, efficiently and effectively identifying drug-target interactions (DTIs) has been the focus of research. At present, many machine learning algorithms are used for predicting DTIs. The key idea is to train the classifier using an existing DTI to predict a new or unknown DTI. However, there are various challenges, such as class imbalance and the parameter optimization of many classifiers, that need to be solved before an optimal DTI model is developed. METHODS In this study, we propose a framework called SSELM-neg for DTI prediction, in which we use a screening approach to choose high-quality negative samples and a spherical search approach to optimize the parameters of the extreme learning machine. RESULTS The results demonstrated that the proposed technique outperformed other state-of-the-art methods in 10-fold cross-validation experiments in terms of the area under the receiver operating characteristic curve (0.986, 0.993, 0.988, and 0.969) and AUPR (0.982, 0.991, 0.982, and 0.946) for the enzyme dataset, G-protein coupled receptor dataset, ion channel dataset, and nuclear receptor dataset, respectively. CONCLUSION The screening approach produced high-quality negative samples with the same number of positive samples, which solved the class imbalance problem. We optimized an extreme learning machine using a spherical search approach to identify DTIs. Therefore, our models performed better than other state-of-the-art methods.
Collapse
Affiliation(s)
- Lingzhi Hu
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China
| | - Chengzhou Fu
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China ,Guangdong Province Precise Medicine Big Data of Traditional Chinese Medicine Engineering Technology Research Center, Guangzhou, People’s Republic of China
| | - Zhonglu Ren
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China
| | - Yongming Cai
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China ,Guangdong Province Precise Medicine Big Data of Traditional Chinese Medicine Engineering Technology Research Center, Guangzhou, People’s Republic of China
| | - Jin Yang
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China ,Guangdong Province Precise Medicine Big Data of Traditional Chinese Medicine Engineering Technology Research Center, Guangzhou, People’s Republic of China
| | - Siwen Xu
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China
| | - Wenhua Xu
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China
| | - Deyu Tang
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China ,grid.79703.3a0000 0004 1764 3838School of Computer Science and Engineering, South China University of Technology, Guangzhou, People’s Republic of China ,Guangdong Province Precise Medicine Big Data of Traditional Chinese Medicine Engineering Technology Research Center, Guangzhou, People’s Republic of China
| |
Collapse
|
66
|
Bai P, Miljković F, John B, Lu H. Interpretable bilinear attention network with domain adaptation improves drug–target prediction. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-022-00605-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
67
|
Sun J, Kulandaisamy A, Liu J, Hu K, Gromiha MM, Zhang Y. Machine learning in computational modelling of membrane protein sequences and structures: From methodologies to applications. Comput Struct Biotechnol J 2023; 21:1205-1226. [PMID: 36817959 PMCID: PMC9932300 DOI: 10.1016/j.csbj.2023.01.036] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 01/16/2023] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
Membrane proteins mediate a wide spectrum of biological processes, such as signal transduction and cell communication. Due to the arduous and costly nature inherent to the experimental process, membrane proteins have long been devoid of well-resolved atomic-level tertiary structures and, consequently, the understanding of their functional roles underlying a multitude of life activities has been hampered. Currently, computational tools dedicated to furthering the structure-function understanding are primarily focused on utilizing intelligent algorithms to address a variety of site-wise prediction problems (e.g., topology and interaction sites), but are scattered across different computing sources. Moreover, the recent advent of deep learning techniques has immensely expedited the development of computational tools for membrane protein-related prediction problems. Given the growing number of applications optimized particularly by manifold deep neural networks, we herein provide a review on the current status of computational strategies mainly in membrane protein type classification, topology identification, interaction site detection, and pathogenic effect prediction. Meanwhile, we provide an overview of how the entire prediction process proceeds, including database collection, data pre-processing, feature extraction, and method selection. This review is expected to be useful for developing more extendable computational tools specific to membrane proteins.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Headington, Oxford OX3 7LD, UK
| | - Arulsamy Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India
| | - Jacklyn Liu
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
| | - Kai Hu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China
| | - M. Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India,Corresponding authors.
| | - Yuan Zhang
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China,Corresponding authors.
| |
Collapse
|
68
|
Ren ZH, You ZH, Zou Q, Yu CQ, Ma YF, Guan YJ, You HR, Wang XF, Pan J. DeepMPF: deep learning framework for predicting drug-target interactions based on multi-modal representation with meta-path semantic analysis. J Transl Med 2023; 21:48. [PMID: 36698208 PMCID: PMC9876420 DOI: 10.1186/s12967-023-03876-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 01/05/2023] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Drug-target interaction (DTI) prediction has become a crucial prerequisite in drug design and drug discovery. However, the traditional biological experiment is time-consuming and expensive, as there are abundant complex interactions present in the large size of genomic and chemical spaces. For alleviating this phenomenon, plenty of computational methods are conducted to effectively complement biological experiments and narrow the search spaces into a preferred candidate domain. Whereas, most of the previous approaches cannot fully consider association behavior semantic information based on several schemas to represent complex the structure of heterogeneous biological networks. Additionally, the prediction of DTI based on single modalities cannot satisfy the demand for prediction accuracy. METHODS We propose a multi-modal representation framework of 'DeepMPF' based on meta-path semantic analysis, which effectively utilizes heterogeneous information to predict DTI. Specifically, we first construct protein-drug-disease heterogeneous networks composed of three entities. Then the feature information is obtained under three views, containing sequence modality, heterogeneous structure modality and similarity modality. We proposed six representative schemas of meta-path to preserve the high-order nonlinear structure and catch hidden structural information of the heterogeneous network. Finally, DeepMPF generates highly representative comprehensive feature descriptors and calculates the probability of interaction through joint learning. RESULTS To evaluate the predictive performance of DeepMPF, comparison experiments are conducted on four gold datasets. Our method can obtain competitive performance in all datasets. We also explore the influence of the different feature embedding dimensions, learning strategies and classification methods. Meaningfully, the drug repositioning experiments on COVID-19 and HIV demonstrate DeepMPF can be applied to solve problems in reality and help drug discovery. The further analysis of molecular docking experiments enhances the credibility of the drug candidates predicted by DeepMPF. CONCLUSIONS All the results demonstrate the effectively predictive capability of DeepMPF for drug-target interactions. It can be utilized as a useful tool to prescreen the most potential drug candidates for the protein. The web server of the DeepMPF predictor is freely available at http://120.77.11.78/DeepMPF/ , which can help relevant researchers to further study.
Collapse
Affiliation(s)
- Zhong-Hao Ren
- grid.460132.20000 0004 1758 0275School of Information Engineering, Xijing University, Xi’an, 710100 China
| | - Zhu-Hong You
- grid.440588.50000 0001 0307 1240School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Quan Zou
- grid.54549.390000 0004 0369 4060Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054 China
| | - Chang-Qing Yu
- grid.460132.20000 0004 1758 0275School of Information Engineering, Xijing University, Xi’an, 710100 China
| | - Yan-Fang Ma
- grid.417234.70000 0004 1808 3203Department of Galactophore, The Third People’s Hospital of Gansu Province, Lanzhou, 730020 China
| | - Yong-Jian Guan
- grid.460132.20000 0004 1758 0275School of Information Engineering, Xijing University, Xi’an, 710100 China
| | - Hai-Ru You
- grid.440588.50000 0001 0307 1240School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Xin-Fei Wang
- grid.460132.20000 0004 1758 0275School of Information Engineering, Xijing University, Xi’an, 710100 China
| | - Jie Pan
- grid.460132.20000 0004 1758 0275School of Information Engineering, Xijing University, Xi’an, 710100 China
| |
Collapse
|
69
|
Sequre: a high-performance framework for secure multiparty computation enables biomedical data sharing. Genome Biol 2023; 24:5. [PMID: 36631897 PMCID: PMC9832703 DOI: 10.1186/s13059-022-02841-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 12/21/2022] [Indexed: 01/12/2023] Open
Abstract
Secure multiparty computation (MPC) is a cryptographic tool that allows computation on top of sensitive biomedical data without revealing private information to the involved entities. Here, we introduce Sequre, an easy-to-use, high-performance framework for developing performant MPC applications. Sequre offers a set of automatic compile-time optimizations that significantly improve the performance of MPC applications and incorporates the syntax of Python programming language to facilitate rapid application development. We demonstrate its usability and performance on various bioinformatics tasks showing up to 3-4 times increased speed over the existing pipelines with 7-fold reductions in codebase sizes.
Collapse
|
70
|
Suhartono D, Majiid MRN, Handoyo AT, Wicaksono P, Lucky H. Towards a more general drug target interaction prediction model using transfer learning. PROCEDIA COMPUTER SCIENCE 2023; 216:370-376. [PMID: 36643181 PMCID: PMC9829421 DOI: 10.1016/j.procs.2022.12.148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
The topic of Drug-Target Interaction (DTI) topic has emerged nowadays since the COVID-19 outbreaks. DTI is one of the stages of finding a new cure for a recent disease. It determines whether a chemical compound would affect a particular protein, known as binding affinity. Recently, significant efforts have been devoted to artificial intelligence (AI) powered DTI. However, the use of transfer learning in DTI has not been explored extensively. This paper aims to make a more general DTI model by investigating DTI prediction method using Transfer learning. Three popular models will be tested and observed: CNN, RNN, and Transformer. Those models combined in several scenarios involving two extensive public datasets on DTI (BindingDB and DAVIS) to find the most optimum architecture. In our finding, combining the CNN model and BindingDB as the source data became the most recommended pre-trained model for real DTI cases. This conclusion was proved with the 6% AUPRC increase after fine-tuning the BindingDB pre-trained model to DAVIS dataset than without pre-training the model first.
Collapse
Affiliation(s)
- Derwin Suhartono
- Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia 11480
| | - Muhammad Rizki Nur Majiid
- Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia 11480
| | - Alif Tri Handoyo
- Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia 11480
| | - Pandu Wicaksono
- Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia 11480
| | - Henry Lucky
- Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia 11480
| |
Collapse
|
71
|
Drug-disease association prediction based on end-to-end multi-layer heterogeneous graph convolutional encoders. INFORMATICS IN MEDICINE UNLOCKED 2023. [DOI: 10.1016/j.imu.2023.101177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
|
72
|
Qi R, Zou Q. Trends and Potential of Machine Learning and Deep Learning in Drug Study at Single-Cell Level. RESEARCH (WASHINGTON, D.C.) 2023; 6:0050. [PMID: 36930772 PMCID: PMC10013796 DOI: 10.34133/research.0050] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 12/27/2022] [Indexed: 01/12/2023]
Abstract
Cancer treatments always face challenging problems, particularly drug resistance due to tumor cell heterogeneity. The existing datasets include the relationship between gene expression and drug sensitivities; however, the majority are based on tissue-level studies. Study drugs at the single-cell level are perspective to overcome minimal residual disease caused by subclonal resistant cancer cells retained after initial curative therapy. Fortunately, machine learning techniques can help us understand how different types of cells respond to different cancer drugs from the perspective of single-cell gene expression. Good modeling using single-cell data and drug response information will not only improve machine learning for cell-drug outcome prediction but also facilitate the discovery of drugs for specific cancer subgroups and specific cancer treatments. In this paper, we review machine learning and deep learning approaches in drug research. By analyzing the application of these methods on cancer cell lines and single-cell data and comparing the technical gap between single-cell sequencing data analysis and single-cell drug sensitivity analysis, we hope to explore the trends and potential of drug research at the single-cell data level and provide more inspiration for drug research at the single-cell level. We anticipate that this review will stimulate the innovative use of machine learning methods to address new challenges in precision medicine more broadly.
Collapse
Affiliation(s)
- Ren Qi
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
73
|
Bae H, Nam H. GraphATT-DTA: Attention-Based Novel Representation of Interaction to Predict Drug-Target Binding Affinity. Biomedicines 2022; 11:biomedicines11010067. [PMID: 36672575 PMCID: PMC9855982 DOI: 10.3390/biomedicines11010067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 12/06/2022] [Accepted: 12/20/2022] [Indexed: 12/29/2022] Open
Abstract
Drug-target binding affinity (DTA) prediction is an essential step in drug discovery. Drug-target protein binding occurs at specific regions between the protein and drug, rather than the entire protein and drug. However, existing deep-learning DTA prediction methods do not consider the interactions between drug substructures and protein sub-sequences. This work proposes GraphATT-DTA, a DTA prediction model that constructs the essential regions for determining interaction affinity between compounds and proteins, modeled with an attention mechanism for interpretability. We make the model consider the local-to-global interactions with the attention mechanism between compound and protein. As a result, GraphATT-DTA shows an improved prediction of DTA performance and interpretability compared with state-of-the-art models. The model is trained and evaluated with the Davis dataset, the human kinase dataset; an external evaluation is achieved with the independently proposed human kinase dataset from the BindingDB dataset.
Collapse
Affiliation(s)
- Haelee Bae
- AI Graduate School, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
| | - Hojung Nam
- AI Graduate School, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
- Center for AI-Applied High Efficiency Drug Discovery (AHEDD), Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
- Correspondence:
| |
Collapse
|
74
|
Huang D, He H, Ouyang J, Zhao C, Dong X, Xie J. Small molecule drug and biotech drug interaction prediction based on multi-modal representation learning. BMC Bioinformatics 2022; 23:561. [PMID: 36575376 PMCID: PMC9793529 DOI: 10.1186/s12859-022-05101-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 12/06/2022] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Drug-drug interactions (DDIs) occur when two or more drugs are taken simultaneously or successively. Early detection of adverse drug interactions can be essential in preventing medical errors and reducing healthcare costs. Many computational methods already predict interactions between small molecule drugs (SMDs). As the number of biotechnology drugs (BioDs) increases, so makes the threat of interactions between SMDs and BioDs. However, few computational methods are available to predict their interactions. RESULTS Considering the structural specificity and relational complexity of SMDs and BioDs, a novel multi-modal representation learning method called Multi-SBI is proposed to predict their interactions. First, multi-modal features are used to adequately represent the heterogeneous structure and complex relationships of SMDs and BioDs. Second, an undersampling method based on Positive-unlabeled learning (PU-sampling) is introduced to obtain negative samples with high confidence from the unlabeled data set. Finally, both learned representations of SMD and BioD are fed into DNN classifiers to predict their interaction events. In addition, we also conduct a retrospective analysis. CONCLUSIONS Our proposed multi-modal representation learning method can extract drug features more comprehensively in heterogeneous drugs. In addition, PU-sampling can effectively reduce the noise in the sampling procedure. Our proposed method significantly outperforms other state-of-the-art drug interaction prediction methods. In a retrospective analysis of DrugBank 5.1.0, 14 out of the 20 predictions with the highest confidence were validated in the latest version of DrugBank 5.1.8, demonstrating that Multi-SBI is a valuable tool for predicting new drug interactions through effectively extracting and learning heterogeneous drug features.
Collapse
Affiliation(s)
- Dingkai Huang
- grid.39436.3b0000 0001 2323 5732School of Computer Engineering and Science, Shanghai University, Shanghai, 200444 China
| | - Hongjian He
- grid.39436.3b0000 0001 2323 5732School of Computer Engineering and Science, Shanghai University, Shanghai, 200444 China
| | - Jiaming Ouyang
- grid.39436.3b0000 0001 2323 5732School of Computer Engineering and Science, Shanghai University, Shanghai, 200444 China
| | - Chang Zhao
- grid.39436.3b0000 0001 2323 5732School of Computer Engineering and Science, Shanghai University, Shanghai, 200444 China
| | - Xin Dong
- grid.39436.3b0000 0001 2323 5732School of Medicine, Shanghai University, Shanghai, 200444 China
| | - Jiang Xie
- grid.39436.3b0000 0001 2323 5732School of Computer Engineering and Science, Shanghai University, Shanghai, 200444 China
| |
Collapse
|
75
|
Wu J, Liu Z, Yang X, Lin Z. Improved compound-protein interaction site and binding affinity prediction using self-supervised protein embeddings. BMC Bioinformatics 2022; 23:543. [PMID: 36526969 PMCID: PMC9756525 DOI: 10.1186/s12859-022-05107-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 12/09/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Compound-protein interaction site and binding affinity predictions are crucial for drug discovery and drug design. In recent years, many deep learning-based methods have been proposed for predications related to compound-protein interaction. For protein inputs, how to make use of protein primary sequence and tertiary structure information has impact on prediction results. RESULTS In this study, we propose a deep learning model based on a multi-objective neural network, which involves a multi-objective neural network for compound-protein interaction site and binding affinity prediction. We used several kinds of self-supervised protein embeddings to enrich our protein inputs and used convolutional neural networks to extract features from them. Our results demonstrate that our model had improvements in terms of interaction site prediction and affinity prediction compared to previous models. In a case study, our model could better predict binding sites, which also showed its effectiveness. CONCLUSION These results suggest that our model could be a helpful tool for compound-protein related predictions.
Collapse
Affiliation(s)
- Jialin Wu
- grid.79703.3a0000 0004 1764 3838School of Biology and Biological Engineering, South China University of Technology, 382 East Outer Loop Road, University Park, Guangzhou, 510006 Guangdong China
| | - Zhe Liu
- grid.79703.3a0000 0004 1764 3838School of Biology and Biological Engineering, South China University of Technology, 382 East Outer Loop Road, University Park, Guangzhou, 510006 Guangdong China
| | - Xiaofeng Yang
- grid.79703.3a0000 0004 1764 3838School of Biology and Biological Engineering, South China University of Technology, 382 East Outer Loop Road, University Park, Guangzhou, 510006 Guangdong China
| | - Zhanglin Lin
- grid.79703.3a0000 0004 1764 3838School of Biology and Biological Engineering, South China University of Technology, 382 East Outer Loop Road, University Park, Guangzhou, 510006 Guangdong China
| |
Collapse
|
76
|
A deep learning method for predicting molecular properties and compound-protein interactions. J Mol Graph Model 2022; 117:108283. [PMID: 35994925 DOI: 10.1016/j.jmgm.2022.108283] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 07/19/2022] [Accepted: 07/26/2022] [Indexed: 01/14/2023]
Abstract
Predicting molecular properties and compound-protein interactions (CPIs) are two important areas of drug design and discovery. They are also an essential way to discover lead compounds in virtual screening. Recently, in silico methods based on deep learning have demonstrated excellent performance in various challenges. It is imperative to develop efficient computational methods to predict accurately both molecular properties and CPIs in drug research using deep learning techniques. In this paper, we propose a deep learning method applicable to both molecular property prediction and CPI prediction based on the idea that both are generally influenced by chemical structure and sequence information of compounds and proteins. Molecular properties are inferred by integrating the molecular structure and sequence information of compounds, and CPIs are predicted by integrating protein sequence and compound structure. The method combines topological structure and sequence fingerprint information of molecules, extracts adequately raw data features, and generates highly representative features for prediction. Molecular property prediction experiments were conducted on BACE, P53 and hERG datasets, and CPI prediction experiments were conducted on Human, C. elegans and KIBA datasets. MG-S achieves outperformance in molecular property prediction on P53, the differences in AUC, Precision and MCC are 0.030, 0.050 and 0.100, respectively, over the suboptimal baseline model, and provides consistently good results on BACE and hERG.The model also achieves impressive performance in CPI prediction, the differences in AUC, Precision and MCC on KIBA are 0.141, 0.138, 0.090 and 0.082, respectively, compared with the state-of-the-art models. The comprehensive results show that the MG-S model has higher performance, better classification ability, and faster convergence. MG-S will serve as a useful method to predict compound properties and CPIs in the early stages of drug design and discovery.Our code and datasets are available at: https://github.com/happay-ending/cpi_cpp.
Collapse
|
77
|
Huang L, Lin J, Liu R, Zheng Z, Meng L, Chen X, Li X, Wong KC. CoaDTI: multi-modal co-attention based framework for drug-target interaction annotation. Brief Bioinform 2022; 23:6770087. [PMID: 36274236 DOI: 10.1093/bib/bbac446] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 08/26/2022] [Accepted: 09/18/2022] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION The identification of drug-target interactions (DTIs) plays a vital role for in silico drug discovery, in which the drug is the chemical molecule, and the target is the protein residues in the binding pocket. Manual DTI annotation approaches remain reliable; however, it is notoriously laborious and time-consuming to test each drug-target pair exhaustively. Recently, the rapid growth of labelled DTI data has catalysed interests in high-throughput DTI prediction. Unfortunately, those methods highly rely on the manual features denoted by human, leading to errors. RESULTS Here, we developed an end-to-end deep learning framework called CoaDTI to significantly improve the efficiency and interpretability of drug target annotation. CoaDTI incorporates the Co-attention mechanism to model the interaction information from the drug modality and protein modality. In particular, CoaDTI incorporates transformer to learn the protein representations from raw amino acid sequences, and GraphSage to extract the molecule graph features from SMILES. Furthermore, we proposed to employ the transfer learning strategy to encode protein features by pre-trained transformer to address the issue of scarce labelled data. The experimental results demonstrate that CoaDTI achieves competitive performance on three public datasets compared with state-of-the-art models. In addition, the transfer learning strategy further boosts the performance to an unprecedented level. The extended study reveals that CoaDTI can identify novel DTIs such as reactions between candidate drugs and severe acute respiratory syndrome coronavirus 2-associated proteins. The visualization of co-attention scores can illustrate the interpretability of our model for mechanistic insights. AVAILABILITY Source code are publicly available at https://github.com/Layne-Huang/CoaDTI.
Collapse
Affiliation(s)
- Lei Huang
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR
| | - Jiecong Lin
- Department of Pathology, Harvard Medical School, Boston, USA.,Department of Computer Science, The University of Hong Kong, Hong Kong SAR
| | - Rui Liu
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR
| | - Lingkuan Meng
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR
| | - Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR.,Hong Kong Institute for Data Science, City University of Hong Kong, Hong Kong SAR
| |
Collapse
|
78
|
Askr H, Elgeldawi E, Aboul Ella H, Elshaier YAMM, Gomaa MM, Hassanien AE. Deep learning in drug discovery: an integrative review and future challenges. Artif Intell Rev 2022; 56:5975-6037. [PMID: 36415536 PMCID: PMC9669545 DOI: 10.1007/s10462-022-10306-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2022] [Indexed: 11/18/2022]
Abstract
Recently, using artificial intelligence (AI) in drug discovery has received much attention since it significantly shortens the time and cost of developing new drugs. Deep learning (DL)-based approaches are increasingly being used in all stages of drug development as DL technology advances, and drug-related data grows. Therefore, this paper presents a systematic Literature review (SLR) that integrates the recent DL technologies and applications in drug discovery Including, drug-target interactions (DTIs), drug-drug similarity interactions (DDIs), drug sensitivity and responsiveness, and drug-side effect predictions. We present a review of more than 300 articles between 2000 and 2022. The benchmark data sets, the databases, and the evaluation measures are also presented. In addition, this paper provides an overview of how explainable AI (XAI) supports drug discovery problems. The drug dosing optimization and success stories are discussed as well. Finally, digital twining (DT) and open issues are suggested as future research challenges for drug discovery problems. Challenges to be addressed, future research directions are identified, and an extensive bibliography is also included.
Collapse
Affiliation(s)
- Heba Askr
- Faculty of Computers and Artificial Intelligence, University of Sadat City, Sadat City, Egypt
| | - Enas Elgeldawi
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Heba Aboul Ella
- Faculty of Pharmacy and Drug Technology, Chinese University in Egypt (CUE), Cairo, Egypt
| | | | - Mamdouh M. Gomaa
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Aboul Ella Hassanien
- Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, Egypt
| |
Collapse
|
79
|
Chandler M, Jain S, Halman J, Hong E, Dobrovolskaia MA, Zakharov AV, Afonin KA. Artificial Immune Cell, AI-cell, a New Tool to Predict Interferon Production by Peripheral Blood Monocytes in Response to Nucleic Acid Nanoparticles. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2022; 18:e2204941. [PMID: 36216772 PMCID: PMC9671856 DOI: 10.1002/smll.202204941] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/15/2022] [Indexed: 06/16/2023]
Abstract
Nucleic acid nanoparticles, or NANPs, rationally designed to communicate with the human immune system, can offer innovative therapeutic strategies to overcome the limitations of traditional nucleic acid therapies. Each set of NANPs is unique in their architectural parameters and physicochemical properties, which together with the type of delivery vehicles determine the kind and the magnitude of their immune response. Currently, there are no predictive tools that would reliably guide the design of NANPs to the desired immunological outcome, a step crucial for the success of personalized therapies. Through a systematic approach investigating physicochemical and immunological profiles of a comprehensive panel of various NANPs, the research team developes and experimentally validates a computational model based on the transformer architecture able to predict the immune activities of NANPs. It is anticipated that the freely accessible computational tool that is called an "artificial immune cell," or AI-cell, will aid in addressing the current critical public health challenges related to safety criteria of nucleic acid therapies in a timely manner and promote the development of novel biomedical tools.
Collapse
Affiliation(s)
- Morgan Chandler
- Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Sankalp Jain
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD 20850, USA
| | - Justin Halman
- Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Enping Hong
- Nanotechnology Characterization Lab, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Marina A. Dobrovolskaia
- Nanotechnology Characterization Lab, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Alexey V. Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD 20850, USA
| | - Kirill A. Afonin
- Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
80
|
Kurata H, Tsukiyama S. ICAN: Interpretable cross-attention network for identifying drug and target protein interactions. PLoS One 2022; 17:e0276609. [PMID: 36279284 PMCID: PMC9591068 DOI: 10.1371/journal.pone.0276609] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 10/10/2022] [Indexed: 11/18/2022] Open
Abstract
Drug-target protein interaction (DTI) identification is fundamental for drug discovery and drug repositioning, because therapeutic drugs act on disease-causing proteins. However, the DTI identification process often requires expensive and time-consuming tasks, including biological experiments involving large numbers of candidate compounds. Thus, a variety of computation approaches have been developed. Of the many approaches available, chemo-genomics feature-based methods have attracted considerable attention. These methods compute the feature descriptors of drugs and proteins as the input data to train machine and deep learning models to enable accurate prediction of unknown DTIs. In addition, attention-based learning methods have been proposed to identify and interpret DTI mechanisms. However, improvements are needed for enhancing prediction performance and DTI mechanism elucidation. To address these problems, we developed an attention-based method designated the interpretable cross-attention network (ICAN), which predicts DTIs using the Simplified Molecular Input Line Entry System of drugs and amino acid sequences of target proteins. We optimized the attention mechanism architecture by exploring the cross-attention or self-attention, attention layer depth, and selection of the context matrixes from the attention mechanism. We found that a plain attention mechanism that decodes drug-related protein context features without any protein-related drug context features effectively achieved high performance. The ICAN outperformed state-of-the-art methods in several metrics on the DAVIS dataset and first revealed with statistical significance that some weighted sites in the cross-attention weight matrix represent experimental binding sites, thus demonstrating the high interpretability of the results. The program is freely available at https://github.com/kuratahiroyuki/ICAN.
Collapse
Affiliation(s)
- Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
- * E-mail:
| | - Sho Tsukiyama
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
| |
Collapse
|
81
|
Kakoti BB, Bezbaruah R, Ahmed N. Therapeutic drug repositioning with special emphasis on neurodegenerative diseases: Threats and issues. Front Pharmacol 2022; 13:1007315. [PMID: 36263141 PMCID: PMC9574100 DOI: 10.3389/fphar.2022.1007315] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 09/12/2022] [Indexed: 11/21/2022] Open
Abstract
Drug repositioning or repurposing is the process of discovering leading-edge indications for authorized or declined/abandoned molecules for use in different diseases. This approach revitalizes the traditional drug discovery method by revealing new therapeutic applications for existing drugs. There are numerous studies available that highlight the triumph of several drugs as repurposed therapeutics. For example, sildenafil to aspirin, thalidomide to adalimumab, and so on. Millions of people worldwide are affected by neurodegenerative diseases. According to a 2021 report, the Alzheimer's disease Association estimates that 6.2 million Americans are detected with Alzheimer's disease. By 2030, approximately 1.2 million people in the United States possibly acquire Parkinson's disease. Drugs that act on a single molecular target benefit people suffering from neurodegenerative diseases. Current pharmacological approaches, on the other hand, are constrained in their capacity to unquestionably alter the course of the disease and provide patients with inadequate and momentary benefits. Drug repositioning-based approaches appear to be very pertinent, expense- and time-reducing strategies for the enhancement of medicinal opportunities for such diseases in the current era. Kinase inhibitors, for example, which were developed for various oncology indications, demonstrated significant neuroprotective effects in neurodegenerative diseases. This review expounds on the classical and recent examples of drug repositioning at various stages of drug development, with a special focus on neurodegenerative disorders and the aspects of threats and issues viz. the regulatory, scientific, and economic aspects.
Collapse
Affiliation(s)
- Bibhuti Bhusan Kakoti
- Department of Pharmaceutical Sciences, Faculty of Science and Engineering, Dibrugarh University, Dibrugarh, India
| | | | | |
Collapse
|
82
|
Chen H, Li D, Liao J, Wei L, Wei L. MultiscaleDTA: a multiscale-based method with a self-attention mechanism for drug-target binding affinity prediction. Methods 2022; 207:103-109. [PMID: 36155250 DOI: 10.1016/j.ymeth.2022.09.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Revised: 09/15/2022] [Accepted: 09/19/2022] [Indexed: 11/28/2022] Open
Abstract
The task of predicting drug-target affinity (DTA) plays an increasingly important role at the early stage of in silico drug discovery and development. Currently, a variety of machine learning-based methods have been presented for DTA prediction and achieved outstanding performance, which is beneficial for speeding up the development of new drugs. However, most convolutional neural networks (CNNs) based methods ignore the significance of information from CNN layers with different scales to DTA prediction. In addition, each feature provides different contributions to the final task. Therefore, in this study, we propose a novel end-to-end deep learning-based framework, called MultiscaleDTA, to predict drug-target binding affinity. MultiscaleDTA incorporates multi-scale CNNs and a self-attention mechanism to capture multi-scale and comprehensive features for characterizing the intrinsic properties of drugs and targets. Extensive experimental results on both regression and binary classification tasks demonstrate that MultiscaleDTA has achieved competitive performance compared to state-of-the-art methods.
Collapse
Affiliation(s)
- Haoyang Chen
- School of Mathematics and Statistics, Hainan Normal University, Hainan, China; School of Software, Shandong University, Jinan, China
| | - Dahe Li
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Jiaqi Liao
- School of Mathematics and Statistics, Hainan Normal University, Hainan, China
| | - Lesong Wei
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.
| | - Leyi Wei
- School of Mathematics and Statistics, Hainan Normal University, Hainan, China; School of Software, Shandong University, Jinan, China.
| |
Collapse
|
83
|
Pu Y, Li J, Tang J, Guo F. DeepFusionDTA: Drug-Target Binding Affinity Prediction With Information Fusion and Hybrid Deep-Learning Ensemble Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2760-2769. [PMID: 34379594 DOI: 10.1109/tcbb.2021.3103966] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Identification of drug-target interaction (DTI) is the most important issue in the broad field of drug discovery. Using purely biological experiments to verify drug-target binding profiles takes lots of time and effort, so computational technologies for this task obviously have great benefits in reducing the drug search space. Most of computational methods to predict DTI are proposed to solve a binary classification problem, which ignore the influence of binding strength. Therefore, drug-target binding affinity prediction is still a challenging issue. Currently, lots of studies only extract sequence information that lacks feature-rich representation, but we consider more spatial features in order to merge various data in drug and target spaces. In this study, we propose a two-stage deep neural network ensemble model for detecting drug-target binding affinity, called DeepFusionDTA, via various information analysis modules. First stage is to utilize sequence and structure information to generate fusion feature map of candidate protein and drug pair through various analysis modules based deep learning. Second stage is to apply bagging-based ensemble learning strategy for regression prediction, and we obtain outstanding results by combining the advantages of various algorithms in efficient feature abstraction and regression calculation. Importantly, we evaluate our novel method, DeepFusionDTA, which delivers 1.5 percent CI increase on KIBA dataset and 1.0 percent increase on Davis dataset, by comparing with existing prediction tools, DeepDTA. Furthermore, the ideas we have offered can be applied to in-silico screening of the interaction space, to provide novel DTIs which can be experimentally pursued. The codes and data are available from https://github.com/guofei-tju/DeepFusionDTA.
Collapse
|
84
|
Kang H, Goo S, Lee H, Chae JW, Yun HY, Jung S. Fine-tuning of BERT Model to Accurately Predict Drug–Target Interactions. Pharmaceutics 2022; 14:pharmaceutics14081710. [PMID: 36015336 PMCID: PMC9414546 DOI: 10.3390/pharmaceutics14081710] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 08/09/2022] [Accepted: 08/11/2022] [Indexed: 11/16/2022] Open
Abstract
The identification of optimal drug candidates is very important in drug discovery. Researchers in biology and computational sciences have sought to use machine learning (ML) to efficiently predict drug–target interactions (DTIs). In recent years, according to the emerging usefulness of pretrained models in natural language process (NLPs), pretrained models are being developed for chemical compounds and target proteins. This study sought to improve DTI predictive models using a Bidirectional Encoder Representations from the Transformers (BERT)-pretrained model, ChemBERTa, for chemical compounds. Pretraining features the use of a simplified molecular-input line-entry system (SMILES). We also employ the pretrained ProBERT for target proteins (pretraining employed the amino acid sequences). The BIOSNAP, DAVIS, and BindingDB databases (DBs) were used (alone or together) for learning. The final model, taught by both ChemBERTa and ProtBert and the integrated DBs, afforded the best DTI predictive performance to date based on the receiver operating characteristic area under the curve (AUC) and precision-recall-AUC values compared with previous models. The performance of the final model was verified using a specific case study on 13 pairs of subtrates and the metabolic enzyme cytochrome P450 (CYP). The final model afforded excellent DTI prediction. As the real-world interactions between drugs and target proteins are expected to exhibit specific patterns, pretraining with ChemBERTa and ProtBert could teach such patterns. Learning the patterns of such interactions would enhance DTI accuracy if learning employs large, well-balanced datasets that cover all relationships between drugs and target proteins.
Collapse
Affiliation(s)
- Hyeunseok Kang
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea
| | - Sungwoo Goo
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea
| | - Hyunjung Lee
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea
| | - Jung-woo Chae
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea
- College of Pharmacy, Chungnam National University, Daejeon 34134, Korea
- Correspondence: (J.-w.C.); (H.-y.Y.); (S.J.)
| | - Hwi-yeol Yun
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea
- College of Pharmacy, Chungnam National University, Daejeon 34134, Korea
- Correspondence: (J.-w.C.); (H.-y.Y.); (S.J.)
| | - Sangkeun Jung
- Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea
- Department of Computer Convergence, Chungnam National University, Daejeon 34134, Korea
- Correspondence: (J.-w.C.); (H.-y.Y.); (S.J.)
| |
Collapse
|
85
|
Pandey M, Radaeva M, Mslati H, Garland O, Fernandez M, Ester M, Cherkasov A. Ligand Binding Prediction Using Protein Structure Graphs and Residual Graph Attention Networks. Molecules 2022; 27:molecules27165114. [PMID: 36014351 PMCID: PMC9416537 DOI: 10.3390/molecules27165114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 08/03/2022] [Accepted: 08/09/2022] [Indexed: 11/25/2022] Open
Abstract
Computational prediction of ligand–target interactions is a crucial part of modern drug discovery as it helps to bypass high costs and labor demands of in vitro and in vivo screening. As the wealth of bioactivity data accumulates, it provides opportunities for the development of deep learning (DL) models with increasing predictive powers. Conventionally, such models were either limited to the use of very simplified representations of proteins or ineffective voxelization of their 3D structures. Herein, we present the development of the PSG-BAR (Protein Structure Graph-Binding Affinity Regression) approach that utilizes 3D structural information of the proteins along with 2D graph representations of ligands. The method also introduces attention scores to selectively weight protein regions that are most important for ligand binding. Results: The developed approach demonstrates the state-of-the-art performance on several binding affinity benchmarking datasets. The attention-based pooling of protein graphs enables identification of surface residues as critical residues for protein–ligand binding. Finally, we validate our model predictions against an experimental assay on a viral main protease (Mpro)—the hallmark target of SARS-CoV-2 coronavirus.
Collapse
Affiliation(s)
- Mohit Pandey
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
| | - Mariia Radaeva
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
| | - Hazem Mslati
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
| | - Olivia Garland
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
| | - Michael Fernandez
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
| | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Artem Cherkasov
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
- Correspondence:
| |
Collapse
|
86
|
El-Behery H, Attia AF, El-Fishawy N, Torkey H. An ensemble-based drug-target interaction prediction approach using multiple feature information with data balancing. J Biol Eng 2022; 16:21. [PMID: 35941686 PMCID: PMC9361677 DOI: 10.1186/s13036-022-00296-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 06/02/2022] [Indexed: 11/16/2022] Open
Abstract
Background Recently, drug repositioning has received considerable attention for its advantage to pharmaceutical industries in drug development. Artificial intelligence techniques have greatly enhanced drug reproduction by discovering therapeutic drug profiles, side effects, and new target proteins. However, as the number of drugs increases, their targets and enormous interactions produce imbalanced data that might not be preferable as an input to a prediction model immediately. Methods This paper proposes a novel scheme for predicting drug–target interactions (DTIs) based on drug chemical structures and protein sequences. The drug Morgan fingerprint, drug constitutional descriptors, protein amino acid composition, and protein dipeptide composition were employed to extract the drugs and protein’s characteristics. Then, the proposed approach for extracting negative samples using a support vector machine one-class classifier was developed to tackle the imbalanced data problem feature sets from the drug–target dataset. Negative and positive samplings were constructed and fed into different prediction algorithms to identify DTIs. A 10-fold CV validation test procedure was applied to assess the predictability of the proposed method, in addition to the study of the effectiveness of the chemical and physical features in the evaluation and discovery of the drug–target interactions. Results Our experimental model outperformed existing techniques concerning the curve for receiver operating characteristic (AUC), accuracy, precision, recall F-score, mean square error, and MCC. The results obtained by the AdaBoost classifier enhanced prediction accuracy by 2.74%, precision by 1.98%, AUC by 1.14%, F-score by 3.53%, and MCC by 4.54% over existing methods.
Collapse
Affiliation(s)
- Heba El-Behery
- Department of Computer Science and Engineering, Faculty of Engineering, Kafrelsheikh University, Kafr_El_Sheikh, Egypt.
| | - Abdel-Fattah Attia
- Department of Computer Science and Engineering, Faculty of Engineering, Kafrelsheikh University, Kafr_El_Sheikh, Egypt
| | - Nawal El-Fishawy
- Computer Science & Engineering Department, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt
| | - Hanaa Torkey
- Computer Science & Engineering Department, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt
| |
Collapse
|
87
|
Bui-Thi D, Rivière E, Meysman P, Laukens K. Predicting compound-protein interaction using hierarchical graph convolutional networks. PLoS One 2022; 17:e0258628. [PMID: 35862351 PMCID: PMC9302762 DOI: 10.1371/journal.pone.0258628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 06/12/2022] [Indexed: 11/18/2022] Open
Abstract
Motivation
Convolutional neural networks have enabled unprecedented breakthroughs in a variety of computer vision tasks. They have also drawn much attention from other domains, including drug discovery and drug development. In this study, we develop a computational method based on convolutional neural networks to tackle a fundamental question in drug discovery and development, i.e. the prediction of compound-protein interactions based on compound structure and protein sequence. We propose a hierarchical graph convolutional network (HGCN) to encode small molecules. The HGCN aggregates a molecule embedding from substructure embeddings, which are synthesized from atom embeddings. As small molecules usually share substructures, computing a molecule embedding from those common substructures allows us to learn better generic models. We then combined the HGCN with a one-dimensional convolutional network to construct a complete model for predicting compound-protein interactions. Furthermore we apply an explanation technique, Grad-CAM, to visualize the contribution of each amino acid into the prediction.
Results
Experiments using different datasets show the improvement of our model compared to other GCN-based methods and a sequence based method, DeepDTA, in predicting compound-protein interactions. Each prediction made by the model is also explainable and can be used to identify critical residues mediating the interaction.
Collapse
Affiliation(s)
- Danh Bui-Thi
- Adrem Data Lab, University of Antwerp, Antwerp, Belgium
| | | | | | - Kris Laukens
- Adrem Data Lab, University of Antwerp, Antwerp, Belgium
- * E-mail:
| |
Collapse
|
88
|
Park S, Han H, Kim H, Choi S. Machine Learning Applications for Chemical Reactions. Chem Asian J 2022; 17:e202200203. [PMID: 35471772 PMCID: PMC9401034 DOI: 10.1002/asia.202200203] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 04/26/2022] [Indexed: 11/30/2022]
Abstract
Machine learning (ML) approaches have enabled rapid and efficient molecular property predictions as well as the design of new novel materials. In addition to great success for molecular problems, ML techniques are applied to various chemical reaction problems that require huge costs to solve with the existing experimental and simulation methods. In this review, starting with basic representations of chemical reactions, we summarized recent achievements of ML studies on two different problems; predicting reaction properties and synthetic routes. The various ML models are used to predict physical properties related to chemical reaction properties (e. g. thermodynamic changes, activation barriers, and reaction rates). Furthermore, the predictions of reactivity, self-optimization of reaction, and designing retrosynthetic reaction paths are also tackled by ML approaches. Herein we illustrate various ML strategies utilized in the various context of chemical reaction studies.
Collapse
Affiliation(s)
- Sanggil Park
- Department of ChemistryIncheon Natoinal University and Research Institute of Basic SciencesIncheon22012Republic of Korea
| | - Herim Han
- Digital Bio R&D CenterMediazenSeoul07789Republic of Korea
- Department of Polymer Science and EngineeringDankook UniversityYongin, Gyeonggi16890Republic of Korea
| | - Hyungjun Kim
- Department of ChemistryIncheon Natoinal University and Research Institute of Basic SciencesIncheon22012Republic of Korea
| | - Sunghwan Choi
- Division of National SupercomputingKorea Institute of Science and Technology InformationDaejeon34141Republic of Korea
| |
Collapse
|
89
|
Yazdani-Jahromi M, Yousefi N, Tayebi A, Kolanthai E, Neal CJ, Seal S, Garibay OO. AttentionSiteDTI: an interpretable graph-based model for drug-target interaction prediction using NLP sentence-level relation classification. Brief Bioinform 2022; 23:6640006. [PMID: 35817396 PMCID: PMC9294423 DOI: 10.1093/bib/bbac272] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 05/01/2022] [Accepted: 06/10/2022] [Indexed: 11/14/2022] Open
Abstract
In this study, we introduce an interpretable graph-based deep learning prediction model, AttentionSiteDTI, which utilizes protein binding sites along with a self-attention mechanism to address the problem of drug-target interaction prediction. Our proposed model is inspired by sentence classification models in the field of Natural Language Processing, where the drug-target complex is treated as a sentence with relational meaning between its biochemical entities a.k.a. protein pockets and drug molecule. AttentionSiteDTI enables interpretability by identifying the protein binding sites that contribute the most toward the drug-target interaction. Results on three benchmark datasets show improved performance compared with the current state-of-the-art models. More significantly, unlike previous studies, our model shows superior performance, when tested on new proteins (i.e. high generalizability). Through multidisciplinary collaboration, we further experimentally evaluate the practical potential of our proposed approach. To achieve this, we first computationally predict the binding interactions between some candidate compounds and a target protein, then experimentally validate the binding interactions for these pairs in the laboratory. The high agreement between the computationally predicted and experimentally observed (measured) drug-target interactions illustrates the potential of our method as an effective pre-screening tool in drug repurposing applications.
Collapse
Affiliation(s)
- Mehdi Yazdani-Jahromi
- Industrial Engineering and Management Systems, University of Central Florida, Street, 32816, 4000 Central Florida Blvd. Orlando, USA
| | - Niloofar Yousefi
- Industrial Engineering and Management Systems, University of Central Florida, Street, 32816, 4000 Central Florida Blvd. Orlando, USA
| | - Aida Tayebi
- Industrial Engineering and Management Systems, University of Central Florida, Street, 32816, 4000 Central Florida Blvd. Orlando, USA
| | - Elayaraja Kolanthai
- College of Medicine, Bionix Cluster, University of Central Florida, 4000 Central Florida Blvd. Orlando, 32816, Florida, USA
| | - Craig J Neal
- College of Medicine, Bionix Cluster, University of Central Florida, 4000 Central Florida Blvd. Orlando, 32816, Florida, USA
| | - Sudipta Seal
- College of Medicine, Bionix Cluster, University of Central Florida, 4000 Central Florida Blvd. Orlando, 32816, Florida, USA.,Advanced Materials Processing and Analysis Center, Dept. of Materials Science and Engineering, University of Central Florida, 4000 Central Florida Blvd. Orlando, 32816, Florida, USA
| | - Ozlem Ozmen Garibay
- Industrial Engineering and Management Systems, University of Central Florida, Street, 32816, 4000 Central Florida Blvd. Orlando, USA
| |
Collapse
|
90
|
Dasgupta A, Bakshi A, Mukherjee S, Das K, Talukdar S, Chatterjee P, Mondal S, Das P, Ghosh S, Som A, Roy P, Kundu R, Sarkar A, Biswas A, Paul K, Basak S, Manna K, Saha C, Mukhopadhyay S, Bhattacharyya NP, De RK. Epidemiological challenges in pandemic coronavirus disease (COVID-19): Role of artificial intelligence. WILEY INTERDISCIPLINARY REVIEWS. DATA MINING AND KNOWLEDGE DISCOVERY 2022; 12:e1462. [PMID: 35942397 PMCID: PMC9350133 DOI: 10.1002/widm.1462] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 03/28/2022] [Accepted: 04/28/2022] [Indexed: 05/02/2023]
Abstract
World is now experiencing a major health calamity due to the coronavirus disease (COVID-19) pandemic, caused by the severe acute respiratory syndrome coronavirus clade 2. The foremost challenge facing the scientific community is to explore the growth and transmission capability of the virus. Use of artificial intelligence (AI), such as deep learning, in (i) rapid disease detection from x-ray or computed tomography (CT) or high-resolution CT (HRCT) images, (ii) accurate prediction of the epidemic patterns and their saturation throughout the globe, (iii) forecasting the disease and psychological impact on the population from social networking data, and (iv) prediction of drug-protein interactions for repurposing the drugs, has attracted much attention. In the present study, we describe the role of various AI-based technologies for rapid and efficient detection from CT images complementing quantitative real-time polymerase chain reaction and immunodiagnostic assays. AI-based technologies to anticipate the current pandemic pattern, prevent the spread of disease, and face mask detection are also discussed. We inspect how the virus transmits depending on different factors. We investigate the deep learning technique to assess the affinity of the most probable drugs to treat COVID-19. This article is categorized under:Application Areas > Health CareAlgorithmic Development > Biological Data MiningTechnologies > Machine Learning.
Collapse
Affiliation(s)
- Abhijit Dasgupta
- Department of Data Science, School of Interdisciplinary StudiesUniversity of Kalyani, KalyaniNadiaWest BengalIndia
| | - Abhisek Bakshi
- Department of Information TechnologyBengal Institute of TechnologyKolkataWest BengalIndia
| | - Srijani Mukherjee
- Department of Data Science, School of Interdisciplinary StudiesUniversity of Kalyani, KalyaniNadiaWest BengalIndia
| | - Kuntal Das
- Department of Data Science, School of Interdisciplinary StudiesUniversity of Kalyani, KalyaniNadiaWest BengalIndia
| | - Soumyajeet Talukdar
- Department of Data Science, School of Interdisciplinary StudiesUniversity of Kalyani, KalyaniNadiaWest BengalIndia
| | - Pratyayee Chatterjee
- Department of Data Science, School of Interdisciplinary StudiesUniversity of Kalyani, KalyaniNadiaWest BengalIndia
| | - Sagnik Mondal
- Department of Data Science, School of Interdisciplinary StudiesUniversity of Kalyani, KalyaniNadiaWest BengalIndia
| | - Puspita Das
- Department of Data Science, School of Interdisciplinary StudiesUniversity of Kalyani, KalyaniNadiaWest BengalIndia
| | - Subhrojit Ghosh
- Department of Data Science, School of Interdisciplinary StudiesUniversity of Kalyani, KalyaniNadiaWest BengalIndia
| | - Archisman Som
- Department of Data Science, School of Interdisciplinary StudiesUniversity of Kalyani, KalyaniNadiaWest BengalIndia
| | - Pritha Roy
- Department of Data Science, School of Interdisciplinary StudiesUniversity of Kalyani, KalyaniNadiaWest BengalIndia
| | - Rima Kundu
- Department of Data Science, School of Interdisciplinary StudiesUniversity of Kalyani, KalyaniNadiaWest BengalIndia
| | - Akash Sarkar
- Department of Data Science, School of Interdisciplinary StudiesUniversity of Kalyani, KalyaniNadiaWest BengalIndia
| | - Arnab Biswas
- Department of Data Science, School of Interdisciplinary StudiesUniversity of Kalyani, KalyaniNadiaWest BengalIndia
| | - Karnelia Paul
- Department of BiotechnologyUniversity of CalcuttaKolkataWest BengalIndia
| | - Sujit Basak
- Department of Physiology and BiophysicsStony Brook UniversityStony BrookNew YorkUSA
| | - Krishnendu Manna
- Department of Food and NutritionUniversity of Kalyani, KalyaniNadiaWest BengalIndia
| | - Chinmay Saha
- Department of Genome Science, School of Interdisciplinary StudiesUniversity of Kalyani, KalyaniNadiaWest BengalIndia
| | - Satinath Mukhopadhyay
- Department of Endocrinology and MetabolismInstitute of Post Graduate Medical Education and Research and Seth Sukhlal Karnani Memorial HospitalKolkataWest BengalIndia
| | - Nitai P. Bhattacharyya
- Department of Endocrinology and MetabolismInstitute of Post Graduate Medical Education and Research and Seth Sukhlal Karnani Memorial HospitalKolkataWest BengalIndia
| | - Rajat K. De
- Machine Intelligence UnitIndian Statistical InstituteKolkataWest BengalIndia
| |
Collapse
|
91
|
Zhao Q, Yang M, Cheng Z, Li Y, Wang J. Biomedical Data and Deep Learning Computational Models for Predicting Compound-Protein Relations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2092-2110. [PMID: 33769935 DOI: 10.1109/tcbb.2021.3069040] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The identification of compound-protein relations (CPRs), which includes compound-protein interactions (CPIs) and compound-protein affinities (CPAs), is critical to drug development. A common method for compound-protein relation identification is the use of in vitro screening experiments. However, the number of compounds and proteins is massive, and in vitro screening experiments are labor-intensive, expensive, and time-consuming with high failure rates. Researchers have developed a computational field called virtual screening (VS) to aid experimental drug development. These methods utilize experimentally validated biological interaction information to generate datasets and use the physicochemical and structural properties of compounds and target proteins as input information to train computational prediction models. At present, deep learning has been widely used in computer vision and natural language processing and has experienced epoch-making progress. At the same time, deep learning has also been used in the field of biomedicine widely, and the prediction of CPRs based on deep learning has developed rapidly and has achieved good results. The purpose of this study is to investigate and discuss the latest applications of deep learning techniques in CPR prediction. First, we describe the datasets and feature engineering (i.e., compound and protein representations and descriptors) commonly used in CPR prediction methods. Then, we review and classify recent deep learning approaches in CPR prediction. Next, a comprehensive comparison is performed to demonstrate the prediction performance of representative methods on classical datasets. Finally, we discuss the current state of the field, including the existing challenges and our proposed future directions. We believe that this investigation will provide sufficient references and insight for researchers to understand and develop new deep learning methods to enhance CPR predictions.
Collapse
|
92
|
Xu X, Xuan P, Zhang T, Chen B, Sheng N. Inferring Drug-Target Interactions Based on Random Walk and Convolutional Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2294-2304. [PMID: 33729947 DOI: 10.1109/tcbb.2021.3066813] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Computational strategies for identifying new drug-target interactions (DTIs) can guide the process of drug discovery, reduce the cost and time of drug development, and thus promote drug development. Most recently proposed methods predict DTIs via integration of heterogeneous data related to drugs and proteins. However, previous methods have failed to deeply integrate these heterogeneous data and learn deep feature representations of multiple original similarities and interactions related to drugs and proteins. We therefore constructed a heterogeneous network by integrating a variety of connection relationships about drugs and proteins, including drugs, proteins, and drug side effects, as well as their similarities, interactions, and associations. A DTI prediction method based on random walk and convolutional neural network was proposed and referred to as DTIPred. DTIPred not only takes advantage of various original features related to drugs and proteins, but also integrates the topological information of heterogeneous networks. The prediction model is composed of two sides and learns the deep feature representation of a drug-protein pair. On the left side, random walk with restart is applied to learn the topological vectors of drug and protein nodes. The topological representation is further learned by the constructed deep learning frame based on convolutional neural network. The right side of the model focuses on integrating multiple original similarities and interactions of drugs and proteins to learn the original representation of the drug-protein pair. The results of cross-validation experiments demonstrate that DTIPred achieves better prediction performance than several state-of-the-art methods. During the validation process, DTIPred can retrieve more actual drug-protein interactions within the top part of the predicted results, which may be more helpful to biologists. In addition, case studies on five drugs further demonstrate the ability of DTIPred to discover potential drug-protein interactions.
Collapse
|
93
|
Detecting Drug–Target Interactions with Feature Similarity Fusion and Molecular Graphs. BIOLOGY 2022; 11:biology11070967. [PMID: 36101348 PMCID: PMC9312204 DOI: 10.3390/biology11070967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/12/2022] [Accepted: 06/24/2022] [Indexed: 12/03/2022]
Abstract
Simple Summary Accurate identification of potential targets for drugs to interact with can accelerate drug development. The identification of drug–target interactions can provide insights into hidden drug efficacy. This paper presents a prediction model based on feature similarity fusion that can identify crucial features of drugs and targets to help predict drug–target interactions. Abstract The key to drug discovery is the identification of a target and a corresponding drug compound. Effective identification of drug–target interactions facilitates the development of drug discovery. In this paper, drug similarity and target similarity are considered, and graphical representations are used to extract internal structural information and intermolecular interaction information about drugs and targets. First, drug similarity and target similarity are fused using the similarity network fusion (SNF) method. Then, the graph isomorphic network (GIN) is used to extract the features with information about the internal structure of drug molecules. For target proteins, feature extraction is carried out using TextCNN to efficiently capture the features of target protein sequences. Three different divisions (CVD, CVP, CVT) are used on the standard dataset, and experiments are carried out separately to validate the performance of the model for drug–target interaction prediction. The experimental results show that our method achieves better results on AUC and AUPR. The docking results also show the superiority of the proposed model in predicting drug–target interactions.
Collapse
|
94
|
Sharifabad MM, Sheikhpour R, Gharaghani S. Drug-target interaction prediction using reliable negative samples and effective feature selection methods. J Pharmacol Toxicol Methods 2022; 116:107191. [PMID: 35738316 DOI: 10.1016/j.vascn.2022.107191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 06/04/2022] [Accepted: 06/14/2022] [Indexed: 11/28/2022]
Abstract
Machine learning-based approaches in the field of drug discovery have dramatically reduced the time and cost of the laboratory process of detecting potential drug-target interactions (DTIs). Standard binary classifiers require both positive and negative samples in the training and validation phases. One of the major challenges in the DTI context is the lack of access to non-interacting pairs as negative samples in the learning process. Many recent studies in this field have randomly selected negative samples from unlabeled drug-target pairs. Therefore, due to the probability of the presence of unknown positive samples in a set considered as negative samples, the model results may be affected and appear with a high rate of false positive. In this study, an algorithm called Reliable Non-Interacting Drug-Target Pairs (RNIDTP) is proposed to select reliable negative samples and an efficient algorithm to select relevant features for drug-target interaction prediction. To validate the performance of the proposed RNIDTP algorithm in the selection of negative samples, a benchmark drug-target interactions dataset is used. The results demonstrate the superiority of the proposed algorithm compared with other algorithms in most cases. The results also indicate that by using an appropriate algorithm for the selection of negative samples, the performance of the learning process is significantly increased compared to random selection.
Collapse
Affiliation(s)
- Mohammad Morovvati Sharifabad
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Razieh Sheikhpour
- Department of Computer Engineering, Faculty of Engineering, Ardakan University, P.O. Box 184, Ardakan, Iran.
| | - Sajjad Gharaghani
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
95
|
Yeh SJ, Yeh TY, Chen BS. Systems Drug Discovery for Diffuse Large B Cell Lymphoma Based on Pathogenic Molecular Mechanism via Big Data Mining and Deep Learning Method. Int J Mol Sci 2022; 23:ijms23126732. [PMID: 35743172 PMCID: PMC9224183 DOI: 10.3390/ijms23126732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/10/2022] [Accepted: 06/15/2022] [Indexed: 02/01/2023] Open
Abstract
Diffuse large B cell lymphoma (DLBCL) is an aggressive heterogeneous disease. The most common subtypes of DLBCL include germinal center b-cell (GCB) type and activated b-cell (ABC) type. To learn more about the pathogenesis of two DLBCL subtypes (i.e., DLBCL ABC and DLBCL GCB), we firstly construct a candidate genome-wide genetic and epigenetic network (GWGEN) by big database mining. With the help of two DLBCL subtypes’ genome-wide microarray data, we identify their real GWGENs via system identification and model order selection approaches. Afterword, the core GWGENs of two DLBCL subtypes could be extracted from real GWGENs by principal network projection (PNP) method. By comparing core signaling pathways and investigating pathogenic mechanisms, we are able to identify pathogenic biomarkers as drug targets for DLBCL ABC and DLBCL GCD, respectively. Furthermore, we do drug discovery considering drug-target interaction ability, drug regulation ability, and drug toxicity. Among them, a deep neural network (DNN)-based drug-target interaction (DTI) model is trained in advance to predict potential drug candidates holding higher probability to interact with identified biomarkers. Consequently, two drug combinations are proposed to alleviate DLBCL ABC and DLBCL GCB, respectively.
Collapse
|
96
|
Zhu S, Bai Q, Li L, Xu T. Drug repositioning in drug discovery of T2DM and repositioning potential of antidiabetic agents. Comput Struct Biotechnol J 2022; 20:2839-2847. [PMID: 35765655 PMCID: PMC9189996 DOI: 10.1016/j.csbj.2022.05.057] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/30/2022] [Accepted: 05/30/2022] [Indexed: 12/19/2022] Open
Abstract
Repositioning or repurposing drugs account for a substantial part of entering approval pipeline drugs, which indicates that drug repositioning has huge market potential and value. Computational technologies such as machine learning methods have accelerated the process of drug repositioning in the last few decades years. The repositioning potential of type 2 diabetes mellitus (T2DM) drugs for various diseases such as cancer, neurodegenerative diseases, and cardiovascular diseases have been widely studied. Hence, the related summary about repurposing antidiabetic drugs is of great significance. In this review, we focus on the machine learning methods for the development of new T2DM drugs and give an overview of the repurposing potential of the existing antidiabetic agents.
Collapse
Affiliation(s)
- Sha Zhu
- Key Lab of Preclinical Study for New Drugs of Gansu Province, Institute of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, PR China
| | - Qifeng Bai
- Key Lab of Preclinical Study for New Drugs of Gansu Province, Institute of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, PR China
- Corresponding author.
| | | | | |
Collapse
|
97
|
Zhao L, Zhu Y, Wang J, Wen N, Wang C, Cheng L. A brief review of protein-ligand interaction prediction. Comput Struct Biotechnol J 2022; 20:2831-2838. [PMID: 35765652 PMCID: PMC9189993 DOI: 10.1016/j.csbj.2022.06.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/30/2022] [Accepted: 06/01/2022] [Indexed: 01/21/2023] Open
Abstract
The task of identifying protein–ligand interactions (PLIs) plays a prominent role in the field of drug discovery. However, it is infeasible to identify potential PLIs via costly and laborious in vitro experiments. There is a need to develop PLI computational prediction approaches to speed up the drug discovery process. In this review, we summarize a brief introduction to various computation-based PLIs. We discuss these approaches, in particular, machine learning-based methods, with illustrations of different emphases based on mainstream trends. Moreover, we analyzed three research dynamics that can be further explored in future studies.
Collapse
Affiliation(s)
- Lingling Zhao
- Faculty of Computing, Harbin Institute of Technology, Harbin, China
| | - Yan Zhu
- Faculty of Computing, Harbin Institute of Technology, Harbin, China
| | - Junjie Wang
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Naifeng Wen
- School of Mechanical and Electrical Engineering, Dalian Minzu University, Dalian, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, China
- Corresponding authors.
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- NHC and CAMS Key Laboratory of Molecular Probe and Targeted Theranostics, Harbin Medical University, Harbin, China
- Corresponding authors.
| |
Collapse
|
98
|
Lee WY, Lee CY, Lee JS, Kim CE. Identifying Candidate Flavonoids for Non-Alcoholic Fatty Liver Disease by Network-Based Strategy. Front Pharmacol 2022; 13:892559. [PMID: 35721123 PMCID: PMC9204489 DOI: 10.3389/fphar.2022.892559] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 04/22/2022] [Indexed: 11/16/2022] Open
Abstract
Nonalcoholic fatty liver disease (NAFLD) is the most common type of chronic liver disease and lacks guaranteed pharmacological therapeutic options. In this study, we applied a network-based framework for comprehensively identifying candidate flavonoids for the prevention and/or treatment of NAFLD. Flavonoid-target interaction information was obtained from combining experimentally validated data and results obtained using a recently developed machine-learning model, AI-DTI. Flavonoids were then prioritized by calculating the network proximity between flavonoid targets and NAFLD-associated proteins. The preventive effects of the candidate flavonoids were evaluated using FFA-induced hepatic steatosis in HepG2 and AML12 cells. We reconstructed the flavonoid-target network and found that the number of re-covered compound-target interactions was significantly higher than the chance level. Proximity scores have successfully rediscovered flavonoids and their potential mechanisms that are reported to have therapeutic effects on NAFLD. Finally, we revealed that discovered candidates, particularly glycitin, significantly attenuated lipid accumulation and moderately inhibited intracellular reactive oxygen species production. We further confirmed the affinity of glycitin with the predicted target using molecular docking and found that glycitin targets are closely related to several proteins involved in lipid metabolism, inflammatory responses, and oxidative stress. The predicted network-level effects were validated at the levels of mRNA. In summary, our study offers and validates network-based methods for the identification of candidate flavonoids for NAFLD.
Collapse
Affiliation(s)
- Won-Yung Lee
- Department of Physiology, College of Korean Medicine, Gachon University, Seongnam, South Korea
- Department of Herbal Formula, College of Korean Medicine, Dongguk University, Goyang-si, South Korea
| | - Choong-Yeol Lee
- Department of Physiology, College of Korean Medicine, Gachon University, Seongnam, South Korea
| | - Jin-Seok Lee
- Institute of Bioscience and Integrative Medicine, Daejeon Oriental Hospital of Daejeon University, Daejeon, South Korea
| | - Chang-Eop Kim
- Department of Physiology, College of Korean Medicine, Gachon University, Seongnam, South Korea
| |
Collapse
|
99
|
Li Y, Qiao G, Gao X, Wang G. Supervised graph co-contrastive learning for drug-target interaction prediction. Bioinformatics 2022; 38:2847-2854. [PMID: 35561181 DOI: 10.1093/bioinformatics/btac164] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 02/05/2022] [Accepted: 03/20/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Identification of Drug-Target Interactions (DTIs) is an essential step in drug discovery and repositioning. DTI prediction based on biological experiments is time-consuming and expensive. In recent years, graph learning-based methods have aroused widespread interest and shown certain advantages on this task, where the DTI prediction is often modeled as a binary classification problem of the nodes composed of drug and protein pairs (DPPs). Nevertheless, in many real applications, labeled data are very limited and expensive to obtain. With only a few thousand labeled data, models could hardly recognize comprehensive patterns of DPP node representations, and are unable to capture enough commonsense knowledge, which is required in DTI prediction. Supervised contrastive learning gives an aligned representation of DPP node representations with the same class label. In embedding space, DPP node representations with the same label are pulled together, and those with different labels are pushed apart. RESULTS We propose an end-to-end supervised graph co-contrastive learning model for DTI prediction directly from heterogeneous networks. By contrasting the topology structures and semantic features of the drug-protein-pair network, as well as the new selection strategy of positive and negative samples, SGCL-DTI generates a contrastive loss to guide the model optimization in a supervised manner. Comprehensive experiments on three public datasets demonstrate that our model outperforms the SOTA methods significantly on the task of DTI prediction, especially in the case of cold start. Furthermore, SGCL-DTI provides a new research perspective of contrastive learning for DTI prediction. AVAILABILITY AND IMPLEMENTATION The research shows that this method has certain applicability in the discovery of drugs, the identification of drug-target pairs and so on.
Collapse
Affiliation(s)
- Yang Li
- College of information and Computer Engineering, Northeast Forestry University, Harbin 150004, China
| | - Guanyu Qiao
- College of information and Computer Engineering, Northeast Forestry University, Harbin 150004, China
| | - Xin Gao
- Computer, Electrical and Mathematical Science and Engineering Division, King Abdullah University of Science and Technology, Mathematical and Computer Sciences and Engineering, Thuwal 23955, Kingdom of Saudi Arabia
| | - Guohua Wang
- College of information and Computer Engineering, Northeast Forestry University, Harbin 150004, China
| |
Collapse
|
100
|
Decoding the protein-ligand interactions using parallel graph neural networks. Sci Rep 2022; 12:7624. [PMID: 35538084 PMCID: PMC9086424 DOI: 10.1038/s41598-022-10418-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 04/06/2022] [Indexed: 12/13/2022] Open
Abstract
Protein-ligand interactions (PLIs) are essential for biochemical functionality and their identification is crucial for estimating biophysical properties for rational therapeutic design. Currently, experimental characterization of these properties is the most accurate method, however, this is very time-consuming and labor-intensive. A number of computational methods have been developed in this context but most of the existing PLI prediction heavily depends on 2D protein sequence data. Here, we present a novel parallel graph neural network (GNN) to integrate knowledge representation and reasoning for PLI prediction to perform deep learning guided by expert knowledge and informed by 3D structural data. We develop two distinct GNN architectures: [Formula: see text] is the base implementation that employs distinct featurization to enhance domain-awareness, while [Formula: see text] is a novel implementation that can predict with no prior knowledge of the intermolecular interactions. The comprehensive evaluation demonstrated that GNN can successfully capture the binary interactions between ligand and protein's 3D structure with 0.979 test accuracy for [Formula: see text] and 0.958 for [Formula: see text] for predicting activity of a protein-ligand complex. These models are further adapted for regression tasks to predict experimental binding affinities and [Formula: see text] crucial for compound's potency and efficacy. We achieve a Pearson correlation coefficient of 0.66 and 0.65 on experimental affinity and 0.50 and 0.51 on [Formula: see text] with [Formula: see text] and [Formula: see text], respectively, outperforming similar 2D sequence based models. Our method can serve as an interpretable and explainable artificial intelligence (AI) tool for predicted activity, potency, and biophysical properties of lead candidates. To this end, we show the utility of [Formula: see text] on SARS-Cov-2 protein targets by screening a large compound library and comparing the prediction with the experimentally measured data.
Collapse
|