1
|
E U, T M, A V G, D P. A comprehensive survey of drug-target interaction analysis in allopathy and siddha medicine. Artif Intell Med 2024; 157:102986. [PMID: 39326289 DOI: 10.1016/j.artmed.2024.102986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 08/13/2024] [Accepted: 09/18/2024] [Indexed: 09/28/2024]
Abstract
Effective drug delivery is the cornerstone of modern healthcare, ensuring therapeutic compounds reach their intended targets efficiently. This paper explores the potential of personalized and holistic healthcare, driven by the synergy between traditional and allopathic medicine systems, with a specific focus on the vast reservoir of medicinal compounds found in plants rooted in the historical legacy of traditional medicine. Motivated by the desire to unlock the therapeutic potential of medicinal plants and bridge the gap between traditional and allopathic medicine, this survey delves into in-silico computational approaches for studying Drug-Target Interactions (DTI) within the contexts of allopathy and siddha medicine. The contributions of this survey are multifaceted: it offers a comprehensive overview of in-silico methods for DTI analysis in both systems, identifies common challenges in DTI studies, provides insights into future directions to advance DTI analysis, and includes a comparative analysis of DTI in allopathy and siddha medicine. The findings of this survey highlight the pivotal role of in-silico computational approaches in advancing drug research and development in both allopathy and siddha medicine, emphasizing the importance of integrating these methods to drive the future of personalized healthcare.
Collapse
Affiliation(s)
- Uma E
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India.
| | - Mala T
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India
| | - Geetha A V
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India
| | - Priyanka D
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India
| |
Collapse
|
2
|
Michels J, Bandarupalli R, Akbari AA, Le T, Xiao H, Li J, Hom EFY. Natural Language Processing Methods for the Study of Protein-Ligand Interactions. ARXIV 2024:arXiv:2409.13057v2. [PMID: 39483353 PMCID: PMC11527106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Natural Language Processing (NLP) has revolutionized the way computers are used to study and interact with human languages and is increasingly influential in the study of protein and ligand binding, which is critical for drug discovery and development. This review examines how NLP techniques have been adapted to decode the "language" of proteins and small molecule ligands to predict protein-ligand interactions (PLIs). We discuss how methods such as long short-term memory (LSTM) networks, transformers, and attention mechanisms can leverage different protein and ligand data types to identify potential interaction patterns. Significant challenges are highlighted, including the scarcity of high-quality negative data, difficulties in interpreting model decisions, and sampling biases of existing datasets. We argue that focusing on improving data quality, enhancing model robustness, and fostering both collaboration and competition could catalyze future advances in machine-learning-based predictions of PLIs.
Collapse
Affiliation(s)
- James Michels
- Department of Computer Science, University of Mississippi, University, MS
| | - Ramya Bandarupalli
- Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, MS
| | - Amin Ahangar Akbari
- Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, MS
| | - Thai Le
- Department of Computer Science, Indiana University, Bloomington, IN
| | - Hong Xiao
- Department of Computer Science, University of Mississippi, University, MS
| | - Jing Li
- Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, MS
| | - Erik F Y Hom
- Department of Biology and Center for Biodiversity and Conservation Research, University of Mississippi, University, MS
| |
Collapse
|
3
|
Sun X, Huang J, Fang Y, Jin Y, Wu J, Wang G, Jia J. MREDTA: A BERT and transformer-based molecular representation encoder for predicting drug-target binding affinity. FASEB J 2024; 38:e70083. [PMID: 39373982 DOI: 10.1096/fj.202401254r] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 09/05/2024] [Accepted: 09/18/2024] [Indexed: 10/08/2024]
Abstract
Drug-target binding affinity (DTA) prediction is vital for drug repositioning. The accuracy and generalizability of DTA models remain a major challenge. Here, we develop a model composed of BERT-Trans Block, Multi-Trans Block, and DTI Learning modules, referred to as Molecular Representation Encoder-based DTA prediction (MREDTA). MREDTA has three advantages: (1) extraction of both local and global molecular features simultaneously through skip connections; (2) improved sensitivity to molecular structures through the Multi-Trans Block; (3) enhanced generalizability through the introduction of BERT. Compared with 12 advanced models, benchmark testing of KIBA and Davis datasets demonstrated optimal performance of MREDTA. In case study, we applied MREDTA to 2034 FDA-approved drugs for treating non-small-cell lung cancer (NSCLC), all of which act on mutant EGFRT790M protein. The corresponding molecular docking results demonstrated the robustness of MREDTA.
Collapse
Affiliation(s)
- Xu Sun
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
| | - Juanjuan Huang
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
- State Key Laboratory for Diagnosis and Treatment of Severe Zoonotic Infectious Diseases, Key Laboratory for Zoonosis Research of the Ministry of Education, College of Basic Medicine, Jilin University, Changchun, China
| | - Yabo Fang
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
| | - Yixuan Jin
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
| | - Jiageng Wu
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
| | - Guoqing Wang
- State Key Laboratory for Diagnosis and Treatment of Severe Zoonotic Infectious Diseases, Key Laboratory for Zoonosis Research of the Ministry of Education, College of Basic Medicine, Jilin University, Changchun, China
| | - Jiwei Jia
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
- Jilin National Applied Mathematical Center, Jilin University, Changchun, China
| |
Collapse
|
4
|
Gangwal A, Ansari A, Ahmad I, Azad AK, Wan Sulaiman WMA. Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review. Comput Biol Med 2024; 179:108734. [PMID: 38964243 DOI: 10.1016/j.compbiomed.2024.108734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 06/01/2024] [Accepted: 06/08/2024] [Indexed: 07/06/2024]
Abstract
Artificial intelligence (AI) has played a vital role in computer-aided drug design (CADD). This development has been further accelerated with the increasing use of machine learning (ML), mainly deep learning (DL), and computing hardware and software advancements. As a result, initial doubts about the application of AI in drug discovery have been dispelled, leading to significant benefits in medicinal chemistry. At the same time, it is crucial to recognize that AI is still in its infancy and faces a few limitations that need to be addressed to harness its full potential in drug discovery. Some notable limitations are insufficient, unlabeled, and non-uniform data, the resemblance of some AI-generated molecules with existing molecules, unavailability of inadequate benchmarks, intellectual property rights (IPRs) related hurdles in data sharing, poor understanding of biology, focus on proxy data and ligands, lack of holistic methods to represent input (molecular structures) to prevent pre-processing of input molecules (feature engineering), etc. The major component in AI infrastructure is input data, as most of the successes of AI-driven efforts to improve drug discovery depend on the quality and quantity of data, used to train and test AI algorithms, besides a few other factors. Additionally, data-gulping DL approaches, without sufficient data, may collapse to live up to their promise. Current literature suggests a few methods, to certain extent, effectively handle low data for better output from the AI models in the context of drug discovery. These are transferring learning (TL), active learning (AL), single or one-shot learning (OSL), multi-task learning (MTL), data augmentation (DA), data synthesis (DS), etc. One different method, which enables sharing of proprietary data on a common platform (without compromising data privacy) to train ML model, is federated learning (FL). In this review, we compare and discuss these methods, their recent applications, and limitations while modeling small molecule data to get the improved output of AI methods in drug discovery. Article also sums up some other novel methods to handle inadequate data.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India.
| | - Azim Ansari
- Computer Aided Drug Design Center, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India
| | - Iqrar Ahmad
- Department of Pharmaceutical Chemistry, Prof. Ravindra Nikam College of Pharmacy, Gondur, Dhule, 424002, Maharashtra, India.
| | - Abul Kalam Azad
- Faculty of Pharmacy, University College of MAIWP International, Batu Caves, 68100, Kuala Lumpur, Malaysia.
| | | |
Collapse
|
5
|
Zhao L, Zhu Y, Wen N, Wang C, Wang J, Yuan Y. Drug-Target Binding Affinity Prediction in a Continuous Latent Space Using Variational Autoencoders. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1458-1467. [PMID: 38767996 DOI: 10.1109/tcbb.2024.3402661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Accurate prediction of Drug-Target binding Affinity (DTA) is a daunting yet pivotal task in the sphere of drug discovery. Over the years, a plethora of deep learning-based DTA models have emerged, rendering promising results in predicting the binding affinities between drugs and their target proteins. However, in contrast to the conventional approach of modeling binding affinity in vector spaces, we propose a more nuanced modeling process in a continuous space to account for the diversity of input samples. Initially, the drug is encoded using the Simplified Molecular Input Line Entry System (SMILES), while the target sequences are characterized via a pretrained language model. Subsequently, highly correlative information is extracted utilizing residual gated convolutional neural networks. In a departure from existing deep learning-based models, our model learns the hidden representations of the drugs and targets jointly. Instead of employing two vectors, our hidden representations consist of two Gaussian distributions. To validate the effectiveness of our proposal, we conducted evaluations on commonly utilized benchmark datasets. The experimental outcomes corroborated that our method surpasses the state-of-the-art vectorial representation methods in terms of performance. This approach, therefore, offers potential enhancements in the precision of DTA predictions, potentially contributing to more efficient drug discovery processes.
Collapse
|
6
|
Yang X, Yang G, Chu J. GraphCL-DTA: A Graph Contrastive Learning With Molecular Semantics for Drug-Target Binding Affinity Prediction. IEEE J Biomed Health Inform 2024; 28:4544-4552. [PMID: 38190664 DOI: 10.1109/jbhi.2024.3350666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Drug-target binding affinity prediction plays an important role in the early stages of drug discovery, which can infer the strength of interactions between new drugs and new targets. However, the performance of previous computational models is limited by the following drawbacks. The learning of drug representation relies only on supervised data without considering the information in the molecular graph itself. Moreover, most previous studies tended to design complicated representation learning modules, while uniformity used to measure representation quality is ignored. In this study, we propose GraphCL-DTA, a graph contrastive learning with molecular semantics for drug-target binding affinity prediction. This graph contrastive learning framework replaces the dropout-based data augmentation strategy by performing data augmentation in the embedding space, thereby better preserving the semantic information of the molecular graph. A more essential and effective drug representation can be learned through this graph contrastive framework without additional supervised data. Next, we design a new loss function that can be directly used to adjust the uniformity of drug and target representations. By directly optimizing the uniformity of representations, the representation quality of drugs and targets can be improved. The effectiveness of the above innovative elements is verified on two real datasets, KIBA and Davis. Compared with the GraphDTA model, the relative improvement of the GraphCL-DTA model on the two datasets is 2.7% and 4.5%. The graph contrastive learning framework and uniformity function in the GraphCL-DTA model can be embedded into other computational models as independent modules to improve their generalization capability.
Collapse
|
7
|
Wu H, Liu J, Zhang R, Lu Y, Cui G, Cui Z, Ding Y. A review of deep learning methods for ligand based drug virtual screening. FUNDAMENTAL RESEARCH 2024; 4:715-737. [PMID: 39156568 PMCID: PMC11330120 DOI: 10.1016/j.fmre.2024.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/10/2024] [Accepted: 02/18/2024] [Indexed: 08/20/2024] Open
Abstract
Drug discovery is costly and time consuming, and modern drug discovery endeavors are progressively reliant on computational methodologies, aiming to mitigate temporal and financial expenditures associated with the process. In particular, the time required for vaccine and drug discovery is prolonged during emergency situations such as the coronavirus 2019 pandemic. Recently, the performance of deep learning methods in drug virtual screening has been particularly prominent. It has become a concern for researchers how to summarize the existing deep learning in drug virtual screening, select different models for different drug screening problems, exploit the advantages of deep learning models, and further improve the capability of deep learning in drug virtual screening. This review first introduces the basic concepts of drug virtual screening, common datasets, and data representation methods. Then, large numbers of common deep learning methods for drug virtual screening are compared and analyzed. In addition, a dataset of different sizes is constructed independently to evaluate the performance of each deep learning model for the difficult problem of large-scale ligand virtual screening. Finally, the existing challenges and future directions in the field of virtual screening are presented.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Runhua Zhang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yaoyao Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Guozeng Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Zhiming Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| |
Collapse
|
8
|
Amorim AM, Piochi LF, Gaspar AT, Preto A, Rosário-Ferreira N, Moreira IS. Advancing Drug Safety in Drug Development: Bridging Computational Predictions for Enhanced Toxicity Prediction. Chem Res Toxicol 2024; 37:827-849. [PMID: 38758610 PMCID: PMC11187637 DOI: 10.1021/acs.chemrestox.3c00352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/29/2024] [Accepted: 05/07/2024] [Indexed: 05/19/2024]
Abstract
The attrition rate of drugs in clinical trials is generally quite high, with estimates suggesting that approximately 90% of drugs fail to make it through the process. The identification of unexpected toxicity issues during preclinical stages is a significant factor contributing to this high rate of failure. These issues can have a major impact on the success of a drug and must be carefully considered throughout the development process. These late-stage rejections or withdrawals of drug candidates significantly increase the costs associated with drug development, particularly when toxicity is detected during clinical trials or after market release. Understanding drug-biological target interactions is essential for evaluating compound toxicity and safety, as well as predicting therapeutic effects and potential off-target effects that could lead to toxicity. This will enable scientists to predict and assess the safety profiles of drug candidates more accurately. Evaluation of toxicity and safety is a critical aspect of drug development, and biomolecules, particularly proteins, play vital roles in complex biological networks and often serve as targets for various chemicals. Therefore, a better understanding of these interactions is crucial for the advancement of drug development. The development of computational methods for evaluating protein-ligand interactions and predicting toxicity is emerging as a promising approach that adheres to the 3Rs principles (replace, reduce, and refine) and has garnered significant attention in recent years. In this review, we present a thorough examination of the latest breakthroughs in drug toxicity prediction, highlighting the significance of drug-target binding affinity in anticipating and mitigating possible adverse effects. In doing so, we aim to contribute to the development of more effective and secure drugs.
Collapse
Affiliation(s)
- Ana M.
B. Amorim
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD
Programme in Biosciences, Department of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PURR.AI,
Rua Pedro Nunes, IPN Incubadora, Ed C, 3030-199 Coimbra, Portugal
| | - Luiz F. Piochi
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Ana T. Gaspar
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - António
J. Preto
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD Programme
in Experimental Biology and Biomedicine, Institute for Interdisciplinary
Research (IIIUC), University of Coimbra, Casa Costa Alemão, 3030-789 Coimbra, Portugal
| | - Nícia Rosário-Ferreira
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Irina S. Moreira
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| |
Collapse
|
9
|
Kalemati M, Zamani Emani M, Koohi S. DCGAN-DTA: Predicting drug-target binding affinity with deep convolutional generative adversarial networks. BMC Genomics 2024; 25:411. [PMID: 38724911 PMCID: PMC11080241 DOI: 10.1186/s12864-024-10326-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 04/19/2024] [Indexed: 05/13/2024] Open
Abstract
BACKGROUND In recent years, there has been a growing interest in utilizing computational approaches to predict drug-target binding affinity, aiming to expedite the early drug discovery process. To address the limitations of experimental methods, such as cost and time, several machine learning-based techniques have been developed. However, these methods encounter certain challenges, including the limited availability of training data, reliance on human intervention for feature selection and engineering, and a lack of validation approaches for robust evaluation in real-life applications. RESULTS To mitigate these limitations, in this study, we propose a method for drug-target binding affinity prediction based on deep convolutional generative adversarial networks. Additionally, we conducted a series of validation experiments and implemented adversarial control experiments using straw models. These experiments serve to demonstrate the robustness and efficacy of our predictive models. We conducted a comprehensive evaluation of our method by comparing it to baselines and state-of-the-art methods. Two recently updated datasets, namely the BindingDB and PDBBind, were used for this purpose. Our findings indicate that our method outperforms the alternative methods in terms of three performance measures when using warm-start data splitting settings. Moreover, when considering physiochemical-based cold-start data splitting settings, our method demonstrates superior predictive performance, particularly in terms of the concordance index. CONCLUSION The results of our study affirm the practical value of our method and its superiority over alternative approaches in predicting drug-target binding affinity across multiple validation sets. This highlights the potential of our approach in accelerating drug repurposing efforts, facilitating novel drug discovery, and ultimately enhancing disease treatment. The data and source code for this study were deposited in the GitHub repository, https://github.com/mojtabaze7/DCGAN-DTA . Furthermore, the web server for our method is accessible at https://dcgan.shinyapps.io/bindingaffinity/ .
Collapse
Affiliation(s)
- Mahmood Kalemati
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Mojtaba Zamani Emani
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Somayyeh Koohi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
| |
Collapse
|
10
|
Zhang H, Liu X, Cheng W, Wang T, Chen Y. Prediction of drug-target binding affinity based on deep learning models. Comput Biol Med 2024; 174:108435. [PMID: 38608327 DOI: 10.1016/j.compbiomed.2024.108435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/05/2024] [Accepted: 04/07/2024] [Indexed: 04/14/2024]
Abstract
The prediction of drug-target binding affinity (DTA) plays an important role in drug discovery. Computerized virtual screening techniques have been used for DTA prediction, greatly reducing the time and economic costs of drug discovery. However, these techniques have not succeeded in reversing the low success rate of new drug development. In recent years, the continuous development of deep learning (DL) technology has brought new opportunities for drug discovery through the DTA prediction. This shift has moved the prediction of DTA from traditional machine learning methods to DL. The DL frameworks used for DTA prediction include convolutional neural networks (CNN), graph convolutional neural networks (GCN), and recurrent neural networks (RNN), and reinforcement learning (RL), among others. This review article summarizes the available literature on DTA prediction using DL models, including DTA quantification metrics and datasets, and DL algorithms used for DTA prediction (including input representation of models, neural network frameworks, valuation indicators, and model interpretability). In addition, the opportunities, challenges, and prospects of the application of DL frameworks for DTA prediction in the field of drug discovery are discussed.
Collapse
Affiliation(s)
- Hao Zhang
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Xiaoqian Liu
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Wenya Cheng
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Tianshi Wang
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Yuanyuan Chen
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China.
| |
Collapse
|
11
|
Zhong KY, Wen ML, Meng FF, Li X, Jiang B, Zeng X, Li Y. MMDTA: A Multimodal Deep Model for Drug-Target Affinity with a Hybrid Fusion Strategy. J Chem Inf Model 2024; 64:2878-2888. [PMID: 37610162 DOI: 10.1021/acs.jcim.3c00866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
The prediction of the drug-target affinity (DTA) plays an important role in evaluating molecular druggability. Although deep learning-based models for DTA prediction have been extensively attempted, there are rare reports on multimodal models that leverage various fusion strategies to exploit heterogeneous information from multiple different modalities of drugs and targets. In this study, we proposed a multimodal deep model named MMDTA, which integrated the heterogeneous information from various modalities of drugs and targets using a hybrid fusion strategy to enhance DTA prediction. To achieve this, MMDTA first employed convolutional neural networks (CNNs) and graph convolutional networks (GCNs) to extract diverse heterogeneous information from the sequences and structures of drugs and targets. It then utilized a hybrid fusion strategy to combine and complement the extracted heterogeneous information, resulting in the fused modal information for predicting drug-target affinity through the fully connected (FC) layers. Experimental results demonstrated that MMDTA outperformed the competitive state-of-the-art deep learning models on the widely used benchmark data sets, particularly with a significantly improved key evaluation metric, Root Mean Square Error (RMSE). Furthermore, MMDTA exhibited excellent generalization and practical application performance on multiple different data sets. These findings highlighted MMDTA's accuracy and reliability in predicting the drug-target binding affinity. For researchers interested in the source data and code, they are accessible at http://github.com/dldxzx/MMDTA.
Collapse
Affiliation(s)
- Kai-Yang Zhong
- College of Mathematics and Computer Science, Dali University, Dali 671003, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resource in Yunnan, Yunnan University, Kunming 650000, China
| | - Fan-Fang Meng
- College of Mathematics and Computer Science, Dali University, Dali 671003, China
| | - Xin Li
- College of Mathematics and Computer Science, Dali University, Dali 671003, China
| | - Bei Jiang
- Yunnan Key Laboratory of Screening and Research on Anti-pathogenic Plant Resources from Western Yunnan, Dali University, Dali 671000, China
| | - Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali 671003, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali 671003, China
| |
Collapse
|
12
|
Visan AI, Negut I. Integrating Artificial Intelligence for Drug Discovery in the Context of Revolutionizing Drug Delivery. Life (Basel) 2024; 14:233. [PMID: 38398742 PMCID: PMC10890405 DOI: 10.3390/life14020233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/03/2024] [Accepted: 02/06/2024] [Indexed: 02/25/2024] Open
Abstract
Drug development is expensive, time-consuming, and has a high failure rate. In recent years, artificial intelligence (AI) has emerged as a transformative tool in drug discovery, offering innovative solutions to complex challenges in the pharmaceutical industry. This manuscript covers the multifaceted role of AI in drug discovery, encompassing AI-assisted drug delivery design, the discovery of new drugs, and the development of novel AI techniques. We explore various AI methodologies, including machine learning and deep learning, and their applications in target identification, virtual screening, and drug design. This paper also discusses the historical development of AI in medicine, emphasizing its profound impact on healthcare. Furthermore, it addresses AI's role in the repositioning of existing drugs and the identification of drug combinations, underscoring its potential in revolutionizing drug delivery systems. The manuscript provides a comprehensive overview of the AI programs and platforms currently used in drug discovery, illustrating the technological advancements and future directions of this field. This study not only presents the current state of AI in drug discovery but also anticipates its future trajectory, highlighting the challenges and opportunities that lie ahead.
Collapse
Affiliation(s)
| | - Irina Negut
- National Institute for Lasers, Plasma and Radiation Physics, 409 Atomistilor Street, 077125 Magurele, Ilfov, Romania;
| |
Collapse
|
13
|
Zhou C, Li Z, Song J, Xiang W. TransVAE-DTA: Transformer and variational autoencoder network for drug-target binding affinity prediction. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 244:108003. [PMID: 38181572 DOI: 10.1016/j.cmpb.2023.108003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 12/29/2023] [Accepted: 12/30/2023] [Indexed: 01/07/2024]
Abstract
BACKGROUND AND OBJECTIVE Recent studies have emphasized the significance of computational in silico drug-target binding affinity (DTA) prediction in the field of drug discovery and drug repurposing. However, existing DTA prediction approaches suffer from two major deficiencies that impede their progress. Firstly, while most methods primarily focus on the feature representations of drug-target binding affinity pairs, they fail to consider the long-distance relationships of proteins. Furthermore, many deep learning-based DTA predictors simply model the interaction of drug-target pairs through concatenation, which hampers the ability to enhance prediction performance. METHODS To address these issues, this study proposes a novel framework named TransVAE-DTA, which combines the transformer and variational autoencoder (VAE). Inspired by the early success of VAEs, we aim to further investigate the feasibility of VAEs for drug structure encoding, while utilizing the transformer architecture for target feature representation. Additionally, an adaptive attention pooling (AAP) module is designed to fuse the drug and target encoded features. Notably, TransVAE-DTA is proven to maximize the lower bound of the joint likelihood of drug, target, and their DTAs. RESULTS Experimental results demonstrate the superiority of TransVAE-DTA in drug-target binding affinity prediction assignments on two public Davis and KIBA datasets. CONCLUSIONS In this research, the developed TransVAE-DTA opens a new avenue for engineering drug-target interactions.
Collapse
Affiliation(s)
- Changjian Zhou
- School of life sciences, Northeast Agricultural University, Harbin, PR China; Department of Data and Computing, Northeast Agricultural University, Harbin, PR China
| | - Zhongzheng Li
- Department of Data and Computing, Northeast Agricultural University, Harbin, PR China; School of Engineering, Northeast Agricultural University, Harbin, PR China
| | - Jia Song
- Department of Data and Computing, Northeast Agricultural University, Harbin, PR China; School of Plant Protection, Northeast Agricultural University, Harbin, PR China.
| | - Wensheng Xiang
- Department of Data and Computing, Northeast Agricultural University, Harbin, PR China; School of Plant Protection, Northeast Agricultural University, Harbin, PR China; State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, PR China.
| |
Collapse
|
14
|
Lee J, Jun DW, Song I, Kim Y. DLM-DTI: a dual language model for the prediction of drug-target interaction with hint-based learning. J Cheminform 2024; 16:14. [PMID: 38297330 PMCID: PMC10832108 DOI: 10.1186/s13321-024-00808-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 01/22/2024] [Indexed: 02/02/2024] Open
Abstract
The drug discovery process is demanding and time-consuming, and machine learning-based research is increasingly proposed to enhance efficiency. A significant challenge in this field is predicting whether a drug molecule's structure will interact with a target protein. A recent study attempted to address this challenge by utilizing an encoder that leverages prior knowledge of molecular and protein structures, resulting in notable improvements in the prediction performance of the drug-target interactions task. Nonetheless, the target encoders employed in previous studies exhibit computational complexity that increases quadratically with the input length, thereby limiting their practical utility. To overcome this challenge, we adopt a hint-based learning strategy to develop a compact and efficient target encoder. With the adaptation parameter, our model can blend general knowledge and target-oriented knowledge to build features of the protein sequences. This approach yielded considerable performance enhancements and improved learning efficiency on three benchmark datasets: BIOSNAP, DAVIS, and Binding DB. Furthermore, our methodology boasts the merit of necessitating only a minimal Video RAM (VRAM) allocation, specifically 7.7GB, during the training phase (16.24% of the previous state-of-the-art model). This ensures the feasibility of training and inference even with constrained computational resources.
Collapse
Affiliation(s)
- Jonghyun Lee
- Department of Medical and Digital Engineering, Hanyang University College of Engineering, 222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea
| | - Dae Won Jun
- Department of Medical and Digital Engineering, Hanyang University College of Engineering, 222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea
- Department of Internal Medicine, Hanyang University College of Medicine, 222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea
| | - Ildae Song
- Department of Pharmaceutical Science and Technology, Kyungsung University, 309, Suyeong-ro, Nam-gu, Busan, 48434, Korea
| | - Yun Kim
- College of Pharmacy, Deagu Catholic University, 13-13, Hayang-ro, Hayang-eup, Gyeongsan-si, 38430, Gyeongsangbuk-do, Korea.
| |
Collapse
|
15
|
Bitencourt-Ferreira G, Villarreal MA, Quiroga R, Biziukova N, Poroikov V, Tarasova O, de Azevedo Junior WF. Exploring Scoring Function Space: Developing Computational Models for Drug Discovery. Curr Med Chem 2024; 31:2361-2377. [PMID: 36944627 DOI: 10.2174/0929867330666230321103731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 12/15/2022] [Accepted: 12/29/2022] [Indexed: 03/23/2023]
Abstract
BACKGROUND The idea of scoring function space established a systems-level approach to address the development of models to predict the affinity of drug molecules by those interested in drug discovery. OBJECTIVE Our goal here is to review the concept of scoring function space and how to explore it to develop machine learning models to address protein-ligand binding affinity. METHODS We searched the articles available in PubMed related to the scoring function space. We also utilized crystallographic structures found in the protein data bank (PDB) to represent the protein space. RESULTS The application of systems-level approaches to address receptor-drug interactions allows us to have a holistic view of the process of drug discovery. The scoring function space adds flexibility to the process since it makes it possible to see drug discovery as a relationship involving mathematical spaces. CONCLUSION The application of the concept of scoring function space has provided us with an integrated view of drug discovery methods. This concept is useful during drug discovery, where we see the process as a computational search of the scoring function space to find an adequate model to predict receptor-drug binding affinity.
Collapse
Affiliation(s)
| | - Marcos A Villarreal
- CONICET-Departamento de Matemática y Física, Instituto de Investigaciones en Fisicoquímica de Córdoba (INFIQC), Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Ciudad Universitaria, Córdoba, Argentina
| | - Rodrigo Quiroga
- CONICET-Departamento de Matemática y Física, Instituto de Investigaciones en Fisicoquímica de Córdoba (INFIQC), Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Ciudad Universitaria, Córdoba, Argentina
| | - Nadezhda Biziukova
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow, 119121, Russia
| | - Vladimir Poroikov
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow, 119121, Russia
| | - Olga Tarasova
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow, 119121, Russia
| | - Walter F de Azevedo Junior
- Pontifical Catholic University of Rio Grande do Sul - PUCRS, Porto Alegre-RS, Brazil
- Specialization Program in Bioinformatics, The Pontifical Catholic University of Rio Grande do Sul (PUCRS), Av. Ipiranga, 6681 Porto Alegre / RS 90619-900, Brazil
| |
Collapse
|
16
|
Qiu W, Liang Q, Yu L, Xiao X, Qiu W, Lin W. LSTM-SAGDTA: Predicting Drug-target Binding Affinity with an Attention Graph Neural Network and LSTM Approach. Curr Pharm Des 2024; 30:468-476. [PMID: 38323613 PMCID: PMC11071654 DOI: 10.2174/0113816128282837240130102817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/14/2024] [Accepted: 01/19/2024] [Indexed: 02/08/2024]
Abstract
INTRODUCTION Drug development is a challenging and costly process, yet it plays a crucial role in improving healthcare outcomes. Drug development requires extensive research and testing to meet the demands for economic efficiency, cures, and pain relief. METHODS Drug development is a vital research area that necessitates innovation and collaboration to achieve significant breakthroughs. Computer-aided drug design provides a promising avenue for drug discovery and development by reducing costs and improving the efficiency of drug design and testing. RESULTS In this study, a novel model, namely LSTM-SAGDTA, capable of accurately predicting drug-target binding affinity, was developed. We employed SeqVec for characterizing the protein and utilized the graph neural networks to capture information on drug molecules. By introducing self-attentive graph pooling, the model achieved greater accuracy and efficiency in predicting drug-target binding affinity. CONCLUSION Moreover, LSTM-SAGDTA obtained superior accuracy over current state-of-the-art methods only by using less training time. The results of experiments suggest that this method represents a highprecision solution for the DTA predictor.
Collapse
Affiliation(s)
- Wenjing Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Qianle Liang
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Liyi Yu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Weizhong Lin
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| |
Collapse
|
17
|
Jiang M, Shao Y, Zhang Y, Zhou W, Pang S. A deep learning method for drug-target affinity prediction based on sequence interaction information mining. PeerJ 2023; 11:e16625. [PMID: 38099302 PMCID: PMC10720480 DOI: 10.7717/peerj.16625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 11/16/2023] [Indexed: 12/17/2023] Open
Abstract
Background A critical aspect of in silico drug discovery involves the prediction of drug-target affinity (DTA). Conducting wet lab experiments to determine affinity is both expensive and time-consuming, making it necessary to find alternative approaches. In recent years, deep learning has emerged as a promising technique for DTA prediction, leveraging the substantial computational power of modern computers. Methods We proposed a novel sequence-based approach, named KC-DTA, for predicting drug-target affinity (DTA). In this approach, we converted the target sequence into two distinct matrices, while representing the molecule compound as a graph. The proposed method utilized k-mers analysis and Cartesian product calculation to capture the interactions and evolutionary information among various residues, enabling the creation of the two matrices for target sequence. For molecule, it was represented by constructing a molecular graph where atoms serve as nodes and chemical bonds serve as edges. Subsequently, the obtained target matrices and molecule graph were utilized as inputs for convolutional neural networks (CNNs) and graph neural networks (GNNs) to extract hidden features, which were further used for the prediction of binding affinity. Results In order to evaluate the effectiveness of the proposed method, we conducted several experiments and made a comprehensive comparison with the state-of-the-art approaches using multiple evaluation metrics. The results of our experiments demonstrated that the KC-DTA method achieves high performance in predicting drug-target affinity (DTA). The findings of this research underscore the significance of the KC-DTA method as a valuable tool in the field of in silico drug discovery, offering promising opportunities for accelerating the drug development process. All the data and code are available for access on https://github.com/syc2017/KCDTA.
Collapse
Affiliation(s)
- Mingjian Jiang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong, China
| | - Yunchang Shao
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong, China
| | - Wei Zhou
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong, China
| | - Shunpeng Pang
- School of Computer Engineering, WeiFang University, Weifang, Shandong, China
| |
Collapse
|
18
|
Liyaqat T, Ahmad T, Saxena C. TeM-DTBA: time-efficient drug target binding affinity prediction using multiple modalities with Lasso feature selection. J Comput Aided Mol Des 2023; 37:573-584. [PMID: 37777631 DOI: 10.1007/s10822-023-00533-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 09/07/2023] [Indexed: 10/02/2023]
Abstract
Drug discovery, especially virtual screening and drug repositioning, can be accelerated through deeper understanding and prediction of Drug Target Interactions (DTIs). The advancement of deep learning as well as the time and financial costs associated with conventional wet-lab experiments have made computational methods for DTI prediction more popular. However, the majority of these computational methods handle the DTI problem as a binary classification task, ignoring the quantitative binding affinity that determines the drug efficacy to their target proteins. Moreover, computational space as well as execution time of the model is often ignored over accuracy. To address these challenges, we introduce a novel method, called Time-efficient Multimodal Drug Target Binding Affinity (TeM-DTBA), which predicts the binding affinity between drugs and targets by fusing different modalities based on compound structures and target sequences. We employ the Lasso feature selection method, which lowers the dimensionality of feature vectors and speeds up the proposed model training time by more than 50%. The results from two benchmark datasets demonstrate that our method outperforms state-of-the-art methods in terms of performance. The mean squared errors of 18.8% and 23.19%, achieved on the KIBA and Davis datasets, respectively, suggest that our method is more accurate in predicting drug-target binding affinity.
Collapse
Affiliation(s)
- Tanya Liyaqat
- Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India.
| | - Tanvir Ahmad
- Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India
| | - Chandni Saxena
- The Chinese University of Hong Kong, Sha Tin, SAR, China
| |
Collapse
|
19
|
Mardikoraem M, Wang Z, Pascual N, Woldring D. Generative models for protein sequence modeling: recent advances and future directions. Brief Bioinform 2023; 24:bbad358. [PMID: 37864295 PMCID: PMC10589401 DOI: 10.1093/bib/bbad358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 09/08/2023] [Accepted: 09/12/2023] [Indexed: 10/22/2023] Open
Abstract
The widespread adoption of high-throughput omics technologies has exponentially increased the amount of protein sequence data involved in many salient disease pathways and their respective therapeutics and diagnostics. Despite the availability of large-scale sequence data, the lack of experimental fitness annotations underpins the need for self-supervised and unsupervised machine learning (ML) methods. These techniques leverage the meaningful features encoded in abundant unlabeled sequences to accomplish complex protein engineering tasks. Proficiency in the rapidly evolving fields of protein engineering and generative AI is required to realize the full potential of ML models as a tool for protein fitness landscape navigation. Here, we support this work by (i) providing an overview of the architecture and mathematical details of the most successful ML models applicable to sequence data (e.g. variational autoencoders, autoregressive models, generative adversarial neural networks, and diffusion models), (ii) guiding how to effectively implement these models on protein sequence data to predict fitness or generate high-fitness sequences and (iii) highlighting several successful studies that implement these techniques in protein engineering (from paratope regions and subcellular localization prediction to high-fitness sequences and protein design rules generation). By providing a comprehensive survey of model details, novel architecture developments, comparisons of model applications, and current challenges, this study intends to provide structured guidance and robust framework for delivering a prospective outlook in the ML-driven protein engineering field.
Collapse
Affiliation(s)
- Mehrsa Mardikoraem
- Michigan State University (MSU)‘s Department of Chemical Engineering and Materials Science
| | - Zirui Wang
- Regeneron Pharmaceuticals, Inc. Having received his B.S. in Chemical Engineering from MSU, he is currently pursuing a M.S. in Computer Science from Syracuse University
| | | | - Daniel Woldring
- MSU’s Department of Chemical Engineering and Materials Science and a member of MSU’s Institute for Quantitative Health Sciences and Engineering
| |
Collapse
|
20
|
Han R, Yoon H, Kim G, Lee H, Lee Y. Revolutionizing Medicinal Chemistry: The Application of Artificial Intelligence (AI) in Early Drug Discovery. Pharmaceuticals (Basel) 2023; 16:1259. [PMID: 37765069 PMCID: PMC10537003 DOI: 10.3390/ph16091259] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/24/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023] Open
Abstract
Artificial intelligence (AI) has permeated various sectors, including the pharmaceutical industry and research, where it has been utilized to efficiently identify new chemical entities with desirable properties. The application of AI algorithms to drug discovery presents both remarkable opportunities and challenges. This review article focuses on the transformative role of AI in medicinal chemistry. We delve into the applications of machine learning and deep learning techniques in drug screening and design, discussing their potential to expedite the early drug discovery process. In particular, we provide a comprehensive overview of the use of AI algorithms in predicting protein structures, drug-target interactions, and molecular properties such as drug toxicity. While AI has accelerated the drug discovery process, data quality issues and technological constraints remain challenges. Nonetheless, new relationships and methods have been unveiled, demonstrating AI's expanding potential in predicting and understanding drug interactions and properties. For its full potential to be realized, interdisciplinary collaboration is essential. This review underscores AI's growing influence on the future trajectory of medicinal chemistry and stresses the importance of ongoing synergies between computational and domain experts.
Collapse
Affiliation(s)
| | | | | | | | - Yoonji Lee
- College of Pharmacy, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
21
|
Ye Q, Zhang X, Lin X. Drug-Target Interaction Prediction via Graph Auto-Encoder and Multi-Subspace Deep Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2647-2658. [PMID: 36107905 DOI: 10.1109/tcbb.2022.3206907] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Computational prediction of drug-target interaction (DTI) is important for the new drug discovery. Currently, the deep neural network (DNN) has been widely used in DTI prediction. However, parameters of the DNN could be insufficiently trained and features of the data could be insufficiently utilized, because the DTI data is limited and its dimension is very high. To deal with the above problems, in this paper, a graph auto-encoder and multi-subspace deep neural network (GAEMSDNN) is designed. GAEMSDNN enhances its learning ability with a graph auto-encoder, a subspace layer and an ensemble layer. The graph auto-encoder can preserve the reconstruction information. The subspace layer can obtain different strong feature subsets. The ensemble layer in the GAEMSDNN can comprehensively utilize these strong feature subsets in a unified optimization framework. As a result, more features can be extracted from the network input and the DNN network can be better trained. In experiments, the results of GAEMSDNN are significantly improved compared to the previous methods, which validates the effectiveness of our strategies.
Collapse
|
22
|
Voitsitskyi T, Stratiichuk R, Koleiev I, Popryho L, Ostrovsky Z, Henitsoi P, Khropachov I, Vozniak V, Zhytar R, Nechepurenko D, Yesylevskyy S, Nafiiev A, Starosyla S. 3DProtDTA: a deep learning model for drug-target affinity prediction based on residue-level protein graphs. RSC Adv 2023; 13:10261-10272. [PMID: 37006369 PMCID: PMC10065141 DOI: 10.1039/d3ra00281k] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 03/26/2023] [Indexed: 04/03/2023] Open
Abstract
Accurate prediction of the drug-target affinity (DTA) in silico is of critical importance for modern drug discovery. Computational methods of DTA prediction, applied in the early stages of drug development, are able to speed it up and cut its cost significantly. A wide range of approaches based on machine learning were recently proposed for DTA assessment. The most promising of them are based on deep learning techniques and graph neural networks to encode molecular structures. The recent breakthrough in protein structure prediction made by AlphaFold made an unprecedented amount of proteins without experimentally defined structures accessible for computational DTA prediction. In this work, we propose a new deep learning DTA model 3DProtDTA, which utilises AlphaFold structure predictions in conjunction with the graph representation of proteins. The model is superior to its rivals on common benchmarking datasets and has potential for further improvement.
Collapse
Affiliation(s)
- Taras Voitsitskyi
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
- Department of Physics of Biological Systems, Institute of Physics of The National Academy of Sciences of Ukraine Nauky Ave. 46 03038 Kyiv Ukraine
| | - Roman Stratiichuk
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
- Department of Biophysics and Medical Informatics, Educational and Scientific Centre "Institute of Biology and Medicine", Taras Shevchenko National University of Kyiv 64 Volodymyrska Str. 01601 Kyiv Ukraine
| | - Ihor Koleiev
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
| | | | | | | | | | | | - Roman Zhytar
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
| | | | - Semen Yesylevskyy
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences CZ-166 10 Prague 6 Czech Republic
- Department of Physics of Biological Systems, Institute of Physics of The National Academy of Sciences of Ukraine Nauky Ave. 46 03038 Kyiv Ukraine
| | - Alan Nafiiev
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
| | | |
Collapse
|
23
|
Lee SJ, Cho J, Lee BH, Hwang D, Park JW. Design and Prediction of Aptamers Assisted by In Silico Methods. Biomedicines 2023; 11:356. [PMID: 36830893 PMCID: PMC9953197 DOI: 10.3390/biomedicines11020356] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 01/21/2023] [Accepted: 01/23/2023] [Indexed: 01/28/2023] Open
Abstract
An aptamer is a single-stranded DNA or RNA that binds to a specific target with high binding affinity. Aptamers are developed through the process of systematic evolution of ligands by exponential enrichment (SELEX), which is repeated to increase the binding power and specificity. However, the SELEX process is time-consuming, and the characterization of aptamer candidates selected through it requires additional effort. Here, we describe in silico methods in order to suggest the most efficient way to develop aptamers and minimize the laborious effort required to screen and optimise aptamers. We investigated several methods for the estimation of aptamer-target molecule binding through conformational structure prediction, molecular docking, and molecular dynamic simulation. In addition, examples of machine learning and deep learning technologies used to predict the binding of targets and ligands in the development of new drugs are introduced. This review will be helpful in the development and application of in silico aptamer screening and characterization.
Collapse
Affiliation(s)
- Su Jin Lee
- Drug Manufacturing Center, Daegu-Gyeongbuk Medical Innovation Foundation (K-MEDI Hub), Daegu 41061, Republic of Korea
| | - Junmin Cho
- Medical Device Development Center, Daegu-Gyeongbuk Medical Innovation Foundation (K-MEDI Hub), Daegu 41061, Republic of Korea
| | - Byung-Hoon Lee
- Medical Device Development Center, Daegu-Gyeongbuk Medical Innovation Foundation (K-MEDI Hub), Daegu 41061, Republic of Korea
| | - Donghwan Hwang
- Medical Device Development Center, Daegu-Gyeongbuk Medical Innovation Foundation (K-MEDI Hub), Daegu 41061, Republic of Korea
| | - Jee-Woong Park
- Medical Device Development Center, Daegu-Gyeongbuk Medical Innovation Foundation (K-MEDI Hub), Daegu 41061, Republic of Korea
| |
Collapse
|
24
|
A deep learning method for predicting molecular properties and compound-protein interactions. J Mol Graph Model 2022; 117:108283. [PMID: 35994925 DOI: 10.1016/j.jmgm.2022.108283] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 07/19/2022] [Accepted: 07/26/2022] [Indexed: 01/14/2023]
Abstract
Predicting molecular properties and compound-protein interactions (CPIs) are two important areas of drug design and discovery. They are also an essential way to discover lead compounds in virtual screening. Recently, in silico methods based on deep learning have demonstrated excellent performance in various challenges. It is imperative to develop efficient computational methods to predict accurately both molecular properties and CPIs in drug research using deep learning techniques. In this paper, we propose a deep learning method applicable to both molecular property prediction and CPI prediction based on the idea that both are generally influenced by chemical structure and sequence information of compounds and proteins. Molecular properties are inferred by integrating the molecular structure and sequence information of compounds, and CPIs are predicted by integrating protein sequence and compound structure. The method combines topological structure and sequence fingerprint information of molecules, extracts adequately raw data features, and generates highly representative features for prediction. Molecular property prediction experiments were conducted on BACE, P53 and hERG datasets, and CPI prediction experiments were conducted on Human, C. elegans and KIBA datasets. MG-S achieves outperformance in molecular property prediction on P53, the differences in AUC, Precision and MCC are 0.030, 0.050 and 0.100, respectively, over the suboptimal baseline model, and provides consistently good results on BACE and hERG.The model also achieves impressive performance in CPI prediction, the differences in AUC, Precision and MCC on KIBA are 0.141, 0.138, 0.090 and 0.082, respectively, compared with the state-of-the-art models. The comprehensive results show that the MG-S model has higher performance, better classification ability, and faster convergence. MG-S will serve as a useful method to predict compound properties and CPIs in the early stages of drug design and discovery.Our code and datasets are available at: https://github.com/happay-ending/cpi_cpp.
Collapse
|
25
|
Ke W, Crist RM, Clogston JD, Stern ST, Dobrovolskaia MA, Grodzinski P, Jensen MA. Trends and patterns in cancer nanotechnology research: A survey of NCI's caNanoLab and nanotechnology characterization laboratory. Adv Drug Deliv Rev 2022; 191:114591. [PMID: 36332724 PMCID: PMC9712232 DOI: 10.1016/j.addr.2022.114591] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 10/22/2022] [Accepted: 10/27/2022] [Indexed: 11/11/2022]
Abstract
Cancer nanotechnologies possess immense potential as therapeutic and diagnostic treatment modalities and have undergone significant and rapid advancement in recent years. With this emergence, the complexities of data standards in the field are on the rise. Data sharing and reanalysis is essential to more fully utilize this complex, interdisciplinary information to answer research questions, promote the technologies, optimize use of funding, and maximize the return on scientific investments. In order to support this, various data-sharing portals and repositories have been developed which not only provide searchable nanomaterial characterization data, but also provide access to standardized protocols for synthesis and characterization of nanomaterials as well as cutting-edge publications. The National Cancer Institute's (NCI) caNanoLab is a dedicated repository for all aspects pertaining to cancer-related nanotechnology data. The searchable database provides a unique opportunity for data mining and the use of artificial intelligence and machine learning, which aims to be an essential arm of future research studies, potentially speeding the design and optimization of next-generation therapies. It also provides an opportunity to track the latest trends and patterns in nanomedicine research. This manuscript provides the first look at such trends extracted from caNanoLab and compares these to similar metrics from the NCI's Nanotechnology Characterization Laboratory, a laboratory providing preclinical characterization of cancer nanotechnologies to researchers around the globe. Together, these analyses provide insight into the emerging interests of the research community and rise of promising nanoparticle technologies.
Collapse
Affiliation(s)
- Weina Ke
- Bioinformatics and Computational Science, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD, United States
| | - Rachael M Crist
- Nanotechnology Characterization Laboratory, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD, United States
| | - Jeffrey D Clogston
- Nanotechnology Characterization Laboratory, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD, United States
| | - Stephan T Stern
- Nanotechnology Characterization Laboratory, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD, United States
| | - Marina A Dobrovolskaia
- Nanotechnology Characterization Laboratory, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD, United States
| | - Piotr Grodzinski
- Nanodelivery Systems and Devices Branch, Cancer Imaging Program, National Cancer Institute, Rockville, MD, United States
| | - Mark A Jensen
- Bioinformatics and Computational Science, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD, United States.
| |
Collapse
|
26
|
Yan X, Liu Y. Graph-sequence attention and transformer for predicting drug-target affinity. RSC Adv 2022; 12:29525-29534. [PMID: 36320763 PMCID: PMC9562047 DOI: 10.1039/d2ra05566j] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Accepted: 10/04/2022] [Indexed: 11/30/2022] Open
Abstract
Drug-target binding affinity (DTA) prediction has drawn increasing interest due to its substantial position in the drug discovery process. The development of new drugs is costly, time-consuming, and often accompanied by safety issues. Drug repurposing can avoid the expensive and lengthy process of drug development by finding new uses for already approved drugs. Therefore, it is of great significance to develop effective computational methods to predict DTAs. The attention mechanisms allow the computational method to focus on the most relevant parts of the input and have been proven to be useful for various tasks. In this study, we proposed a novel model based on self-attention, called GSATDTA, to predict the binding affinity between drugs and targets. For the representation of drugs, we use Bi-directional Gated Recurrent Units (BiGRU) to extract the SMILES representation from SMILES sequences, and graph neural networks to extract the graph representation of the molecular graphs. Then we utilize an attention mechanism to fuse the two representations of the drug. For the target/protein, we utilized an efficient transformer to learn the representation of the protein, which can capture the long-distance relationships in the sequence of amino acids. We conduct extensive experiments to compare our model with state-of-the-art models. Experimental results show that our model outperforms the current state-of-the-art methods on two independent datasets.
Collapse
Affiliation(s)
- Xiangfeng Yan
- School of Computer Science and Technology, Heilongjiang University Harbin China
| | - Yong Liu
- School of Computer Science and Technology, Heilongjiang University Harbin China
| |
Collapse
|
27
|
Modeling DTA by Combining Multiple-Instance Learning with a Private-Public Mechanism. Int J Mol Sci 2022; 23:ijms231911136. [PMID: 36232434 PMCID: PMC9569912 DOI: 10.3390/ijms231911136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 09/18/2022] [Accepted: 09/19/2022] [Indexed: 11/20/2022] Open
Abstract
The prediction of the strengths of drug–target interactions, also called drug–target binding affinities (DTA), plays a fundamental role in facilitating drug discovery, where the goal is to find prospective drug candidates. With the increase in the number of drug–protein interactions, machine learning techniques, especially deep learning methods, have become applicable for drug–target interaction discovery because they significantly reduce the required experimental workload. In this paper, we present a spontaneous formulation of the DTA prediction problem as an instance of multi-instance learning. We address the problem in three stages, first organizing given drug and target sequences into instances via a private-public mechanism, then identifying the predicted scores of all instances in the same bag, and finally combining all the predicted scores as the output prediction. A comprehensive evaluation demonstrates that the proposed method outperforms other state-of-the-art methods on three benchmark datasets.
Collapse
|
28
|
Avery C, Patterson J, Grear T, Frater T, Jacobs DJ. Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:1246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein-ligand binding, including allosteric effects, protein-protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
Affiliation(s)
- Chris Avery
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - John Patterson
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Tyler Grear
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
- Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Theodore Frater
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Donald J. Jacobs
- Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
29
|
Pandey M, Radaeva M, Mslati H, Garland O, Fernandez M, Ester M, Cherkasov A. Ligand Binding Prediction Using Protein Structure Graphs and Residual Graph Attention Networks. Molecules 2022; 27:molecules27165114. [PMID: 36014351 PMCID: PMC9416537 DOI: 10.3390/molecules27165114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 08/03/2022] [Accepted: 08/09/2022] [Indexed: 11/25/2022] Open
Abstract
Computational prediction of ligand–target interactions is a crucial part of modern drug discovery as it helps to bypass high costs and labor demands of in vitro and in vivo screening. As the wealth of bioactivity data accumulates, it provides opportunities for the development of deep learning (DL) models with increasing predictive powers. Conventionally, such models were either limited to the use of very simplified representations of proteins or ineffective voxelization of their 3D structures. Herein, we present the development of the PSG-BAR (Protein Structure Graph-Binding Affinity Regression) approach that utilizes 3D structural information of the proteins along with 2D graph representations of ligands. The method also introduces attention scores to selectively weight protein regions that are most important for ligand binding. Results: The developed approach demonstrates the state-of-the-art performance on several binding affinity benchmarking datasets. The attention-based pooling of protein graphs enables identification of surface residues as critical residues for protein–ligand binding. Finally, we validate our model predictions against an experimental assay on a viral main protease (Mpro)—the hallmark target of SARS-CoV-2 coronavirus.
Collapse
Affiliation(s)
- Mohit Pandey
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
| | - Mariia Radaeva
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
| | - Hazem Mslati
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
| | - Olivia Garland
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
| | - Michael Fernandez
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
| | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Artem Cherkasov
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC V6T 1Z2, Canada
- Correspondence:
| |
Collapse
|
30
|
Zeng Y, Chen X, Peng D, Zhang L, Huang H. Multi-scaled self-attention for drug-target interaction prediction based on multi-granularity representation. BMC Bioinformatics 2022; 23:314. [PMID: 35922768 PMCID: PMC9347097 DOI: 10.1186/s12859-022-04857-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 07/22/2022] [Indexed: 11/21/2022] Open
Abstract
Background Drug–target interaction (DTI) prediction plays a crucial role in drug discovery. Although the advanced deep learning has shown promising results in predicting DTIs, it still needs improvements in two aspects: (1) encoding method, in which the existing encoding method, character encoding, overlooks chemical textual information of atoms with multiple characters and chemical functional groups; as well as (2) the architecture of deep model, which should focus on multiple chemical patterns in drug and target representations. Results In this paper, we propose a multi-granularity multi-scaled self-attention (SAN) model by alleviating the above problems. Specifically, in process of encoding, we investigate a segmentation method for drug and protein sequences and then label the segmented groups as the multi-granularity representations. Moreover, in order to enhance the various local patterns in these multi-granularity representations, a multi-scaled SAN is built and exploited to generate deep representations of drugs and targets. Finally, our proposed model predicts DTIs based on the fusion of these deep representations. Our proposed model is evaluated on two benchmark datasets, KIBA and Davis. The experimental results reveal that our proposed model yields better prediction accuracy than strong baseline models. Conclusion Our proposed multi-granularity encoding method and multi-scaled SAN model improve DTI prediction by encoding the chemical textual information of drugs and targets and extracting their various local patterns, respectively.
Collapse
Affiliation(s)
- Yuni Zeng
- School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou, China
| | - Xiangru Chen
- College of Computer Science, Sichuan University, Chengdu, China
| | - Dezhong Peng
- College of Computer Science, Sichuan University, Chengdu, China.,Shenzhen Peng Cheng Laboratory, Shenzhen, China.,Chengdu Sobey Digital Technology Co., Ltd, Chengdu, China
| | - Lijun Zhang
- Sichuan Zhiqian Technology Co., Ltd, Chengdu, China.,Chengdu Ruibei Yingte Information Technology Co., Ltd, Chengdu, China
| | - Haixiao Huang
- Sichuan Provincial Commission of Politics and Law, Chengdu, China.
| |
Collapse
|
31
|
Reciprocal perspective as a super learner improves drug-target interaction prediction (MUSDTI). Sci Rep 2022; 12:13237. [PMID: 35918366 PMCID: PMC9344797 DOI: 10.1038/s41598-022-16493-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 07/11/2022] [Indexed: 11/08/2022] Open
Abstract
The identification of novel drug-target interactions (DTI) is critical to drug discovery and drug repurposing to address contemporary medical and public health challenges presented by emergent diseases. Historically, computational methods have framed DTI prediction as a binary classification problem (indicating whether or not a drug physically interacts with a given protein target); however, framing the problem instead as a regression-based prediction of the physiochemical binding affinity is more meaningful. With growing databases of experimentally derived drug-target interactions (e.g. Davis, Binding-DB, and Kiba), deep learning-based DTI predictors can be effectively leveraged to achieve state-of-the-art (SOTA) performance. In this work, we formulated a DTI competition as part of the coursework for a senior undergraduate machine learning course and challenged students to generate component DTI models that might surpass SOTA models and effectively combine these component models as part of a meta-model using the Reciprocal Perspective (RP) multi-view learning framework. Following 6 weeks of concerted effort, 28 student-produced component deep-learning DTI models were leveraged in this work to produce a new SOTA RP-DTI model, denoted the Meta Undergraduate Student DTI (MUSDTI) model. Through a series of experiments we demonstrate that (1) RP can considerably improve SOTA DTI prediction, (2) our new double-cold experimental design is more appropriate for emergent DTI challenges, (3) that our novel MUSDTI meta-model outperforms SOTA models, (4) that RP can improve upon individual models as an ensembling method, and finally, (5) RP can be utilized for low computation transfer learning. This work introduces a number of important revelations for the field of DTI prediction and sequence-based, pairwise prediction in general.
Collapse
|
32
|
Towards computational solutions for precision medicine based big data healthcare system using deep learning models: A review. Comput Biol Med 2022; 149:106020. [DOI: 10.1016/j.compbiomed.2022.106020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/16/2022] [Accepted: 08/20/2022] [Indexed: 12/14/2022]
|
33
|
Zhao Q, Yang M, Cheng Z, Li Y, Wang J. Biomedical Data and Deep Learning Computational Models for Predicting Compound-Protein Relations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2092-2110. [PMID: 33769935 DOI: 10.1109/tcbb.2021.3069040] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The identification of compound-protein relations (CPRs), which includes compound-protein interactions (CPIs) and compound-protein affinities (CPAs), is critical to drug development. A common method for compound-protein relation identification is the use of in vitro screening experiments. However, the number of compounds and proteins is massive, and in vitro screening experiments are labor-intensive, expensive, and time-consuming with high failure rates. Researchers have developed a computational field called virtual screening (VS) to aid experimental drug development. These methods utilize experimentally validated biological interaction information to generate datasets and use the physicochemical and structural properties of compounds and target proteins as input information to train computational prediction models. At present, deep learning has been widely used in computer vision and natural language processing and has experienced epoch-making progress. At the same time, deep learning has also been used in the field of biomedicine widely, and the prediction of CPRs based on deep learning has developed rapidly and has achieved good results. The purpose of this study is to investigate and discuss the latest applications of deep learning techniques in CPR prediction. First, we describe the datasets and feature engineering (i.e., compound and protein representations and descriptors) commonly used in CPR prediction methods. Then, we review and classify recent deep learning approaches in CPR prediction. Next, a comprehensive comparison is performed to demonstrate the prediction performance of representative methods on classical datasets. Finally, we discuss the current state of the field, including the existing challenges and our proposed future directions. We believe that this investigation will provide sufficient references and insight for researchers to understand and develop new deep learning methods to enhance CPR predictions.
Collapse
|
34
|
Jiang M, Wang S, Zhang S, Zhou W, Zhang Y, Li Z. Sequence-based drug-target affinity prediction using weighted graph neural networks. BMC Genomics 2022; 23:449. [PMID: 35715739 PMCID: PMC9205061 DOI: 10.1186/s12864-022-08648-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 05/23/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Affinity prediction between molecule and protein is an important step of virtual screening, which is usually called drug-target affinity (DTA) prediction. Its accuracy directly influences the progress of drug development. Sequence-based drug-target affinity prediction can predict the affinity according to protein sequence, which is fast and can be applied to large datasets. However, due to the lack of protein structure information, the accuracy needs to be improved. RESULTS The proposed model which is called WGNN-DTA can be competent in drug-target affinity (DTA) and compound-protein interaction (CPI) prediction tasks. Various experiments are designed to verify the performance of the proposed method in different scenarios, which proves that WGNN-DTA has the advantages of simplicity and high accuracy. Moreover, because it does not need complex steps such as multiple sequence alignment (MSA), it has fast execution speed, and can be suitable for the screening of large databases. CONCLUSION We construct protein and molecular graphs through sequence and SMILES that can effectively reflect their structures. To utilize the detail contact information of protein, graph neural network is used to extract features and predict the binding affinity based on the graphs, which is called weighted graph neural networks drug-target affinity predictor (WGNN-DTA). The proposed method has the advantages of simplicity and high accuracy.
Collapse
Affiliation(s)
- Mingjian Jiang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266525, China
| | - Shuang Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, China
| | - Shugang Zhang
- College of Computer Science and Technology, Ocean University of China, Qingdao, 266100, China
| | - Wei Zhou
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266525, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266525, China
| | - Zhen Li
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, China.
| |
Collapse
|
35
|
DeepMHADTA: Prediction of Drug-Target Binding Affinity Using Multi-Head Self-Attention and Convolutional Neural Network. Curr Issues Mol Biol 2022; 44:2287-2299. [PMID: 35678684 PMCID: PMC9164023 DOI: 10.3390/cimb44050155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 05/08/2022] [Accepted: 05/14/2022] [Indexed: 11/17/2022] Open
Abstract
Drug-target interactions provide insight into the drug-side effects and drug repositioning. However, wet-lab biochemical experiments are time-consuming and labor-intensive, and are insufficient to meet the pressing demand for drug research and development. With the rapid advancement of deep learning, computational methods are increasingly applied to screen drug-target interactions. Many methods consider this problem as a binary classification task (binding or not), but ignore the quantitative binding affinity. In this paper, we propose a new end-to-end deep learning method called DeepMHADTA, which uses the multi-head self-attention mechanism in a deep residual network to predict drug-target binding affinity. On two benchmark datasets, our method outperformed several current state-of-the-art methods in terms of multiple performance measures, including mean square error (MSE), consistency index (CI), rm2, and PR curve area (AUPR). The results demonstrated that our method achieved better performance in predicting the drug–target binding affinity.
Collapse
|
36
|
Wang L, Wong L, Chen ZH, Hu J, Sun XF, Li Y, You ZH. MSPEDTI: Prediction of Drug-Target Interactions via Molecular Structure with Protein Evolutionary Information. BIOLOGY 2022; 11:740. [PMID: 35625468 PMCID: PMC9138588 DOI: 10.3390/biology11050740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/03/2022] [Accepted: 05/04/2022] [Indexed: 11/25/2022]
Abstract
The key to new drug discovery and development is first and foremost the search for molecular targets of drugs, thus advancing drug discovery and drug repositioning. However, traditional drug-target interactions (DTIs) is a costly, lengthy, high-risk, and low-success-rate system project. Therefore, more and more pharmaceutical companies are trying to use computational technologies to screen existing drug molecules and mine new drugs, leading to accelerating new drug development. In the current study, we designed a deep learning computational model MSPEDTI based on Molecular Structure and Protein Evolutionary to predict the potential DTIs. The model first fuses protein evolutionary information and drug structure information, then a deep learning convolutional neural network (CNN) to mine its hidden features, and finally accurately predicts the associated DTIs by extreme learning machine (ELM). In cross-validation experiments, MSPEDTI achieved 94.19%, 90.95%, 87.95%, and 86.11% prediction accuracy in the gold-standard datasets enzymes, ion channels, G-protein-coupled receptors (GPCRs), and nuclear receptors, respectively. MSPEDTI showed its competitive ability in ablation experiments and comparison with previous excellent methods. Additionally, 7 of 10 potential DTIs predicted by MSPEDTI were substantiated by the classical database. These excellent outcomes demonstrate the ability of MSPEDTI to provide reliable drug candidate targets and strongly facilitate the development of drug repositioning and drug development.
Collapse
Affiliation(s)
- Lei Wang
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China;
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China; (J.H.); (X.-F.S.)
| | - Leon Wong
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China;
| | - Zhan-Heng Chen
- Computer Science and Technology, Tongji University, Shanghai 200092, China;
| | - Jing Hu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China; (J.H.); (X.-F.S.)
| | - Xiao-Fei Sun
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China; (J.H.); (X.-F.S.)
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China;
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|
37
|
Bai Q, Liu S, Tian Y, Xu T, Banegas‐Luna AJ, Pérez‐Sánchez H, Huang J, Liu H, Yao X. Application advances of deep learning methods for de novo drug design and molecular dynamics simulation. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1581] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Qifeng Bai
- Key Lab of Preclinical Study for New Drugs of Gansu Province Institute of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Lanzhou University Lanzhou Gansu China
| | - Shuo Liu
- School of Pharmacy Lanzhou University Lanzhou Gansu China
| | - Yanan Tian
- School of Pharmacy Lanzhou University Lanzhou Gansu China
| | - Tingyang Xu
- Tencent AI Lab, Shenzhen Tencent Computer Ltd Shenzhen China
| | - Antonio Jesús Banegas‐Luna
- Structural Bioinformatics and High Performance Computing Research Group (BIO‐HPC), Computer Engineering Department UCAM Universidad Católica de Murcia Murcia Spain
| | - Horacio Pérez‐Sánchez
- Structural Bioinformatics and High Performance Computing Research Group (BIO‐HPC), Computer Engineering Department UCAM Universidad Católica de Murcia Murcia Spain
| | - Junzhou Huang
- Tencent AI Lab, Shenzhen Tencent Computer Ltd Shenzhen China
| | - Huanxiang Liu
- School of Pharmacy Lanzhou University Lanzhou Gansu China
| | - Xiaojun Yao
- College of Chemistry and Chemical Engineering Lanzhou University Lanzhou Gansu China
| |
Collapse
|
38
|
Oyelade ON, Ezugwu AE, Almutairi MS, Saha AK, Abualigah L, Chiroma H. A generative adversarial network for synthetization of regions of interest based on digital mammograms. Sci Rep 2022; 12:6166. [PMID: 35418566 PMCID: PMC9008034 DOI: 10.1038/s41598-022-09929-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 03/23/2022] [Indexed: 11/09/2022] Open
Abstract
Deep learning (DL) models are becoming pervasive and applicable to computer vision, image processing, and synthesis problems. The performance of these models is often improved through architectural configuration, tweaks, the use of enormous training data, and skillful selection of hyperparameters. The application of deep learning models to medical image processing has yielded interesting performance, capable of correctly detecting abnormalities in medical digital images, making them surpass human physicians. However, advancing research in this domain largely relies on the availability of training datasets. These datasets are sometimes not publicly accessible, insufficient for training, and may also be characterized by a class imbalance among samples. As a result, inadequate training samples and difficulty in accessing new datasets for training deep learning models limit performance and research into new domains. Hence, generative adversarial networks (GANs) have been proposed to mediate this gap by synthesizing data similar to real sample images. However, we observed that benchmark datasets with regions of interest (ROIs) for characterizing abnormalities in breast cancer using digital mammography do not contain sufficient data with a fair distribution of all cases of abnormalities. For instance, the architectural distortion and breast asymmetry in digital mammograms are sparsely distributed across most publicly available datasets. This paper proposes a GAN model, named ROImammoGAN, which synthesizes ROI-based digital mammograms. Our approach involves the design of a GAN model consisting of both a generator and a discriminator to learn a hierarchy of representations for abnormalities in digital mammograms. Attention is given to architectural distortion, asymmetry, mass, and microcalcification abnormalities so that training distinctively learns the features of each abnormality and generates sufficient images for each category. The proposed GAN model was applied to MIAS datasets, and the performance evaluation yielded a competitive accuracy for the synthesized samples. In addition, the quality of the images generated was also evaluated using PSNR, SSIM, FSIM, BRISQUE, PQUE, NIQUE, FID, and geometry scores. The results showed that ROImammoGAN performed competitively with state-of-the-art GANs. The outcome of this study is a model for augmenting CNN models with ROI-centric image samples for the characterization of abnormalities in breast images.
Collapse
Affiliation(s)
- Olaide N Oyelade
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa.
| | - Absalom E Ezugwu
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa.
| | - Mubarak S Almutairi
- University of Hafr Al Batin, College of Computer Science and Engineering, Hafar Al Batin, Saudi Arabia
| | - Apu Kumar Saha
- Department of Mathematics, National Institute of Technology Agartala, Agartala, India
| | - Laith Abualigah
- Faculty of Computer Sciences and Informatics, Amman Arab University, Amman, 11953, Jordan
- School of Computer Sciences, Universiti Sains Malaysia, 11800, Gelugor, Pulau Pinang, Malaysia
| | - Haruna Chiroma
- University of Hafr Al Batin, College of Computer Science and Engineering, Hafar Al Batin, Saudi Arabia.
| |
Collapse
|
39
|
Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning. Sci Rep 2022; 12:4751. [PMID: 35306525 PMCID: PMC8934358 DOI: 10.1038/s41598-022-08787-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 03/08/2022] [Indexed: 11/21/2022] Open
Abstract
Drug-target interaction (DTI) prediction plays a crucial role in drug repositioning and virtual drug screening. Most DTI prediction methods cast the problem as a binary classification task to predict if interactions exist or as a regression task to predict continuous values that indicate a drug's ability to bind to a specific target. The regression-based methods provide insight beyond the binary relationship. However, most of these methods require the three-dimensional (3D) structural information of targets which are still not generally available to the targets. Despite this bottleneck, only a few methods address the drug-target binding affinity (DTBA) problem from a non-structure-based approach to avoid the 3D structure limitations. Here we propose Affinity2Vec, as a novel regression-based method that formulates the entire task as a graph-based problem. To develop this method, we constructed a weighted heterogeneous graph that integrates data from several sources, including drug-drug similarity, target-target similarity, and drug-target binding affinities. Affinity2Vec further combines several computational techniques from feature representation learning, graph mining, and machine learning to generate or extract features, build the model, and predict the binding affinity between the drug and the target with no 3D structural data. We conducted extensive experiments to evaluate and demonstrate the robustness and efficiency of the proposed method on benchmark datasets used in state-of-the-art non-structured-based drug-target binding affinity studies. Affinity2Vec showed superior and competitive results compared to the state-of-the-art methods based on several evaluation metrics, including mean squared error, rm2, concordance index, and area under the precision-recall curve.
Collapse
|
40
|
Du BX, Qin Y, Jiang YF, Xu Y, Yiu SM, Yu H, Shi JY. Compound–protein interaction prediction by deep learning: Databases, descriptors and models. Drug Discov Today 2022; 27:1350-1366. [DOI: 10.1016/j.drudis.2022.02.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 11/19/2021] [Accepted: 02/28/2022] [Indexed: 11/24/2022]
|
41
|
Born J, Huynh T, Stroobants A, Cornell WD, Manica M. Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model. J Chem Inf Model 2021; 62:240-257. [PMID: 34905358 DOI: 10.1021/acs.jcim.1c00889] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Recent advances in deep learning have enabled the development of large-scale multimodal models for virtual screening and de novo molecular design. The human kinome with its abundant sequence and inhibitor data presents an attractive opportunity to develop proteochemometric models that exploit the size and internal diversity of this family of targets. Here, we challenge a standard practice in sequence-based affinity prediction models: instead of leveraging the full primary structure of proteins, each target is represented by a sequence of 29 discontiguous residues defining the ATP binding site. In kinase-ligand binding affinity prediction, our results show that the reduced active site sequence representation is not only computationally more efficient but consistently yields significantly higher performance than the full primary structure. This trend persists across different models, data sets, and performance metrics and holds true when predicting pIC50 for both unseen ligands and kinases. Our interpretability analysis reveals a potential explanation for the superiority of the active site models: whereas only mild statistical effects about the extraction of three-dimensional (3D) interaction sites take place in the full sequence models, the active site models are equipped with an implicit but strong inductive bias about the 3D structure stemming from the discontiguity of the active sites. Moreover, in direct comparisons, our models perform similarly or better than previous state-of-the-art approaches in affinity prediction. We then investigate a de novo molecular design task and find that the active site provides benefits in the computational efficiency, but otherwise, both kinase representations yield similar optimized affinities (for both SMILES- and SELFIES-based molecular generators). Our work challenges the assumption that the full primary structure is indispensable for modeling human kinases.
Collapse
Affiliation(s)
- Jannis Born
- IBM Research Europe, 8804 Rüschlikon, Switzerland.,Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Tien Huynh
- IBM Research, Yorktown Heights, New York 10598, United States
| | - Astrid Stroobants
- Department of Chemistry, Imperial College London, SW7 2AZ London, United Kingdom
| | - Wendy D Cornell
- IBM Research, Yorktown Heights, New York 10598, United States
| | | |
Collapse
|
42
|
Nayarisseri A, Khandelwal R, Tanwar P, Madhavi M, Sharma D, Thakur G, Speck-Planche A, Singh SK. Artificial Intelligence, Big Data and Machine Learning Approaches in Precision Medicine & Drug Discovery. Curr Drug Targets 2021; 22:631-655. [PMID: 33397265 DOI: 10.2174/1389450122999210104205732] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Revised: 08/21/2020] [Accepted: 09/14/2020] [Indexed: 11/22/2022]
Abstract
Artificial Intelligence revolutionizes the drug development process that can quickly identify potential biologically active compounds from millions of candidate within a short period. The present review is an overview based on some applications of Machine Learning based tools, such as GOLD, Deep PVP, LIB SVM, etc. and the algorithms involved such as support vector machine (SVM), random forest (RF), decision tree and Artificial Neural Network (ANN), etc. at various stages of drug designing and development. These techniques can be employed in SNP discoveries, drug repurposing, ligand-based drug design (LBDD), Ligand-based Virtual Screening (LBVS) and Structure- based Virtual Screening (SBVS), Lead identification, quantitative structure-activity relationship (QSAR) modeling, and ADMET analysis. It is demonstrated that SVM exhibited better performance in indicating that the classification model will have great applications on human intestinal absorption (HIA) predictions. Successful cases have been reported which demonstrate the efficiency of SVM and RF models in identifying JFD00950 as a novel compound targeting against a colon cancer cell line, DLD-1, by inhibition of FEN1 cytotoxic and cleavage activity. Furthermore, a QSAR model was also used to predict flavonoid inhibitory effects on AR activity as a potent treatment for diabetes mellitus (DM), using ANN. Hence, in the era of big data, ML approaches have been evolved as a powerful and efficient way to deal with the huge amounts of generated data from modern drug discovery to model small-molecule drugs, gene biomarkers and identifying the novel drug targets for various diseases.
Collapse
Affiliation(s)
- Anuraj Nayarisseri
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| | - Ravina Khandelwal
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| | - Poonam Tanwar
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| | - Maddala Madhavi
- Department of Zoology, Nizam College, Osmania University, Hyderabad - 500001, Telangana State, India
| | - Diksha Sharma
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| | - Garima Thakur
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| | - Alejandro Speck-Planche
- Programa Institucional de Fomento a la Investigacion, Desarrollo e Innovacion, Universidad Tecnologica Metropolitana, Ignacio Valdivieso 2409, P.O. 8940577, San Joaquin, Santiago, Chile
| | - Sanjeev Kumar Singh
- Computer Aided Drug Designing and Molecular Modeling Lab, Department of Bioinformatics, Alagappa University, Karaikudi-630003, Tamil Nadu, India
| |
Collapse
|
43
|
Zeng Y, Chen X, Luo Y, Li X, Peng D. Deep drug-target binding affinity prediction with multiple attention blocks. Brief Bioinform 2021; 22:6231754. [PMID: 33866349 PMCID: PMC8083346 DOI: 10.1093/bib/bbab117] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 02/12/2021] [Accepted: 03/13/2021] [Indexed: 11/23/2022] Open
Abstract
Drug-target interaction (DTI) prediction has drawn increasing interest due to its substantial position in the drug discovery process. Many studies have introduced computational models to treat DTI prediction as a regression task, which directly predict the binding affinity of drug-target pairs. However, existing studies (i) ignore the essential correlations between atoms when encoding drug compounds and (ii) model the interaction of drug-target pairs simply by concatenation. Based on those observations, in this study, we propose an end-to-end model with multiple attention blocks to predict the binding affinity scores of drug-target pairs. Our proposed model offers the abilities to (i) encode the correlations between atoms by a relation-aware self-attention block and (ii) model the interaction of drug representations and target representations by the multi-head attention block. Experimental results of DTI prediction on two benchmark datasets show our approach outperforms existing methods, which are benefit from the correlation information encoded by the relation-aware self-attention block and the interaction information extracted by the multi-head attention block. Moreover, we conduct the experiments on the effects of max relative position length and find out the best max relative position length value \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$k \in \{3, 5\}$\end{document}. Furthermore, we apply our model to predict the binding affinity of Corona Virus Disease 2019 (COVID-19)-related genome sequences and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$3137$\end{document} FDA-approved drugs.
Collapse
Affiliation(s)
- Yuni Zeng
- College of Computer Science, Sichuan University, Chengdu, Sichuan,610065, China
| | - Xiangru Chen
- College of Computer Science, Sichuan University, Chengdu, Sichuan,610065, China
| | - Yujie Luo
- Shenzhen Peng Cheng Laboratory, Shenzhen, 518052, China
| | - Xuedong Li
- Chengdu Sobey Digital Technology Co., Ltd, Chengdu, 610041,China
| | - Dezhong Peng
- College of Computer Science, Sichuan University, Chengdu, Sichuan,610065, China
| |
Collapse
|
44
|
Yang S, Zhu F, Ling X, Liu Q, Zhao P. Intelligent Health Care: Applications of Deep Learning in Computational Medicine. Front Genet 2021; 12:607471. [PMID: 33912213 PMCID: PMC8075004 DOI: 10.3389/fgene.2021.607471] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 03/05/2021] [Indexed: 12/24/2022] Open
Abstract
With the progress of medical technology, biomedical field ushered in the era of big data, based on which and driven by artificial intelligence technology, computational medicine has emerged. People need to extract the effective information contained in these big biomedical data to promote the development of precision medicine. Traditionally, the machine learning methods are used to dig out biomedical data to find the features from data, which generally rely on feature engineering and domain knowledge of experts, requiring tremendous time and human resources. Different from traditional approaches, deep learning, as a cutting-edge machine learning branch, can automatically learn complex and robust feature from raw data without the need for feature engineering. The applications of deep learning in medical image, electronic health record, genomics, and drug development are studied, where the suggestion is that deep learning has obvious advantage in making full use of biomedical data and improving medical health level. Deep learning plays an increasingly important role in the field of medical health and has a broad prospect of application. However, the problems and challenges of deep learning in computational medical health still exist, including insufficient data, interpretability, data privacy, and heterogeneity. Analysis and discussion on these problems provide a reference to improve the application of deep learning in medical health.
Collapse
Affiliation(s)
- Sijie Yang
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Fei Zhu
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Xinghong Ling
- School of Computer Science and Technology, Soochow University, Suzhou, China
- WenZheng College of Soochow University, Suzhou, China
| | - Quan Liu
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Peiyao Zhao
- School of Computer Science and Technology, Soochow University, Suzhou, China
| |
Collapse
|
45
|
Chen Z, Hu L, Zhang BT, Lu A, Wang Y, Yu Y, Zhang G. Artificial Intelligence in Aptamer-Target Binding Prediction. Int J Mol Sci 2021; 22:3605. [PMID: 33808496 PMCID: PMC8038094 DOI: 10.3390/ijms22073605] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 03/25/2021] [Accepted: 03/26/2021] [Indexed: 12/18/2022] Open
Abstract
Aptamers are short single-stranded DNA, RNA, or synthetic Xeno nucleic acids (XNA) molecules that can interact with corresponding targets with high affinity. Owing to their unique features, including low cost of production, easy chemical modification, high thermal stability, reproducibility, as well as low levels of immunogenicity and toxicity, aptamers can be used as an alternative to antibodies in diagnostics and therapeutics. Systematic evolution of ligands by exponential enrichment (SELEX), an experimental approach for aptamer screening, allows the selection and identification of in vitro aptamers with high affinity and specificity. However, the SELEX process is time consuming and characterization of the representative aptamer candidates from SELEX is rather laborious. Artificial intelligence (AI) could help to rapidly identify the potential aptamer candidates from a vast number of sequences. This review discusses the advancements of AI pipelines/methods, including structure-based and machine/deep learning-based methods, for predicting the binding ability of aptamers to targets. Structure-based methods are the most used in computer-aided drug design. For this part, we review the secondary and tertiary structure prediction methods for aptamers, molecular docking, as well as molecular dynamic simulation methods for aptamer-target binding. We also performed analysis to compare the accuracy of different secondary and tertiary structure prediction methods for aptamers. On the other hand, advanced machine-/deep-learning models have witnessed successes in predicting the binding abilities between targets and ligands in drug discovery and thus potentially offer a robust and accurate approach to predict the binding between aptamers and targets. The research utilizing machine-/deep-learning techniques for prediction of aptamer-target binding is limited currently. Therefore, perspectives for models, algorithms, and implementation strategies of machine/deep learning-based methods are discussed. This review could facilitate the development and application of high-throughput and less laborious in silico methods in aptamer selection and characterization.
Collapse
Affiliation(s)
- Zihao Chen
- School of Chinese Medicine, The Chinese University of Hong Kong, Hong Kong, China; (Z.C.); (B.-T.Z.)
| | - Long Hu
- Law Sau Fai Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, China;
| | - Bao-Ting Zhang
- School of Chinese Medicine, The Chinese University of Hong Kong, Hong Kong, China; (Z.C.); (B.-T.Z.)
| | - Aiping Lu
- Institute of Integrated Bioinformedicine and Translational Science, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, China;
- Guangdong-Hong Kong Macao Greater Bay Area International Research Platform for Aptamer-Based Translational Medicine and Drug Discovery, Hong Kong, China
| | - Yaofeng Wang
- Centre for Regenerative Medicine and Health, Hong Kong Institute of Science & Innovation, Chinese Academy of Sciences, Hong Kong, China
| | - Yuanyuan Yu
- Law Sau Fai Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, China;
- Guangdong-Hong Kong Macao Greater Bay Area International Research Platform for Aptamer-Based Translational Medicine and Drug Discovery, Hong Kong, China
| | - Ge Zhang
- Law Sau Fai Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, China;
- Guangdong-Hong Kong Macao Greater Bay Area International Research Platform for Aptamer-Based Translational Medicine and Drug Discovery, Hong Kong, China
| |
Collapse
|
46
|
Lim S, Lu Y, Cho CY, Sung I, Kim J, Kim Y, Park S, Kim S. A review on compound-protein interaction prediction methods: Data, format, representation and model. Comput Struct Biotechnol J 2021; 19:1541-1556. [PMID: 33841755 PMCID: PMC8008185 DOI: 10.1016/j.csbj.2021.03.004] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 02/28/2021] [Accepted: 03/01/2021] [Indexed: 01/27/2023] Open
Abstract
There has recently been a rapid progress in computational methods for determining protein targets of small molecule drugs, which will be termed as compound protein interaction (CPI). In this review, we comprehensively review topics related to computational prediction of CPI. Data for CPI has been accumulated and curated significantly both in quantity and quality. Computational methods have become powerful ever to analyze such complex the data. Thus, recent successes in the improved quality of CPI prediction are due to use of both sophisticated computational techniques and higher quality information in the databases. The goal of this article is to provide reviews of topics related to CPI, such as data, format, representation, to computational models, so that researchers can take full advantages of these resources to develop novel prediction methods. Chemical compounds and protein data from various resources were discussed in terms of data formats and encoding schemes. For the CPI methods, we grouped prediction methods into five categories from traditional machine learning techniques to state-of-the-art deep learning techniques. In closing, we discussed emerging machine learning topics to help both experimental and computational scientists leverage the current knowledge and strategies to develop more powerful and accurate CPI prediction methods.
Collapse
Affiliation(s)
- Sangsoo Lim
- Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea
| | - Yijingxiu Lu
- Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
| | - Chang Yun Cho
- Institute of Engineering Research, Seoul National University, Seoul, Republic of Korea
| | - Inyoung Sung
- Institute of Engineering Research, Seoul National University, Seoul, Republic of Korea
| | - Jungwoo Kim
- Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
| | - Youngkuk Kim
- Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
| | - Sungjoon Park
- Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
| | - Sun Kim
- Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea
- Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
- Institute of Engineering Research, Seoul National University, Seoul, Republic of Korea
- Interdisciplinary Program in Bioinformatics, College of Natural Sciences, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
47
|
Shim J, Hong ZY, Sohn I, Hwang C. Prediction of drug-target binding affinity using similarity-based convolutional neural network. Sci Rep 2021; 11:4416. [PMID: 33627791 PMCID: PMC7904939 DOI: 10.1038/s41598-021-83679-y] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 01/18/2021] [Indexed: 12/02/2022] Open
Abstract
Identifying novel drug–target interactions (DTIs) plays an important role in drug discovery. Most of the computational methods developed for predicting DTIs use binary classification, whose goal is to determine whether or not a drug–target (DT) pair interacts. However, it is more meaningful but also more challenging to predict the binding affinity that describes the strength of the interaction between a DT pair. If the binding affinity is not sufficiently large, such drug may not be useful. Therefore, the methods for predicting DT binding affinities are very valuable. The increase in novel public affinity data available in the DT-related databases enables advanced deep learning techniques to be used to predict binding affinities. In this paper, we propose a similarity-based model that applies 2-dimensional (2D) convolutional neural network (CNN) to the outer products between column vectors of two similarity matrices for the drugs and targets to predict DT binding affinities. To our best knowledge, this is the first application of 2D CNN in similarity-based DT binding affinity prediction. The validation results on multiple public datasets show that the proposed model is an effective approach for DT binding affinity prediction and can be quite helpful in drug development process.
Collapse
Affiliation(s)
- Jooyong Shim
- Department of Statistics, Institute of Statistical Information, Inje University, Gimhae, Gyeongsangnamdo, South Korea
| | | | | | - Changha Hwang
- Department of Applied Statistics, Dankook University, Yongin, Gyeonggido, 16890, South Korea.
| |
Collapse
|
48
|
Mahmud M, Kaiser MS, McGinnity TM, Hussain A. Deep Learning in Mining Biological Data. Cognit Comput 2021; 13:1-33. [PMID: 33425045 PMCID: PMC7783296 DOI: 10.1007/s12559-020-09773-x] [Citation(s) in RCA: 100] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 09/28/2020] [Indexed: 02/06/2023]
Abstract
Recent technological advancements in data acquisition tools allowed life scientists to acquire multimodal data from different biological application domains. Categorized in three broad types (i.e. images, signals, and sequences), these data are huge in amount and complex in nature. Mining such enormous amount of data for pattern recognition is a big challenge and requires sophisticated data-intensive machine learning techniques. Artificial neural network-based learning systems are well known for their pattern recognition capabilities, and lately their deep architectures-known as deep learning (DL)-have been successfully applied to solve many complex pattern recognition problems. To investigate how DL-especially its different architectures-has contributed and been utilized in the mining of biological data pertaining to those three types, a meta-analysis has been performed and the resulting resources have been critically analysed. Focusing on the use of DL to analyse patterns in data from diverse biological domains, this work investigates different DL architectures' applications to these data. This is followed by an exploration of available open access data sources pertaining to the three data types along with popular open-source DL tools applicable to these data. Also, comparative investigations of these tools from qualitative, quantitative, and benchmarking perspectives are provided. Finally, some open research challenges in using DL to mine biological data are outlined and a number of possible future perspectives are put forward.
Collapse
Affiliation(s)
- Mufti Mahmud
- Department of Computer Science, Nottingham Trent University, Clifton, NG11 8NS Nottingham, UK
- Medical Technology Innovation Facility, Nottingham Trent University, NG11 8NS Clifton, Nottingham, UK
| | - M. Shamim Kaiser
- Institute of Information Technology, Jahangirnagar University, Savar 1342 Dhaka, Bangladesh
| | - T. Martin McGinnity
- Department of Computer Science, Nottingham Trent University, Clifton, NG11 8NS Nottingham, UK
- Intelligent Systems Research Centre, Ulster University, Northern Ireland BT48 7JL Derry, UK
| | - Amir Hussain
- School of Computing , Edinburgh, EH11 4BN Edinburgh, UK
| |
Collapse
|
49
|
Abdel-Basset M, Hawash H, Elhoseny M, Chakrabortty RK, Ryan M. DeepH-DTA: Deep Learning for Predicting Drug-Target Interactions: A Case Study of COVID-19 Drug Repurposing. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:170433-170451. [PMID: 34786289 PMCID: PMC8545313 DOI: 10.1109/access.2020.3024238] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 09/11/2020] [Indexed: 05/04/2023]
Abstract
The rapid spread of novel coronavirus pneumonia (COVID-19) has led to a dramatically increased mortality rate worldwide. Despite many efforts, the rapid development of an effective vaccine for this novel virus will take considerable time and relies on the identification of drug-target (DT) interactions utilizing commercially available medication to identify potential inhibitors. Motivated by this, we propose a new framework, called DeepH-DTA, for predicting DT binding affinities for heterogeneous drugs. We propose a heterogeneous graph attention (HGAT) model to learn topological information of compound molecules and bidirectional ConvLSTM layers for modeling spatio-sequential information in simplified molecular-input line-entry system (SMILES) sequences of drug data. For protein sequences, we propose a squeezed-excited dense convolutional network for learning hidden representations within amino acid sequences; while utilizing advanced embedding techniques for encoding both kinds of input sequences. The performance of DeepH-DTA is evaluated through extensive experiments against cutting-edge approaches utilising two public datasets (Davis, and KIBA) which comprise eclectic samples of the kinase protein family and the pertinent inhibitors. DeepH-DTA attains the highest Concordance Index (CI) of 0.924 and 0.927 and also achieved a mean square error (MSE) of 0.195 and 0.111 on the Davis and KIBA datasets respectively. Moreover, a study using FDA-approved drugs from the Drug Bank database is performed using DeepH-DTA to predict the affinity scores of drugs against SARS-CoV-2 amino acid sequences, and the results show that that the model can predict some of the SARS-Cov-2 inhibitors that have been recently approved in many clinical studies.
Collapse
Affiliation(s)
| | - Hossam Hawash
- Faculty of Computers and InformaticsZagazig University Zagazig 44519 Egypt
| | - Mohamed Elhoseny
- Department of Computer ScienceCollege of Computer Information TechnologyAmerican University in the Emirates Dubai 503000 United Arab Emirates
- Faculty of Computers and InformationMansoura University Mansoura 35516 Egypt
| | - Ripon K Chakrabortty
- Capability Systems Centre, School of Engineering and ITUniversity of New South Wales Canberra Canberra ACT 2612 Australia
| | - Michael Ryan
- Capability Systems Centre, School of Engineering and ITUniversity of New South Wales Canberra Canberra ACT 2612 Australia
| |
Collapse
|