1
|
Tao W, Lin X, Liu Y, Zeng L, Ma T, Cheng N, Jiang J, Zeng X, Yuan S. Bridging chemical structure and conceptual knowledge enables accurate prediction of compound-protein interaction. BMC Biol 2024; 22:248. [PMID: 39468510 PMCID: PMC11520867 DOI: 10.1186/s12915-024-02049-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Accepted: 10/17/2024] [Indexed: 10/30/2024] Open
Abstract
BACKGROUND Accurate prediction of compound-protein interaction (CPI) plays a crucial role in drug discovery. Existing data-driven methods aim to learn from the chemical structures of compounds and proteins yet ignore the conceptual knowledge that is the interrelationships among the fundamental elements in the biomedical knowledge graph (KG). Knowledge graphs provide a comprehensive view of entities and relationships beyond individual compounds and proteins. They encompass a wealth of information like pathways, diseases, and biological processes, offering a richer context for CPI prediction. This contextual information can be used to identify indirect interactions, infer potential relationships, and improve prediction accuracy. In real-world applications, the prevalence of knowledge-missing compounds and proteins is a critical barrier for injecting knowledge into data-driven models. RESULTS Here, we propose BEACON, a data and knowledge dual-driven framework that bridges chemical structure and conceptual knowledge for CPI prediction. The proposed BEACON learns the consistent representations by maximizing the mutual information between chemical structure and conceptual knowledge and predicts the missing representations by minimizing their conditional entropy. BEACON achieves state-of-the-art performance on multiple datasets compared to competing methods, notably with 5.1% and 6.6% performance gain on the BIOSNAP and DrugBank datasets, respectively. Moreover, BEACON is the only approach capable of effectively predicting knowledge representations for knowledge-lacking compounds and proteins. CONCLUSIONS Overall, our work provides a general approach for directly injecting conceptual knowledge to enhance the performance of CPI prediction.
Collapse
Affiliation(s)
- Wen Tao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Xuan Lin
- School of Computer Science, Xiangtan University, Xiangtan, 411105, Hunan, China
- Laboratory of Intelligent Computing and Information Processing, Ministry of Education (Xiangtan University), Xiangtan, 411105, Hunan, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China.
- Laboratory of Intelligent Computing and Information Processing, Ministry of Education (Xiangtan University), Xiangtan, 411105, Hunan, China.
| | - Li Zeng
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai, 201109, Shanghai, China
| | - Tengfei Ma
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Ning Cheng
- School of Informatics, Hunan University of Chinese Medicine, Changsha, 410208, Hunan, China
| | - Jing Jiang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Sisi Yuan
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, 28223, NC, USA.
| |
Collapse
|
2
|
Alazmi M. Enzyme catalytic efficiency prediction: employing convolutional neural networks and XGBoost. Front Artif Intell 2024; 7:1446063. [PMID: 39498388 PMCID: PMC11532030 DOI: 10.3389/frai.2024.1446063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Accepted: 10/07/2024] [Indexed: 11/07/2024] Open
Abstract
Introduction In the intricate realm of enzymology, the precise quantification of enzyme efficiency, epitomized by the turnover number (k cat), is a paramount yet elusive objective. Existing methodologies, though sophisticated, often grapple with the inherent stochasticity and multifaceted nature of enzymatic reactions. Thus, there arises a necessity to explore avant-garde computational paradigms. Methods In this context, we introduce "enzyme catalytic efficiency prediction (ECEP)," leveraging advanced deep learning techniques to enhance the previous implementation, TurNuP, for predicting the enzyme catalase k cat. Our approach significantly outperforms prior methodologies, incorporating new features derived from enzyme sequences and chemical reaction dynamics. Through ECEP, we unravel the intricate enzyme-substrate interactions, capturing the nuanced interplay of molecular determinants. Results Preliminary assessments, compared against established models like TurNuP and DLKcat, underscore the superior predictive capabilities of ECEP, marking a pivotal shift in silico enzymatic turnover number estimation. This study enriches the computational toolkit available to enzymologists and lays the groundwork for future explorations in the burgeoning field of bioinformatics. This paper suggested a multi-feature ensemble deep learning-based approach to predict enzyme kinetic parameters using an ensemble convolution neural network and XGBoost by calculating weighted-average of each feature-based model's output to outperform traditional machine learning methods. The proposed "ECEP" model significantly outperformed existing methodologies, achieving a mean squared error (MSE) reduction of 0.35 from 0.81 to 0.46 and R-squared score from 0.44 to 0.54, thereby demonstrating its superior accuracy and effectiveness in enzyme catalytic efficiency prediction. Discussion This improvement underscores the model's potential to enhance the field of bioinformatics, setting a new benchmark for performance.
Collapse
Affiliation(s)
- Meshari Alazmi
- College of Computer Science and Engineering, University of Ha’il, Ha’il, Saudi Arabia
| |
Collapse
|
3
|
Lam HYI, Guan JS, Ong XE, Pincket R, Mu Y. Protein language models are performant in structure-free virtual screening. Brief Bioinform 2024; 25:bbae480. [PMID: 39327890 PMCID: PMC11427677 DOI: 10.1093/bib/bbae480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 08/17/2024] [Accepted: 09/12/2024] [Indexed: 09/28/2024] Open
Abstract
Hitherto virtual screening (VS) has been typically performed using a structure-based drug design paradigm. Such methods typically require the use of molecular docking on high-resolution three-dimensional structures of a target protein-a computationally-intensive and time-consuming exercise. This work demonstrates that by employing protein language models and molecular graphs as inputs to a novel graph-to-transformer cross-attention mechanism, a screening power comparable to state-of-the-art structure-based models can be achieved. The implications thereof include highly expedited VS due to the greatly reduced compute required to run this model, and the ability to perform early stages of computer-aided drug design in the complete absence of 3D protein structures.
Collapse
Affiliation(s)
- Hilbert Yuen In Lam
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore, Republic of Singapore
- MagMol Pte. Ltd., 68 Circular Road, #02-01, Singapore 049422, Singapore, Republic of Singapore
| | - Jia Sheng Guan
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore, Republic of Singapore
| | - Xing Er Ong
- MagMol Pte. Ltd., 68 Circular Road, #02-01, Singapore 049422, Singapore, Republic of Singapore
| | - Robbe Pincket
- Heliovision, Asstraat 5, 3000 Leuven, Leuven, Kingdom of Belgium
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore, Republic of Singapore
- MagMol Pte. Ltd., 68 Circular Road, #02-01, Singapore 049422, Singapore, Republic of Singapore
| |
Collapse
|
4
|
Zhao L, Zhu Y, Wen N, Wang C, Wang J, Yuan Y. Drug-Target Binding Affinity Prediction in a Continuous Latent Space Using Variational Autoencoders. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1458-1467. [PMID: 38767996 DOI: 10.1109/tcbb.2024.3402661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Accurate prediction of Drug-Target binding Affinity (DTA) is a daunting yet pivotal task in the sphere of drug discovery. Over the years, a plethora of deep learning-based DTA models have emerged, rendering promising results in predicting the binding affinities between drugs and their target proteins. However, in contrast to the conventional approach of modeling binding affinity in vector spaces, we propose a more nuanced modeling process in a continuous space to account for the diversity of input samples. Initially, the drug is encoded using the Simplified Molecular Input Line Entry System (SMILES), while the target sequences are characterized via a pretrained language model. Subsequently, highly correlative information is extracted utilizing residual gated convolutional neural networks. In a departure from existing deep learning-based models, our model learns the hidden representations of the drugs and targets jointly. Instead of employing two vectors, our hidden representations consist of two Gaussian distributions. To validate the effectiveness of our proposal, we conducted evaluations on commonly utilized benchmark datasets. The experimental outcomes corroborated that our method surpasses the state-of-the-art vectorial representation methods in terms of performance. This approach, therefore, offers potential enhancements in the precision of DTA predictions, potentially contributing to more efficient drug discovery processes.
Collapse
|
5
|
Wu H, Liu J, Zhang R, Lu Y, Cui G, Cui Z, Ding Y. A review of deep learning methods for ligand based drug virtual screening. FUNDAMENTAL RESEARCH 2024; 4:715-737. [PMID: 39156568 PMCID: PMC11330120 DOI: 10.1016/j.fmre.2024.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/10/2024] [Accepted: 02/18/2024] [Indexed: 08/20/2024] Open
Abstract
Drug discovery is costly and time consuming, and modern drug discovery endeavors are progressively reliant on computational methodologies, aiming to mitigate temporal and financial expenditures associated with the process. In particular, the time required for vaccine and drug discovery is prolonged during emergency situations such as the coronavirus 2019 pandemic. Recently, the performance of deep learning methods in drug virtual screening has been particularly prominent. It has become a concern for researchers how to summarize the existing deep learning in drug virtual screening, select different models for different drug screening problems, exploit the advantages of deep learning models, and further improve the capability of deep learning in drug virtual screening. This review first introduces the basic concepts of drug virtual screening, common datasets, and data representation methods. Then, large numbers of common deep learning methods for drug virtual screening are compared and analyzed. In addition, a dataset of different sizes is constructed independently to evaluate the performance of each deep learning model for the difficult problem of large-scale ligand virtual screening. Finally, the existing challenges and future directions in the field of virtual screening are presented.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Runhua Zhang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yaoyao Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Guozeng Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Zhiming Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| |
Collapse
|
6
|
Amorim AM, Piochi LF, Gaspar AT, Preto A, Rosário-Ferreira N, Moreira IS. Advancing Drug Safety in Drug Development: Bridging Computational Predictions for Enhanced Toxicity Prediction. Chem Res Toxicol 2024; 37:827-849. [PMID: 38758610 PMCID: PMC11187637 DOI: 10.1021/acs.chemrestox.3c00352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/29/2024] [Accepted: 05/07/2024] [Indexed: 05/19/2024]
Abstract
The attrition rate of drugs in clinical trials is generally quite high, with estimates suggesting that approximately 90% of drugs fail to make it through the process. The identification of unexpected toxicity issues during preclinical stages is a significant factor contributing to this high rate of failure. These issues can have a major impact on the success of a drug and must be carefully considered throughout the development process. These late-stage rejections or withdrawals of drug candidates significantly increase the costs associated with drug development, particularly when toxicity is detected during clinical trials or after market release. Understanding drug-biological target interactions is essential for evaluating compound toxicity and safety, as well as predicting therapeutic effects and potential off-target effects that could lead to toxicity. This will enable scientists to predict and assess the safety profiles of drug candidates more accurately. Evaluation of toxicity and safety is a critical aspect of drug development, and biomolecules, particularly proteins, play vital roles in complex biological networks and often serve as targets for various chemicals. Therefore, a better understanding of these interactions is crucial for the advancement of drug development. The development of computational methods for evaluating protein-ligand interactions and predicting toxicity is emerging as a promising approach that adheres to the 3Rs principles (replace, reduce, and refine) and has garnered significant attention in recent years. In this review, we present a thorough examination of the latest breakthroughs in drug toxicity prediction, highlighting the significance of drug-target binding affinity in anticipating and mitigating possible adverse effects. In doing so, we aim to contribute to the development of more effective and secure drugs.
Collapse
Affiliation(s)
- Ana M.
B. Amorim
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD
Programme in Biosciences, Department of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PURR.AI,
Rua Pedro Nunes, IPN Incubadora, Ed C, 3030-199 Coimbra, Portugal
| | - Luiz F. Piochi
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Ana T. Gaspar
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - António
J. Preto
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD Programme
in Experimental Biology and Biomedicine, Institute for Interdisciplinary
Research (IIIUC), University of Coimbra, Casa Costa Alemão, 3030-789 Coimbra, Portugal
| | - Nícia Rosário-Ferreira
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Irina S. Moreira
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| |
Collapse
|
7
|
Zhang Y, Li J, Lin S, Zhao J, Xiong Y, Wei DQ. An end-to-end method for predicting compound-protein interactions based on simplified homogeneous graph convolutional network and pre-trained language model. J Cheminform 2024; 16:67. [PMID: 38849874 PMCID: PMC11162000 DOI: 10.1186/s13321-024-00862-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 05/19/2024] [Indexed: 06/09/2024] Open
Abstract
Identification of interactions between chemical compounds and proteins is crucial for various applications, including drug discovery, target identification, network pharmacology, and elucidation of protein functions. Deep neural network-based approaches are becoming increasingly popular in efficiently identifying compound-protein interactions with high-throughput capabilities, narrowing down the scope of candidates for traditional labor-intensive, time-consuming and expensive experimental techniques. In this study, we proposed an end-to-end approach termed SPVec-SGCN-CPI, which utilized simplified graph convolutional network (SGCN) model with low-dimensional and continuous features generated from our previously developed model SPVec and graph topology information to predict compound-protein interactions. The SGCN technique, dividing the local neighborhood aggregation and nonlinearity layer-wise propagation steps, effectively aggregates K-order neighbor information while avoiding neighbor explosion and expediting training. The performance of the SPVec-SGCN-CPI method was assessed across three datasets and compared against four machine learning- and deep learning-based methods, as well as six state-of-the-art methods. Experimental results revealed that SPVec-SGCN-CPI outperformed all these competing methods, particularly excelling in unbalanced data scenarios. By propagating node features and topological information to the feature space, SPVec-SGCN-CPI effectively incorporates interactions between compounds and proteins, enabling the fusion of heterogeneity. Furthermore, our method scored all unlabeled data in ChEMBL, confirming the top five ranked compound-protein interactions through molecular docking and existing evidence. These findings suggest that our model can reliably uncover compound-protein interactions within unlabeled compound-protein pairs, carrying substantial implications for drug re-profiling and discovery. In summary, SPVec-SGCN demonstrates its efficacy in accurately predicting compound-protein interactions, showcasing potential to enhance target identification and streamline drug discovery processes.Scientific contributionsThe methodology presented in this work not only enables the comparatively accurate prediction of compound-protein interactions but also, for the first time, take sample imbalance which is very common in real world and computation efficiency into consideration simultaneously, accelerating the target identification and drug discovery process.
Collapse
Affiliation(s)
- Yufang Zhang
- School of Mathematical Sciences and SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, 200240, China
- Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China
- Zhongjing Research and Industrialization, Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, 473006, Henan, China
| | - Jiayi Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China
| | - Shenggeng Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China
| | - Jianwei Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China.
| | - Dong-Qing Wei
- Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China.
- Zhongjing Research and Industrialization, Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, 473006, Henan, China.
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai JiaoTong University, Shanghai, China.
| |
Collapse
|
8
|
Tian T, Li S, Zhang Z, Chen L, Zou Z, Zhao D, Zeng J. Benchmarking compound activity prediction for real-world drug discovery applications. Commun Chem 2024; 7:127. [PMID: 38834746 DOI: 10.1038/s42004-024-01204-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 05/16/2024] [Indexed: 06/06/2024] Open
Abstract
Identifying active compounds for target proteins is fundamental in early drug discovery. Recently, data-driven computational methods have demonstrated promising potential in predicting compound activities. However, there lacks a well-designed benchmark to comprehensively evaluate these methods from a practical perspective. To fill this gap, we propose a Compound Activity benchmark for Real-world Applications (CARA). Through carefully distinguishing assay types, designing train-test splitting schemes and selecting evaluation metrics, CARA can consider the biased distribution of current real-world compound activity data and avoid overestimation of model performances. We observed that although current models can make successful predictions for certain proportions of assays, their performances varied across different assays. In addition, evaluation of several few-shot training strategies demonstrated different performances related to task types. Overall, we provide a high-quality dataset for developing and evaluating compound activity prediction models, and the analyses in this work may inspire better applications of data-driven models in drug discovery.
Collapse
Affiliation(s)
- Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Ziting Zhang
- Department of Automation, Tsinghua University, Beijing, China
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing, China
| | - Lin Chen
- Silexon AI Technology Co., Ltd., Nanjing, Jiangsu Province, China
| | - Ziheng Zou
- Silexon AI Technology Co., Ltd., Nanjing, Jiangsu Province, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
- School of Engineering, Westlake University, Hangzhou, Zhejiang Province, China.
| |
Collapse
|
9
|
Kairys V, Baranauskiene L, Kazlauskiene M, Zubrienė A, Petrauskas V, Matulis D, Kazlauskas E. Recent advances in computational and experimental protein-ligand affinity determination techniques. Expert Opin Drug Discov 2024; 19:649-670. [PMID: 38715415 DOI: 10.1080/17460441.2024.2349169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024]
Abstract
INTRODUCTION Modern drug discovery revolves around designing ligands that target the chosen biomolecule, typically proteins. For this, the evaluation of affinities of putative ligands is crucial. This has given rise to a multitude of dedicated computational and experimental methods that are constantly being developed and improved. AREAS COVERED In this review, the authors reassess both the industry mainstays and the newest trends among the methods for protein - small-molecule affinity determination. They discuss both computational affinity predictions and experimental techniques, describing their basic principles, main limitations, and advantages. Together, this serves as initial guide to the currently most popular and cutting-edge ligand-binding assays employed in rational drug design. EXPERT OPINION The affinity determination methods continue to develop toward miniaturization, high-throughput, and in-cell application. Moreover, the availability of data analysis tools has been constantly increasing. Nevertheless, cross-verification of data using at least two different techniques and careful result interpretation remain of utmost importance.
Collapse
Affiliation(s)
- Visvaldas Kairys
- Department of Bioinformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Lina Baranauskiene
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | | | - Asta Zubrienė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Vytautas Petrauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Daumantas Matulis
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Egidijus Kazlauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
10
|
Tan D, Jiang H, Li H, Xie Y, Su Y. Prediction of drug-protein interaction based on dual channel neural networks with attention mechanism. Brief Funct Genomics 2024; 23:286-294. [PMID: 37642213 DOI: 10.1093/bfgp/elad037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 07/16/2023] [Accepted: 08/08/2023] [Indexed: 08/31/2023] Open
Abstract
The precise identification of drug-protein inter action (DPI) can significantly speed up the drug discovery process. Bioassay methods are time-consuming and expensive to screen for each pair of drug proteins. Machine-learning-based methods cannot accurately predict a large number of DPIs. Compared with traditional computing methods, deep learning methods need less domain knowledge and have strong data learning ability. In this study, we construct a DPI prediction model based on dual channel neural networks with an efficient path attention mechanism, called DCA-DPI. The drug molecular graph and protein sequence are used as the data input of the model, and the residual graph neural network and the residual convolution network are used to learn the feature representation of the drug and protein, respectively, to obtain the feature vector of the drug and the hidden vector of protein. To get a more accurate protein feature vector, the weighted sum of the hidden vector of protein is applied using the neural attention mechanism. In the end, drug and protein vectors are concatenated and input into the full connection layer for classification. In order to evaluate the performance of DCA-DPI, three widely used public data, Human, C.elegans and DUD-E, are used in the experiment. The evaluation metrics values in the experiment are superior to other relevant methods. Experiments show that our model is efficient for DPI prediction.
Collapse
Affiliation(s)
- Dayu Tan
- Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, 230601, Hefei, China
| | - Haijun Jiang
- Key Laboratory of Intelligent Computing and Signal Processing, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Hefei, China
| | - Haitao Li
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601, Hefei, China
| | - Ying Xie
- School of Mechanical, Electrical and Information Engineering, Putian University, China
| | - Yansen Su
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601, Hefei, China
| |
Collapse
|
11
|
Tang R, Guo H, Chen JQ, Huang C, Kong XX, Cao L, Wan FH, Han RC. Tandemly expanded OR17b in Himalaya ghost moth facilitates larval food allocation via olfactory reception of plant-derived tricosane. Int J Biol Macromol 2024; 268:131503. [PMID: 38663697 DOI: 10.1016/j.ijbiomac.2024.131503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 04/07/2024] [Accepted: 04/08/2024] [Indexed: 04/30/2024]
Abstract
Herbivorous insects utilize intricate olfactory mechanisms to locate food plants. The chemical communication of insect-plant in primitive lineage offers insights into evolutionary milestones of divergent olfactory modalities. Here, we focus on a system endemic to the Qinghai-Tibetan Plateau to unravel the chemical and molecular basis of food preference in ancestral Lepidoptera. We conducted volatile profiling, neural electrophysiology, and chemotaxis assays with a panel of host plant organs to identify attractants for Himalaya ghost moth Thitarodes xiaojinensis larvae, the primitive host of medicinal Ophiocordyceps sinensis fungus. Using a DREAM approach based on odorant induced transcriptomes and subsequent deorphanization tests, we elucidated the odorant receptors responsible for coding bioactive volatiles. Contrary to allocation signals in most plant-feeding insects, T. xiaojinensis larvae utilize tricosane from the bulbil as the main attractant for locating native host plant. We deorphanized a TxiaOR17b, an indispensable odorant receptor resulting from tandem duplication of OR17, for transducing olfactory signals in response to tricosane. The discovery of this ligand-receptor pair suggests a survival strategy based on food location via olfaction in ancestral Lepidoptera, which synchronizes both plant asexual reproduction and peak hatch periods of insect larvae.
Collapse
Affiliation(s)
- Rui Tang
- Guangdong Key Laboratory of Animal Conservation and Resource Utilization, Guangdong Public Laboratory of Wild Animal Conservation and Utilization, Institute of Zoology, Guangdong Academy of Sciences, Guangzhou 510260, China
| | - Hao Guo
- College of Life Science, Institute of life Science and Green Development, Hebei University, Baoding 071002, China
| | - Jia-Qi Chen
- Guangdong Key Laboratory of Animal Conservation and Resource Utilization, Guangdong Public Laboratory of Wild Animal Conservation and Utilization, Institute of Zoology, Guangdong Academy of Sciences, Guangzhou 510260, China
| | - Cong Huang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Xiang-Xin Kong
- Guangdong Key Laboratory of Animal Conservation and Resource Utilization, Guangdong Public Laboratory of Wild Animal Conservation and Utilization, Institute of Zoology, Guangdong Academy of Sciences, Guangzhou 510260, China
| | - Li Cao
- Guangdong Key Laboratory of Animal Conservation and Resource Utilization, Guangdong Public Laboratory of Wild Animal Conservation and Utilization, Institute of Zoology, Guangdong Academy of Sciences, Guangzhou 510260, China
| | - Fang-Hao Wan
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China; Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; College of Plant Health and Medicine, Qingdao Agricultural University, Qingdao 266109, China
| | - Ri-Chou Han
- Guangdong Key Laboratory of Animal Conservation and Resource Utilization, Guangdong Public Laboratory of Wild Animal Conservation and Utilization, Institute of Zoology, Guangdong Academy of Sciences, Guangzhou 510260, China.
| |
Collapse
|
12
|
Wang X, Quinn D, Moody TS, Huang M. ALDELE: All-Purpose Deep Learning Toolkits for Predicting the Biocatalytic Activities of Enzymes. J Chem Inf Model 2024; 64:3123-3139. [PMID: 38573056 PMCID: PMC11040732 DOI: 10.1021/acs.jcim.4c00058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 02/15/2024] [Accepted: 03/11/2024] [Indexed: 04/05/2024]
Abstract
Rapidly predicting enzyme properties for catalyzing specific substrates is essential for identifying potential enzymes for industrial transformations. The demand for sustainable production of valuable industry chemicals utilizing biological resources raised a pressing need to speed up biocatalyst screening using machine learning techniques. In this research, we developed an all-purpose deep-learning-based multiple-toolkit (ALDELE) workflow for screening enzyme catalysts. ALDELE incorporates both structural and sequence representations of proteins, alongside representations of ligands by subgraphs and overall physicochemical properties. Comprehensive evaluation demonstrated that ALDELE can predict the catalytic activities of enzymes, and particularly, it identifies residue-based hotspots to guide enzyme engineering and generates substrate heat maps to explore the substrate scope for a given biocatalyst. Moreover, our models notably match empirical data, reinforcing the practicality and reliability of our approach through the alignment with confirmed mutation sites. ALDELE offers a facile and comprehensive solution by integrating different toolkits tailored for different purposes at affordable computational cost and therefore would be valuable to speed up the discovery of new functional enzymes for their exploitation by the industry.
Collapse
Affiliation(s)
- Xiangwen Wang
- School
of Chemistry and Chemical Engineering, Queen’s
University Belfast, Belfast BT9 5AG, Northern Ireland, U.K.
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, Craigavon BT63 5QD, Northern Ireland, U.K.
| | - Derek Quinn
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, Craigavon BT63 5QD, Northern Ireland, U.K.
| | - Thomas S. Moody
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, Craigavon BT63 5QD, Northern Ireland, U.K.
- Arran
Chemical Company Limited, Unit 1 Monksland Industrial Estate, Athlone,
Co., Roscommon N37 DN24, Ireland
| | - Meilan Huang
- School
of Chemistry and Chemical Engineering, Queen’s
University Belfast, Belfast BT9 5AG, Northern Ireland, U.K.
| |
Collapse
|
13
|
Zhong KY, Wen ML, Meng FF, Li X, Jiang B, Zeng X, Li Y. MMDTA: A Multimodal Deep Model for Drug-Target Affinity with a Hybrid Fusion Strategy. J Chem Inf Model 2024; 64:2878-2888. [PMID: 37610162 DOI: 10.1021/acs.jcim.3c00866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
The prediction of the drug-target affinity (DTA) plays an important role in evaluating molecular druggability. Although deep learning-based models for DTA prediction have been extensively attempted, there are rare reports on multimodal models that leverage various fusion strategies to exploit heterogeneous information from multiple different modalities of drugs and targets. In this study, we proposed a multimodal deep model named MMDTA, which integrated the heterogeneous information from various modalities of drugs and targets using a hybrid fusion strategy to enhance DTA prediction. To achieve this, MMDTA first employed convolutional neural networks (CNNs) and graph convolutional networks (GCNs) to extract diverse heterogeneous information from the sequences and structures of drugs and targets. It then utilized a hybrid fusion strategy to combine and complement the extracted heterogeneous information, resulting in the fused modal information for predicting drug-target affinity through the fully connected (FC) layers. Experimental results demonstrated that MMDTA outperformed the competitive state-of-the-art deep learning models on the widely used benchmark data sets, particularly with a significantly improved key evaluation metric, Root Mean Square Error (RMSE). Furthermore, MMDTA exhibited excellent generalization and practical application performance on multiple different data sets. These findings highlighted MMDTA's accuracy and reliability in predicting the drug-target binding affinity. For researchers interested in the source data and code, they are accessible at http://github.com/dldxzx/MMDTA.
Collapse
Affiliation(s)
- Kai-Yang Zhong
- College of Mathematics and Computer Science, Dali University, Dali 671003, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resource in Yunnan, Yunnan University, Kunming 650000, China
| | - Fan-Fang Meng
- College of Mathematics and Computer Science, Dali University, Dali 671003, China
| | - Xin Li
- College of Mathematics and Computer Science, Dali University, Dali 671003, China
| | - Bei Jiang
- Yunnan Key Laboratory of Screening and Research on Anti-pathogenic Plant Resources from Western Yunnan, Dali University, Dali 671000, China
| | - Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali 671003, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali 671003, China
| |
Collapse
|
14
|
Zhang X, Gao H, Wang H, Chen Z, Zhang Z, Chen X, Li Y, Qi Y, Wang R. PLANET: A Multi-objective Graph Neural Network Model for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2024; 64:2205-2220. [PMID: 37319418 DOI: 10.1021/acs.jcim.3c00253] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Predicting protein-ligand binding affinity is a central issue in drug design. Various deep learning models have been published in recent years, where many of them rely on 3D protein-ligand complex structures as input and tend to focus on the single task of reproducing binding affinity. In this study, we have developed a graph neural network model called PLANET (Protein-Ligand Affinity prediction NETwork). This model takes the graph-represented 3D structure of the binding pocket on the target protein and the 2D chemical structure of the ligand molecule as input. It was trained through a multi-objective process with three related tasks, including deriving the protein-ligand binding affinity, protein-ligand contact map, and ligand distance matrix. Besides the protein-ligand complexes with known binding affinity data retrieved from the PDBbind database, a large number of non-binder decoys were also added to the training data for deriving the final model of PLANET. When tested on the CASF-2016 benchmark, PLANET exhibited a scoring power comparable to the best result yielded by other deep learning models as well as a reasonable ranking power and docking power. In virtual screening trials conducted on the DUD-E benchmark, PLANET's performance was notably better than several deep learning and machine learning models. As on the LIT-PCBA benchmark, PLANET achieved comparable accuracy as the conventional docking program Glide, but it only spent less than 1% of Glide's computation time to finish the same job because PLANET did not need exhaustive conformational sampling. Considering the decent accuracy and efficiency of PLANET in binding affinity prediction, it may become a useful tool for conducting large-scale virtual screening.
Collapse
Affiliation(s)
- Xiangying Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Haotian Gao
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Haojie Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Zhihang Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Zhe Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Xinchong Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Yan Li
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Yifei Qi
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Renxiao Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| |
Collapse
|
15
|
Li P, Jiang Z, Liu T, Liu X, Qiao H, Yao X. Improving drug response prediction via integrating gene relationships with deep learning. Brief Bioinform 2024; 25:bbae153. [PMID: 38600666 PMCID: PMC11006795 DOI: 10.1093/bib/bbae153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 03/05/2024] [Accepted: 03/18/2024] [Indexed: 04/12/2024] Open
Abstract
Predicting the drug response of cancer cell lines is crucial for advancing personalized cancer treatment, yet remains challenging due to tumor heterogeneity and individual diversity. In this study, we present a deep learning-based framework named Deep neural network Integrating Prior Knowledge (DIPK) (DIPK), which adopts self-supervised techniques to integrate multiple valuable information, including gene interaction relationships, gene expression profiles and molecular topologies, to enhance prediction accuracy and robustness. We demonstrated the superior performance of DIPK compared to existing methods on both known and novel cells and drugs, underscoring the importance of gene interaction relationships in drug response prediction. In addition, DIPK extends its applicability to single-cell RNA sequencing data, showcasing its capability for single-cell-level response prediction and cell identification. Further, we assess the applicability of DIPK on clinical data. DIPK accurately predicted a higher response to paclitaxel in the pathological complete response (pCR) group compared to the residual disease group, affirming the better response of the pCR group to the chemotherapy compound. We believe that the integration of DIPK into clinical decision-making processes has the potential to enhance individualized treatment strategies for cancer patients.
Collapse
Affiliation(s)
- Pengyong Li
- School of Computer Science and Technology,Xidian University, 710126 Xi’an, Shaanxi, China
- State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, 519020 Macau, China
| | - Zhengxiang Jiang
- School of Electronic Engineering, Xidian University, 710126 Xi’an, Shaanxi, China
| | - Tianxiao Liu
- School of Computer Science and Technology,Xidian University, 710126 Xi’an, Shaanxi, China
| | - Xinyu Liu
- Beijing Laboratory of Biomedical Materials, Department of Geriatric Dentistry, Peking University School and Hospital of Stomatology, 100081 Beijing, China
| | - Hui Qiao
- Department of Oncology, Tai’an Municipal Hospital, 271021 Tai’an, Shandong, China
| | - Xiaojun Yao
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, 999078 Macao, China
| |
Collapse
|
16
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
17
|
Kumar YB, Kumar N, John L, Mahanta HJ, Vaikundamani S, Nagamani S, Sastry GM, Sastry GN. Analyzing the cation-aromatic interactions in proteins: Cation-aromatic database V2.0. Proteins 2024; 92:179-191. [PMID: 37789571 DOI: 10.1002/prot.26600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/17/2023] [Accepted: 09/07/2023] [Indexed: 10/05/2023]
Abstract
The cation-aromatic database (CAD) is a comprehensive repository of cation-aromatic motifs found in experimentally determined protein structures, first reported in 2007 [Proteins, 2007, 67, 1179]. The present article is an update of CAD that contains information of approximately 27.26 million cation-aromatic motifs. CAD uses three distance parameters (r, d1, and d2) to determine the position of the cation relative to the centroid of the aromatic residue and classifies the motifs as cation-π or cation-σ interactions. As of June 2023, about 193 936 protein structures were retrieved from Protein Data Bank, and this resulted in the identification of an impressive number of 27 255 817 cation-aromatic motifs. Among these motifs, spherical motifs constituted 94.09%, while cylindrical motifs made up the remaining 5.91%. When considering the interaction of metal ions with aromatic residues, 965 564 motifs are identified. Remarkably, 82.08% of these motifs involved the binding of metal ions to the amino acid HIS. Moreover, the analysis of binding preferences between cations and aromatic residues revealed that the HIS-HIS, PHE-ARG, and TRP-ARG pairs exhibited a preferential geometry. The motif pair HIS-HIS was the most prevalent, accounting for 19.87% of the total, closely followed by TYR-LYS at 10.17%. Conversely, the motif pair TRP-HIS had the lowest occurrence, representing only 4.20% of the total. The data generated help in revealing the characteristics and biological functions of cation-aromatic interactions in biological molecules. The updated version of CAD (Cation-Aromatic Database V2.0) can be accessed at https://acds.neist.res.in/cadv2.
Collapse
Affiliation(s)
- Y Bhargav Kumar
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat, Assam, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Nandan Kumar
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat, Assam, India
| | - Lijo John
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat, Assam, India
| | - Hridoy Jyoti Mahanta
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat, Assam, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - S Vaikundamani
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat, Assam, India
| | - Selvaraman Nagamani
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat, Assam, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | | | - G Narahari Sastry
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat, Assam, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| |
Collapse
|
18
|
Yi C, Taylor ML, Ziebarth J, Wang Y. Predictive Models and Impact of Interfacial Contacts and Amino Acids on Protein-Protein Binding Affinity. ACS OMEGA 2024; 9:3454-3468. [PMID: 38284090 PMCID: PMC10809705 DOI: 10.1021/acsomega.3c06996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/11/2023] [Accepted: 12/14/2023] [Indexed: 01/30/2024]
Abstract
Protein-protein interactions (PPIs) play a central role in nearly all cellular processes. The strength of the binding in a PPI is characterized by the binding affinity (BA) and is a key factor in controlling protein-protein complex formation and defining the structure-function relationship. Despite advancements in understanding protein-protein binding, much remains unknown about the interfacial region and its association with BA. New models are needed to predict BA with improved accuracy for therapeutic design. Here, we use machine learning approaches to examine how well different types of interfacial contacts can be used to predict experimentally determined BA and to reveal the impact of the specific amino acids at the binding interface on BA. We create a series of multivariate linear regression models incorporating different contact features at both residue and atomic levels and examine how different methods of identifying and characterizing these properties impact the performance of these models. Particularly, we introduce a new and simple approach to predict BA based on the quantities of specific amino acids at the protein-protein interface. We found that the numbers of specific amino acids at the protein-protein interface were correlated with BA. We show that the interfacial numbers of amino acids can be used to produce models with consistently good performance across different data sets, indicating the importance of the identities of interfacial amino acids in underlying BA. When trained on a diverse set of complexes from two benchmark data sets, the best performing BA model was generated with an explicit linear equation involving six amino acids. Tyrosine, in particular, was identified as the key amino acid in controlling BA, as it had the strongest correlation with BA and was consistently identified as the most important amino acid in feature importance studies. Glycine and serine were identified as the next two most important amino acids in predicting BA. The results from this study further our understanding of PPIs and can be used to make improved predictions of BA, giving them implications for drug design and screening in the pharmaceutical industry.
Collapse
Affiliation(s)
- Carey
Huang Yi
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| | - Mitchell Lee Taylor
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| | - Jesse Ziebarth
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| | - Yongmei Wang
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| |
Collapse
|
19
|
Zhang C, Zang T, Zhao T. KGE-UNIT: toward the unification of molecular interactions prediction based on knowledge graph and multi-task learning on drug discovery. Brief Bioinform 2024; 25:bbae043. [PMID: 38348746 PMCID: PMC10939374 DOI: 10.1093/bib/bbae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 12/29/2023] [Accepted: 01/23/2024] [Indexed: 02/15/2024] Open
Abstract
The prediction of molecular interactions is vital for drug discovery. Existing methods often focus on individual prediction tasks and overlook the relationships between them. Additionally, certain tasks encounter limitations due to insufficient data availability, resulting in limited performance. To overcome these limitations, we propose KGE-UNIT, a unified framework that combines knowledge graph embedding (KGE) and multi-task learning, for simultaneous prediction of drug-target interactions (DTIs) and drug-drug interactions (DDIs) and enhancing the performance of each task, even when data availability is limited. Via KGE, we extract heterogeneous features from the drug knowledge graph to enhance the structural features of drug and protein nodes, thereby improving the quality of features. Additionally, employing multi-task learning, we introduce an innovative predictor that comprises the task-aware Convolutional Neural Network-based (CNN-based) encoder and the task-aware attention decoder which can fuse better multimodal features, capture the contextual interactions of molecular tasks and enhance task awareness, leading to improved performance. Experiments on two imbalanced datasets for DTIs and DDIs demonstrate the superiority of KGE-UNIT, achieving high area under the receiver operating characteristics curves (AUROCs) (0.942, 0.987) and area under the precision-recall curve ( AUPRs) (0.930, 0.980) for DTIs and high AUROCs (0.975, 0.989) and AUPRs (0.966, 0.988) for DDIs. Notably, on the LUO dataset where the data were more limited, KGE-UNIT exhibited a more pronounced improvement, with increases of 4.32$\%$ in AUROC and 3.56$\%$ in AUPR for DTIs and 6.56$\%$ in AUROC and 8.17$\%$ in AUPR for DDIs. The scalability of KGE-UNIT is demonstrated through its extension to protein-protein interactions prediction, ablation studies and case studies further validate its effectiveness.
Collapse
Affiliation(s)
- Chengcheng Zhang
- Department of Computer Science, Harbin Institute of Technology, Harbin, 150001, China
| | - Tianyi Zang
- Department of Computer Science, Harbin Institute of Technology, Harbin, 150001, China
| | - Tianyi Zhao
- School of Medicine and Health, Harbin Institute of Technology, Harbin, 150001, China
| |
Collapse
|
20
|
Yin Z, Chen Y, Hao Y, Pandiyan S, Shao J, Wang L. FOTF-CPI: A compound-protein interaction prediction transformer based on the fusion of optimal transport fragments. iScience 2024; 27:108756. [PMID: 38230261 PMCID: PMC10790010 DOI: 10.1016/j.isci.2023.108756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 11/05/2023] [Accepted: 12/13/2023] [Indexed: 01/18/2024] Open
Abstract
Compound-protein interaction (CPI) affinity prediction plays an important role in reducing the cost and time of drug discovery. However, the interpretability of how fragments function in CPI is impacted by the fact that current methods ignore the affinity relationships between fragments of compounds and fragments of proteins in CPI modeling. This article introduces an improved Transformer called FOTF-CPI (a Fusion of Optimal Transport Fragments compound-protein interaction prediction model). We use an optimal transport-based fragmentation approach to improve the model's understanding of compound and protein sequences. Additionally, a fused attention mechanism is employed, which combines the features of fragments to capture full affinity information. This fused attention redistributes higher attention scores to fragments with higher affinity. Experimental results show FOTF-CPI achieves an average 2% higher performance than other models on all three datasets. Furthermore, the visualization confirms the potential of FOTF-CPI for drug discovery applications.
Collapse
Affiliation(s)
- Zeyu Yin
- School of Information Science and Technology, Nantong University, Nantong 226001, China
| | - Yu Chen
- School of Information Science and Technology, Nantong University, Nantong 226001, China
| | - Yajie Hao
- School of Information Science and Technology, Nantong University, Nantong 226001, China
| | - Sanjeevi Pandiyan
- Research Center for Intelligent Information Technology, Nantong University, Nantong 226001, China
| | - Jinsong Shao
- School of Information Science and Technology, Nantong University, Nantong 226001, China
| | - Li Wang
- School of Information Science and Technology, Nantong University, Nantong 226001, China
- Research Center for Intelligent Information Technology, Nantong University, Nantong 226001, China
| |
Collapse
|
21
|
Guo J. Improving structure-based protein-ligand affinity prediction by graph representation learning and ensemble learning. PLoS One 2024; 19:e0296676. [PMID: 38232063 DOI: 10.1371/journal.pone.0296676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 12/15/2023] [Indexed: 01/19/2024] Open
Abstract
Predicting protein-ligand binding affinity presents a viable solution for accelerating the discovery of new lead compounds. The recent widespread application of machine learning approaches, especially graph neural networks, has brought new advancements in this field. However, some existing structure-based methods treat protein macromolecules and ligand small molecules in the same way and ignore the data heterogeneity, potentially leading to incomplete exploration of the biochemical information of ligands. In this work, we propose LGN, a graph neural network-based fusion model with extra ligand feature extraction to effectively capture local features and global features within the protein-ligand complex, and make use of interaction fingerprints. By combining the ligand-based features and interaction fingerprints, LGN achieves Pearson correlation coefficients of up to 0.842 on the PDBbind 2016 core set, compared to 0.807 when using the features of complex graphs alone. Finally, we verify the rationalization and generalization of our model through comprehensive experiments. We also compare our model with state-of-the-art baseline methods, which validates the superiority of our model. To reduce the impact of data similarity, we increase the robustness of the model by incorporating ensemble learning.
Collapse
Affiliation(s)
- Jia Guo
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Beijing, P.R. China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| |
Collapse
|
22
|
Wu H, Liu J, Jiang T, Zou Q, Qi S, Cui Z, Tiwari P, Ding Y. AttentionMGT-DTA: A multi-modal drug-target affinity prediction using graph transformer and attention mechanism. Neural Netw 2024; 169:623-636. [PMID: 37976593 DOI: 10.1016/j.neunet.2023.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 09/29/2023] [Accepted: 11/07/2023] [Indexed: 11/19/2023]
Abstract
The accurate prediction of drug-target affinity (DTA) is a crucial step in drug discovery and design. Traditional experiments are very expensive and time-consuming. Recently, deep learning methods have achieved notable performance improvements in DTA prediction. However, one challenge for deep learning-based models is appropriate and accurate representations of drugs and targets, especially the lack of effective exploration of target representations. Another challenge is how to comprehensively capture the interaction information between different instances, which is also important for predicting DTA. In this study, we propose AttentionMGT-DTA, a multi-modal attention-based model for DTA prediction. AttentionMGT-DTA represents drugs and targets by a molecular graph and binding pocket graph, respectively. Two attention mechanisms are adopted to integrate and interact information between different protein modalities and drug-target pairs. The experimental results showed that our proposed model outperformed state-of-the-art baselines on two benchmark datasets. In addition, AttentionMGT-DTA also had high interpretability by modeling the interaction strength between drug atoms and protein residues. Our code is available at https://github.com/JK-Liu7/AttentionMGT-DTA.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China; Yangtze Delta Region Institute(Quzhou), University of Electronic Science and Technology of China, Quzhou, 324003, China.
| | - Tengsheng Jiang
- Gusu School, Nanjing Medical University, Suzhou, 215009, China.
| | - Quan Zou
- Yangtze Delta Region Institute(Quzhou), University of Electronic Science and Technology of China, Quzhou, 324003, China.
| | - Shujie Qi
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Zhiming Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Prayag Tiwari
- School of Information Technology, Halmstad University, Sweden.
| | - Yijie Ding
- Yangtze Delta Region Institute(Quzhou), University of Electronic Science and Technology of China, Quzhou, 324003, China.
| |
Collapse
|
23
|
Sharma V, Singh A, Chauhan S, Sharma PK, Chaudhary S, Sharma A, Porwal O, Fuloria NK. Role of Artificial Intelligence in Drug Discovery and Target Identification in Cancer. Curr Drug Deliv 2024; 21:870-886. [PMID: 37670704 DOI: 10.2174/1567201821666230905090621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 03/08/2023] [Accepted: 03/24/2023] [Indexed: 09/07/2023]
Abstract
Drug discovery and development (DDD) is a highly complex process that necessitates precise monitoring and extensive data analysis at each stage. Furthermore, the DDD process is both timeconsuming and costly. To tackle these concerns, artificial intelligence (AI) technology can be used, which facilitates rapid and precise analysis of extensive datasets within a limited timeframe. The pathophysiology of cancer disease is complicated and requires extensive research for novel drug discovery and development. The first stage in the process of drug discovery and development involves identifying targets. Cell structure and molecular functioning are complex due to the vast number of molecules that function constantly, performing various roles. Furthermore, scientists are continually discovering novel cellular mechanisms and molecules, expanding the range of potential targets. Accurately identifying the correct target is a crucial step in the preparation of a treatment strategy. Various forms of AI, such as machine learning, neural-based learning, deep learning, and network-based learning, are currently being utilised in applications, online services, and databases. These technologies facilitate the identification and validation of targets, ultimately contributing to the success of projects. This review focuses on the different types and subcategories of AI databases utilised in the field of drug discovery and target identification for cancer.
Collapse
Affiliation(s)
- Vishal Sharma
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Amit Singh
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Sanjana Chauhan
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Pramod Kumar Sharma
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Shubham Chaudhary
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Astha Sharma
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Omji Porwal
- Department of Pharmacognosy, Faculty of Pharmacy, Tishk International University, Erbil 44001, Iraq
| | | |
Collapse
|
24
|
Zhao M, Xu SX, Yang Y, Yuan M. GGNpTCR: A Generative Graph Structure Neural Network for Predicting Immunogenic Peptides for T-cell Immune Response. J Chem Inf Model 2023; 63:7557-7567. [PMID: 37990917 DOI: 10.1021/acs.jcim.3c01293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Identifying the interactions between T-cell receptor (TCRs) and human antigens is a crucial step in developing new vaccines, diagnostics, and immunotherapy. Current methods primarily focus on learning binding patterns from known TCR binding repertoires by using sequence information alone without considering the binding specificity of new antigens or exogenous peptides that have not appeared in the training set. Furthermore, the spatial structure of antigens plays a critical role in immune studies and immunotherapy, which should be addressed properly in the identification of interacting TCR-antigen pairs. In this study, we introduced a novel deep learning framework based on generative graph structures, GGNpTCR, for predicting interactions between TCR and peptides from sequence information. Results of real data analysis indicate that our model achieved excellent prediction for new antigens unseen in the training data set, making significant improvements compared to existing methods. We also applied the model to a large COVID-19 data set with no antigens in the training data set, and the improvement was also significant. Furthermore, through incorporation of additional supervised mechanisms, GGNpTCR demonstrated the ability to precisely forecast the locations of peptide-TCR interactions within 3D configurations. This enhancement substantially improved the model's interpretability. In summary, based on the performance on multiple data sets, GGNpTCR has made significant progress in terms of performance, universality, and interpretability.
Collapse
Affiliation(s)
- Minghua Zhao
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | - Steven X Xu
- Genmab US, Inc., Princeton, New Jersey 08540, United States
| | - Yaning Yang
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | - Min Yuan
- School of Public Health Administration, Anhui Medical University, Hefei 230032, China
| |
Collapse
|
25
|
Li H, Wang S, Zheng W, Yu L. Multi-dimensional search for drug-target interaction prediction by preserving the consistency of attention distribution. Comput Biol Chem 2023; 107:107968. [PMID: 37844375 DOI: 10.1016/j.compbiolchem.2023.107968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 09/27/2023] [Accepted: 10/05/2023] [Indexed: 10/18/2023]
Abstract
Predicting drug-target interaction (DTI) is a crucial step in the process of drug repurposing and new drug development. Although the attention mechanism has been widely used to capture the interactions between drugs and targets, it mainly uses the Simplified Molecular Input Line Entry System (SMILES) and two-dimensional (2D) molecular graph features of drugs. In this paper, we propose a neural network model called MdDTI for DTI prediction. The model searches for binding sites that may interact with the target from the multiple dimensions of drug structure, namely the 2D substructures and the three-dimensional (3D) spatial structure. For the 2D substructures, we have developed a novel substructure decomposition strategy based on drug molecular graphs and compared its performance with the SMILES-based decomposition method. For the 3D spatial structure of drugs, we constructed spatial feature representation matrices for drugs based on the Cartesian coordinates of heavy atoms (without hydrogen atoms) in each drug. Finally, to ensure the search results of the model are consistent across multiple dimensions, we construct a consistency loss function. We evaluate MdDTI on four drug-target interaction datasets and three independent compound-protein affinity test sets. The results indicate that our model surpasses a series of state-of-the-art models. Case studies demonstrate that our model is capable of capturing the potential binding regions between drugs and targets, and it shows efficacy in drug repurposing. Our code is available at https://github.com/lhhu1999/MdDTI.
Collapse
Affiliation(s)
- Huaihu Li
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China; The Key Lab of Intelligent Systems and Computing of Yunnan Province, Yunnan University, Kunming, Yunnan, China.
| | - Weihua Zheng
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Li Yu
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| |
Collapse
|
26
|
Wang M, Li W, Yu X, Luo Y, Han K, Wang C, Jin Q. AffinityVAE: A multi-objective model for protein-ligand affinity prediction and drug design. Comput Biol Chem 2023; 107:107971. [PMID: 37852036 DOI: 10.1016/j.compbiolchem.2023.107971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 09/23/2023] [Accepted: 10/08/2023] [Indexed: 10/20/2023]
Abstract
In the prediction of protein-ligand affinity, the traditional methods require a large amount of computing resources, and have certain limitations in predicting and simulating the structural changes. Although employing data-driven approaches can yield favorable outcomes in deep learning, it entails a lack of interpretability. Some methods may require additional structural information or domain knowledge to support the interpretation, which may limit their applicability. This paper proposes an affinity variational autoencoder (AffinityVAE) using interaction feature mapping and a variational autoencoder, which consists of a multi-objective model capable of end-to-end affinity prediction and drug discovery. In this study, the limitations of affinity prediction in terms of interpretability are tackled by proposing the concept of a protein-ligand interaction feature map. This increases the diversity and quantity of protein-ligand binding data by designing an adaptive autoencoder of target chemical properties to generate new ligands similar to known ligands and adding them to the original training set. AffinityVAE is then retrained using this extended training set to further validate the protein-ligand binding affinity prediction. Comparisons were conducted between the AffinityVAE and recent methods to demonstrate the high efficiency of the proposed model. The experimental results show that AffinityVAE has very high prediction performance, and it has the potential to enhance the diversity and the amount of protein-ligand binding data, which promotes the drug development.
Collapse
Affiliation(s)
- Mengying Wang
- School of Computer Engineering and Science, Shanghai University, Shanghai, China.
| | - Weimin Li
- School of Computer Engineering and Science, Shanghai University, Shanghai, China.
| | - Xiao Yu
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Yin Luo
- School of Life Sciences, East China Normal University, China
| | - Ke Han
- Medical and Health Center, Liaocheng People's Hospital, LiaoCheng, China.
| | - Can Wang
- School of Information and Communication Technology, Griffith University, Australia
| | - Qun Jin
- Networked Information System Laboratory, Waseda University, Tokyo, Japan
| |
Collapse
|
27
|
Luo Y, Liu Y, Peng J. Calibrated geometric deep learning improves kinase-drug binding predictions. NAT MACH INTELL 2023; 5:1390-1401. [PMID: 38962391 PMCID: PMC11221792 DOI: 10.1038/s42256-023-00751-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 09/29/2023] [Indexed: 07/05/2024]
Abstract
Protein kinases regulate various cellular functions and hold significant pharmacological promise in cancer and other diseases. Although kinase inhibitors are one of the largest groups of approved drugs, much of the human kinome remains unexplored but potentially druggable. Computational approaches, such as machine learning, offer efficient solutions for exploring kinase-compound interactions and uncovering novel binding activities. Despite the increasing availability of three-dimensional (3D) protein and compound structures, existing methods predominantly focus on exploiting local features from one-dimensional protein sequences and two-dimensional molecular graphs to predict binding affinities, overlooking the 3D nature of the binding process. Here we present KDBNet, a deep learning algorithm that incorporates 3D protein and molecule structure data to predict binding affinities. KDBNet uses graph neural networks to learn structure representations of protein binding pockets and drug molecules, capturing the geometric and spatial characteristics of binding activity. In addition, we introduce an algorithm to quantify and calibrate the uncertainties of KDBNet's predictions, enhancing its utility in model-guided discovery in chemical or protein space. Experiments demonstrated that KDBNet outperforms existing deep learning models in predicting kinase-drug binding affinities. The uncertainties estimated by KDBNet are informative and well-calibrated with respect to prediction errors. When integrated with a Bayesian optimization framework, KDBNet enables data-efficient active learning and accelerates the exploration and exploitation of diverse high-binding kinase-drug pairs.
Collapse
Affiliation(s)
- Yunan Luo
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
- These authors contributed equally: Yunan Luo, Yang Liu
| | - Yang Liu
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA
- These authors contributed equally: Yunan Luo, Yang Liu
| | - Jian Peng
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
28
|
Li Y, Fan Z, Rao J, Chen Z, Chu Q, Zheng M, Li X. An overview of recent advances and challenges in predicting compound-protein interaction (CPI). MEDICAL REVIEW (2021) 2023; 3:465-486. [PMID: 38282802 PMCID: PMC10808869 DOI: 10.1515/mr-2023-0030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 08/30/2023] [Indexed: 01/30/2024]
Abstract
Compound-protein interactions (CPIs) are critical in drug discovery for identifying therapeutic targets, drug side effects, and repurposing existing drugs. Machine learning (ML) algorithms have emerged as powerful tools for CPI prediction, offering notable advantages in cost-effectiveness and efficiency. This review provides an overview of recent advances in both structure-based and non-structure-based CPI prediction ML models, highlighting their performance and achievements. It also offers insights into CPI prediction-related datasets and evaluation benchmarks. Lastly, the article presents a comprehensive assessment of the current landscape of CPI prediction, elucidating the challenges faced and outlining emerging trends to advance the field.
Collapse
Affiliation(s)
- Yanbei Li
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhiyi Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Qinyu Chu
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
29
|
Qian H, Huang W, Tu S, Xu L. KGDiff: towards explainable target-aware molecule generation with knowledge guidance. Brief Bioinform 2023; 25:bbad435. [PMID: 38040493 PMCID: PMC10783868 DOI: 10.1093/bib/bbad435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 10/14/2023] [Accepted: 11/03/2023] [Indexed: 12/03/2023] Open
Abstract
Designing 3D molecules with high binding affinity for specific protein targets is crucial in drug design. One challenge is that the atomic interaction between molecules and proteins in 3D space has to be taken into account. However, the existing target-aware methods solely model the joint distribution between the molecules and proteins, disregarding the binding affinities between them, which leads to limited performance. In this paper, we propose an explainable diffusion model to generate molecules that can be bound to a given protein target with high affinity. Our method explicitly incorporates the chemical knowledge of protein-ligand binding affinity into the diffusion model, and uses the knowledge to guide the denoising process towards the direction of high binding affinity. Specifically, an SE(3)-invariant expert network is developed to fit the Vina scoring functions and jointly trained with the denoising network, while the domain knowledge is distilled and conveyed from Vina functions to the expert network. An effective guidance is proposed on both continuous atom coordinates and discrete atom types by taking advantages of the gradient of the expert network. Experiments on the benchmark CrossDocked2020 demonstrate the superiority of our method. Additionally, an atom-level explanation of the generated molecules is provided, and the connections with the domain knowledge are established.
Collapse
Affiliation(s)
- Hao Qian
- Department of Computer Science and Engineering
- Centre for Cognitive Machines and Computational Health (CMaCH), Shanghai Jiao Tong University, Shanghai 200240, China
| | - Wenjing Huang
- Department of Computer Science and Engineering
- Centre for Cognitive Machines and Computational Health (CMaCH), Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shikui Tu
- Department of Computer Science and Engineering
- Centre for Cognitive Machines and Computational Health (CMaCH), Shanghai Jiao Tong University, Shanghai 200240, China
| | - Lei Xu
- Department of Computer Science and Engineering
- Centre for Cognitive Machines and Computational Health (CMaCH), Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
30
|
Wang J, Xiao Y, Shang X, Peng J. Predicting drug-target binding affinity with cross-scale graph contrastive learning. Brief Bioinform 2023; 25:bbad516. [PMID: 38221904 PMCID: PMC10788681 DOI: 10.1093/bib/bbad516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 12/04/2023] [Accepted: 12/07/2023] [Indexed: 01/16/2024] Open
Abstract
Identifying the binding affinity between a drug and its target is essential in drug discovery and repurposing. Numerous computational approaches have been proposed for understanding these interactions. However, most existing methods only utilize either the molecular structure information of drugs and targets or the interaction information of drug-target bipartite networks. They may fail to combine the molecule-scale and network-scale features to obtain high-quality representations. In this study, we propose CSCo-DTA, a novel cross-scale graph contrastive learning approach for drug-target binding affinity prediction. The proposed model combines features learned from the molecular scale and the network scale to capture information from both local and global perspectives. We conducted experiments on two benchmark datasets, and the proposed model outperformed existing state-of-art methods. The ablation experiment demonstrated the significance and efficacy of multi-scale features and cross-scale contrastive learning modules in improving the prediction performance. Moreover, we applied the CSCo-DTA to predict the novel potential targets for Erlotinib and validated the predicted targets with the molecular docking analysis.
Collapse
Affiliation(s)
- Jingru Wang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi’an, 710072, China
- The National Engineering Laboratory for Integrated Aerospace-Ground-Ocean Big Data Application Technology, Xi’an, 710072, China
| | - Yihang Xiao
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi’an, 710072, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi’an, 710072, China
- The National Engineering Laboratory for Integrated Aerospace-Ground-Ocean Big Data Application Technology, Xi’an, 710072, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi’an, 710072, China
- The National Engineering Laboratory for Integrated Aerospace-Ground-Ocean Big Data Application Technology, Xi’an, 710072, China
- Research and Development Institute of Northwestern Polytechnical University in Shenzhen, Shenzhen, 518000, China
| |
Collapse
|
31
|
Qiu S, Zhao S, Yang A. DLTKcat: deep learning-based prediction of temperature-dependent enzyme turnover rates. Brief Bioinform 2023; 25:bbad506. [PMID: 38189538 PMCID: PMC10772988 DOI: 10.1093/bib/bbad506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 11/29/2023] [Accepted: 12/08/2023] [Indexed: 01/09/2024] Open
Abstract
The enzyme turnover rate, ${k}_{cat}$, quantifies enzyme kinetics by indicating the maximum efficiency of enzyme catalysis. Despite its importance, ${k}_{cat}$ values remain scarce in databases for most organisms, primarily because of the cost of experimental measurements. To predict ${k}_{cat}$ and account for its strong temperature dependence, DLTKcat was developed in this study and demonstrated superior performance (log10-scale root mean squared error = 0.88, R-squared = 0.66) than previously published models. Through two case studies, DLTKcat showed its ability to predict the effects of protein sequence mutations and temperature changes on ${k}_{cat}$ values. Although its quantitative accuracy is not high enough yet to model the responses of cellular metabolism to temperature changes, DLTKcat has the potential to eventually become a computational tool to describe the temperature dependence of biological systems.
Collapse
Affiliation(s)
- Sizhe Qiu
- Department of Engineering Science, University of Oxford, OX1 3PJ, United Kingdom
| | - Simiao Zhao
- Radcliffe Department of Medicine, University of Oxford, OX3 9DU, United Kingdom
| | - Aidong Yang
- Department of Engineering Science, University of Oxford, OX1 3PJ, United Kingdom
| |
Collapse
|
32
|
Shen C, Luo J, Xia K. Molecular geometric deep learning. CELL REPORTS METHODS 2023; 3:100621. [PMID: 37875121 PMCID: PMC10694498 DOI: 10.1016/j.crmeth.2023.100621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 06/16/2023] [Accepted: 09/28/2023] [Indexed: 10/26/2023]
Abstract
Molecular representation learning plays an important role in molecular property prediction. Existing molecular property prediction models rely on the de facto standard of covalent-bond-based molecular graphs for representing molecular topology at the atomic level and totally ignore the non-covalent interactions within the molecule. In this study, we propose a molecular geometric deep learning model to predict the properties of molecules that aims to comprehensively consider the information of covalent and non-covalent interactions of molecules. The essential idea is to incorporate a more general molecular representation into geometric deep learning (GDL) models. We systematically test molecular GDL (Mol-GDL) on fourteen commonly used benchmark datasets. The results show that Mol-GDL can achieve a better performance than state-of-the-art (SOTA) methods. Extensive tests have demonstrated the important role of non-covalent interactions in molecular property prediction and the effectiveness of Mol-GDL models.
Collapse
Affiliation(s)
- Cong Shen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000, China; School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000, China.
| | - Kelin Xia
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore.
| |
Collapse
|
33
|
Ran X, Jiang Y, Shao Q, Yang ZJ. EnzyKR: a chirality-aware deep learning model for predicting the outcomes of the hydrolase-catalyzed kinetic resolution. Chem Sci 2023; 14:12073-12082. [PMID: 37969577 PMCID: PMC10631226 DOI: 10.1039/d3sc02752j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 10/16/2023] [Indexed: 11/17/2023] Open
Abstract
Hydrolase-catalyzed kinetic resolution is a well-established biocatalytic process. However, the computational tools that predict favorable enzyme scaffolds for separating a racemic substrate mixture are underdeveloped. To address this challenge, we trained a deep learning framework, EnzyKR, to automate the selection of hydrolases for stereoselective biocatalysis. EnzyKR adopts a classifier-regressor architecture that first identifies the reactive binding conformer of a substrate-hydrolase complex, and then predicts its activation free energy. A structure-based encoding strategy was used to depict the chiral interactions between hydrolases and enantiomers. Different from existing models trained on protein sequences and substrate SMILES strings, EnzyKR was trained using 204 substrate-hydrolase complexes, which were constructed by docking. EnzyKR was tested using a held-out dataset of 20 complexes on the task of predicting activation free energy. EnzyKR achieved a Pearson correlation coefficient (R) of 0.72, a Spearman rank correlation coefficient (Spearman R) of 0.72, and a mean absolute error (MAE) of 1.54 kcal mol-1 in this task. Furthermore, EnzyKR was tested on the task of predicting enantiomeric excess ratios for 28 hydrolytic kinetic resolution reactions catalyzed by fluoroacetate dehalogenase RPA1163, halohydrin HheC, A. mediolanus epoxide hydrolase, and P. fluorescens esterase. The performance of EnzyKR was compared against that of a recently developed kinetic predictor, DLKcat. EnzyKR correctly predicts the favored enantiomer and outperforms DLKcat in 18 out of 28 reactions, occupying 64% of the test cases. These results demonstrate EnzyKR to be a new approach for prediction of enantiomeric outcomes in hydrolase-catalyzed kinetic resolution reactions.
Collapse
Affiliation(s)
- Xinchun Ran
- Department of Chemistry, Vanderbilt University Nashville Tennessee 37235 USA +1-343-9849
| | - Yaoyukun Jiang
- Department of Chemistry, Vanderbilt University Nashville Tennessee 37235 USA +1-343-9849
| | - Qianzhen Shao
- Department of Chemistry, Vanderbilt University Nashville Tennessee 37235 USA +1-343-9849
| | - Zhongyue J Yang
- Department of Chemistry, Vanderbilt University Nashville Tennessee 37235 USA +1-343-9849
- Center for Structural Biology, Vanderbilt University Nashville Tennessee 37235 USA
- Vanderbilt Institute of Chemical Biology, Vanderbilt University Nashville Tennessee 37235 USA
- Data Science Institute, Vanderbilt University Nashville Tennessee 37235 USA
- Department of Chemical and Biomolecular Engineering, Vanderbilt University Nashville Tennessee 37235 USA
| |
Collapse
|
34
|
Tang R, Sun C, Huang J, Li M, Wei J, Liu J. Predicting Drug-Protein Interactions by Self-Adaptively Adjusting the Topological Structure of the Heterogeneous Network. IEEE J Biomed Health Inform 2023; 27:5675-5684. [PMID: 37672364 DOI: 10.1109/jbhi.2023.3312374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Many powerful computational methods based on graph neural networks (GNNs) have been proposed to predict drug-protein interactions (DPIs). It can effectively reduce laboratory workload and the cost of drug discovery and drug repurposing. However, many clinical functions of drugs and proteins are unknown due to their unobserved indications. Therefore, it is difficult to establish a reliable drug-protein heterogeneous network that can describe the relationships between drugs and proteins based on the available information. To solve this problem, we propose a DPI prediction method that can self-adaptively adjust the topological structure of the heterogeneous networks, and name it SATS. SATS establishes a representation learning module based on graph attention network to carry out the drug-protein heterogeneous network. It can self-adaptively learn the relationships among the nodes based on their attributes and adjust the topological structure of the network according to the training loss of the model. Finally, SATS predicts the interaction propensity between drugs and proteins based on their embeddings. The experimental results show that SATS can effectively improve the topological structure of the network. The performance of SATS outperforms several state-of-the-art DPI prediction methods under various evaluation metrics. These prove that SATS is useful to deal with incomplete data and unreliable networks. The case studies on the top section of the prediction results further demonstrate that SATS is powerful for discovering novel DPIs.
Collapse
|
35
|
Song N, Dong R, Pu Y, Wang E, Xu J, Guo F. Pmf-cpi: assessing drug selectivity with a pretrained multi-functional model for compound-protein interactions. J Cheminform 2023; 15:97. [PMID: 37838703 PMCID: PMC10576287 DOI: 10.1186/s13321-023-00767-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 09/28/2023] [Indexed: 10/16/2023] Open
Abstract
Compound-protein interactions (CPI) play significant roles in drug development. To avoid side effects, it is also crucial to evaluate drug selectivity when binding to different targets. However, most selectivity prediction models are constructed for specific targets with limited data. In this study, we present a pretrained multi-functional model for compound-protein interaction prediction (PMF-CPI) and fine-tune it to assess drug selectivity. This model uses recurrent neural networks to process the protein embedding based on the pretrained language model TAPE, extracts molecular information from a graph encoder, and produces the output from dense layers. PMF-CPI obtained the best performance compared to outstanding approaches on both the binding affinity regression and CPI classification tasks. Meanwhile, we apply the model to analyzing drug selectivity after fine-tuning it on three datasets related to specific targets, including human cytochrome P450s. The study shows that PMF-CPI can accurately predict different drug affinities or opposite interactions toward similar targets, recognizing selective drugs for precise therapeutics.Kindly confirm if corresponding authors affiliations are identified correctly and amend if any.Yes, it is correct.
Collapse
Affiliation(s)
- Nan Song
- School of New Media and Communication, Tianjin University, Tianjin, Tianjin, 300072, China
- College of Intelligence and Computing, Tianjin University, Tianjin, Tianjin, 300350, China
| | - Ruihan Dong
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, Beijing, 100871, China
| | - Yuqian Pu
- College of Intelligence and Computing, Tianjin University, Tianjin, Tianjin, 300350, China
| | - Ercheng Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
- Zhejiang Laboratory, Hangzhou, 311100, Zhejiang, China.
| | - Junhai Xu
- School of New Media and Communication, Tianjin University, Tianjin, Tianjin, 300072, China.
- College of Intelligence and Computing, Tianjin University, Tianjin, Tianjin, 300350, China.
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China.
| |
Collapse
|
36
|
Sivangi KB, Amilpur S, Dasari CM. ReGen-DTI: A novel generative drug target interaction model for predicting potential drug candidates against SARS-COV2. Comput Biol Chem 2023; 106:107927. [PMID: 37499436 DOI: 10.1016/j.compbiolchem.2023.107927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 07/12/2023] [Accepted: 07/13/2023] [Indexed: 07/29/2023]
Abstract
Covid-19 has caused massive numbers of infections and fatalities globally. In response, there has been a large-scale experimental and computational research effort to study and develop drugs. Towards this, Deep learning techniques are used for the generation of potential novel drug candidates that are proven to be effective against exploring large molecular search spaces. Recent advances in reinforcement learning in conjunction with generative techniques has proven to be a promising field in the area of drug discovery. In this regard, we propose a generative drug discovery approach using reinforcement techniques for sampling novel molecules that bind to the main protease of SARS-COV2. The generative method reported significant validity scores for the generated novel molecules and captured the underlying features of the training molecules. Further, the model is fine-tuned on existing re-purposed molecules which are active towards specific target proteins based on similarity metrics. Upon fine tuning the model generated 92.71% valid, 93.55% unique, and 100% novel molecules. Unlike previous methods which are dependent on docking procedures, we proposed a deep learning based novel drug target interaction (DTI) model to find the binding affinity between candidate molecules and target protease sequence. Finally, the binding affinity of the generated molecules is predicted against the 3CLPro main protease by using the proposed DTI model. Most of the generated molecules have shown binding affinity scores <100 nM (lower the better), which are significantly better compared to the existing commercial drugs including Remdesevir.
Collapse
Affiliation(s)
- Kaushik Bhargav Sivangi
- Indian Institute of Information Technology, Sri City, Chittoor, 517646, Andhra Pradesh, India
| | - Santhosh Amilpur
- Indian Institute of Information Technology, Sri City, Chittoor, 517646, Andhra Pradesh, India
| | - Chandra Mohan Dasari
- Indian Institute of Information Technology, Sri City, Chittoor, 517646, Andhra Pradesh, India.
| |
Collapse
|
37
|
Guo L, Qiu T, Wang J. ViTScore: A Novel Three-Dimensional Vision Transformer Method for Accurate Prediction of Protein-Ligand Docking Poses. IEEE Trans Nanobioscience 2023; 22:734-743. [PMID: 37159314 DOI: 10.1109/tnb.2023.3274640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Protein-ligand interactions (PLIs) are essential for cellular activities and drug discovery, and due to the complexity and high cost of experimental methods, there is a great demand for computational approaches, such as protein-ligand docking, to decipher PLI patterns. One of the most challenging aspects of protein-ligand docking is to identify near-native conformations from a set of poses, but traditional scoring functions still have limited accuracy. Therefore, new scoring methods are urgently needed for methodological and/or practical implications. We present a novel deep learning-based scoring function for ranking protein-ligand docking poses based on Vision Transformer (ViT), named ViTScore. To recognize near-native poses from a set of poses, ViTScore voxelizes the protein-ligand interactional pocket into a 3D grid labeled by the occupancy contribution of atoms in different physicochemical classes. This allows ViTScore to capture the subtle differences between spatially and energetically favorable near-native poses and unfavorable non-native poses without needing extra information. After that, ViTScore will output the prediction of the root mean square deviation (rmsd) of a docking pose with reference to the native binding pose. ViTScore is extensively evaluated on diverse test sets including PDBbind2019 and CASF2016, and obtains significant improvements over existing methods in terms of RMSE, R and docking power. Moreover, the results demonstrate that ViTScore is a promising scoring function for protein-ligand docking, and it can be used to accurately identify near-native poses from a set of poses. Furthermore, the results suggest that ViTScore is a powerful tool for protein-ligand docking, and it can be used to accurately identify near-native poses from a set of poses. Additionally, ViTScore can be used to identify potential drug targets and to design new drugs with improved efficacy and safety.
Collapse
|
38
|
Pei Q, Wu L, Zhu J, Xia Y, Xie S, Qin T, Liu H, Liu TY, Yan R. Breaking the barriers of data scarcity in drug-target affinity prediction. Brief Bioinform 2023; 24:bbad386. [PMID: 37903413 DOI: 10.1093/bib/bbad386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 09/14/2023] [Accepted: 10/05/2023] [Indexed: 11/01/2023] Open
Abstract
Accurate prediction of drug-target affinity (DTA) is of vital importance in early-stage drug discovery, facilitating the identification of drugs that can effectively interact with specific targets and regulate their activities. While wet experiments remain the most reliable method, they are time-consuming and resource-intensive, resulting in limited data availability that poses challenges for deep learning approaches. Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue. To overcome this challenge, we present the Semi-Supervised Multi-task training (SSM) framework for DTA prediction, which incorporates three simple yet highly effective strategies: (1) A multi-task training approach that combines DTA prediction with masked language modeling using paired drug-target data. (2) A semi-supervised training method that leverages large-scale unpaired molecules and proteins to enhance drug and target representations. This approach differs from previous methods that only employed molecules or proteins in pre-training. (3) The integration of a lightweight cross-attention module to improve the interaction between drugs and targets, further enhancing prediction accuracy. Through extensive experiments on benchmark datasets such as BindingDB, DAVIS and KIBA, we demonstrate the superior performance of our framework. Additionally, we conduct case studies on specific drug-target binding activities, virtual screening experiments, drug feature visualizations and real-world applications, all of which showcase the significant potential of our work. In conclusion, our proposed SSM-DTA framework addresses the data limitation challenge in DTA prediction and yields promising results, paving the way for more efficient and accurate drug discovery processes.
Collapse
Affiliation(s)
- Qizhi Pei
- Gaoling School of Artificial Intelligence, Renmin University of China, No.59, Zhong Guan Cun Avenue, Haidian District, 100872, Beijing, China
| | - Lijun Wu
- Microsoft Research AI4Science, No.5, Dan Ling Street, Haidian District, 100080, Beijing, China
| | - Jinhua Zhu
- CAS Key Laboratory of GIPAS, EEIS Department, University of Science and Technology of China, No.96, JinZhai Road, Baohe District, 230026, Hefei, Anhui Province, China
| | - Yingce Xia
- Microsoft Research AI4Science, No.5, Dan Ling Street, Haidian District, 100080, Beijing, China
| | - Shufang Xie
- Gaoling School of Artificial Intelligence, Renmin University of China, No.59, Zhong Guan Cun Avenue, Haidian District, 100872, Beijing, China
| | - Tao Qin
- Engineering Research Center of Next-Generation Intelligent Search and Recommendation, Ministry of Education
| | - Haiguang Liu
- Microsoft Research AI4Science, No.5, Dan Ling Street, Haidian District, 100080, Beijing, China
| | - Tie-Yan Liu
- Microsoft Research AI4Science, No.5, Dan Ling Street, Haidian District, 100080, Beijing, China
| | - Rui Yan
- Beijing Key Laboratory of Big Data Management and Analysis Methods
| |
Collapse
|
39
|
Spronk SA, Glick ZL, Metcalf DP, Sherrill CD, Cheney DL. A quantum chemical interaction energy dataset for accurately modeling protein-ligand interactions. Sci Data 2023; 10:619. [PMID: 37699937 PMCID: PMC10497680 DOI: 10.1038/s41597-023-02443-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 08/03/2023] [Indexed: 09/14/2023] Open
Abstract
Fast and accurate calculation of intermolecular interaction energies is desirable for understanding many chemical and biological processes, including the binding of small molecules to proteins. The Splinter ["Symmetry-adapted perturbation theory (SAPT0) protein-ligand interaction"] dataset has been created to facilitate the development and improvement of methods for performing such calculations. Molecular fragments representing commonly found substructures in proteins and small-molecule ligands were paired into >9000 unique dimers, assembled into numerous configurations using an approach designed to adequately cover the breadth of the dimers' potential energy surfaces while enhancing sampling in favorable regions. ~1.5 million configurations of these dimers were randomly generated, and a structurally diverse subset of these were minimized to obtain an additional ~80 thousand local and global minima. For all >1.6 million configurations, SAPT0 calculations were performed with two basis sets to complete the dataset. It is expected that Splinter will be a useful benchmark dataset for training and testing various methods for the calculation of intermolecular interaction energies.
Collapse
Affiliation(s)
- Steven A Spronk
- Molecular Structure and Design, Bristol Myers Squibb Company, P. O. Box 5400, Princeton, NJ, 08543, USA.
| | - Zachary L Glick
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332-0400, USA
| | - Derek P Metcalf
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332-0400, USA
| | - C David Sherrill
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332-0400, USA.
| | - Daniel L Cheney
- Molecular Structure and Design, Bristol Myers Squibb Company, P. O. Box 5400, Princeton, NJ, 08543, USA
| |
Collapse
|
40
|
Liu C, Kutchukian P, Nguyen ND, AlQuraishi M, Sorger PK. A Hybrid Structure-Based Machine Learning Approach for Predicting Kinase Inhibition by Small Molecules. J Chem Inf Model 2023; 63:5457-5472. [PMID: 37595065 PMCID: PMC10498990 DOI: 10.1021/acs.jcim.3c00347] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Indexed: 08/20/2023]
Abstract
Kinases have been the focus of drug discovery programs for three decades leading to over 70 therapeutic kinase inhibitors and biophysical affinity measurements for over 130,000 kinase-compound pairs. Nonetheless, the precise target spectrum for many kinases remains only partly understood. In this study, we describe a computational approach to unlocking qualitative and quantitative kinome-wide binding measurements for structure-based machine learning. Our study has three components: (i) a Kinase Inhibitor Complex (KinCo) data set comprising in silico predicted kinase structures paired with experimental binding constants, (ii) a machine learning loss function that integrates qualitative and quantitative data for model training, and (iii) a structure-based machine learning model trained on KinCo. We show that our approach outperforms methods trained on crystal structures alone in predicting binary and quantitative kinase-compound interaction affinities; relative to structure-free methods, our approach also captures known kinase biochemistry and more successfully generalizes to distant kinase sequences and compound scaffolds.
Collapse
Affiliation(s)
- Changchang Liu
- Laboratory
of Systems Pharmacology, Department of Systems Biology, Harvard Program
in Therapeutic Science, Harvard Medical
School, Boston, Massachusetts 02115, United States
| | - Peter Kutchukian
- Novartis
Institutes for Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Nhan D. Nguyen
- Pritzker
School of Molecular Engineering, University
of Chicago, Chicago, Illinois 60637, United
States
| | - Mohammed AlQuraishi
- Department
of Systems Biology, Columbia University, New York, New York 10032, United States
| | - Peter K. Sorger
- Laboratory
of Systems Pharmacology, Department of Systems Biology, Harvard Program
in Therapeutic Science, Harvard Medical
School, Boston, Massachusetts 02115, United States
| |
Collapse
|
41
|
Chen P, Shen H, Zhang Y, Wang B, Gu P. SGNet: Sequence-Based Convolution and Ligand Graph Network for Protein Binding Affinity Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3257-3266. [PMID: 37030867 DOI: 10.1109/tcbb.2023.3262821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Protein-ligand binding can play an important role in many fields. It is of great importance to accurately predict the binding affinity between molecules by computational methods. Most computational binding affinity methods require molecular structures. However, there are still a large number of protein molecules with known amino acid sequences whose structures have not yet been solved. To address this issue, this paper proposes a sequence-based convolution and ligand graph network, called SGNet, to fuse the molecular graph information and the amino acid sequence information. This method integrates Conjoint Triad (CT) encoding of amino acid sequence and one-dimensional convolutional neural network module to extract protein molecules, develops graph attention network to extract molecular features of ligand, and then fuses the two feature sets to predict the binding affinity between molecules from the fully connected layer. As a result, SGNet achieves good prediction performance on both KIKD and IC50 data sets, with prediction error RMSEs of 1.287 and 1.58, and correlation Pearson Rs of 0.687 and 0.592, respectively. Comparative experimental results under the same conditions showed that SGNet outperformed Kdeep and GraphDTA in predicting binding affinities between protein-ligand molecules.
Collapse
|
42
|
Wang Y, Zhang R, Zhang S, Guo L, Zhou Q, Zhao B, Mo X, Yang Q, Huang Y, Li K, Fan Y, Huang L, Zhou F. OCMR: A comprehensive framework for optical chemical molecular recognition. Comput Biol Med 2023; 163:107187. [PMID: 37393787 DOI: 10.1016/j.compbiomed.2023.107187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/10/2023] [Accepted: 06/19/2023] [Indexed: 07/04/2023]
Abstract
Artificial intelligence (AI) has achieved significant progress in the field of drug discovery. AI-based tools have been used in all aspects of drug discovery, including chemical structure recognition. We propose a chemical structure recognition framework, Optical Chemical Molecular Recognition (OCMR), to improve the data extraction capability in practical scenarios compared with the rule-based and end-to-end deep learning models. The proposed OCMR framework enhances the recognition performances via the integration of local information in the topology of molecular graphs. OCMR handles complex tasks like non-canonical drawing and atomic group abbreviation and substantially improves the current state-of-the-art results on multiple public benchmark datasets and one internally curated dataset.
Collapse
Affiliation(s)
- Yan Wang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; School of Artificial Intelligence, Jilin University, Changchun, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Ruochi Zhang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Shengde Zhang
- Machine Learning Department, Silexon AI Technology Co, Ltd, Beijing, 100084, China
| | - Liming Guo
- Machine Learning Department, Silexon AI Technology Co, Ltd, Beijing, 100084, China
| | - Qiong Zhou
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, 130012, China
| | - Bowen Zhao
- Machine Learning Department, Silexon AI Technology Co, Ltd, Beijing, 100084, China
| | - Xiaotong Mo
- Machine Learning Department, Silexon AI Technology Co, Ltd, Beijing, 100084, China
| | - Qian Yang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; School of Artificial Intelligence, Jilin University, Changchun, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Yajuan Huang
- Machine Learning Department, Silexon AI Technology Co, Ltd, Beijing, 100084, China
| | - Kewei Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
| | - Yusi Fan
- College of Software, Jilin University, Changchun, Jilin, 130012, China
| | - Lan Huang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Fengfeng Zhou
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; School of Artificial Intelligence, Jilin University, Changchun, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
| |
Collapse
|
43
|
Sinha K, Ghosh N, Sil PC. A Review on the Recent Applications of Deep Learning in Predictive Drug Toxicological Studies. Chem Res Toxicol 2023; 36:1174-1205. [PMID: 37561655 DOI: 10.1021/acs.chemrestox.2c00375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
Drug toxicity prediction is an important step in ensuring patient safety during drug design studies. While traditional preclinical studies have historically relied on animal models to evaluate toxicity, recent advances in deep-learning approaches have shown great promise in advancing drug safety science and reducing animal use in preclinical studies. However, deep-learning-based approaches also face challenges in handling large biological data sets, model interpretability, and regulatory acceptance. In this review, we provide an overview of recent developments in deep-learning-based approaches for predicting drug toxicity, highlighting their potential advantages over traditional methods and the need to address their limitations. Deep-learning models have demonstrated excellent performance in predicting toxicity outcomes from various data sources such as chemical structures, genomic data, and high-throughput screening assays. The potential of deep learning for automated feature engineering is also discussed. This review emphasizes the need to address ethical concerns related to the use of deep learning in drug toxicity studies, including the reduction of animal use and ensuring regulatory acceptance. Furthermore, emerging applications of deep learning in drug toxicity prediction, such as predicting drug-drug interactions and toxicity in rare subpopulations, are highlighted. The integration of deep-learning-based approaches with traditional methods is discussed as a way to develop more reliable and efficient predictive models for drug safety assessment, paving the way for safer and more effective drug discovery and development. Overall, this review highlights the critical role of deep learning in predictive toxicology and drug safety evaluation, emphasizing the need for continued research and development in this rapidly evolving field. By addressing the limitations of traditional methods, leveraging the potential of deep learning for automated feature engineering, and addressing ethical concerns, deep-learning-based approaches have the potential to revolutionize drug toxicity prediction and improve patient safety in drug discovery and development.
Collapse
Affiliation(s)
- Krishnendu Sinha
- Department of Zoology, Jhargram Raj College, Jhargram 721507, West Bengal, India
| | - Nabanita Ghosh
- Department of Zoology, Maulana Azad College, Kolkata 700013, West Bengal, India
| | - Parames C Sil
- Division of Molecular Medicine, Bose Institute, Kolkata 700054, West Bengal, India
| |
Collapse
|
44
|
Li S, Tian T, Zhang Z, Zou Z, Zhao D, Zeng J. PocketAnchor: Learning structure-based pocket representations for protein-ligand interaction prediction. Cell Syst 2023; 14:692-705.e6. [PMID: 37516103 DOI: 10.1016/j.cels.2023.05.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 11/25/2022] [Accepted: 05/19/2023] [Indexed: 07/31/2023]
Abstract
Protein-ligand interactions are essential for cellular activities and drug discovery processes. Appropriately and effectively representing protein features is of vital importance for developing computational approaches, especially data-driven methods, for predicting protein-ligand interactions. However, existing approaches may not fully investigate the features of the ligand-occupying regions in the protein pockets. Here, we design a structure-based protein representation method, named PocketAnchor, for capturing the local environmental and spatial features of protein pockets to facilitate protein-ligand interaction-related learning tasks. We define "anchors" as probe points reaching into the cavities and those located near the surface of proteins, and we design a specific message passing strategy for gathering local information from the atoms and surface neighboring these anchors. Comprehensive evaluation of our method demonstrated its successful applications in pocket detection and binding affinity prediction, which indicated that our anchor-based approach can provide effective protein feature representations for improving the prediction of protein-ligand interactions.
Collapse
Affiliation(s)
- Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Ziting Zhang
- Department of Automation, Tsinghua University, Beijing 100084, China; MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - Ziheng Zou
- Silexon AI Technology, Nanjing, Jiangsu Province 210023, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
45
|
Chen L, Fan Z, Chang J, Yang R, Hou H, Guo H, Zhang Y, Yang T, Zhou C, Sui Q, Chen Z, Zheng C, Hao X, Zhang K, Cui R, Zhang Z, Ma H, Ding Y, Zhang N, Lu X, Luo X, Jiang H, Zhang S, Zheng M. Sequence-based drug design as a concept in computational drug design. Nat Commun 2023; 14:4217. [PMID: 37452028 PMCID: PMC10349078 DOI: 10.1038/s41467-023-39856-w] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 06/27/2023] [Indexed: 07/18/2023] Open
Abstract
Drug development based on target proteins has been a successful approach in recent decades. However, the conventional structure-based drug design (SBDD) pipeline is a complex, human-engineered process with multiple independently optimized steps. Here, we propose a sequence-to-drug concept for computational drug design based on protein sequence information by end-to-end differentiable learning. We validate this concept in three stages. First, we design TransformerCPI2.0 as a core tool for the concept, which demonstrates generalization ability across proteins and compounds. Second, we interpret the binding knowledge that TransformerCPI2.0 learned. Finally, we use TransformerCPI2.0 to discover new hits for challenging drug targets, and identify new target for an existing drug based on an inverse application of the concept. Overall, this proof-of-concept study shows that the sequence-to-drug concept adds a perspective on drug design. It can serve as an alternative method to SBDD, particularly for proteins that do not yet have high-quality 3D structures available.
Collapse
Affiliation(s)
- Lifan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Zisheng Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China
| | - Jie Chang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Ruirui Yang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China
| | - Hui Hou
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Hao Guo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Yinghui Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Tianbiao Yang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Chenmao Zhou
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Qibang Sui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Zhengyang Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Chen Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xinyue Hao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Keke Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Rongrong Cui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Zehong Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Hudson Ma
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Yiluan Ding
- Department of Analytical Chemistry, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Naixia Zhang
- Department of Analytical Chemistry, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xiaojie Lu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, 1 Sub-lane Xiangshan, Hangzhou, 310024, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China.
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China.
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, 1 Sub-lane Xiangshan, Hangzhou, 310024, China.
| |
Collapse
|
46
|
Varikoti RA, Schultz KJ, Kombala CJ, Kruel A, Brandvold KR, Zhou M, Kumar N. Integrated data-driven and experimental approaches to accelerate lead optimization targeting SARS-CoV-2 main protease. J Comput Aided Mol Des 2023:10.1007/s10822-023-00509-1. [PMID: 37314632 DOI: 10.1007/s10822-023-00509-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 05/23/2023] [Indexed: 06/15/2023]
Abstract
Identification of potential therapeutic candidates can be expedited by integrating computational modeling with domain aware machine learning (ML) models followed by experimental validation in an iterative manner. Generative deep learning models can generate thousands of new candidates, however, their physiochemical and biochemical properties are typically not fully optimized. Using our recently developed deep learning models and a scaffold as a starting point, we generated tens of thousands of compounds for SARS-CoV-2 Mpro that preserve the core scaffold. We utilized and implemented several computational tools such as structural alert and toxicity analysis, high throughput virtual screening, ML-based 3D quantitative structure-activity relationships, multi-parameter optimization, and graph neural networks on generated candidates to predict biological activity and binding affinity in advance. As a result of these combined computational endeavors, eight promising candidates were singled out and put through experimental testing using Native Mass Spectrometry and FRET-based functional assays. Two of the tested compounds with quinazoline-2-thiol and acetylpiperidine core moieties showed IC[Formula: see text] values in the low micromolar range: [Formula: see text] [Formula: see text]M and 3.41±0.0015 [Formula: see text]M, respectively. Molecular dynamics simulations further highlight that binding of these compounds results in allosteric modulations within the chain B and the interface domains of the Mpro. Our integrated approach provides a platform for data driven lead optimization with rapid characterization and experimental validation in a closed loop that could be applied to other potential protein targets.
Collapse
Affiliation(s)
- Rohith Anand Varikoti
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Katherine J Schultz
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Chathuri J Kombala
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Agustin Kruel
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Kristoffer R Brandvold
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Mowei Zhou
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Neeraj Kumar
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA.
| |
Collapse
|
47
|
Wen J, Gan H, Yang Z, Zhou R, Zhao J, Ye Z. Mutual-DTI: A mutual interaction feature-based neural network for drug-target protein interaction prediction. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:10610-10625. [PMID: 37322951 DOI: 10.3934/mbe.2023469] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
The prediction of drug-target protein interaction (DTI) is a crucial task in the development of new drugs in modern medicine. Accurately identifying DTI through computer simulations can significantly reduce development time and costs. In recent years, many sequence-based DTI prediction methods have been proposed, and introducing attention mechanisms has improved their forecasting performance. However, these methods have some shortcomings. For example, inappropriate dataset partitioning during data preprocessing can lead to overly optimistic prediction results. Additionally, only single non-covalent intermolecular interactions are considered in the DTI simulation, ignoring the complex interactions between their internal atoms and amino acids. In this paper, we propose a network model called Mutual-DTI that predicts DTI based on the interaction properties of sequences and a Transformer model. We use multi-head attention to extract the long-distance interdependent features of the sequence and introduce a module to extract the sequence's mutual interaction features in mining complex reaction processes of atoms and amino acids. We evaluate the experiments on two benchmark datasets, and the results show that Mutual-DTI outperforms the latest baseline significantly. In addition, we conduct ablation experiments on a label-inversion dataset that is split more rigorously. The results show that there is a significant improvement in the evaluation metrics after introducing the extracted sequence interaction feature module. This suggests that Mutual-DTI may contribute to modern medical drug development research. The experimental results show the effectiveness of our approach. The code for Mutual-DTI can be downloaded from https://github.com/a610lab/Mutual-DTI.
Collapse
Affiliation(s)
- Jiahui Wen
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China
| | - Haitao Gan
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei University, Wuhan 430062, China
| | - Zhi Yang
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei University, Wuhan 430062, China
| | - Ran Zhou
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China
| | - Jing Zhao
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei University, Wuhan 430062, China
| | - Zhiwei Ye
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China
| |
Collapse
|
48
|
Steinmetz B, Smok I, Bikaki M, Leitner A. Protein-RNA interactions: from mass spectrometry to drug discovery. Essays Biochem 2023; 67:175-186. [PMID: 36866608 PMCID: PMC10070478 DOI: 10.1042/ebc20220177] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 01/25/2023] [Accepted: 01/26/2023] [Indexed: 03/04/2023]
Abstract
Proteins and RNAs are fundamental parts of biological systems, and their interactions affect many essential cellular processes. Therefore, it is crucial to understand at a molecular and at a systems level how proteins and RNAs form complexes and mutually affect their functions. In the present mini-review, we will first provide an overview of different mass spectrometry (MS)-based methods to study the RNA-binding proteome (RBPome), most of which are based on photochemical cross-linking. As we will show, some of these methods are also able to provide higher-resolution information about binding sites, which are important for the structural characterisation of protein-RNA interactions. In addition, classical structural biology techniques such as nuclear magnetic resonance (NMR) spectroscopy and biophysical methods such as electron paramagnetic resonance (EPR) spectroscopy and fluorescence-based methods contribute to a detailed understanding of the interactions between these two classes of biomolecules. We will discuss the relevance of such interactions in the context of the formation of membrane-less organelles (MLOs) by liquid-liquid phase separation (LLPS) processes and their emerging importance as targets for drug discovery.
Collapse
Affiliation(s)
- Benjamin Steinmetz
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8093 Zurich, Switzerland
- RNA Biology PhD Program, University of Zurich and ETH Zürich, Zurich, Switzerland
| | - Izabela Smok
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8093 Zurich, Switzerland
- RNA Biology PhD Program, University of Zurich and ETH Zürich, Zurich, Switzerland
| | - Maria Bikaki
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8093 Zurich, Switzerland
| | - Alexander Leitner
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8093 Zurich, Switzerland
| |
Collapse
|
49
|
Guo B, Zheng H, Jiang H, Li X, Guan N, Zuo Y, Zhang Y, Yang H, Wang X. Enhanced compound-protein binding affinity prediction by representing protein multimodal information via a coevolutionary strategy. Brief Bioinform 2023; 24:6995409. [PMID: 36682005 DOI: 10.1093/bib/bbac628] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 12/12/2022] [Accepted: 12/25/2022] [Indexed: 01/23/2023] Open
Abstract
Due to the lack of a method to efficiently represent the multimodal information of a protein, including its structure and sequence information, predicting compound-protein binding affinity (CPA) still suffers from low accuracy when applying machine-learning methods. To overcome this limitation, in a novel end-to-end architecture (named FeatNN), we develop a coevolutionary strategy to jointly represent the structure and sequence features of proteins and ultimately optimize the mathematical models for predicting CPA. Furthermore, from the perspective of data-driven approach, we proposed a rational method that can utilize both high- and low-quality databases to optimize the accuracy and generalization ability of FeatNN in CPA prediction tasks. Notably, we visually interpret the feature interaction process between sequence and structure in the rationally designed architecture. As a result, FeatNN considerably outperforms the state-of-the-art (SOTA) baseline in virtual drug evaluation tasks, indicating the feasibility of this approach for practical use. FeatNN provides an outstanding method for higher CPA prediction accuracy and better generalization ability by efficiently representing multimodal information of proteins via a coevolutionary strategy.
Collapse
Affiliation(s)
- Binjie Guo
- Department of Neurobiology and Department of Rehabilitation Medicine, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zheng
- Department of Neurobiology and Department of Rehabilitation Medicine, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
| | - Haohan Jiang
- Department of Neurobiology and Department of Rehabilitation Medicine, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
| | - Xiaodan Li
- Department of Neurobiology and Department of Rehabilitation Medicine, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
| | - Naiyu Guan
- Department of Neurobiology and Department of Rehabilitation Medicine, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
| | - Yanming Zuo
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
| | - Yicheng Zhang
- Department of Neurobiology and Department of Rehabilitation Medicine, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
| | - Hengfu Yang
- School of Computer Science, Hunan First Normal University, Changsha, 410205 Hunan, China
| | - Xuhua Wang
- Department of Neurobiology and Department of Rehabilitation Medicine, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou 310058, China
- Co-innovation Center of Neuroregeneration, Nantong University, Nantong, 226001 Jiangsu, China
| |
Collapse
|
50
|
Zhao Q, Duan G, Yang M, Cheng Z, Li Y, Wang J. AttentionDTA: Drug-Target Binding Affinity Prediction by Sequence-Based Deep Learning With Attention Mechanism. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:852-863. [PMID: 35471889 DOI: 10.1109/tcbb.2022.3170365] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The identification of drug-target relations (DTRs) is substantial in drug development. A large number of methods treat DTRs as drug-target interactions (DTIs), a binary classification problem. The main drawback of these methods are the lack of reliable negative samples and the absence of many important aspects of DTR, including their dose dependence and quantitative affinities. With increasing number of publications of drug-protein binding affinity data recently, DTRs prediction can be viewed as a regression problem of drug-target affinities (DTAs) which reflects how tightly the drug binds to the target and can present more detailed and specific information than DTIs. The growth of affinity data enables the use of deep learning architectures, which have been shown to be among the state-of-the-art methods in binding affinity prediction. Although relatively effective, due to the black-box nature of deep learning, these models are less biologically interpretable. In this study, we proposed a deep learning-based model, named AttentionDTA, which uses attention mechanism to predict DTAs. Different from the models using 3D structures of drug-target complexes or graph representation of drugs and proteins, the novelty of our work is to use attention mechanism to focus on key subsequences which are important in drug and protein sequences when predicting its affinity. We use two separate one-dimensional Convolution Neural Networks (1D-CNNs) to extract the semantic information of drug's SMILES string and protein's amino acid sequence. Furthermore, a two-side multi-head attention mechanism is developed and embedded to our model to explore the relationship between drug features and protein features. We evaluate our model on three established DTA benchmark datasets, Davis, Metz, and KIBA. AttentionDTA outperforms the state-of-the-art deep learning methods under different evaluation metrics. The results show that the attention-based model can effectively extract protein features related to drug information and drug features related to protein information to better predict drug target affinities. It is worth mentioning that we test our model on IC50 dataset, which provides the binding sites between drugs and proteins, to evaluate the ability of our model to locate binding sites. Finally, we visualize the attention weight to demonstrate the biological significance of the model. The source code of AttentionDTA can be downloaded from https://github.com/zhaoqichang/AttentionDTA_TCBB.
Collapse
|