1
|
Odugbemi AI, Nyirenda C, Christoffels A, Egieyeh SA. Artificial intelligence in antidiabetic drug discovery: The advances in QSAR and the prediction of α-glucosidase inhibitors. Comput Struct Biotechnol J 2024; 23:2964-2977. [PMID: 39148608 PMCID: PMC11326494 DOI: 10.1016/j.csbj.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 07/03/2024] [Accepted: 07/03/2024] [Indexed: 08/17/2024] Open
Abstract
Artificial Intelligence is transforming drug discovery, particularly in the hit identification phase of therapeutic compounds. One tool that has been instrumental in this transformation is Quantitative Structure-Activity Relationship (QSAR) analysis. This computer-aided drug design tool uses machine learning to predict the biological activity of new compounds based on the numerical representation of chemical structures against various biological targets. With diabetes mellitus becoming a significant health challenge in recent times, there is intense research interest in modulating antidiabetic drug targets. α-Glucosidase is an antidiabetic target that has gained attention due to its ability to suppress postprandial hyperglycaemia, a key contributor to diabetic complications. This review explored a detailed approach to developing QSAR models, focusing on strategies for generating input variables (molecular descriptors) and computational approaches ranging from classical machine learning algorithms to modern deep learning algorithms. We also highlighted studies that have used these approaches to develop predictive models for α-glucosidase inhibitors to modulate this critical antidiabetic drug target.
Collapse
Affiliation(s)
- Adeshina I Odugbemi
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, Cape Town 7535, South Africa
- School of Pharmacy, University of the Western Cape, Bellville, Cape Town 7535, South Africa
- National Institute for Theoretical and Computational Sciences (NITheCS), South Africa
| | - Clement Nyirenda
- Department of Computer Science, University of the Western Cape, Cape Town 7535, South Africa
| | - Alan Christoffels
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, Cape Town 7535, South Africa
- Africa Centres for Disease Control and Prevention, African Union, Addis Ababa, Ethiopia
| | - Samuel A Egieyeh
- School of Pharmacy, University of the Western Cape, Bellville, Cape Town 7535, South Africa
- National Institute for Theoretical and Computational Sciences (NITheCS), South Africa
| |
Collapse
|
2
|
Guichaoua G, Pinel P, Hoffmann B, Azencott CA, Stoven V. Drug-Target Interactions Prediction at Scale: The Komet Algorithm with the LCIdb Dataset. J Chem Inf Model 2024; 64:6938-6956. [PMID: 39237105 PMCID: PMC11423346 DOI: 10.1021/acs.jcim.4c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
Drug-target interactions (DTIs) prediction algorithms are used at various stages of the drug discovery process. In this context, specific problems such as deorphanization of a new therapeutic target or target identification of a drug candidate arising from phenotypic screens require large-scale predictions across the protein and molecule spaces. DTI prediction heavily relies on supervised learning algorithms that use known DTIs to learn associations between molecule and protein features, allowing for the prediction of new interactions based on learned patterns. The algorithms must be broadly applicable to enable reliable predictions, even in regions of the protein or molecule spaces where data may be scarce. In this paper, we address two key challenges to fulfill these goals: building large, high-quality training datasets and designing prediction methods that can scale, in order to be trained on such large datasets. First, we introduce LCIdb, a curated, large-sized dataset of DTIs, offering extensive coverage of both the molecule and druggable protein spaces. Notably, LCIdb contains a much higher number of molecules than publicly available benchmarks, expanding coverage of the molecule space. Second, we propose Komet (Kronecker Optimized METhod), a DTI prediction pipeline designed for scalability without compromising performance. Komet leverages a three-step framework, incorporating efficient computation choices tailored for large datasets and involving the Nyström approximation. Specifically, Komet employs a Kronecker interaction module for (molecule, protein) pairs, which efficiently captures determinants in DTIs, and whose structure allows for reduced computational complexity and quasi-Newton optimization, ensuring that the model can handle large training sets, without compromising on performance. Our method is implemented in open-source software, leveraging GPU parallel computation for efficiency. We demonstrate the interest of our pipeline on various datasets, showing that Komet displays superior scalability and prediction performance compared to state-of-the-art deep learning approaches. Additionally, we illustrate the generalization properties of Komet by showing its performance on an external dataset, and on the publicly available L H benchmark designed for scaffold hopping problems. Komet is available open source at https://komet.readthedocs.io and all datasets, including LCIdb, can be found at https://zenodo.org/records/10731712.
Collapse
Affiliation(s)
- Gwenn Guichaoua
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
| | - Philippe Pinel
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
- Iktos SAS, 75017 Paris, France
| | | | - Chloé-Agathe Azencott
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
| | - Véronique Stoven
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
| |
Collapse
|
3
|
Yang Y, Qiu Y, Hu J, Rosen-Zvi M, Guan Q, Cheng F. A deep learning framework combining molecular image and protein structural representations identifies candidate drugs for pain. CELL REPORTS METHODS 2024:100865. [PMID: 39341201 DOI: 10.1016/j.crmeth.2024.100865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 07/11/2024] [Accepted: 09/03/2024] [Indexed: 09/30/2024]
Abstract
Artificial intelligence (AI) and deep learning technologies hold promise for identifying effective drugs for human diseases, including pain. Here, we present an interpretable deep-learning-based ligand image- and receptor's three-dimensional (3D)-structure-aware framework to predict compound-protein interactions (LISA-CPI). LISA-CPI integrates an unsupervised deep-learning-based molecular image representation (ImageMol) of ligands and an advanced AlphaFold2-based algorithm (Evoformer). We demonstrated that LISA-CPI achieved ∼20% improvement in the average mean absolute error (MAE) compared to state-of-the-art models on experimental CPIs connecting 104,969 ligands and 33 G-protein-coupled receptors (GPCRs). Using LISA-CPI, we prioritized potential repurposable drugs (e.g., methylergometrine) and identified candidate gut-microbiota-derived metabolites (e.g., citicoline) for potential treatment of pain via specifically targeting human GPCRs. In summary, we presented that the integration of molecular image and protein 3D structural representations using a deep learning framework offers a powerful computational drug discovery tool for treating pain and other complex diseases if broadly applied.
Collapse
Affiliation(s)
- Yuxin Yang
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Department of Computer Science, Kent State University, Kent, OH 44242, USA; Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Yunguang Qiu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Jianying Hu
- IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Michal Rosen-Zvi
- AI for Accelerated Healthcare and Life Sciences Discovery, IBM Research-Israel, Haifa 3498825, Israel
| | - Qiang Guan
- Department of Computer Science, Kent State University, Kent, OH 44242, USA.
| | - Feixiong Cheng
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA.
| |
Collapse
|
4
|
Zeng X, Zhong KY, Meng PY, Li SJ, Lv SQ, Wen ML, Li Y. MvGraphDTA: multi-view-based graph deep model for drug-target affinity prediction by introducing the graphs and line graphs. BMC Biol 2024; 22:182. [PMID: 39183297 PMCID: PMC11346193 DOI: 10.1186/s12915-024-01981-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 08/13/2024] [Indexed: 08/27/2024] Open
Abstract
BACKGROUND Accurately identifying drug-target affinity (DTA) plays a pivotal role in drug screening, design, and repurposing in pharmaceutical industry. It not only reduces the time, labor, and economic costs associated with biological experiments but also expedites drug development process. However, achieving the desired level of computational accuracy for DTA identification methods remains a significant challenge. RESULTS We proposed a novel multi-view-based graph deep model known as MvGraphDTA for DTA prediction. MvGraphDTA employed a graph convolutional network (GCN) to extract the structural features from original graphs of drugs and targets, respectively. It went a step further by constructing line graphs with edges as vertices based on original graphs of drugs and targets. GCN was also used to extract the relationship features within their line graphs. To enhance the complementarity between the extracted features from original graphs and line graphs, MvGraphDTA fused the extracted multi-view features of drugs and targets, respectively. Finally, these fused features were concatenated and passed through a fully connected (FC) network to predict DTA. CONCLUSIONS During the experiments, we performed data augmentation on all the training sets used. Experimental results showed that MvGraphDTA outperformed the competitive state-of-the-art methods on benchmark datasets for DTA prediction. Additionally, we evaluated the universality and generalization performance of MvGraphDTA on additional datasets. Experimental outcomes revealed that MvGraphDTA exhibited good universality and generalization capability, making it a reliable tool for drug-target interaction prediction.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Kai-Yang Zhong
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Pei-Yan Meng
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Shu-Juan Li
- Yunnan Institute of Endemic Diseases Control & Prevention, Dali, 671000, China
| | - Shuang-Qing Lv
- Institute of Surveying and Information Engineering, West Yunnan University of Applied Science, Dali, 671000, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, 650000, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China.
| |
Collapse
|
5
|
Xu S, Shen L, Zhang M, Jiang C, Zhang X, Xu Y, Liu J, Liu X. Surface-based multimodal protein-ligand binding affinity prediction. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae413. [PMID: 38905501 DOI: 10.1093/bioinformatics/btae413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/15/2024] [Accepted: 06/19/2024] [Indexed: 06/23/2024]
Abstract
MOTIVATION In the field of drug discovery, accurately and effectively predicting the binding affinity between proteins and ligands is crucial for drug screening and optimization. However, current research primarily utilizes representations based on sequence or structure to predict protein-ligand binding affinity, with relatively less study on protein surface information, which is crucial for protein-ligand interactions. Moreover, when dealing with multimodal information of proteins, traditional approaches typically concatenate features from different modalities in a straightforward manner without considering the heterogeneity among them, which results in an inability to effectively exploit the complementary between modalities. RESULTS We introduce a novel multimodal feature extraction (MFE) framework that, for the first time, incorporates information from protein surfaces, 3D structures, and sequences, and uses cross-attention mechanism for feature alignment between different modalities. Experimental results show that our method achieves state-of-the-art performance in predicting protein-ligand binding affinity. Furthermore, we conduct ablation studies that demonstrate the effectiveness and necessity of protein surface information and multimodal feature alignment within the framework. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/Sultans0fSwing/MFE.
Collapse
Affiliation(s)
- Shiyu Xu
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Lian Shen
- Department of Computer Science and Technology, Xiamen University, Xiamen 361005, China
| | - Menglong Zhang
- Department of Computer Science and Technology, Xiamen University, Xiamen 361005, China
| | - Changzhi Jiang
- Department of Computer Science and Technology, Xiamen University, Xiamen 361005, China
| | - Xinyi Zhang
- Department of Computer Science and Technology, Xiamen University, Xiamen 361005, China
| | - Yanni Xu
- Department of Computer Science and Technology, Xiamen University, Xiamen 361005, China
| | - Juan Liu
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361005, China
| | - Xiangrong Liu
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
- Department of Computer Science and Technology, Xiamen University, Xiamen 361005, China
- Xiamen Key Laboratory of Intelligent Storage and Computing, Xiamen University, Xiamen 361005, China
| |
Collapse
|
6
|
Wu H, Liu J, Zhang R, Lu Y, Cui G, Cui Z, Ding Y. A review of deep learning methods for ligand based drug virtual screening. FUNDAMENTAL RESEARCH 2024; 4:715-737. [PMID: 39156568 PMCID: PMC11330120 DOI: 10.1016/j.fmre.2024.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/10/2024] [Accepted: 02/18/2024] [Indexed: 08/20/2024] Open
Abstract
Drug discovery is costly and time consuming, and modern drug discovery endeavors are progressively reliant on computational methodologies, aiming to mitigate temporal and financial expenditures associated with the process. In particular, the time required for vaccine and drug discovery is prolonged during emergency situations such as the coronavirus 2019 pandemic. Recently, the performance of deep learning methods in drug virtual screening has been particularly prominent. It has become a concern for researchers how to summarize the existing deep learning in drug virtual screening, select different models for different drug screening problems, exploit the advantages of deep learning models, and further improve the capability of deep learning in drug virtual screening. This review first introduces the basic concepts of drug virtual screening, common datasets, and data representation methods. Then, large numbers of common deep learning methods for drug virtual screening are compared and analyzed. In addition, a dataset of different sizes is constructed independently to evaluate the performance of each deep learning model for the difficult problem of large-scale ligand virtual screening. Finally, the existing challenges and future directions in the field of virtual screening are presented.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Runhua Zhang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yaoyao Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Guozeng Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Zhiming Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| |
Collapse
|
7
|
Tian T, Li S, Zhang Z, Chen L, Zou Z, Zhao D, Zeng J. Benchmarking compound activity prediction for real-world drug discovery applications. Commun Chem 2024; 7:127. [PMID: 38834746 DOI: 10.1038/s42004-024-01204-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 05/16/2024] [Indexed: 06/06/2024] Open
Abstract
Identifying active compounds for target proteins is fundamental in early drug discovery. Recently, data-driven computational methods have demonstrated promising potential in predicting compound activities. However, there lacks a well-designed benchmark to comprehensively evaluate these methods from a practical perspective. To fill this gap, we propose a Compound Activity benchmark for Real-world Applications (CARA). Through carefully distinguishing assay types, designing train-test splitting schemes and selecting evaluation metrics, CARA can consider the biased distribution of current real-world compound activity data and avoid overestimation of model performances. We observed that although current models can make successful predictions for certain proportions of assays, their performances varied across different assays. In addition, evaluation of several few-shot training strategies demonstrated different performances related to task types. Overall, we provide a high-quality dataset for developing and evaluating compound activity prediction models, and the analyses in this work may inspire better applications of data-driven models in drug discovery.
Collapse
Affiliation(s)
- Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Ziting Zhang
- Department of Automation, Tsinghua University, Beijing, China
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing, China
| | - Lin Chen
- Silexon AI Technology Co., Ltd., Nanjing, Jiangsu Province, China
| | - Ziheng Zou
- Silexon AI Technology Co., Ltd., Nanjing, Jiangsu Province, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
- School of Engineering, Westlake University, Hangzhou, Zhejiang Province, China.
| |
Collapse
|
8
|
Yang R, Zhang L, Bu F, Sun F, Cheng B. AI-based prediction of protein-ligand binding affinity and discovery of potential natural product inhibitors against ERK2. BMC Chem 2024; 18:108. [PMID: 38831341 PMCID: PMC11145815 DOI: 10.1186/s13065-024-01219-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Accepted: 05/29/2024] [Indexed: 06/05/2024] Open
Abstract
Determination of protein-ligand binding affinity (PLA) is a key technological tool in hit discovery and lead optimization, which is critical to the drug development process. PLA can be determined directly by experimental methods, but it is time-consuming and costly. In recent years, deep learning has been widely applied to PLA prediction, the key of which lies in the comprehensive and accurate representation of proteins and ligands. In this study, we proposed a multi-modal deep learning model based on the early fusion strategy, called DeepLIP, to improve PLA prediction by integrating multi-level information, and further used it for virtual screening of extracellular signal-regulated protein kinase 2 (ERK2), an ideal target for cancer treatment. Experimental results from model evaluation showed that DeepLIP achieved superior performance compared to state-of-the-art methods on the widely used benchmark dataset. In addition, by combining previously developed machine learning models and molecular dynamics simulation, we screened three novel hits from a drug-like natural product library. These compounds not only had favorable physicochemical properties, but also bound stably to the target protein. We believe they have the potential to serve as starting molecules for the development of ERK2 inhibitors.
Collapse
Affiliation(s)
- Ruoqi Yang
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, 250011, China.
- Shandong University of Traditional Chinese Medicine, Jinan, 250355, China.
| | - Lili Zhang
- Jinan Central Hospital Affiliated to Shandong First Medical University, Jinan, 250013, China
| | - Fanyou Bu
- Qingdao Municipal Hospital Group, Qingdao, 266000, China
| | - Fuqiang Sun
- Shandong University of Traditional Chinese Medicine, Jinan, 250355, China
| | - Bin Cheng
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, 250011, China.
| |
Collapse
|
9
|
Ma J, Zhao Z, Li T, Liu Y, Ma J, Zhang R. GraphsformerCPI: Graph Transformer for Compound-Protein Interaction Prediction. Interdiscip Sci 2024; 16:361-377. [PMID: 38457109 DOI: 10.1007/s12539-024-00609-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 01/01/2024] [Accepted: 01/08/2024] [Indexed: 03/09/2024]
Abstract
Accurately predicting compound-protein interactions (CPI) is a critical task in computer-aided drug design. In recent years, the exponential growth of compound activity and biomedical data has highlighted the need for efficient and interpretable prediction approaches. In this study, we propose GraphsformerCPI, an end-to-end deep learning framework that improves prediction performance and interpretability. GraphsformerCPI treats compounds and proteins as sequences of nodes with spatial structures, and leverages novel structure-enhanced self-attention mechanisms to integrate semantic and graph structural features within molecules for deep molecule representations. To capture the vital association between compound atoms and protein residues, we devise a dual-attention mechanism to effectively extract relational features through .cross-mapping. By extending the powerful learning capabilities of Transformers to spatial structures and extensively utilizing attention mechanisms, our model offers strong interpretability, a significant advantage over most black-box deep learning methods. To evaluate GraphsformerCPI, extensive experiments were conducted on benchmark datasets including human, C. elegans, Davis and KIBA datasets. We explored the impact of model depth and dropout rate on performance and compared our model against state-of-the-art baseline models. Our results demonstrate that GraphsformerCPI outperforms baseline models in classification datasets and achieves competitive performance in regression datasets. Specifically, on the human dataset, GraphsformerCPI achieves an average improvement of 1.6% in AUC, 0.5% in precision, and 5.3% in recall. On the KIBA dataset, the average improvement in Concordance index (CI) and mean squared error (MSE) is 3.3% and 7.2%, respectively. Molecular docking shows that our model provides novel insights into the intrinsic interactions and binding mechanisms. Our research holds practical significance in effectively predicting CPIs and binding affinities, identifying key atoms and residues, enhancing model interpretability.
Collapse
Affiliation(s)
- Jun Ma
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China.
- School of Information Engineering, Lanzhou University of Finance and Economics, Lanzhou, 730020, China.
| | - Zhili Zhao
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
| | - Tongfeng Li
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
- Computer College, Qinghai Normal University, Xi'ning, 810016, China
| | - Yunwu Liu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
| | - Jun Ma
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
| | - Ruisheng Zhang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China.
| |
Collapse
|
10
|
Feng BM, Zhang YY, Zhou XC, Wang JL, Feng YF. MolLoG: A Molecular Level Interpretability Model Bridging Local to Global for Predicting Drug Target Interactions. J Chem Inf Model 2024; 64:4348-4358. [PMID: 38709146 DOI: 10.1021/acs.jcim.4c00171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
Developing new pharmaceuticals is a costly and time-consuming endeavor fraught with significant safety risks. A critical aspect of drug research and disease therapy is discerning the existence of interactions between drugs and proteins. The evolution of deep learning (DL) in computer science has been remarkably aided in this regard in recent years. Yet, two challenges remain: (i) balancing the extraction of profound, local cohesive characteristics while warding off gradient disappearance and (ii) globally representing and understanding the interactions between the drug and target local attributes, which is vital for delivering molecular level insights indispensable to drug development. In response to these challenges, we propose a DL network structure, MolLoG, primarily comprising two modules: local feature encoders (LFE) and global interactive learning (GIL). Within the LFE module, graph convolution networks and leap blocks capture the local features of drug and protein molecules, respectively. The GIL module enables the efficient amalgamation of feature information, facilitating the global learning of feature structural semantics and procuring multihead attention weights for abstract features stemming from two modalities, providing biologically pertinent explanations for black-box results. Finally, predictive outcomes are achieved by decoding the unified representation via a multilayer perceptron. Our experimental analysis reveals that MolLoG outperforms several cutting-edge baselines across four data sets, delivering superior overall performance and providing satisfactory results when elucidating various facets of drug-target interaction predictions.
Collapse
Affiliation(s)
- Bao-Ming Feng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Yuan-Yuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Xiao-Chen Zhou
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Jin-Long Wang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Yin-Fei Feng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| |
Collapse
|
11
|
Wang X, Quinn D, Moody TS, Huang M. ALDELE: All-Purpose Deep Learning Toolkits for Predicting the Biocatalytic Activities of Enzymes. J Chem Inf Model 2024; 64:3123-3139. [PMID: 38573056 PMCID: PMC11040732 DOI: 10.1021/acs.jcim.4c00058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 02/15/2024] [Accepted: 03/11/2024] [Indexed: 04/05/2024]
Abstract
Rapidly predicting enzyme properties for catalyzing specific substrates is essential for identifying potential enzymes for industrial transformations. The demand for sustainable production of valuable industry chemicals utilizing biological resources raised a pressing need to speed up biocatalyst screening using machine learning techniques. In this research, we developed an all-purpose deep-learning-based multiple-toolkit (ALDELE) workflow for screening enzyme catalysts. ALDELE incorporates both structural and sequence representations of proteins, alongside representations of ligands by subgraphs and overall physicochemical properties. Comprehensive evaluation demonstrated that ALDELE can predict the catalytic activities of enzymes, and particularly, it identifies residue-based hotspots to guide enzyme engineering and generates substrate heat maps to explore the substrate scope for a given biocatalyst. Moreover, our models notably match empirical data, reinforcing the practicality and reliability of our approach through the alignment with confirmed mutation sites. ALDELE offers a facile and comprehensive solution by integrating different toolkits tailored for different purposes at affordable computational cost and therefore would be valuable to speed up the discovery of new functional enzymes for their exploitation by the industry.
Collapse
Affiliation(s)
- Xiangwen Wang
- School
of Chemistry and Chemical Engineering, Queen’s
University Belfast, Belfast BT9 5AG, Northern Ireland, U.K.
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, Craigavon BT63 5QD, Northern Ireland, U.K.
| | - Derek Quinn
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, Craigavon BT63 5QD, Northern Ireland, U.K.
| | - Thomas S. Moody
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, Craigavon BT63 5QD, Northern Ireland, U.K.
- Arran
Chemical Company Limited, Unit 1 Monksland Industrial Estate, Athlone,
Co., Roscommon N37 DN24, Ireland
| | - Meilan Huang
- School
of Chemistry and Chemical Engineering, Queen’s
University Belfast, Belfast BT9 5AG, Northern Ireland, U.K.
| |
Collapse
|
12
|
Yao S, Song J, Jia L, Cheng L, Zhong Z, Song M, Feng Z. Fast and effective molecular property prediction with transferability map. Commun Chem 2024; 7:85. [PMID: 38632308 PMCID: PMC11024153 DOI: 10.1038/s42004-024-01169-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 04/05/2024] [Indexed: 04/19/2024] Open
Abstract
Effective transfer learning for molecular property prediction has shown considerable strength in addressing insufficient labeled molecules. Many existing methods either disregard the quantitative relationship between source and target properties, risking negative transfer, or require intensive training on target tasks. To quantify transferability concerning task-relatedness, we propose Principal Gradient-based Measurement (PGM) for transferring molecular property prediction ability. First, we design an optimization-free scheme to calculate a principal gradient for approximating the direction of model optimization on a molecular property prediction dataset. We have analyzed the close connection between the principal gradient and model optimization through mathematical proof. PGM measures the transferability as the distance between the principal gradient obtained from the source dataset and that derived from the target dataset. Then, we perform PGM on various molecular property prediction datasets to build a quantitative transferability map for source dataset selection. Finally, we evaluate PGM on multiple combinations of transfer learning tasks across 12 benchmark molecular property prediction datasets and demonstrate that it can serve as fast and effective guidance to improve the performance of a target task. This work contributes to more efficient discovery of drugs, materials, and catalysts by offering a task-relatedness quantification prior to transfer learning and understanding the relationship between chemical properties.
Collapse
Affiliation(s)
- Shaolun Yao
- Collaborative Innovation Center of Artificial Intelligence by MOE and Zhejiang Provincial Government, Zhejiang University, 310027, Hangzhou, China
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China
| | - Jie Song
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China
- School of Software Technology, Zhejiang University, 315048, Ningbo, China
| | - Lingxiang Jia
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
| | - Lechao Cheng
- School of Computer Science and Information Engineering, Hefei University of Technology, 230009, Hefei, China
| | - Zipeng Zhong
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
| | - Mingli Song
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China
| | - Zunlei Feng
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China.
- School of Software Technology, Zhejiang University, 315048, Ningbo, China.
| |
Collapse
|
13
|
Tan LH, Kwoh CK, Mu Y. RmsdXNA: RMSD prediction of nucleic acid-ligand docking poses using machine-learning method. Brief Bioinform 2024; 25:bbae166. [PMID: 38695120 PMCID: PMC11063749 DOI: 10.1093/bib/bbae166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/15/2024] [Accepted: 03/19/2024] [Indexed: 05/04/2024] Open
Abstract
Small molecule drugs can be used to target nucleic acids (NA) to regulate biological processes. Computational modeling methods, such as molecular docking or scoring functions, are commonly employed to facilitate drug design. However, the accuracy of the scoring function in predicting the closest-to-native docking pose is often suboptimal. To overcome this problem, a machine learning model, RmsdXNA, was developed to predict the root-mean-square-deviation (RMSD) of ligand docking poses in NA complexes. The versatility of RmsdXNA has been demonstrated by its successful application to various complexes involving different types of NA receptors and ligands, including metal complexes and short peptides. The predicted RMSD by RmsdXNA was strongly correlated with the actual RMSD of the docked poses. RmsdXNA also outperformed the rDock scoring function in ranking and identifying closest-to-native docking poses across different structural groups and on the testing dataset. Using experimental validated results conducted on polyadenylated nuclear element for nuclear expression triplex, RmsdXNA demonstrated better screening power for the RNA-small molecule complex compared to rDock. Molecular dynamics simulations were subsequently employed to validate the binding of top-scoring ligand candidates selected by RmsdXNA and rDock on MALAT1. The results showed that RmsdXNA has a higher success rate in identifying promising ligands that can bind well to the receptor. The development of an accurate docking score for a NA-ligand complex can aid in drug discovery and development advancements. The code to use RmsdXNA is available at the GitHub repository https://github.com/laiheng001/RmsdXNA.
Collapse
Affiliation(s)
- Lai Heng Tan
- Interdisciplinary Graduate School, Nanyang Technological University, 61 Nanyang Drive, 637335 Singapore, Singapore
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798 Singapore, Singapore
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, 637551 Singapore, Singapore
| |
Collapse
|
14
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
15
|
Shi J, Walsh D, Zou W, Rebello NJ, Deagen ME, Fransen KA, Gao X, Olsen BD, Audus DJ. Calculating Pairwise Similarity of Polymer Ensembles via Earth Mover's Distance. ACS POLYMERS AU 2024; 4:66-76. [PMID: 38371731 PMCID: PMC10870752 DOI: 10.1021/acspolymersau.3c00029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/28/2023] [Accepted: 11/29/2023] [Indexed: 02/20/2024]
Abstract
Synthetic polymers, in contrast to small molecules and deterministic biomacromolecules, are typically ensembles composed of polymer chains with varying numbers, lengths, sequences, chemistry, and topologies. While numerous approaches exist for measuring pairwise similarity among small molecules and sequence-defined biomacromolecules, accurately determining the pairwise similarity between two polymer ensembles remains challenging. This work proposes the earth mover's distance (EMD) metric to calculate the pairwise similarity score between two polymer ensembles. EMD offers a greater resolution of chemical differences between polymer ensembles than the averaging method and provides a quantitative numeric value representing the pairwise similarity between polymer ensembles in alignment with chemical intuition. The EMD approach for assessing polymer similarity enhances the development of accurate chemical search algorithms within polymer databases and can improve machine learning techniques for polymer design, optimization, and property prediction.
Collapse
Affiliation(s)
- Jiale Shi
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Dylan Walsh
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Weizhong Zou
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Nathan J. Rebello
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael E. Deagen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Katharina A. Fransen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Xian Gao
- Department
of Chemical and Biomolecular Engineering, University of Notre Dame, Notre
Dame, Indiana 46556, United States
| | - Bradley D. Olsen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Debra J. Audus
- Materials
Science and Engineering Division, National
Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
16
|
Qu X, Du G, Hu J, Cai Y. Graph-DTI: A New Model for Drug-target Interaction Prediction Based on Heterogenous Network Graph Embedding. Curr Comput Aided Drug Des 2024; 20:1013-1024. [PMID: 37448360 DOI: 10.2174/1573409919666230713142255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 05/04/2023] [Accepted: 05/26/2023] [Indexed: 07/15/2023]
Abstract
BACKGROUND In this study, we aimed to develop a new end-to-end learning model called Graph-Drug-Target Interaction (DTI), which integrates various types of information in the heterogeneous network data, and to explore automatic learning of the topology-maintaining representations of drugs and targets, thereby effectively contributing to the prediction of DTI. Precise predictions of DTI can guide drug discovery and development. Most machine learning algorithms integrate multiple data sources and combine them with common embedding methods. However, the relationship between the drugs and target proteins is not well reported. Although some existing studies have used heterogeneous network graphs for DTI prediction, there are many limitations in the neighborhood information between the nodes in the heterogeneous network graphs. We studied the drug-drug interaction (DDI) and DTI from DrugBank Version 3.0, protein-protein interaction (PPI) from the human protein reference database Release 9, drug structure similarity from Morgan fingerprints of radius 2 and calculated by RDKit, and protein sequence similarity from Smith-Waterman score. METHODS Our study consists of three major components. First, various drugs and target proteins were integrated, and a heterogeneous network was established based on a series of data sets. Second, the graph neural networks-inspired graph auto-encoding method was used to extract high-order structural information from the heterogeneous networks, thereby revealing the description of nodes (drugs and proteins) and their topological neighbors. Finally, potential DTI prediction was made, and the obtained samples were sent to the classifier for secondary classification. RESULTS The performance of Graph-DTI and all baseline methods was evaluated using the sums of the area under the precision-recall curve (AUPR) and the area under the receiver operating characteristic curve (AUC). The results indicated that Graph-DTI outperformed the baseline methods in both performance results. CONCLUSION Compared with other baseline DTI prediction methods, the results showed that Graph-DTI had better prediction performance. Additionally, in this study, we effectively classified drugs corresponding to different targets and vice versa. The above findings showed that Graph-DTI provided a powerful tool for drug research, development, and repositioning. Graph- DTI can serve as a drug development and repositioning tool more effectively than previous studies that did not use heterogeneous network graph embedding.
Collapse
Affiliation(s)
- Xiaohan Qu
- School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Guoxia Du
- School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Jing Hu
- School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Yongming Cai
- School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, China
- Guangdong Provincial Traditional Chinese Medicine Precision Medicine Big Data Engineering Technology Research Center, Guangzhou, China
| |
Collapse
|
17
|
Qiu S, Zhao S, Yang A. DLTKcat: deep learning-based prediction of temperature-dependent enzyme turnover rates. Brief Bioinform 2023; 25:bbad506. [PMID: 38189538 PMCID: PMC10772988 DOI: 10.1093/bib/bbad506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 11/29/2023] [Accepted: 12/08/2023] [Indexed: 01/09/2024] Open
Abstract
The enzyme turnover rate, ${k}_{cat}$, quantifies enzyme kinetics by indicating the maximum efficiency of enzyme catalysis. Despite its importance, ${k}_{cat}$ values remain scarce in databases for most organisms, primarily because of the cost of experimental measurements. To predict ${k}_{cat}$ and account for its strong temperature dependence, DLTKcat was developed in this study and demonstrated superior performance (log10-scale root mean squared error = 0.88, R-squared = 0.66) than previously published models. Through two case studies, DLTKcat showed its ability to predict the effects of protein sequence mutations and temperature changes on ${k}_{cat}$ values. Although its quantitative accuracy is not high enough yet to model the responses of cellular metabolism to temperature changes, DLTKcat has the potential to eventually become a computational tool to describe the temperature dependence of biological systems.
Collapse
Affiliation(s)
- Sizhe Qiu
- Department of Engineering Science, University of Oxford, OX1 3PJ, United Kingdom
| | - Simiao Zhao
- Radcliffe Department of Medicine, University of Oxford, OX3 9DU, United Kingdom
| | - Aidong Yang
- Department of Engineering Science, University of Oxford, OX1 3PJ, United Kingdom
| |
Collapse
|
18
|
Barsbey M, ÖZçelİk R, Bağ A, Atil B, ÖZgür A, Ozkirimli E. A Computational Software for Training Robust Drug-Target Affinity Prediction Models: pydebiaseddta. J Comput Biol 2023; 30:1240-1245. [PMID: 37988394 DOI: 10.1089/cmb.2023.0194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023] Open
Abstract
Robust generalization of drug-target affinity (DTA) prediction models is a notoriously difficult problem in computational drug discovery. In this article, we present pydebiaseddta: a computational software for improving the generalizability of DTA prediction models to novel ligands and/or proteins. pydebiaseddta serves as the practical implementation of the DebiasedDTA training framework, which advocates modifying the training distribution to mitigate the effect of spurious correlations in the training data set that leads to substantially degraded performance for novel ligands and proteins. Written in Python programming language, pydebiaseddta combines a user-friendly streamlined interface with a feature-rich and highly modifiable architecture. With this article we introduce our software, showcase its main functionalities, and describe practical ways for new users to engage with it.
Collapse
Affiliation(s)
- Melİh Barsbey
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Riza ÖZçelİk
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Alperen Bağ
- Technical University of Munich, Munich, Germany
| | - Berk Atil
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Arzucan ÖZgür
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey
| | - Elif Ozkirimli
- Roche Informatics, F. Hoffmann-La Roche AG, Basel, Switzerland
| |
Collapse
|
19
|
Shan W, Chen L, Xu H, Zhong Q, Xu Y, Yao H, Lin K, Li X. GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47. Front Chem 2023; 11:1292869. [PMID: 37927570 PMCID: PMC10623438 DOI: 10.3389/fchem.2023.1292869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 10/09/2023] [Indexed: 11/07/2023] Open
Abstract
Identifying compound-protein interaction plays a vital role in drug discovery. Artificial intelligence (AI), especially machine learning (ML) and deep learning (DL) algorithms, are playing increasingly important roles in compound-protein interaction (CPI) prediction. However, ML relies on learning from large sample data. And the CPI for specific target often has a small amount of data available. To overcome the dilemma, we propose a virtual screening model, in which word2vec is used as an embedding tool to generate low-dimensional vectors of SMILES of compounds and amino acid sequences of proteins, and the modified multi-grained cascade forest based gcForest is used as the classifier. This proposed method is capable of constructing a model from raw data, adjusting model complexity according to the scale of datasets, especially for small scale datasets, and is robust with few hyper-parameters and without over-fitting. We found that the proposed model is superior to other CPI prediction models and performs well on the constructed challenging dataset. We finally predicted 2 new inhibitors for clusters of differentiation 47(CD47) which has few known inhibitors. The IC50s of enzyme activities of these 2 new small molecular inhibitors targeting CD47-SIRPα interaction are 3.57 and 4.79 μM respectively. These results fully demonstrate the competence of this concise but efficient tool for CPI prediction.
Collapse
Affiliation(s)
- Wenying Shan
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
- Faculty of Health Sciences, University of Macau, Macau, China
| | - Lvqi Chen
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Hao Xu
- Institute of Chemical Industry of Forest Products, Chinese Academy of Forestry, Nanjing, China
- National Engineering Laboratory for Biomass Chemical Utilization, Nanjing, China
| | - Qinghao Zhong
- School of Humanities and Social Sciences, The Chinese University of Hong Kong, Shenzhen, China
| | - Yinqiu Xu
- Department of Pharmacy, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing, China
| | - Hequan Yao
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Kejiang Lin
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Xuanyi Li
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| |
Collapse
|
20
|
Liu C, Kutchukian P, Nguyen ND, AlQuraishi M, Sorger PK. A Hybrid Structure-Based Machine Learning Approach for Predicting Kinase Inhibition by Small Molecules. J Chem Inf Model 2023; 63:5457-5472. [PMID: 37595065 PMCID: PMC10498990 DOI: 10.1021/acs.jcim.3c00347] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Indexed: 08/20/2023]
Abstract
Kinases have been the focus of drug discovery programs for three decades leading to over 70 therapeutic kinase inhibitors and biophysical affinity measurements for over 130,000 kinase-compound pairs. Nonetheless, the precise target spectrum for many kinases remains only partly understood. In this study, we describe a computational approach to unlocking qualitative and quantitative kinome-wide binding measurements for structure-based machine learning. Our study has three components: (i) a Kinase Inhibitor Complex (KinCo) data set comprising in silico predicted kinase structures paired with experimental binding constants, (ii) a machine learning loss function that integrates qualitative and quantitative data for model training, and (iii) a structure-based machine learning model trained on KinCo. We show that our approach outperforms methods trained on crystal structures alone in predicting binary and quantitative kinase-compound interaction affinities; relative to structure-free methods, our approach also captures known kinase biochemistry and more successfully generalizes to distant kinase sequences and compound scaffolds.
Collapse
Affiliation(s)
- Changchang Liu
- Laboratory
of Systems Pharmacology, Department of Systems Biology, Harvard Program
in Therapeutic Science, Harvard Medical
School, Boston, Massachusetts 02115, United States
| | - Peter Kutchukian
- Novartis
Institutes for Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Nhan D. Nguyen
- Pritzker
School of Molecular Engineering, University
of Chicago, Chicago, Illinois 60637, United
States
| | - Mohammed AlQuraishi
- Department
of Systems Biology, Columbia University, New York, New York 10032, United States
| | - Peter K. Sorger
- Laboratory
of Systems Pharmacology, Department of Systems Biology, Harvard Program
in Therapeutic Science, Harvard Medical
School, Boston, Massachusetts 02115, United States
| |
Collapse
|
21
|
Yi J, Lee S, Lim S, Cho C, Piao Y, Yeo M, Kim D, Kim S, Lee S. Exploring chemical space for lead identification by propagating on chemical similarity network. Comput Struct Biotechnol J 2023; 21:4187-4195. [PMID: 37680266 PMCID: PMC10480321 DOI: 10.1016/j.csbj.2023.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/08/2023] [Accepted: 08/20/2023] [Indexed: 09/09/2023] Open
Abstract
Motivation Lead identification is a fundamental step to prioritize candidate compounds for downstream drug discovery process. Machine learning (ML) and deep learning (DL) approaches are widely used to identify lead compounds using both chemical property and experimental information. However, ML or DL methods rarely consider compound similarity information directly since ML and DL models use abstract representation of molecules for model construction. Alternatively, data mining approaches are also used to explore chemical space with drug candidates by screening undesirable compounds. A major challenge for data mining approaches is to develop efficient data mining methods that search large chemical space for desirable lead compounds with low false positive rate. Results In this work, we developed a network propagation (NP) based data mining method for lead identification that performs search on an ensemble of chemical similarity networks. We compiled 14 fingerprint-based similarity networks. Given a target protein of interest, we use a deep learning-based drug target interaction model to narrow down compound candidates and then we use network propagation to prioritize drug candidates that are highly correlated with drug activity score such as IC50. In an extensive experiment with BindingDB, we showed that our approach successfully discovered intentionally unlabeled compounds for given targets. To further demonstrate the prediction power of our approach, we identified 24 candidate leads for CLK1. Two out of five synthesizable candidates were experimentally validated in binding assays. In conclusion, our framework can be very useful for lead identification from very large compound databases such as ZINC.
Collapse
Affiliation(s)
- Jungseob Yi
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Sangseon Lee
- Institute of Computer Technology, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Sangsoo Lim
- School of AI Software Convergence, Dongguk University, Pildong-ro 1-gil, Jung-gu, Seoul, South Korea
| | - Changyun Cho
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Yinhua Piao
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Marie Yeo
- PHARMGENSCIENCE CO., LTD., 216, Dongjak-daero, Seocho-gu, Seoul, 06554, South Korea
| | - Dongkyu Kim
- PHARMGENSCIENCE CO., LTD., 216, Dongjak-daero, Seocho-gu, Seoul, 06554, South Korea
| | - Sun Kim
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
- AIGENDRUG CO., LTD., Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Sunho Lee
- AIGENDRUG CO., LTD., Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| |
Collapse
|
22
|
Sinha K, Ghosh N, Sil PC. A Review on the Recent Applications of Deep Learning in Predictive Drug Toxicological Studies. Chem Res Toxicol 2023; 36:1174-1205. [PMID: 37561655 DOI: 10.1021/acs.chemrestox.2c00375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
Drug toxicity prediction is an important step in ensuring patient safety during drug design studies. While traditional preclinical studies have historically relied on animal models to evaluate toxicity, recent advances in deep-learning approaches have shown great promise in advancing drug safety science and reducing animal use in preclinical studies. However, deep-learning-based approaches also face challenges in handling large biological data sets, model interpretability, and regulatory acceptance. In this review, we provide an overview of recent developments in deep-learning-based approaches for predicting drug toxicity, highlighting their potential advantages over traditional methods and the need to address their limitations. Deep-learning models have demonstrated excellent performance in predicting toxicity outcomes from various data sources such as chemical structures, genomic data, and high-throughput screening assays. The potential of deep learning for automated feature engineering is also discussed. This review emphasizes the need to address ethical concerns related to the use of deep learning in drug toxicity studies, including the reduction of animal use and ensuring regulatory acceptance. Furthermore, emerging applications of deep learning in drug toxicity prediction, such as predicting drug-drug interactions and toxicity in rare subpopulations, are highlighted. The integration of deep-learning-based approaches with traditional methods is discussed as a way to develop more reliable and efficient predictive models for drug safety assessment, paving the way for safer and more effective drug discovery and development. Overall, this review highlights the critical role of deep learning in predictive toxicology and drug safety evaluation, emphasizing the need for continued research and development in this rapidly evolving field. By addressing the limitations of traditional methods, leveraging the potential of deep learning for automated feature engineering, and addressing ethical concerns, deep-learning-based approaches have the potential to revolutionize drug toxicity prediction and improve patient safety in drug discovery and development.
Collapse
Affiliation(s)
- Krishnendu Sinha
- Department of Zoology, Jhargram Raj College, Jhargram 721507, West Bengal, India
| | - Nabanita Ghosh
- Department of Zoology, Maulana Azad College, Kolkata 700013, West Bengal, India
| | - Parames C Sil
- Division of Molecular Medicine, Bose Institute, Kolkata 700054, West Bengal, India
| |
Collapse
|
23
|
Li S, Tian T, Zhang Z, Zou Z, Zhao D, Zeng J. PocketAnchor: Learning structure-based pocket representations for protein-ligand interaction prediction. Cell Syst 2023; 14:692-705.e6. [PMID: 37516103 DOI: 10.1016/j.cels.2023.05.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 11/25/2022] [Accepted: 05/19/2023] [Indexed: 07/31/2023]
Abstract
Protein-ligand interactions are essential for cellular activities and drug discovery processes. Appropriately and effectively representing protein features is of vital importance for developing computational approaches, especially data-driven methods, for predicting protein-ligand interactions. However, existing approaches may not fully investigate the features of the ligand-occupying regions in the protein pockets. Here, we design a structure-based protein representation method, named PocketAnchor, for capturing the local environmental and spatial features of protein pockets to facilitate protein-ligand interaction-related learning tasks. We define "anchors" as probe points reaching into the cavities and those located near the surface of proteins, and we design a specific message passing strategy for gathering local information from the atoms and surface neighboring these anchors. Comprehensive evaluation of our method demonstrated its successful applications in pocket detection and binding affinity prediction, which indicated that our anchor-based approach can provide effective protein feature representations for improving the prediction of protein-ligand interactions.
Collapse
Affiliation(s)
- Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Ziting Zhang
- Department of Automation, Tsinghua University, Beijing 100084, China; MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - Ziheng Zou
- Silexon AI Technology, Nanjing, Jiangsu Province 210023, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
24
|
Oršolić D, Šmuc T. Dynamic applicability domain (dAD): compound-target binding affinity estimates with local conformal prediction. Bioinformatics 2023; 39:btad465. [PMID: 37594752 PMCID: PMC10457664 DOI: 10.1093/bioinformatics/btad465] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 04/26/2023] [Accepted: 08/17/2023] [Indexed: 08/19/2023] Open
Abstract
MOTIVATION Increasing efforts are being made in the field of machine learning to advance the learning of robust and accurate models from experimentally measured data and enable more efficient drug discovery processes. The prediction of binding affinity is one of the most frequent tasks of compound bioactivity modelling. Learned models for binding affinity prediction are assessed by their average performance on unseen samples, but point predictions are typically not provided with a rigorous confidence assessment. Approaches, such as the conformal predictor framework equip conventional models with a more rigorous assessment of confidence for individual point predictions. In this article, we extend the inductive conformal prediction framework for interaction data, in particular the compound-target binding affinity prediction task. The new framework is based on dynamically defined calibration sets that are specific for each testing pair and provides prediction assessment in the context of calibration pairs from its compound-target neighbourhood, enabling improved estimates based on the local properties of the prediction model. RESULTS The effectiveness of the approach is benchmarked on several publicly available datasets and tested in realistic use-case scenarios with increasing levels of difficulty on a complex compound-target binding affinity space. We demonstrate that in such scenarios, novel approach combining applicability domain paradigm with conformal prediction framework, produces superior confidence assessment with valid and more informative prediction regions compared to other 'state-of-the-art' conformal prediction approaches. AVAILABILITY AND IMPLEMENTATION Dataset and the code are available on GitHub (https://github.com/mlkr-rbi/dAD).
Collapse
Affiliation(s)
- Davor Oršolić
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta 54, Zagreb 10000, Croatia
| | - Tomislav Šmuc
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta 54, Zagreb 10000, Croatia
| |
Collapse
|
25
|
Saldinger JC, Raymond M, Elvati P, Violi A. Domain-agnostic predictions of nanoscale interactions in proteins and nanoparticles. NATURE COMPUTATIONAL SCIENCE 2023; 3:393-402. [PMID: 38177838 DOI: 10.1038/s43588-023-00438-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 03/24/2023] [Indexed: 01/06/2024]
Abstract
Although challenging, the accurate and rapid prediction of nanoscale interactions has broad applications for numerous biological processes and material properties. While several models have been developed to predict the interaction of specific biological components, they use system-specific information that hinders their application to more general materials. Here we present NeCLAS, a general and efficient machine learning pipeline that predicts the location of nanoscale interactions, providing human-intelligible predictions. NeCLAS outperforms current nanoscale prediction models for generic nanoparticles up to 10-20 nm, reproducing interactions for biological and non-biological systems. Two aspects contribute to these results: a low-dimensional representation of nanoparticles and molecules (to reduce the effect of data uncertainty), and environmental features (to encode the physicochemical neighborhood at multiple scales). This framework has several applications, from basic research to rapid prototyping and design in nanobiotechnology.
Collapse
Affiliation(s)
| | - Matt Raymond
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA
| | - Paolo Elvati
- Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Angela Violi
- Chemical Engineering, University of Michigan, Ann Arbor, MI, USA.
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA.
- Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA.
- Biophysics Program, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
26
|
Zhang T, Bi Y, Zhu X, Gao X. Identification and Classification of Small Sample Desert Grassland Vegetation Communities Based on Dynamic Graph Convolution and UAV Hyperspectral Imagery. SENSORS (BASEL, SWITZERLAND) 2023; 23:2856. [PMID: 36905067 PMCID: PMC10006976 DOI: 10.3390/s23052856] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 02/28/2023] [Accepted: 03/03/2023] [Indexed: 06/18/2023]
Abstract
Desert steppes are the last barrier to protecting the steppe ecosystem. However, existing grassland monitoring methods still mainly use traditional monitoring methods, which have certain limitations in the monitoring process. Additionally, the existing deep learning classification models of desert and grassland still use traditional convolutional neural networks for classification, which cannot adapt to the classification task of irregular ground objects, which limits the classification performance of the model. To address the above problems, this paper uses a UAV hyperspectral remote sensing platform for data acquisition and proposes a spatial neighborhood dynamic graph convolution network (SN_DGCN) for degraded grassland vegetation community classification. The results show that the proposed classification model had the highest classification accuracy compared to the seven classification models of MLP, 1DCNN, 2DCNN, 3DCNN, Resnet18, Densenet121, and SN_GCN; its OA, AA, and kappa were 97.13%, 96.50%, and 96.05% in the case of only 10 samples per class of features, respectively; The classification performance was stable under different numbers of training samples, had better generalization ability in the classification task of small samples, and was more effective for the classification task of irregular features. Meanwhile, the latest desert grassland classification models were also compared, which fully demonstrated the superior classification performance of the proposed model in this paper. The proposed model provides a new method for the classification of vegetation communities in desert grasslands, which is helpful for the management and restoration of desert steppes.
Collapse
|
27
|
Kour S, Biswas I, Sheoran S, Arora S, Sheela P, Duppala SK, Murthy DK, Pawar SC, Singh H, Kumar D, Prabhu D, Vuree S, Kumar R. Artificial intelligence and nanotechnology for cervical cancer treatment: Current status and future perspectives. J Drug Deliv Sci Technol 2023. [DOI: 10.1016/j.jddst.2023.104392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
|
28
|
Campana P, Nikoloski Z. Self- and cross-attention accurately predicts metabolite-protein interactions. NAR Genom Bioinform 2023; 5:lqad008. [PMID: 36733400 PMCID: PMC9887643 DOI: 10.1093/nargab/lqad008] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 12/20/2022] [Accepted: 01/17/2023] [Indexed: 02/04/2023] Open
Abstract
Metabolites regulate activity of proteins and thereby affect cellular processes in all organisms. Despite extensive efforts to catalogue the metabolite-protein interactome in different organisms by employing experimental and computational approaches, the coverage of such interactions remains fragmented, particularly for eukaryotes. Here, we make use of two most comprehensive collections, BioSnap and STITCH, of metabolite-protein interactions from seven eukaryotes as gold standards to train a deep learning model that relies on self- and cross-attention over protein sequences. This innovative protein-centric approach results in interaction-specific features derived from protein sequence alone. In addition, we designed and assessed a first double-blind evaluation protocol for metabolite-protein interactions, demonstrating the generalizability of the model. Our results indicated that the excellent performance of the proposed model over simpler alternatives and randomized baselines is due to the local and global features generated by the attention mechanisms. As a results, the predictions from the deep learning model provide a valuable resource for studying metabolite-protein interactions in eukaryotes.
Collapse
Affiliation(s)
- Pedro Alonso Campana
- Machine Learning, Department of Computer Science, University of Potsdam, 14476 Potsdam, Germany
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany
| | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany
- Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam, Germany
| |
Collapse
|
29
|
Zhang Y, Li S, Xing M, Yuan Q, He H, Sun S. Universal Approach to De Novo Drug Design for Target Proteins Using Deep Reinforcement Learning. ACS OMEGA 2023; 8:5464-5474. [PMID: 36816653 PMCID: PMC9933084 DOI: 10.1021/acsomega.2c06653] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 01/05/2023] [Indexed: 05/28/2023]
Abstract
In drug design, the design and manufacture of safe and effective compounds is a long-term, complex, and complicated process. Therefore, developing a new rapid and generalizable drug design method is of great value. This study aimed to propose a general model based on reinforcement learning combined with drug-target interaction, which could be used to design new molecules according to different protein targets. The method adopted recurrent neural network molecular modeling and took the drug-target affinity model as the reward function of optimal molecular generation. It did not need to know the three-dimensional structure and active sites of protein targets but only required the information of a one-dimensional amino acid sequence. This approach was demonstrated to produce drugs highly similar to marketed drugs and design molecules with a better binding energy.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing
Key Laboratory for Green Catalysis and Separation, The Faculty of
Environment and Life, Beijing University
of Technology, Beijing100124, PR China
| | - Shuyuan Li
- Beijing
Key Laboratory for Green Catalysis and Separation, The Faculty of
Environment and Life, Beijing University
of Technology, Beijing100124, PR China
| | - Miaojuan Xing
- Beijing
Key Laboratory for Green Catalysis and Separation, The Faculty of
Environment and Life, Beijing University
of Technology, Beijing100124, PR China
| | - Qing Yuan
- Department
of Chemistry and Chemical Engineering, Beijing
University of Technology, Beijing100124, China
| | - Hong He
- Beijing
Key Laboratory for Green Catalysis and Separation, The Faculty of
Environment and Life, Beijing University
of Technology, Beijing100124, PR China
| | - Shaorui Sun
- Beijing
Key Laboratory for Green Catalysis and Separation, The Faculty of
Environment and Life, Beijing University
of Technology, Beijing100124, PR China
| |
Collapse
|
30
|
Bai P, Miljković F, John B, Lu H. Interpretable bilinear attention network with domain adaptation improves drug–target prediction. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-022-00605-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
31
|
Walther D. Specifics of Metabolite-Protein Interactions and Their Computational Analysis and Prediction. Methods Mol Biol 2023; 2554:179-197. [PMID: 36178627 DOI: 10.1007/978-1-0716-2624-5_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Computational approaches to the characterization and prediction of compound-protein interactions have a long research history and are well established, driven primarily by the needs of drug development. While, in principle, many of the computational methods developed in the context of drug development can also be applied directly to the investigation of metabolite-protein interactions, the interactions of metabolites with proteins (enzymes) are characterized by a number of particularities that result from their natural evolutionary origin and their biological and biochemical roles, as well as from a different problem setting when investigating them. In this review, these special aspects will be highlighted and recent research on them and developed computational approaches presented, along with available resources. They concern, among others, binding promiscuity, allostery, the role of posttranslational modifications, molecular steering and crowding effects, and metabolic conversion rate predictions. Recent breakthroughs in the field of protein structure prediction and newly developed machine learning techniques are being discussed as a tremendous opportunity for developing a more detailed molecular understanding of metabolism.
Collapse
Affiliation(s)
- Dirk Walther
- Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany.
| |
Collapse
|
32
|
Protein-ligand binding affinity prediction with edge awareness and supervised attention. iScience 2022; 26:105892. [PMID: 36691617 PMCID: PMC9860494 DOI: 10.1016/j.isci.2022.105892] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 11/12/2022] [Accepted: 12/23/2022] [Indexed: 12/29/2022] Open
Abstract
Accurate prediction of protein-ligand binding affinity is crucial in structure-based drug design but remains some challenges even with recent advances in deep learning: (1) Existing methods neglect the edge information in protein and ligand structure data; (2) current attention mechanisms struggle to capture true binding interactions in the small dataset. Herein, we proposed SEGSA_DTA, a SuperEdge Graph convolution-based and Supervised Attention-based Drug-Target Affinity prediction method, where the super edge graph convolution can comprehensively utilize node and edge information and the multi-supervised attention module can efficiently learn the attention distribution consistent with real protein-ligand interactions. Results on the multiple datasets show that SEGSA_DTA outperforms current state-of-the-art methods. We also applied SEGSA_DTA in repurposing FDA-approved drugs to identify potential coronavirus disease 2019 (COVID-19) treatments. Besides, by using SHapley Additive exPlanations (SHAP), we found that SEGSA_DTA is interpretable and further provides a new quantitative analytical solution for structure-based lead optimization.
Collapse
|
33
|
A deep learning method for predicting molecular properties and compound-protein interactions. J Mol Graph Model 2022; 117:108283. [PMID: 35994925 DOI: 10.1016/j.jmgm.2022.108283] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 07/19/2022] [Accepted: 07/26/2022] [Indexed: 01/14/2023]
Abstract
Predicting molecular properties and compound-protein interactions (CPIs) are two important areas of drug design and discovery. They are also an essential way to discover lead compounds in virtual screening. Recently, in silico methods based on deep learning have demonstrated excellent performance in various challenges. It is imperative to develop efficient computational methods to predict accurately both molecular properties and CPIs in drug research using deep learning techniques. In this paper, we propose a deep learning method applicable to both molecular property prediction and CPI prediction based on the idea that both are generally influenced by chemical structure and sequence information of compounds and proteins. Molecular properties are inferred by integrating the molecular structure and sequence information of compounds, and CPIs are predicted by integrating protein sequence and compound structure. The method combines topological structure and sequence fingerprint information of molecules, extracts adequately raw data features, and generates highly representative features for prediction. Molecular property prediction experiments were conducted on BACE, P53 and hERG datasets, and CPI prediction experiments were conducted on Human, C. elegans and KIBA datasets. MG-S achieves outperformance in molecular property prediction on P53, the differences in AUC, Precision and MCC are 0.030, 0.050 and 0.100, respectively, over the suboptimal baseline model, and provides consistently good results on BACE and hERG.The model also achieves impressive performance in CPI prediction, the differences in AUC, Precision and MCC on KIBA are 0.141, 0.138, 0.090 and 0.082, respectively, compared with the state-of-the-art models. The comprehensive results show that the MG-S model has higher performance, better classification ability, and faster convergence. MG-S will serve as a useful method to predict compound properties and CPIs in the early stages of drug design and discovery.Our code and datasets are available at: https://github.com/happay-ending/cpi_cpp.
Collapse
|
34
|
Yaseen A, Amin I, Akhter N, Ben-Hur A, Minhas F. Insights into performance evaluation of compound-protein interaction prediction methods. Bioinformatics 2022; 38:ii75-ii81. [PMID: 36124806 DOI: 10.1093/bioinformatics/btac496] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Machine-learning-based prediction of compound-protein interactions (CPIs) is important for drug design, screening and repurposing. Despite numerous recent publication with increasing methodological sophistication claiming consistent improvements in predictive accuracy, we have observed a number of fundamental issues in experiment design that produce overoptimistic estimates of model performance. RESULTS We systematically analyze the impact of several factors affecting generalization performance of CPI predictors that are overlooked in existing work: (i) similarity between training and test examples in cross-validation; (ii) synthesizing negative examples in absence of experimentally verified negative examples and (iii) alignment of evaluation protocol and performance metrics with real-world use of CPI predictors in screening large compound libraries. Using both state-of-the-art approaches by other researchers as well as a simple kernel-based baseline, we have found that effective assessment of generalization performance of CPI predictors requires careful control over similarity between training and test examples. We show that, under stringent performance assessment protocols, a simple kernel-based approach can exceed the predictive performance of existing state-of-the-art methods. We also show that random pairing for generating synthetic negative examples for training and performance evaluation results in models with better generalization in comparison to more sophisticated strategies used in existing studies. Our analyses indicate that using proposed experiment design strategies can offer significant improvements for CPI prediction leading to effective target compound screening for drug repurposing and discovery of putative chemical ligands of SARS-CoV-2-Spike and Human-ACE2 proteins. AVAILABILITY AND IMPLEMENTATION Code and supplementary material available at https://github.com/adibayaseen/HKRCPI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Adiba Yaseen
- Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad 45650, Pakistan
| | - Imran Amin
- National Institute for Biotechnology and Genetic Engineering, Faisalabad 38000, Pakistan
| | - Naeem Akhter
- Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad 45650, Pakistan
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA
| | - Fayyaz Minhas
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK
| |
Collapse
|
35
|
Sajjan M, Li J, Selvarajan R, Sureshbabu SH, Kale SS, Gupta R, Singh V, Kais S. Quantum machine learning for chemistry and physics. Chem Soc Rev 2022; 51:6475-6573. [PMID: 35849066 DOI: 10.1039/d2cs00203e] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Machine learning (ML) has emerged as a formidable force for identifying hidden but pertinent patterns within a given data set with the objective of subsequent generation of automated predictive behavior. In recent years, it is safe to conclude that ML and its close cousin, deep learning (DL), have ushered in unprecedented developments in all areas of physical sciences, especially chemistry. Not only classical variants of ML, even those trainable on near-term quantum hardwares have been developed with promising outcomes. Such algorithms have revolutionized materials design and performance of photovoltaics, electronic structure calculations of ground and excited states of correlated matter, computation of force-fields and potential energy surfaces informing chemical reaction dynamics, reactivity inspired rational strategies of drug designing and even classification of phases of matter with accurate identification of emergent criticality. In this review we shall explicate a subset of such topics and delineate the contributions made by both classical and quantum computing enhanced machine learning algorithms over the past few years. We shall not only present a brief overview of the well-known techniques but also highlight their learning strategies using statistical physical insight. The objective of the review is not only to foster exposition of the aforesaid techniques but also to empower and promote cross-pollination among future research in all areas of chemistry which can benefit from ML and in turn can potentially accelerate the growth of such algorithms.
Collapse
Affiliation(s)
- Manas Sajjan
- Department of Chemistry, Purdue University, West Lafayette, IN-47907, USA. .,Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA
| | - Junxu Li
- Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA.,Department of Physics and Astronomy, Purdue University, West Lafayette, IN-47907, USA
| | - Raja Selvarajan
- Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA.,Department of Physics and Astronomy, Purdue University, West Lafayette, IN-47907, USA
| | - Shree Hari Sureshbabu
- Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA.,Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN-47907, USA
| | - Sumit Suresh Kale
- Department of Chemistry, Purdue University, West Lafayette, IN-47907, USA. .,Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA
| | - Rishabh Gupta
- Department of Chemistry, Purdue University, West Lafayette, IN-47907, USA. .,Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA
| | - Vinit Singh
- Department of Chemistry, Purdue University, West Lafayette, IN-47907, USA. .,Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA
| | - Sabre Kais
- Department of Chemistry, Purdue University, West Lafayette, IN-47907, USA. .,Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA.,Department of Physics and Astronomy, Purdue University, West Lafayette, IN-47907, USA.,Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN-47907, USA
| |
Collapse
|
36
|
Dong L, Qu X, Wang B. XLPFE: A Simple and Effective Machine Learning Scoring Function for Protein-Ligand Scoring and Ranking. ACS OMEGA 2022; 7:21727-21735. [PMID: 35785279 PMCID: PMC9245135 DOI: 10.1021/acsomega.2c01723] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 05/30/2022] [Indexed: 06/15/2023]
Abstract
Prediction of protein-ligand binding affinities is a central issue in structure-based computer-aided drug design. In recent years, much effort has been devoted to the prediction of the binding affinity in protein-ligand complexes using machine learning (ML). Due to the remarkable ability of ML methods in nonlinear fitting, ML-based scoring functions (SFs) can deliver much improved performance on a selected test set, such as the comparative assessment of scoring functions (CASF), when compared to the classical SFs. However, the performance of ML-based SFs heavily relies on the overall similarity of the training set and the test set. To improve the performance and transferability of an SF, we have tried to combine various features including energy terms from X-score and AutoDock Vina, the properties of ligands, and the statistical sequence-related information from either the binding site or the full protein. In conjunction with extreme trees (ET), an ML model, we have developed XLPFE, a new SF. Compared with other tested methods such as X-score, AutoDock Vina, ΔvinaXGB, PSH-ML, or CNN-score, XLPFE achieves consistently better scoring and ranking power for various types of protein-ligand complex structures beyond the CASF, suggesting that XLPFE has superior transferability. In particular, XLPFE performs better with metalloenzymes. With its faster speed, improved accuracy, and better transferability, XLPFE could be usefully applied to a diverse range of protein-ligand complexes.
Collapse
Affiliation(s)
- Lina Dong
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| | - Xiaoyang Qu
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| | - Binju Wang
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| |
Collapse
|
37
|
Zheng J, Xiao X, Qiu WR. DTI-BERT: Identifying Drug-Target Interactions in Cellular Networking Based on BERT and Deep Learning Method. Front Genet 2022; 13:859188. [PMID: 35754843 PMCID: PMC9213727 DOI: 10.3389/fgene.2022.859188] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 04/25/2022] [Indexed: 11/20/2022] Open
Abstract
Drug–target interactions (DTIs) are regarded as an essential part of genomic drug discovery, and computational prediction of DTIs can accelerate to find the lead drug for the target, which can make up for the lack of time-consuming and expensive wet-lab techniques. Currently, many computational methods predict DTIs based on sequential composition or physicochemical properties of drug and target, but further efforts are needed to improve them. In this article, we proposed a new sequence-based method for accurately identifying DTIs. For target protein, we explore using pre-trained Bidirectional Encoder Representations from Transformers (BERT) to extract sequence features, which can provide unique and valuable pattern information. For drug molecules, Discrete Wavelet Transform (DWT) is employed to generate information from drug molecular fingerprints. Then we concatenate the feature vectors of the DTIs, and input them into a feature extraction module consisting of a batch-norm layer, rectified linear activation layer and linear layer, called BRL block and a Convolutional Neural Networks module to extract DTIs features further. Subsequently, a BRL block is used as the prediction engine. After optimizing the model based on contrastive loss and cross-entropy loss, it gave prediction accuracies of the target families of G Protein-coupled receptors, ion channels, enzymes, and nuclear receptors up to 90.1, 94.7, 94.9, and 89%, which indicated that the proposed method can outperform the existing predictors. To make it as convenient as possible for researchers, the web server for the new predictor is freely accessible at: https://bioinfo.jcu.edu.cn/dtibert or http://121.36.221.79/dtibert/. The proposed method may also be a potential option for other DITs.
Collapse
Affiliation(s)
- Jie Zheng
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| | - Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| | - Wang-Ren Qiu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| |
Collapse
|
38
|
Protein–Protein Interaction Prediction for Targeted Protein Degradation. Int J Mol Sci 2022; 23:ijms23137033. [PMID: 35806036 PMCID: PMC9266413 DOI: 10.3390/ijms23137033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/17/2022] [Accepted: 06/18/2022] [Indexed: 02/04/2023] Open
Abstract
Protein–protein interactions (PPIs) play a fundamental role in various biological functions; thus, detecting PPI sites is essential for understanding diseases and developing new drugs. PPI prediction is of particular relevance for the development of drugs employing targeted protein degradation, as their efficacy relies on the formation of a stable ternary complex involving two proteins. However, experimental methods to detect PPI sites are both costly and time-intensive. In recent years, machine learning-based methods have been developed as screening tools. While they are computationally more efficient than traditional docking methods and thus allow rapid execution, these tools have so far primarily been based on sequence information, and they are therefore limited in their ability to address spatial requirements. In addition, they have to date not been applied to targeted protein degradation. Here, we present a new deep learning architecture based on the concept of graph representation learning that can predict interaction sites and interactions of proteins based on their surface representations. We demonstrate that our model reaches state-of-the-art performance using AUROC scores on the established MaSIF dataset. We furthermore introduce a new dataset with more diverse protein interactions and show that our model generalizes well to this new data. These generalization capabilities allow our model to predict the PPIs relevant for targeted protein degradation, which we show by demonstrating the high accuracy of our model for PPI prediction on the available ternary complex data. Our results suggest that PPI prediction models can be a valuable tool for screening protein pairs while developing new drugs for targeted protein degradation.
Collapse
|
39
|
Wan X, Wu X, Wang D, Tan X, Liu X, Fu Z, Jiang H, Zheng M, Li X. An inductive graph neural network model for compound-protein interaction prediction based on a homogeneous graph. Brief Bioinform 2022; 23:6547264. [PMID: 35275993 PMCID: PMC9310259 DOI: 10.1093/bib/bbac073] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 02/09/2022] [Accepted: 02/11/2022] [Indexed: 01/10/2023] Open
Abstract
Identifying the potential compound–protein interactions (CPIs) plays an essential role in drug development. The computational approaches for CPI prediction can reduce time and costs of experimental methods and have benefited from the continuously improved graph representation learning. However, most of the network-based methods use heterogeneous graphs, which is challenging due to their complex structures and heterogeneous attributes. Therefore, in this work, we transformed the compound–protein heterogeneous graph to a homogeneous graph by integrating the ligand-based protein representations and overall similarity associations. We then proposed an Inductive Graph AggrEgator-based framework, named CPI-IGAE, for CPI prediction. CPI-IGAE learns the low-dimensional representations of compounds and proteins from the homogeneous graph in an end-to-end manner. The results show that CPI-IGAE performs better than some state-of-the-art methods. Further ablation study and visualization of embeddings reveal the advantages of the model architecture and its role in feature extraction, and some of the top ranked CPIs by CPI-IGAE have been validated by a review of recent literature. The data and source codes are available at https://github.com/wanxiaozhe/CPI-IGAE.
Collapse
Affiliation(s)
- Xiaozhe Wan
- State Key Laboratory of Drug Research, Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Xiaolong Wu
- State Key Laboratory of Drug Research, Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Dingyan Wang
- State Key Laboratory of Drug Research, Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | | | - Xiaohong Liu
- AlphaMa Inc., No. 108, Yuxin Road, Suzhou Industrial Park, Suzhou 215128, China
| | - Zunyun Fu
- State Key Laboratory of Drug Research, Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Hualiang Jiang
- State Key Laboratory of Drug Research, Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China; School of Life Science and Technology, ShanghaiTech University, 393 Huaxiazhong Road, Shanghai 200031, China
| | - Mingyue Zheng
- State Key Laboratory of Drug Research, Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xutong Li
- State Key Laboratory of Drug Research, Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| |
Collapse
|
40
|
Du BX, Qin Y, Jiang YF, Xu Y, Yiu SM, Yu H, Shi JY. Compound–protein interaction prediction by deep learning: Databases, descriptors and models. Drug Discov Today 2022; 27:1350-1366. [DOI: 10.1016/j.drudis.2022.02.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 11/19/2021] [Accepted: 02/28/2022] [Indexed: 11/24/2022]
|
41
|
Dhakal A, McKay C, Tanner JJ, Cheng J. Artificial intelligence in the prediction of protein-ligand interactions: recent advances and future directions. Brief Bioinform 2022; 23:bbab476. [PMID: 34849575 PMCID: PMC8690157 DOI: 10.1093/bib/bbab476] [Citation(s) in RCA: 79] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 09/28/2021] [Accepted: 10/15/2021] [Indexed: 12/13/2022] Open
Abstract
New drug production, from target identification to marketing approval, takes over 12 years and can cost around $2.6 billion. Furthermore, the COVID-19 pandemic has unveiled the urgent need for more powerful computational methods for drug discovery. Here, we review the computational approaches to predicting protein-ligand interactions in the context of drug discovery, focusing on methods using artificial intelligence (AI). We begin with a brief introduction to proteins (targets), ligands (e.g. drugs) and their interactions for nonexperts. Next, we review databases that are commonly used in the domain of protein-ligand interactions. Finally, we survey and analyze the machine learning (ML) approaches implemented to predict protein-ligand binding sites, ligand-binding affinity and binding pose (conformation) including both classical ML algorithms and recent deep learning methods. After exploring the correlation between these three aspects of protein-ligand interaction, it has been proposed that they should be studied in unison. We anticipate that our review will aid exploration and development of more accurate ML-based prediction strategies for studying protein-ligand interactions.
Collapse
Affiliation(s)
- Ashwin Dhakal
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Cole McKay
- Department of Biochemistry, University of Missouri, Columbia, MO, 65211, USA
| | - John J Tanner
- Department of Biochemistry, University of Missouri, Columbia, MO, 65211, USA
- Department of Chemistry, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| |
Collapse
|
42
|
Thieme S, Walther D. Biclique extension as an effective approach to identify missing links in metabolic compound-protein interaction networks. BIOINFORMATICS ADVANCES 2022; 2:vbac001. [PMID: 36699348 PMCID: PMC9710583 DOI: 10.1093/bioadv/vbac001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 11/26/2021] [Accepted: 01/10/2022] [Indexed: 01/28/2023]
Abstract
Motivation Metabolic networks are complex systems of chemical reactions proceeding via physical interactions between metabolites and proteins. We aimed to predict previously unknown compound-protein interactions (CPI) in metabolic networks by applying biclique extension, a network-structure-based prediction method. Results We developed a workflow, named BiPredict, to predict CPIs based on biclique extension and applied it to Escherichia coli and human using their respective known CPI networks as input. Depending on the chosen biclique size and using a STITCH-derived E.coli CPI network as input, a sensitivity of 39% and an associated precision of 59% was reached. For the larger human STITCH network, a sensitivity of 78% with a false-positive rate of <5% and precision of 75% was obtained. High performance was also achieved when using KEGG metabolic-reaction networks as input. Prediction performance significantly exceeded that of randomized controls and compared favorably to state-of-the-art deep-learning methods. Regarding metabolic process involvement, TCA-cycle and ribosomal processes were found enriched among predicted interactions. BiPredict can be used for network curation, may help increase the efficiency of experimental testing of CPIs, and can readily be applied to other species. Availability and implementation BiPredict and related datasets are available at https://github.com/SandraThieme/BiPredict. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Sandra Thieme
- Max Planck Institute of Molecular Plant Physiology, Potsdam 14476, Germany
| | - Dirk Walther
- Max Planck Institute of Molecular Plant Physiology, Potsdam 14476, Germany,To whom correspondence should be addressed.
| |
Collapse
|
43
|
Jung YS, Kim Y, Cho YR. Comparative analysis of network-based approaches and machine learning algorithms for predicting drug-target interactions. Methods 2021; 198:19-31. [PMID: 34737033 DOI: 10.1016/j.ymeth.2021.10.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 10/21/2021] [Accepted: 10/22/2021] [Indexed: 01/06/2023] Open
Abstract
Computational prediction of drug-target interactions (DTIs) is of particular importance in the process of drug repositioning because of its efficiency in selecting potential candidates for DTIs. A variety of computational methods for predicting DTIs have been proposed over the past decade. Our interest is which methods or techniques are the most advantageous for increasing prediction accuracy. This article provides a comprehensive overview of network-based, machine learning, and integrated DTI prediction methods. The network-based methods handle a DTI network along with drug and target similarities in a matrix form and apply graph-theoretic algorithms to identify new DTIs. Machine learning methods use known DTIs and the features of drugs and target proteins as training data to build a predictive model. Integrated methods combine these two techniques. We assessed the prediction performance of the selected state-of-the-art methods using two different benchmark datasets. Our experimental results demonstrate that the integrated methods outperform the others in general. Some previous methods showed low accuracy on predicting interactions of unknown drugs which do not exist in the training dataset. Combining similarity matrices from multiple features by data fusion was not beneficial in increasing prediction accuracy. Finally, we analyzed future directions for further improvements in DTI predictions.
Collapse
Affiliation(s)
- Yi-Sue Jung
- Division of Software, Yonsei University - Mirae Campus, Republic of Korea
| | - Yoonbee Kim
- Division of Software, Yonsei University - Mirae Campus, Republic of Korea
| | - Young-Rae Cho
- Division of Software, Yonsei University - Mirae Campus, Republic of Korea; Division of Digital Healthcare, Yonsei University - Mirae Campus, Republic of Korea.
| |
Collapse
|
44
|
Rácz A, Bajusz D, Miranda-Quintana RA, Héberger K. Machine learning models for classification tasks related to drug safety. Mol Divers 2021; 25:1409-1424. [PMID: 34110577 PMCID: PMC8342376 DOI: 10.1007/s11030-021-10239-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 05/27/2021] [Indexed: 12/23/2022]
Abstract
In this review, we outline the current trends in the field of machine learning-driven classification studies related to ADME (absorption, distribution, metabolism and excretion) and toxicity endpoints from the past six years (2015-2021). The study focuses only on classification models with large datasets (i.e. more than a thousand compounds). A comprehensive literature search and meta-analysis was carried out for nine different targets: hERG-mediated cardiotoxicity, blood-brain barrier penetration, permeability glycoprotein (P-gp) substrate/inhibitor, cytochrome P450 enzyme family, acute oral toxicity, mutagenicity, carcinogenicity, respiratory toxicity and irritation/corrosion. The comparison of the best classification models was targeted to reveal the differences between machine learning algorithms and modeling types, endpoint-specific performances, dataset sizes and the different validation protocols. Based on the evaluation of the data, we can say that tree-based algorithms are (still) dominating the field, with consensus modeling being an increasing trend in drug safety predictions. Although one can already find classification models with great performances to hERG-mediated cardiotoxicity and the isoenzymes of the cytochrome P450 enzyme family, these targets are still central to ADMET-related research efforts.
Collapse
Affiliation(s)
- Anita Rácz
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, Budapest, 1117, Hungary.
| | - Dávid Bajusz
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, Budapest, 1117, Hungary
| | | | - Károly Héberger
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, Budapest, 1117, Hungary.
| |
Collapse
|