101
|
Tang C, Zhong C, Wang M, Zhou F. FMGNN: A Method to Predict Compound-Protein Interaction With Pharmacophore Features and Physicochemical Properties of Amino Acids. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1030-1040. [PMID: 35503835 DOI: 10.1109/tcbb.2022.3172340] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Identifying interactions between compounds and proteins is an essential task in drug discovery. To recommend compounds as new drug candidates, applying the computational approaches has a lower cost than conducting the wet-lab experiments. Machine learning-based methods, especially deep learning-based methods, have advantages in learning complex feature interactions between compounds and proteins. However, deep learning models will over-generalize and lead to the problem of predicting less relevant compound-protein pairs when the compound-protein feature interactions are high-dimensional sparse. This problem can be overcome by learning both low-order and high-order feature interactions. In this paper, we propose a novel hybrid model with Factorization Machines and Graph Neural Network called FMGNN to extract the low-order and high-order features, respectively. Then, we design a compound-protein interactions (CPIs) prediction method with pharmacophore features of compound and physicochemical properties of amino acids. The pharmacophore features can ensure that the prediction results much more fit the expectation of biological experiment and the physicochemical properties of amino acids are loaded into the embedding layer to improve the convergence speed and accuracy of protein feature learning. The experimental results on several datasets, especially on an imbalanced large-scale dataset, showed that our proposed method outperforms other existing methods for CPI prediction. The western blot experiment results on wogonin and its candidate target proteins also showed that our proposed method is effective and accurate for finding target proteins. The computer program of implementing the model FMGNN is available at https://github.com/tcygxu2021/FMGNN.
Collapse
|
102
|
Yu T, Liu M, Ren Z, Zhang J. A Fast Approximate Method for k-Edge Connected Component Detection in Graphs with High Accuracy. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2023]
|
103
|
Huang C, Gao W, Zheng Y, Wang W, Zhang Y, Liu K. Universal machine-learning algorithm for predicting adsorption performance of organic molecules based on limited data set: Importance of feature description. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 859:160228. [PMID: 36402319 DOI: 10.1016/j.scitotenv.2022.160228] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 11/09/2022] [Accepted: 11/12/2022] [Indexed: 06/16/2023]
Abstract
Adsorption of organic molecules from aqueous solution offers a simple and effective method for their removal. Recently, there have been several attempts to apply machine learning (ML) for this problem. To this end, polyparameter linear free energy relationships (pp-LFERs) were employed, and poor prediction results were observed outside model applicability domain of pp-LFERs. In this study, we improved the applicability of ML methods by adopting a chemical-structure (CS) based approach. We used the prediction of adsorption of organic molecules on carbon-based adsorbents as an example. Our results show that this approach can fully differentiate the structural differences between any organic molecules, while providing significant information that is relevant to their interaction with the adsorbents. We compared two CS feature descriptors: 3D-coordination and simplified molecular-input line-entry system (SMILES). We then built CS-ML models based on neural networks (NN) and extreme gradient boosting (XGB). They all outperformed pp-LFERs based models and are capable to accurately predict adsorption isotherm of isomers with similar physiochemical properties such as chiral molecules, even though they are trained with achiral molecules and racemates. We found for predicting adsorption isotherm, XGB shows better performance than NN, and 3D-coordinations allow effective differentiation between organic molecules.
Collapse
Affiliation(s)
- Chaoyi Huang
- Division of Environment and Resources, College of Engineering, Westlake University, Hangzhou, Zhejiang 310024, China
| | - Wenyang Gao
- Division of Artificial Intelligence and Data Science, College of Engineering, Westlake University, Hangzhou, Zhejiang 310024, China
| | - Yingdie Zheng
- Division of Environment and Resources, College of Engineering, Westlake University, Hangzhou, Zhejiang 310024, China
| | - Wei Wang
- Division of Environment and Resources, College of Engineering, Westlake University, Hangzhou, Zhejiang 310024, China
| | - Yue Zhang
- Division of Artificial Intelligence and Data Science, College of Engineering, Westlake University, Hangzhou, Zhejiang 310024, China
| | - Kai Liu
- Division of Environment and Resources, College of Engineering, Westlake University, Hangzhou, Zhejiang 310024, China.
| |
Collapse
|
104
|
Hua Y, Song X, Feng Z, Wu X. MFR-DTA: a multi-functional and robust model for predicting drug-target binding affinity and region. Bioinformatics 2023; 39:7008321. [PMID: 36708000 PMCID: PMC9900210 DOI: 10.1093/bioinformatics/btad056] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 12/31/2022] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
MOTIVATION Recently, deep learning has become the mainstream methodology for drug-target binding affinity prediction. However, two deficiencies of the existing methods restrict their practical applications. On the one hand, most existing methods ignore the individual information of sequence elements, resulting in poor sequence feature representations. On the other hand, without prior biological knowledge, the prediction of drug-target binding regions based on attention weights of a deep neural network could be difficult to verify, which may bring adverse interference to biological researchers. RESULTS We propose a novel Multi-Functional and Robust Drug-Target binding Affinity prediction (MFR-DTA) method to address the above issues. Specifically, we design a new biological sequence feature extraction block, namely BioMLP, that assists the model in extracting individual features of sequence elements. Then, we propose a new Elem-feature fusion block to refine the extracted features. After that, we construct a Mix-Decoder block that extracts drug-target interaction information and predicts their binding regions simultaneously. Last, we evaluate MFR-DTA on two benchmarks consistently with the existing methods and propose a new dataset, sc-PDB, to better measure the accuracy of binding region prediction. We also visualize some samples to demonstrate the locations of their binding sites and the predicted multi-scale interaction regions. The proposed method achieves excellent performance on these datasets, demonstrating its merits and superiority over the state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION https://github.com/JU-HuaY/MFR.
Collapse
Affiliation(s)
- Yang Hua
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
| | - Xiaoning Song
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
| | - Zhenhua Feng
- School of Computer Science and Electronic Engineering, University of Surrey, Guildford GU2 7XH, UK
| | - Xiaojun Wu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
105
|
Bai P, Miljković F, John B, Lu H. Interpretable bilinear attention network with domain adaptation improves drug–target prediction. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-022-00605-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
106
|
Xu W, Yang B. Microkinetic modeling with machine learning predicted binding energies of reaction intermediates of ethanol steam reforming: The limitations. MOLECULAR CATALYSIS 2023. [DOI: 10.1016/j.mcat.2023.112940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
107
|
Vemula D, Jayasurya P, Sushmitha V, Kumar YN, Bhandari V. CADD, AI and ML in drug discovery: A comprehensive review. Eur J Pharm Sci 2023; 181:106324. [PMID: 36347444 DOI: 10.1016/j.ejps.2022.106324] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 10/26/2022] [Accepted: 11/03/2022] [Indexed: 11/06/2022]
Abstract
Computer-aided drug design (CADD) is an emerging field that has drawn a lot of interest because of its potential to expedite and lower the cost of the drug development process. Drug discovery research is expensive and time-consuming, and it frequently took 10-15 years for a drug to be commercially available. CADD has significantly impacted this area of research. Further, the combination of CADD with Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) technologies to handle enormous amounts of biological data has reduced the time and cost associated with the drug development process. This review will discuss how CADD, AI, ML, and DL approaches help identify drug candidates and various other steps of the drug discovery process. It will also provide a detailed overview of the different in silico tools used and how these approaches interact.
Collapse
Affiliation(s)
- Divya Vemula
- National Institute of Pharmaceutical Education and Research- Hyderabad, India
| | - Perka Jayasurya
- National Institute of Pharmaceutical Education and Research- Hyderabad, India
| | - Varthiya Sushmitha
- National Institute of Pharmaceutical Education and Research- Hyderabad, India
| | | | - Vasundhra Bhandari
- National Institute of Pharmaceutical Education and Research- Hyderabad, India.
| |
Collapse
|
108
|
Lim PK, Julca I, Mutwil M. Redesigning plant specialized metabolism with supervised machine learning using publicly available reactome data. Comput Struct Biotechnol J 2023; 21:1639-1650. [PMID: 36874159 PMCID: PMC9976193 DOI: 10.1016/j.csbj.2023.01.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 01/12/2023] [Accepted: 01/12/2023] [Indexed: 01/19/2023] Open
Abstract
The immense structural diversity of products and intermediates of plant specialized metabolism (specialized metabolites) makes them rich sources of therapeutic medicine, nutrients, and other useful materials. With the rapid accumulation of reactome data that can be accessible on biological and chemical databases, along with recent advances in machine learning, this review sets out to outline how supervised machine learning can be used to design new compounds and pathways by exploiting the wealth of said data. We will first examine the various sources from which reactome data can be obtained, followed by explaining the different machine learning encoding methods for reactome data. We then discuss current supervised machine learning developments that can be employed in various aspects to help redesign plant specialized metabolism.
Collapse
Affiliation(s)
- Peng Ken Lim
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Irene Julca
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
109
|
Hua Y, Song X, Feng Z, Wu XJ, Kittler J, Yu DJ. CPInformer for Efficient and Robust Compound-Protein Interaction Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:285-296. [PMID: 35044921 DOI: 10.1109/tcbb.2022.3144008] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Recently, deep learning has become the mainstream methodology for Compound-Protein Interaction (CPI) prediction. However, the existing compound-protein feature extraction methods have some issues that limit their performance. First, graph networks are widely used for structural compound feature extraction, but the chemical properties of a compound depend on functional groups rather than graphic structure. Besides, the existing methods lack capabilities in extracting rich and discriminative protein features. Last, the compound-protein features are usually simply combined for CPI prediction, without considering information redundancy and effective feature mining. To address the above issues, we propose a novel CPInformer method. Specifically, we extract heterogeneous compound features, including structural graph features and functional class fingerprints, to reduce prediction errors caused by similar structural compounds. Then, we combine local and global features using dense connections to obtain multi-scale protein features. Last, we apply ProbSparse self-attention to protein features, under the guidance of compound features, to eliminate information redundancy, and to improve the accuracy of CPInformer. More importantly, the proposed method identifies the activated local regions that link a CPI, providing a good visualisation for the CPI state. The results obtained on five benchmarks demonstrate the merits and superiority of CPInformer over the state-of-the-art approaches.
Collapse
|
110
|
Molecule generation toward target protein (SARS-CoV-2) using reinforcement learning-based graph neural network via knowledge graph. NETWORK MODELING AND ANALYSIS IN HEALTH INFORMATICS AND BIOINFORMATICS 2023; 12:13. [PMID: 36627927 PMCID: PMC9817447 DOI: 10.1007/s13721-023-00409-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 11/23/2022] [Accepted: 12/31/2022] [Indexed: 01/07/2023]
Abstract
AI-driven approaches are widely used in drug discovery, where candidate molecules are generated and tested on a target protein for binding affinity prediction. However, generating new compounds with desirable molecular properties such as Quantitative Estimate of Drug-likeness (QED) and Dopamine Receptor D2 activity (DRD2) while adhering to distinct chemical laws is challenging. To address these challenges, we proposed a graph-based deep learning framework to generate potential therapeutic drugs targeting the SARS-CoV-2 protein. Our proposed framework consists of two modules: a novel reinforcement learning (RL)-based graph generative module with knowledge graph (KG) and a graph early fusion approach (GEFA) for binding affinity prediction. The first module uses a gated graph neural network (GGNN) model under the RL environment for generating novel molecular compounds with desired properties and a custom-made KG for molecule screening. The second module uses GEFA to predict binding affinity scores between the generated compounds and target proteins. Experiments show how fine-tuning the GGNN model under the RL environment enhances the molecules with desired properties to generate 100 % valid and 100 % unique compounds using different scoring functions. Additionally, KG-based screening reduces the search space of generated candidate molecules by 96.64 % while retaining 95.38 % of promising binding molecules against SARS-CoV-2 protein, i.e., 3C-like protease (3CLpro). We achieved a binding affinity score of 8.185 from the top rank of generated compound. In addition, we compared top-ranked generated compounds to Indinavir on different parameters, including drug-likeness and medicinal chemistry, for qualitative analysis from a drug development perspective. Supplementary Information The online version contains supplementary material available at 10.1007/s13721-023-00409-2.
Collapse
|
111
|
Protein-ligand binding affinity prediction with edge awareness and supervised attention. iScience 2022; 26:105892. [PMID: 36691617 PMCID: PMC9860494 DOI: 10.1016/j.isci.2022.105892] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 11/12/2022] [Accepted: 12/23/2022] [Indexed: 12/29/2022] Open
Abstract
Accurate prediction of protein-ligand binding affinity is crucial in structure-based drug design but remains some challenges even with recent advances in deep learning: (1) Existing methods neglect the edge information in protein and ligand structure data; (2) current attention mechanisms struggle to capture true binding interactions in the small dataset. Herein, we proposed SEGSA_DTA, a SuperEdge Graph convolution-based and Supervised Attention-based Drug-Target Affinity prediction method, where the super edge graph convolution can comprehensively utilize node and edge information and the multi-supervised attention module can efficiently learn the attention distribution consistent with real protein-ligand interactions. Results on the multiple datasets show that SEGSA_DTA outperforms current state-of-the-art methods. We also applied SEGSA_DTA in repurposing FDA-approved drugs to identify potential coronavirus disease 2019 (COVID-19) treatments. Besides, by using SHapley Additive exPlanations (SHAP), we found that SEGSA_DTA is interpretable and further provides a new quantitative analytical solution for structure-based lead optimization.
Collapse
|
112
|
Wu J, Liu Z, Yang X, Lin Z. Improved compound-protein interaction site and binding affinity prediction using self-supervised protein embeddings. BMC Bioinformatics 2022; 23:543. [PMID: 36526969 PMCID: PMC9756525 DOI: 10.1186/s12859-022-05107-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 12/09/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Compound-protein interaction site and binding affinity predictions are crucial for drug discovery and drug design. In recent years, many deep learning-based methods have been proposed for predications related to compound-protein interaction. For protein inputs, how to make use of protein primary sequence and tertiary structure information has impact on prediction results. RESULTS In this study, we propose a deep learning model based on a multi-objective neural network, which involves a multi-objective neural network for compound-protein interaction site and binding affinity prediction. We used several kinds of self-supervised protein embeddings to enrich our protein inputs and used convolutional neural networks to extract features from them. Our results demonstrate that our model had improvements in terms of interaction site prediction and affinity prediction compared to previous models. In a case study, our model could better predict binding sites, which also showed its effectiveness. CONCLUSION These results suggest that our model could be a helpful tool for compound-protein related predictions.
Collapse
Affiliation(s)
- Jialin Wu
- grid.79703.3a0000 0004 1764 3838School of Biology and Biological Engineering, South China University of Technology, 382 East Outer Loop Road, University Park, Guangzhou, 510006 Guangdong China
| | - Zhe Liu
- grid.79703.3a0000 0004 1764 3838School of Biology and Biological Engineering, South China University of Technology, 382 East Outer Loop Road, University Park, Guangzhou, 510006 Guangdong China
| | - Xiaofeng Yang
- grid.79703.3a0000 0004 1764 3838School of Biology and Biological Engineering, South China University of Technology, 382 East Outer Loop Road, University Park, Guangzhou, 510006 Guangdong China
| | - Zhanglin Lin
- grid.79703.3a0000 0004 1764 3838School of Biology and Biological Engineering, South China University of Technology, 382 East Outer Loop Road, University Park, Guangzhou, 510006 Guangdong China
| |
Collapse
|
113
|
Luo Z, Lou L, Qiu W, Xu Z, Xiao X. Predicting N6-Methyladenosine Sites in Multiple Tissues of Mammals through Ensemble Deep Learning. Int J Mol Sci 2022; 23:15490. [PMID: 36555143 PMCID: PMC9778682 DOI: 10.3390/ijms232415490] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/03/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022] Open
Abstract
N6-methyladenosine (m6A) is the most abundant within eukaryotic messenger RNA modification, which plays an essential regulatory role in the control of cellular functions and gene expression. However, it remains an outstanding challenge to detect mRNA m6A transcriptome-wide at base resolution via experimental approaches, which are generally time-consuming and expensive. Developing computational methods is a good strategy for accurate in silico detection of m6A modification sites from the large amount of RNA sequence data. Unfortunately, the existing computational models are usually only for m6A site prediction in a single species, without considering the tissue level of species, while most of them are constructed based on low-confidence level data generated by an m6A antibody immunoprecipitation (IP)-based sequencing method, thereby restricting reliability and generalizability of proposed models. Here, we review recent advances in computational prediction of m6A sites and construct a new computational approach named im6APred using ensemble deep learning to accurately identify m6A sites based on high-confidence level data in multiple tissues of mammals. Our model im6APred builds upon a comprehensive evaluation of multiple classification methods, including four traditional classification algorithms and three deep learning methods and their ensembles. The optimal base-classifier combinations are then chosen by five-fold cross-validation test to achieve an effective stacked model. Our model im6APred can produce the area under the receiver operating characteristic curve (AUROC) in the range of 0.82-0.91 on independent tests, indicating that our model has the ability to learn general methylation rules on RNA bases and generalize to m6A transcriptome-wide identification. Moreover, AUROCs in the range of 0.77-0.96 were achieved using cross-species/tissues validation on the benchmark dataset, demonstrating differences in predictive performance at the tissue level and the need for constructing tissue-specific models for m6A site prediction.
Collapse
Affiliation(s)
| | | | | | - Zhaochun Xu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| |
Collapse
|
114
|
Li MM, Huang K, Zitnik M. Graph representation learning in biomedicine and healthcare. Nat Biomed Eng 2022; 6:1353-1369. [PMID: 36316368 PMCID: PMC10699434 DOI: 10.1038/s41551-022-00942-x] [Citation(s) in RCA: 52] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 08/09/2022] [Indexed: 11/11/2022]
Abstract
Networks-or graphs-are universal descriptors of systems of interacting elements. In biomedicine and healthcare, they can represent, for example, molecular interactions, signalling pathways, disease co-morbidities or healthcare systems. In this Perspective, we posit that representation learning can realize principles of network medicine, discuss successes and current limitations of the use of representation learning on graphs in biomedicine and healthcare, and outline algorithmic strategies that leverage the topology of graphs to embed them into compact vectorial spaces. We argue that graph representation learning will keep pushing forward machine learning for biomedicine and healthcare applications, including the identification of genetic variants underlying complex traits, the disentanglement of single-cell behaviours and their effects on health, the assistance of patients in diagnosis and treatment, and the development of safe and effective medicines.
Collapse
Affiliation(s)
- Michelle M Li
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Kexin Huang
- Health Data Science Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
| |
Collapse
|
115
|
A deep learning method for predicting molecular properties and compound-protein interactions. J Mol Graph Model 2022; 117:108283. [PMID: 35994925 DOI: 10.1016/j.jmgm.2022.108283] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 07/19/2022] [Accepted: 07/26/2022] [Indexed: 01/14/2023]
Abstract
Predicting molecular properties and compound-protein interactions (CPIs) are two important areas of drug design and discovery. They are also an essential way to discover lead compounds in virtual screening. Recently, in silico methods based on deep learning have demonstrated excellent performance in various challenges. It is imperative to develop efficient computational methods to predict accurately both molecular properties and CPIs in drug research using deep learning techniques. In this paper, we propose a deep learning method applicable to both molecular property prediction and CPI prediction based on the idea that both are generally influenced by chemical structure and sequence information of compounds and proteins. Molecular properties are inferred by integrating the molecular structure and sequence information of compounds, and CPIs are predicted by integrating protein sequence and compound structure. The method combines topological structure and sequence fingerprint information of molecules, extracts adequately raw data features, and generates highly representative features for prediction. Molecular property prediction experiments were conducted on BACE, P53 and hERG datasets, and CPI prediction experiments were conducted on Human, C. elegans and KIBA datasets. MG-S achieves outperformance in molecular property prediction on P53, the differences in AUC, Precision and MCC are 0.030, 0.050 and 0.100, respectively, over the suboptimal baseline model, and provides consistently good results on BACE and hERG.The model also achieves impressive performance in CPI prediction, the differences in AUC, Precision and MCC on KIBA are 0.141, 0.138, 0.090 and 0.082, respectively, compared with the state-of-the-art models. The comprehensive results show that the MG-S model has higher performance, better classification ability, and faster convergence. MG-S will serve as a useful method to predict compound properties and CPIs in the early stages of drug design and discovery.Our code and datasets are available at: https://github.com/happay-ending/cpi_cpp.
Collapse
|
116
|
Abdullah M, Chellappan Lethesh K, Baloch AA, Bamgbopa MO. Comparison of molecular and structural features towards prediction of ionic liquid ionic conductivity for electrochemical applications. J Mol Liq 2022. [DOI: 10.1016/j.molliq.2022.120620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
117
|
Sun J, Wen M, Wang H, Ruan Y, Yang Q, Kang X, Zhang H, Zhang Z, Lu H. Prediction of drug-likeness using graph convolutional attention network. Bioinformatics 2022; 38:5262-5269. [PMID: 36222555 DOI: 10.1093/bioinformatics/btac676] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 09/22/2022] [Accepted: 10/08/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The drug-likeness has been widely used as a criterion to distinguish drug-like molecules from non-drugs. Developing reliable computational methods to predict the drug-likeness of compounds is crucial to triage unpromising molecules and accelerate the drug discovery process. RESULTS In this study, a deep learning method was developed to predict the drug-likeness based on the graph convolutional attention network (D-GCAN) directly from molecular structures. Results showed that the D-GCAN model outperformed other state-of-the-art models for drug-likeness prediction. The combination of graph convolution and attention mechanism made an important contribution to the performance of the model. Specifically, the application of the attention mechanism improved accuracy by 4.0%. The utilization of graph convolution improved the accuracy by 6.1%. Results on the dataset beyond Lipinski's rule of five space and the non-US dataset showed that the model had good versatility. Then, the billion-scale GDB-13 database was used as a case study to screen SARS-CoV-2 3C-like protease inhibitors. Sixty-five drug candidates were screened out, most substructures of which are similar to these of existing oral drugs. Candidates screened from S-GDB13 have higher similarity to existing drugs and better molecular docking performance than those from the rest of GDB-13. The screening speed on S-GDB13 is significantly faster than screening directly on GDB-13. In general, D-GCAN is a promising tool to predict the drug-likeness for selecting potential candidates and accelerating drug discovery by excluding unpromising candidates and avoiding unnecessary biological and clinical testing. AVAILABILITY AND IMPLEMENTATION The source code, model and tutorials are available at https://github.com/JinYSun/D-GCAN. The S-GDB13 database is available at https://doi.org/10.5281/zenodo.7054367. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jinyu Sun
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Ming Wen
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Huabei Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yuezhe Ruan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Xiao Kang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
118
|
Using Artificial Intelligence for Drug Discovery: A Bibliometric Study and Future Research Agenda. Pharmaceuticals (Basel) 2022; 15:ph15121492. [PMID: 36558943 PMCID: PMC9785219 DOI: 10.3390/ph15121492] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 11/23/2022] [Accepted: 11/27/2022] [Indexed: 12/03/2022] Open
Abstract
Drug discovery is usually a rule-based process that is carefully carried out by pharmacists. However, a new trend is emerging in research and practice where artificial intelligence is being used for drug discovery to increase efficiency or to develop new drugs for previously untreatable diseases. Nevertheless, so far, no study takes a holistic view of AI-based drug discovery research. Given the importance and potential of AI for drug discovery, this lack of research is surprising. This study aimed to close this research gap by conducting a bibliometric analysis to identify all relevant studies and to analyze interrelationships among algorithms, institutions, countries, and funding sponsors. For this purpose, a sample of 3884 articles was examined bibliometrically, including studies from 1991 to 2022. We utilized various qualitative and quantitative methods, such as performance analysis, science mapping, and thematic analysis. Based on these findings, we furthermore developed a research agenda that aims to serve as a foundation for future researchers.
Collapse
|
119
|
Dong R, Yang H, Ai C, Duan G, Wang J, Guo F. DeepBLI: A Transferable Multichannel Model for Detecting β-Lactamase-Inhibitor Interaction. J Chem Inf Model 2022; 62:5830-5840. [PMID: 36245217 DOI: 10.1021/acs.jcim.2c01008] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Pathogens producing β-lactamase pose a great challenge to antibiotic-resistant infection treatment; thus, it is urgent to discover novel β-lactamase inhibitors for drug development. Conventional high-throughput screening is very costly, and structure-based virtual screening is limited with mechanisms. In this study, we construct a novel multichannel deep neural network (DeepBLI) for β-lactamase inhibitor screening, pretrained with a label reversal KIBA data set and fine-tuned on β-lactamase-inhibitor pairs from BindingDB. First, the pairs of encoders (Conv and Att) fuse the information spatially and sequentially for both enzymes and inhibitors. Then, a co-attention module creates the connection between the inhibitor and enzyme embeddings. Finally, multichannel outputs fuse with an element-wise product and then are fed into 3-layer fully connected networks to predict interactions. Comparing the state-of-the-art methods, DeepBLI yields an AUROC of 0.9240 and an AUPRC of 0.9715, which indicates that it can identify new β-lactamase-inhibitor interactions. To demonstrate its prediction ability, an application of DeepBLI is described to screen potential inhibitor compounds for metallo-β-lactamase AIM-1 and repurpose rottlerin for four classes of β-lactamase targets, showing the possibility of being a broad-spectrum inhibitor. DeepBLI provides an effective way for antibacterial drug development, contributing to antibiotic-resistant therapeutics.
Collapse
Affiliation(s)
- Ruihan Dong
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing100871, China
| | - Hongpeng Yang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina29208, United States
| | - Chengwei Ai
- College of Intelligence and Computing, Tianjin University, Tianjin300350, China
| | - Guihua Duan
- School of Computer Science and Engineering, Central South University, Changsha410083, China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha410083, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha410083, China
| |
Collapse
|
120
|
Li P, Luo H, Ji B, Nielsen J. Machine learning for data integration in human gut microbiome. Microb Cell Fact 2022; 21:241. [PMID: 36419034 PMCID: PMC9685977 DOI: 10.1186/s12934-022-01973-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 11/15/2022] [Indexed: 11/25/2022] Open
Abstract
Recent studies have demonstrated that gut microbiota plays critical roles in various human diseases. High-throughput technology has been widely applied to characterize the microbial ecosystems, which led to an explosion of different types of molecular profiling data, such as metagenomics, metatranscriptomics and metabolomics. For analysis of such data, machine learning algorithms have shown to be useful for identifying key molecular signatures, discovering potential patient stratifications, and particularly for generating models that can accurately predict phenotypes. In this review, we first discuss how dysbiosis of the intestinal microbiota is linked to human disease development and how potential modulation strategies of the gut microbial ecosystem can be used for disease treatment. In addition, we introduce categories and workflows of different machine learning approaches, and how they can be used to perform integrative analysis of multi-omics data. Finally, we review advances of machine learning in gut microbiome applications and discuss related challenges. Based on this we conclude that machine learning is very well suited for analysis of gut microbiome and that these approaches can be useful for development of gut microbe-targeted therapies, which ultimately can help in achieving personalized and precision medicine.
Collapse
Affiliation(s)
- Peishun Li
- grid.5371.00000 0001 0775 6028Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Hao Luo
- grid.5371.00000 0001 0775 6028Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Boyang Ji
- grid.5371.00000 0001 0775 6028Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden ,grid.510909.4BioInnovation Institute, Ole Maaløes Vej 3, DK2200 Copenhagen, Denmark
| | - Jens Nielsen
- grid.5371.00000 0001 0775 6028Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden ,grid.510909.4BioInnovation Institute, Ole Maaløes Vej 3, DK2200 Copenhagen, Denmark
| |
Collapse
|
121
|
Asada M, Miwa M, Sasaki Y. Integrating heterogeneous knowledge graphs into drug-drug interaction extraction from the literature. Bioinformatics 2022; 39:6842324. [PMID: 36416141 PMCID: PMC9805562 DOI: 10.1093/bioinformatics/btac754] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 10/17/2022] [Accepted: 11/22/2022] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION Most of the conventional deep neural network-based methods for drug-drug interaction (DDI) extraction consider only context information around drug mentions in the text. However, human experts use heterogeneous background knowledge about drugs to comprehend pharmaceutical papers and extract relationships between drugs. Therefore, we propose a novel method that simultaneously considers various heterogeneous information for DDI extraction from the literature. RESULTS We first construct drug representations by conducting the link prediction task on a heterogeneous pharmaceutical knowledge graph (KG) dataset. We then effectively combine the text information of input sentences in the corpus and the information on drugs in the heterogeneous KG (HKG) dataset. Finally, we evaluate our DDI extraction method on the DDIExtraction-2013 shared task dataset. In the experiment, integrating heterogeneous drug information significantly improves the DDI extraction performance, and we achieved an F-score of 85.40%, which results in state-of-the-art performance. We evaluated our method on the DrugProt dataset and improved the performance significantly, achieving an F-score of 77.9%. Further analysis showed that each type of node in the HKG contributes to the performance improvement of DDI extraction, indicating the importance of considering multiple pieces of information. AVAILABILITY AND IMPLEMENTATION Our code is available at https://github.com/tticoin/HKG-DDIE.git.
Collapse
Affiliation(s)
- Masaki Asada
- Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya, Aichi 468-8511, Japan
| | | | - Yutaka Sasaki
- Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya, Aichi 468-8511, Japan
| |
Collapse
|
122
|
Huang L, Lin J, Liu R, Zheng Z, Meng L, Chen X, Li X, Wong KC. CoaDTI: multi-modal co-attention based framework for drug-target interaction annotation. Brief Bioinform 2022; 23:6770087. [PMID: 36274236 DOI: 10.1093/bib/bbac446] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 08/26/2022] [Accepted: 09/18/2022] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION The identification of drug-target interactions (DTIs) plays a vital role for in silico drug discovery, in which the drug is the chemical molecule, and the target is the protein residues in the binding pocket. Manual DTI annotation approaches remain reliable; however, it is notoriously laborious and time-consuming to test each drug-target pair exhaustively. Recently, the rapid growth of labelled DTI data has catalysed interests in high-throughput DTI prediction. Unfortunately, those methods highly rely on the manual features denoted by human, leading to errors. RESULTS Here, we developed an end-to-end deep learning framework called CoaDTI to significantly improve the efficiency and interpretability of drug target annotation. CoaDTI incorporates the Co-attention mechanism to model the interaction information from the drug modality and protein modality. In particular, CoaDTI incorporates transformer to learn the protein representations from raw amino acid sequences, and GraphSage to extract the molecule graph features from SMILES. Furthermore, we proposed to employ the transfer learning strategy to encode protein features by pre-trained transformer to address the issue of scarce labelled data. The experimental results demonstrate that CoaDTI achieves competitive performance on three public datasets compared with state-of-the-art models. In addition, the transfer learning strategy further boosts the performance to an unprecedented level. The extended study reveals that CoaDTI can identify novel DTIs such as reactions between candidate drugs and severe acute respiratory syndrome coronavirus 2-associated proteins. The visualization of co-attention scores can illustrate the interpretability of our model for mechanistic insights. AVAILABILITY Source code are publicly available at https://github.com/Layne-Huang/CoaDTI.
Collapse
Affiliation(s)
- Lei Huang
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR
| | - Jiecong Lin
- Department of Pathology, Harvard Medical School, Boston, USA.,Department of Computer Science, The University of Hong Kong, Hong Kong SAR
| | - Rui Liu
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR
| | - Lingkuan Meng
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR
| | - Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR.,Hong Kong Institute for Data Science, City University of Hong Kong, Hong Kong SAR
| |
Collapse
|
123
|
Zeng X, Xiang H, Yu L, Wang J, Li K, Nussinov R, Cheng F. Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00557-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
124
|
Liao J, Chen H, Wei L, Wei L. GSAML-DTA: An interpretable drug-target binding affinity prediction model based on graph neural networks with self-attention mechanism and mutual information. Comput Biol Med 2022; 150:106145. [PMID: 37859276 DOI: 10.1016/j.compbiomed.2022.106145] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 08/23/2022] [Accepted: 09/24/2022] [Indexed: 11/03/2022]
Abstract
Identifying drug-target affinity (DTA) has great practical importance in the process of designing efficacious drugs for known diseases. Recently, numerous deep learning-based computational methods have been developed to predict drug-target affinity and achieved impressive performance. However, most of them construct the molecule (drug or target) encoder without considering the weights of features of each node (atom or residue). Besides, they generally combine drug and target representations directly, which may contain irrelevant-task information. In this study, we develop GSAML-DTA, an interpretable deep learning framework for DTA prediction. GSAML-DTA integrates a self-attention mechanism and graph neural networks (GNNs) to build representations of drugs and target proteins from the structural information. In addition, mutual information is introduced to filter out redundant information and retain relevant information in the combined representations of drugs and targets. Extensive experimental results demonstrate that GSAML-DTA outperforms state-of-the-art methods for DTA prediction on two benchmark datasets. Furthermore, GSAML-DTA has the interpretation ability to analyze binding atoms and residues, which may be conducive to chemical biology studies from data. Overall, GSAML-DTA can serve as a powerful and interpretable tool suitable for DTA modelling.
Collapse
Affiliation(s)
- Jiaqi Liao
- School of Software, Shandong University, Jinan, China
| | - Haoyang Chen
- School of Software, Shandong University, Jinan, China
| | - Lesong Wei
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China.
| |
Collapse
|
125
|
Kurata H, Tsukiyama S. ICAN: Interpretable cross-attention network for identifying drug and target protein interactions. PLoS One 2022; 17:e0276609. [PMID: 36279284 PMCID: PMC9591068 DOI: 10.1371/journal.pone.0276609] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 10/10/2022] [Indexed: 11/18/2022] Open
Abstract
Drug-target protein interaction (DTI) identification is fundamental for drug discovery and drug repositioning, because therapeutic drugs act on disease-causing proteins. However, the DTI identification process often requires expensive and time-consuming tasks, including biological experiments involving large numbers of candidate compounds. Thus, a variety of computation approaches have been developed. Of the many approaches available, chemo-genomics feature-based methods have attracted considerable attention. These methods compute the feature descriptors of drugs and proteins as the input data to train machine and deep learning models to enable accurate prediction of unknown DTIs. In addition, attention-based learning methods have been proposed to identify and interpret DTI mechanisms. However, improvements are needed for enhancing prediction performance and DTI mechanism elucidation. To address these problems, we developed an attention-based method designated the interpretable cross-attention network (ICAN), which predicts DTIs using the Simplified Molecular Input Line Entry System of drugs and amino acid sequences of target proteins. We optimized the attention mechanism architecture by exploring the cross-attention or self-attention, attention layer depth, and selection of the context matrixes from the attention mechanism. We found that a plain attention mechanism that decodes drug-related protein context features without any protein-related drug context features effectively achieved high performance. The ICAN outperformed state-of-the-art methods in several metrics on the DAVIS dataset and first revealed with statistical significance that some weighted sites in the cross-attention weight matrix represent experimental binding sites, thus demonstrating the high interpretability of the results. The program is freely available at https://github.com/kuratahiroyuki/ICAN.
Collapse
Affiliation(s)
- Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
- * E-mail:
| | - Sho Tsukiyama
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
| |
Collapse
|
126
|
Zhang Y, Hu Y, Li H, Liu X. Drug-protein interaction prediction via variational autoencoders and attention mechanisms. Front Genet 2022; 13:1032779. [PMID: 36313473 PMCID: PMC9614151 DOI: 10.3389/fgene.2022.1032779] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 09/30/2022] [Indexed: 09/29/2023] Open
Abstract
During the process of drug discovery, exploring drug-protein interactions (DPIs) is a key step. With the rapid development of biological data, computer-aided methods are much faster than biological experiments. Deep learning methods have become popular and are mainly used to extract the characteristics of drugs and proteins for further DPIs prediction. Since the prediction of DPIs through machine learning cannot fully extract effective features, in our work, we propose a deep learning framework that uses variational autoencoders and attention mechanisms; it utilizes convolutional neural networks (CNNs) to obtain local features and attention mechanisms to obtain important information about drugs and proteins, which is very important for predicting DPIs. Compared with some machine learning methods on the C.elegans and human datasets, our approach provides a better effect. On the BindingDB dataset, its accuracy (ACC) and area under the curve (AUC) reach 0.862 and 0.913, respectively. To verify the robustness of the model, multiclass classification tasks are performed on Davis and KIBA datasets, and the ACC values reach 0.850 and 0.841, respectively, thus further demonstrating the effectiveness of the model.
Collapse
Affiliation(s)
- Yue Zhang
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, China
| | | | | | | |
Collapse
|
127
|
Yan X, Liu Y. Graph-sequence attention and transformer for predicting drug-target affinity. RSC Adv 2022; 12:29525-29534. [PMID: 36320763 PMCID: PMC9562047 DOI: 10.1039/d2ra05566j] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Accepted: 10/04/2022] [Indexed: 11/30/2022] Open
Abstract
Drug-target binding affinity (DTA) prediction has drawn increasing interest due to its substantial position in the drug discovery process. The development of new drugs is costly, time-consuming, and often accompanied by safety issues. Drug repurposing can avoid the expensive and lengthy process of drug development by finding new uses for already approved drugs. Therefore, it is of great significance to develop effective computational methods to predict DTAs. The attention mechanisms allow the computational method to focus on the most relevant parts of the input and have been proven to be useful for various tasks. In this study, we proposed a novel model based on self-attention, called GSATDTA, to predict the binding affinity between drugs and targets. For the representation of drugs, we use Bi-directional Gated Recurrent Units (BiGRU) to extract the SMILES representation from SMILES sequences, and graph neural networks to extract the graph representation of the molecular graphs. Then we utilize an attention mechanism to fuse the two representations of the drug. For the target/protein, we utilized an efficient transformer to learn the representation of the protein, which can capture the long-distance relationships in the sequence of amino acids. We conduct extensive experiments to compare our model with state-of-the-art models. Experimental results show that our model outperforms the current state-of-the-art methods on two independent datasets.
Collapse
Affiliation(s)
- Xiangfeng Yan
- School of Computer Science and Technology, Heilongjiang University Harbin China
| | - Yong Liu
- School of Computer Science and Technology, Heilongjiang University Harbin China
| |
Collapse
|
128
|
Shimizu H, Kodama M, Matsumoto M, Orba Y, Sasaki M, Sato A, Sawa H, Nakayama KI. LIGHTHOUSE illuminates therapeutics for a variety of diseases including COVID-19. iScience 2022; 25:105314. [PMID: 36246574 PMCID: PMC9549714 DOI: 10.1016/j.isci.2022.105314] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 08/08/2022] [Accepted: 10/05/2022] [Indexed: 11/26/2022] Open
Abstract
One of the bottlenecks in the application of basic research findings to patients is the enormous cost, time, and effort required for high-throughput screening of potential drugs for given therapeutic targets. Here we have developed LIGHTHOUSE, a graph-based deep learning approach for discovery of the hidden principles underlying the association of small-molecule compounds with target proteins. Without any 3D structural information for proteins or chemicals, LIGHTHOUSE estimates protein-compound scores that incorporate known evolutionary relations and available experimental data. It identified therapeutics for cancer, lifestyle related disease, and bacterial infection. Moreover, LIGHTHOUSE predicted ethoxzolamide as a therapeutic for coronavirus disease 2019 (COVID-19), and this agent was indeed effective against alpha, beta, gamma, and delta variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that are rampant worldwide. We envision that LIGHTHOUSE will help accelerate drug discovery and fill the gap between bench side and bedside. LIGHTHOUSE discovers therapeutics solely on the basis of the primary sequence The predictions of LIGHTHOUSE against multiple diseases were experimentally correct LIGHTHOUSE facilitates optimization of lead compounds as well
Collapse
Affiliation(s)
- Hideyuki Shimizu
- Department of Molecular and Cellular Biology, Medical Institute of Bioregulation, Kyushu University, Fukuoka 812-8582, Japan,Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA,Wyss Institute for Biologically Inspired Engineering, Harvard Medical School, Boston, MA 02115, USA,Department of AI Systems Medicine, M&D Data Science Center, Tokyo Medical and Dental University, Tokyo 113-8510, Japan,Corresponding author
| | - Manabu Kodama
- Department of Molecular and Cellular Biology, Medical Institute of Bioregulation, Kyushu University, Fukuoka 812-8582, Japan
| | - Masaki Matsumoto
- Department of Omics and Systems Biology, Niigata University Graduate School of Medical and Dental Sciences, Niigata 951-8510, Japan
| | - Yasuko Orba
- Division of Molecular Pathobiology, International Institute for Zoonosis Control, Hokkaido University, Sapporo 060-8638, Japan
| | - Michihito Sasaki
- Division of Molecular Pathobiology, International Institute for Zoonosis Control, Hokkaido University, Sapporo 060-8638, Japan
| | - Akihiko Sato
- Division of Molecular Pathobiology, International Institute for Zoonosis Control, Hokkaido University, Sapporo 060-8638, Japan,Drug Discovery and Disease Research Laboratory, Shionogi & Co. Ltd., Osaka 561-0825, Japan
| | - Hirofumi Sawa
- Division of Molecular Pathobiology, International Institute for Zoonosis Control, Hokkaido University, Sapporo 060-8638, Japan,International Collaboration Unit, International Institute for Zoonosis Control, Hokkaido University, Sapporo 060-8638, Japan,One Health Research Center, Hokkaido University, Sapporo 060-8638, Japan,Global Virus Network, Baltimore, MD 21201, USA,Hokkaido University, Institute for Vaccine Research and Development (HU-IVReD)
| | - Keiichi I. Nakayama
- Department of Molecular and Cellular Biology, Medical Institute of Bioregulation, Kyushu University, Fukuoka 812-8582, Japan,Corresponding author
| |
Collapse
|
129
|
Xu D, Liu B, Wang J, Zhang Z. Bibliometric analysis of artificial intelligence for biotechnology and applied microbiology: Exploring research hotspots and frontiers. Front Bioeng Biotechnol 2022; 10:998298. [PMID: 36277390 PMCID: PMC9585160 DOI: 10.3389/fbioe.2022.998298] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 09/23/2022] [Indexed: 11/13/2022] Open
Abstract
Background: In the biotechnology and applied microbiology sectors, artificial intelligence (AI) has been extensively used in disease diagnostics, drug research and development, functional genomics, biomarker recognition, and medical imaging diagnostics. In our study, from 2000 to 2021, science publications focusing on AI in biotechnology were reviewed, and quantitative, qualitative, and modeling analyses were performed. Methods: On 6 May 2022, the Web of Science Core Collection (WoSCC) was screened for AI applications in biotechnology and applied microbiology; 3,529 studies were identified between 2000 and 2022, and analyzed. The following information was collected: publication, country or region, references, knowledgebase, institution, keywords, journal name, and research hotspots, and examined using VOSviewer and CiteSpace V bibliometric platforms. Results: We showed that 128 countries published articles related to AI in biotechnology and applied microbiology; the United States had the most publications. In addition, 584 global institutions contributed to publications, with the Chinese Academy of Science publishing the most. Reference clusters from studies were categorized into ten headings: deep learning, prediction, support vector machines (SVM), object detection, feature representation, synthetic biology, amyloid, human microRNA precursors, systems biology, and single cell RNA-Sequencing. Research frontier keywords were represented by microRNA (2012–2020) and protein-protein interactions (PPIs) (2012–2020). Conclusion: We systematically, objectively, and comprehensively analyzed AI-related biotechnology and applied microbiology literature, and additionally, identified current hot spots and future trends in this area. Our review provides researchers with a comprehensive overview of the dynamic evolution of AI in biotechnology and applied microbiology and identifies future key research areas.
Collapse
Affiliation(s)
- Dongyu Xu
- Department of Computer, School of Intelligent Medicine, China Medical University, Shenyang, Liaoning, China
| | - Bing Liu
- Department of Bone Oncology, The People’s Hospital of Liaoning Province, Shenyang, Liaoning, China
| | - Jian Wang
- Department of Pathogenic Biology, School of Basic Medicine, China Medical University, Shenyang, Liaoning, China
| | - Zhichang Zhang
- Department of Computer, School of Intelligent Medicine, China Medical University, Shenyang, Liaoning, China
- *Correspondence: Zhichang Zhang,
| |
Collapse
|
130
|
Chen H, Li D, Liao J, Wei L, Wei L. MultiscaleDTA: a multiscale-based method with a self-attention mechanism for drug-target binding affinity prediction. Methods 2022; 207:103-109. [PMID: 36155250 DOI: 10.1016/j.ymeth.2022.09.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Revised: 09/15/2022] [Accepted: 09/19/2022] [Indexed: 11/28/2022] Open
Abstract
The task of predicting drug-target affinity (DTA) plays an increasingly important role at the early stage of in silico drug discovery and development. Currently, a variety of machine learning-based methods have been presented for DTA prediction and achieved outstanding performance, which is beneficial for speeding up the development of new drugs. However, most convolutional neural networks (CNNs) based methods ignore the significance of information from CNN layers with different scales to DTA prediction. In addition, each feature provides different contributions to the final task. Therefore, in this study, we propose a novel end-to-end deep learning-based framework, called MultiscaleDTA, to predict drug-target binding affinity. MultiscaleDTA incorporates multi-scale CNNs and a self-attention mechanism to capture multi-scale and comprehensive features for characterizing the intrinsic properties of drugs and targets. Extensive experimental results on both regression and binary classification tasks demonstrate that MultiscaleDTA has achieved competitive performance compared to state-of-the-art methods.
Collapse
Affiliation(s)
- Haoyang Chen
- School of Mathematics and Statistics, Hainan Normal University, Hainan, China; School of Software, Shandong University, Jinan, China
| | - Dahe Li
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Jiaqi Liao
- School of Mathematics and Statistics, Hainan Normal University, Hainan, China
| | - Lesong Wei
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.
| | - Leyi Wei
- School of Mathematics and Statistics, Hainan Normal University, Hainan, China; School of Software, Shandong University, Jinan, China.
| |
Collapse
|
131
|
Yaseen A, Amin I, Akhter N, Ben-Hur A, Minhas F. Insights into performance evaluation of compound-protein interaction prediction methods. Bioinformatics 2022; 38:ii75-ii81. [PMID: 36124806 DOI: 10.1093/bioinformatics/btac496] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Machine-learning-based prediction of compound-protein interactions (CPIs) is important for drug design, screening and repurposing. Despite numerous recent publication with increasing methodological sophistication claiming consistent improvements in predictive accuracy, we have observed a number of fundamental issues in experiment design that produce overoptimistic estimates of model performance. RESULTS We systematically analyze the impact of several factors affecting generalization performance of CPI predictors that are overlooked in existing work: (i) similarity between training and test examples in cross-validation; (ii) synthesizing negative examples in absence of experimentally verified negative examples and (iii) alignment of evaluation protocol and performance metrics with real-world use of CPI predictors in screening large compound libraries. Using both state-of-the-art approaches by other researchers as well as a simple kernel-based baseline, we have found that effective assessment of generalization performance of CPI predictors requires careful control over similarity between training and test examples. We show that, under stringent performance assessment protocols, a simple kernel-based approach can exceed the predictive performance of existing state-of-the-art methods. We also show that random pairing for generating synthetic negative examples for training and performance evaluation results in models with better generalization in comparison to more sophisticated strategies used in existing studies. Our analyses indicate that using proposed experiment design strategies can offer significant improvements for CPI prediction leading to effective target compound screening for drug repurposing and discovery of putative chemical ligands of SARS-CoV-2-Spike and Human-ACE2 proteins. AVAILABILITY AND IMPLEMENTATION Code and supplementary material available at https://github.com/adibayaseen/HKRCPI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Adiba Yaseen
- Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad 45650, Pakistan
| | - Imran Amin
- National Institute for Biotechnology and Genetic Engineering, Faisalabad 38000, Pakistan
| | - Naeem Akhter
- Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad 45650, Pakistan
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA
| | - Fayyaz Minhas
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK
| |
Collapse
|
132
|
You Y, Shen Y. Cross-modality and self-supervised protein embedding for compound-protein affinity and contact prediction. Bioinformatics 2022; 38:ii68-ii74. [PMID: 36124802 PMCID: PMC9486597 DOI: 10.1093/bioinformatics/btac470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Computational methods for compound-protein affinity and contact (CPAC) prediction aim at facilitating rational drug discovery by simultaneous prediction of the strength and the pattern of compound-protein interactions. Although the desired outputs are highly structure-dependent, the lack of protein structures often makes structure-free methods rely on protein sequence inputs alone. The scarcity of compound-protein pairs with affinity and contact labels further limits the accuracy and the generalizability of CPAC models. RESULTS To overcome the aforementioned challenges of structure naivety and labeled-data scarcity, we introduce cross-modality and self-supervised learning, respectively, for structure-aware and task-relevant protein embedding. Specifically, protein data are available in both modalities of 1D amino-acid sequences and predicted 2D contact maps that are separately embedded with recurrent and graph neural networks, respectively, as well as jointly embedded with two cross-modality schemes. Furthermore, both protein modalities are pre-trained under various self-supervised learning strategies, by leveraging massive amount of unlabeled protein data. Our results indicate that individual protein modalities differ in their strengths of predicting affinities or contacts. Proper cross-modality protein embedding combined with self-supervised learning improves model generalizability when predicting both affinities and contacts for unseen proteins. AVAILABILITY AND IMPLEMENTATION Data and source codes are available at https://github.com/Shen-Lab/CPAC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuning You
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
133
|
Hu JX, Yang Y, Xu YY, Shen HB. GraphLoc: a graph neural network model for predicting protein subcellular localization from immunohistochemistry images. Bioinformatics 2022; 38:4941-4948. [DOI: 10.1093/bioinformatics/btac634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/07/2022] [Accepted: 09/15/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Recognition of protein subcellular distribution patterns and identification of location biomarker proteins in cancer tissues are important for understanding protein functions and related diseases. Immunohistochemical (IHC) images enable visualizing the distribution of proteins at the tissue level, providing an important resource for the protein localization studies. In the past decades, several image-based protein subcellular location prediction methods have been developed, but the prediction accuracies still have much space to improve due to the complexity of protein patterns resulting from multi-label proteins and variation of location patterns across cell types or states.
Results
Here, we propose a multi-label multi-instance model based on deep graph convolutional neural networks, GraphLoc, to recognize protein subcellular location patterns. GraphLoc builds a graph of multiple IHC images for one protein, learns protein-level representations by graph convolutions, and predicts multi-label information by a dynamic threshold method. Our results show that GraphLoc is a promising model for image-based protein subcellular location prediction with model interpretability. Furthermore, we apply GraphLoc to the identification of candidate location biomarkers and potential members for protein networks. A large portion of the predicted results have supporting evidence from the existing literatures and the new candidates also provide guidance for further experimental screening.
Availability
The dataset and code are available at: www.csbio.sjtu.edu.cn/bioinf/GraphLoc.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jin-Xian Hu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing , Ministry of Education of China, Shanghai 200240, China
| | - Yang Yang
- Shanghai Jiao Tong University Department of Computer Science and Engineering, Center for Brain-Like Computing and Machine Intelligence, , Shanghai 200240, China
| | - Ying-Ying Xu
- Southern Medical University School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, , Guangzhou 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University , Guangzhou 510515, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing , Ministry of Education of China, Shanghai 200240, China
| |
Collapse
|
134
|
Qian Y, Wu J, Zhang Q. CAT-CPI: Combining CNN and transformer to learn compound image features for predicting compound-protein interactions. Front Mol Biosci 2022; 9:963912. [PMID: 36188230 PMCID: PMC9520300 DOI: 10.3389/fmolb.2022.963912] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 08/30/2022] [Indexed: 11/13/2022] Open
Abstract
Compound-protein interaction (CPI) prediction is a foundational task for drug discovery, which process is time-consuming and costly. The effectiveness of CPI prediction can be greatly improved using deep learning methods to accelerate drug development. Large number of recent research results in the field of computer vision, especially in deep learning, have proved that the position, geometry, spatial structure and other features of objects in an image can be well characterized. We propose a novel molecular image-based model named CAT-CPI (combining CNN and transformer to predict CPI) for CPI task. We use Convolution Neural Network (CNN) to learn local features of molecular images and then use transformer encoder to capture the semantic relationships of these features. To extract protein sequence feature, we propose to use a k-gram based method and obtain the semantic relationships of sub-sequences by transformer encoder. In addition, we build a Feature Relearning (FR) module to learn interaction features of compounds and proteins. We evaluated CAT-CPI on three benchmark datasets—Human, Celegans, and Davis—and the experimental results demonstrate that CAT-CPI presents competitive performance against state-of-the-art predictors. In addition, we carry out Drug-Drug Interaction (DDI) experiments to verify the strong potential of the methods based on molecular images and FR module.
Collapse
|
135
|
Aldeghi M, Coley CW. A graph representation of molecular ensembles for polymer property prediction. Chem Sci 2022; 13:10486-10498. [PMID: 36277616 PMCID: PMC9473492 DOI: 10.1039/d2sc02839e] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 08/15/2022] [Indexed: 12/02/2022] Open
Abstract
Synthetic polymers are versatile and widely used materials. Similar to small organic molecules, a large chemical space of such materials is hypothetically accessible. Computational property prediction and virtual screening can accelerate polymer design by prioritizing candidates expected to have favorable properties. However, in contrast to organic molecules, polymers are often not well-defined single structures but an ensemble of similar molecules, which poses unique challenges to traditional chemical representations and machine learning approaches. Here, we introduce a graph representation of molecular ensembles and an associated graph neural network architecture that is tailored to polymer property prediction. We demonstrate that this approach captures critical features of polymeric materials, like chain architecture, monomer stoichiometry, and degree of polymerization, and achieves superior accuracy to off-the-shelf cheminformatics methodologies. While doing so, we built a dataset of simulated electron affinity and ionization potential values for >40k polymers with varying monomer composition, stoichiometry, and chain architecture, which may be used in the development of other tailored machine learning approaches. The dataset and machine learning models presented in this work pave the path toward new classes of algorithms for polymer informatics and, more broadly, introduce a framework for the modeling of molecular ensembles.
Collapse
Affiliation(s)
- Matteo Aldeghi
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
136
|
Wang J, Hu J, Sun H, Xu M, Yu Y, Liu Y, Cheng L. MGPLI: Exploring Multigranular Representations for Protein-Ligand Interaction Prediction. Bioinformatics 2022; 38:4859-4867. [PMID: 36094335 DOI: 10.1093/bioinformatics/btac597] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 07/22/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The capability to predict the potential drug binding affinity against a protein target has always been a fundamental challenge in-silico drug discovery. The traditional experiments in vitro and in vivo are costly and time-consuming which need to search over large compound space. Recent years have witnessed significant success on deep learning-based models for drug-target binding affinity (DTA) prediction task. RESULTS Following the recent success of the Transformer model, we propose a multi-granularity protein ligand interaction (MGPLI) model, which adopts the Transformer encoders to represent the character-level features and fragment-level features, modeling the possible interaction between residues and atoms or their segments. In addition, we use the Convolutional Neural Network (CNN) to extract higher-level features based on transformer encoder outputs and a highway layer to fuse the protein and drug features. We evaluate MGPLI on different protein ligand interaction datasets and show the improvement of prediction performance compared to state-of-the-art baselines. AVAILABILITY The model scripts are available at https://github.com/IILab-Resource/MGDTA.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Junjie Wang
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Jie Hu
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Huiting Sun
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, China
| | - MengDie Xu
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Yun Yu
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Yun Liu
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, China.,Institute of Medical Informatics and Management, Nanjing Medical University, No.300 Guang Zhou Road, Nanjing, Jiangsu, China.,Department of Information, the First Affiliated Hospital, Nanjing Medical University, No.300 Guang Zhou Road, Nanjing, Jiangsu, China
| | - Liang Cheng
- NHC and CAMS Key Laboratory of Molecular Probe and Targeted Theranostics, Harbin Medical University, Harbin, HeiLongJiang, China.,College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, HeiLongJiang, China
| |
Collapse
|
137
|
Baranwal M, Magner A, Saldinger J, Turali-Emre ES, Elvati P, Kozarekar S, VanEpps JS, Kotov NA, Violi A, Hero AO. Struct2Graph: a graph attention network for structure based predictions of protein-protein interactions. BMC Bioinformatics 2022; 23:370. [PMID: 36088285 PMCID: PMC9464414 DOI: 10.1186/s12859-022-04910-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 08/26/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Development of new methods for analysis of protein-protein interactions (PPIs) at molecular and nanometer scales gives insights into intracellular signaling pathways and will improve understanding of protein functions, as well as other nanoscale structures of biological and abiological origins. Recent advances in computational tools, particularly the ones involving modern deep learning algorithms, have been shown to complement experimental approaches for describing and rationalizing PPIs. However, most of the existing works on PPI predictions use protein-sequence information, and thus have difficulties in accounting for the three-dimensional organization of the protein chains. RESULTS In this study, we address this problem and describe a PPI analysis based on a graph attention network, named Struct2Graph, for identifying PPIs directly from the structural data of folded protein globules. Our method is capable of predicting the PPI with an accuracy of 98.89% on the balanced set consisting of an equal number of positive and negative pairs. On the unbalanced set with the ratio of 1:10 between positive and negative pairs, Struct2Graph achieves a fivefold cross validation average accuracy of 99.42%. Moreover, Struct2Graph can potentially identify residues that likely contribute to the formation of the protein-protein complex. The identification of important residues is tested for two different interaction types: (a) Proteins with multiple ligands competing for the same binding area, (b) Dynamic protein-protein adhesion interaction. Struct2Graph identifies interacting residues with 30% sensitivity, 89% specificity, and 87% accuracy. CONCLUSIONS In this manuscript, we address the problem of prediction of PPIs using a first of its kind, 3D-structure-based graph attention network (code available at https://github.com/baranwa2/Struct2Graph ). Furthermore, the novel mutual attention mechanism provides insights into likely interaction sites through its unsupervised knowledge selection process. This study demonstrates that a relatively low-dimensional feature embedding learned from graph structures of individual proteins outperforms other modern machine learning classifiers based on global protein features. In addition, through the analysis of single amino acid variations, the attention mechanism shows preference for disease-causing residue variations over benign polymorphisms, demonstrating that it is not limited to interface residues.
Collapse
Affiliation(s)
- Mayank Baranwal
- Division of Data and Decision Sciences, Tata Consultancy Services Research, Mumbai, India
- Systems and Control Engineering Group, Indian Institute of Technology, Bombay, India
| | - Abram Magner
- Department of Computer Science, University of Albany, SUNY, Albany, USA
| | - Jacob Saldinger
- Department of Chemical Engineering, University of Michigan, Ann Arbor, USA
| | | | - Paolo Elvati
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, USA
| | - Shivani Kozarekar
- Department of Chemical Engineering, University of Michigan, Ann Arbor, USA
| | - J. Scott VanEpps
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, USA
- Department of Emergency Medicine, University of Michigan, Ann Arbor, USA
- Biointerfaces Institute, University of Michigan, Ann Arbor, USA
| | - Nicholas A. Kotov
- Department of Chemical Engineering, University of Michigan, Ann Arbor, USA
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, USA
- Biointerfaces Institute, University of Michigan, Ann Arbor, USA
- Department of Materials Science and Engineering, University of Michigan, Ann Arbor, USA
| | - Angela Violi
- Department of Chemical Engineering, University of Michigan, Ann Arbor, USA
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, USA
- Biophysics Program, University of Michigan, Ann Arbor, USA
| | - Alfred O. Hero
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, USA
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, USA
- Department of Statistics, University of Michigan, Ann Arbor, USA
- Program in Applied Interdisciplinary Mathematics, University of Michigan, Ann Arbor, USA
- Program in Bioinformatics, University of Michigan, Ann Arbor, USA
| |
Collapse
|
138
|
Wang C, Chen Y, Zhang Y, Li K, Lin M, Pan F, Wu W, Zhang J. A reinforcement learning approach for protein-ligand binding pose prediction. BMC Bioinformatics 2022; 23:368. [PMID: 36076158 PMCID: PMC9454149 DOI: 10.1186/s12859-022-04912-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 08/25/2022] [Indexed: 11/10/2022] Open
Abstract
Protein ligand docking is an indispensable tool for computational prediction of protein functions and screening drug candidates. Despite significant progress over the past two decades, it is still a challenging problem, characterized by the still limited understanding of the energetics between proteins and ligands, and the vast conformational space that has to be searched to find a satisfactory solution. In this project, we developed a novel reinforcement learning (RL) approach, the asynchronous advantage actor-critic model (A3C), to address the protein ligand docking problem. The overall framework consists of two models. During the search process, the agent takes an action selected by the actor model based on the current location. The critic model then evaluates this action and predict the distance between the current location and true binding site. Experimental results showed that in both single- and multi-atom cases, our model improves binding site prediction substantially compared to a naïve model. For the single-atom ligand, copper ion (Cu2+), the model predicted binding sites have a median root-mean-square-deviation (RMSD) of 2.39 Å to the true binding sites when starting from random starting locations. For the multi-atom ligand, sulfate ion (SO42-), the predicted binding sites have a median RMSD of 3.82 Å to the true binding sites. The ligand-specific models built in this study can be used in solvent mapping studies and the RL framework can be readily scaled up to larger and more diverse sets of ligands.
Collapse
Affiliation(s)
- Chenran Wang
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Yang Chen
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Yuan Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Keqiao Li
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Menghan Lin
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Feng Pan
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Wei Wu
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA.
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA.
| |
Collapse
|
139
|
Li Q, Zhang X, Wu L, Bo X, He S, Wang S. PLA-MoRe: A Protein-Ligand Binding Affinity Prediction Model via Comprehensive Molecular Representations. J Chem Inf Model 2022; 62:4380-4390. [PMID: 36054653 DOI: 10.1021/acs.jcim.2c00960] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Accurately predicting the binding affinity of protein-ligand pairs is an essential part of drug discovery. Since wet laboratory experiments to determine the binding affinity are expensive and time-consuming, several computational methods for binding affinity prediction have been proposed. In the representation of compounds, most methods only focus on the structural properties such as SMILES and ignore the bioactive properties. In this study, we proposed a novel model named PLA-MoRe to predict protein-ligand binding affinity, which represents compounds based on both structural and bioactive properties and mainly contains three feature extractors. First, a structure feature extractor based on the graph isomorphism network was constructed to learn the representations of the molecular graphs. Second, we designed an Autoencoder-based bioactive feature extractor to integrate the multisource bioactive information including chemical, target, network, cellular, and clinical. The above two parts aimed to learn representations of compounds in terms of structures and bioactivities, respectively. Then, we constructed a sequence feature extractor to learn embeddings for protein sequences. The output of the three extractors was concatenated and fed into a fully connected network for affinity prediction. We compared PLA-MoRe with three state-of-the-art methods, and an ablation study was conducted to test the role of each part of the model. Further attention visualization showed that our model had the potential to locate the binding sites, which might help explain the mechanism of interaction. These results prove that PLA-MoRe is competitive and reliable. The resource codes are freely available at the GitHub repository https://github.com/QingyuLiaib/PLA-MoRe.
Collapse
Affiliation(s)
- Qingyu Li
- Beijing Institute of Microbiology and Epidemiology, Beijing 100850, China
| | - Xiaochang Zhang
- Beijing Institute of Microbiology and Epidemiology, Beijing 100850, China
| | - Lianlian Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China.,Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Song He
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Shengqi Wang
- Beijing Institute of Microbiology and Epidemiology, Beijing 100850, China
| |
Collapse
|
140
|
Cheng Z, Zhao Q, Li Y, Wang J. IIFDTI: predicting drug-target interactions through interactive and independent features based on attention mechanism. Bioinformatics 2022; 38:4153-4161. [PMID: 35801934 DOI: 10.1093/bioinformatics/btac485] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 05/02/2022] [Accepted: 07/07/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Identifying drug-target interactions is a crucial step for drug discovery and design. Traditional biochemical experiments are credible to accurately validate drug-target interactions. However, they are also extremely laborious, time-consuming and expensive. With the collection of more validated biomedical data and the advancement of computing technology, the computational methods based on chemogenomics gradually attract more attention, which guide the experimental verifications. RESULTS In this study, we propose an end-to-end deep learning-based method named IIFDTI to predict drug-target interactions (DTIs) based on independent features of drug-target pairs and interactive features of their substructures. First, the interactive features of substructures between drugs and targets are extracted by the bidirectional encoder-decoder architecture. The independent features of drugs and targets are extracted by the graph neural networks and convolutional neural networks, respectively. Then, all extracted features are fused and inputted into fully connected dense layers in downstream tasks for predicting DTIs. IIFDTI takes into account the independent features of drugs/targets and simulates the interactive features of the substructures from the biological perspective. Multiple experiments show that IIFDTI outperforms the state-of-the-art methods in terms of the area under the receiver operating characteristics curve (AUC), the area under the precision-recall curve (AUPR), precision, and recall on benchmark datasets. In addition, the mapped visualizations of attention weights indicate that IIFDTI has learned the biological knowledge insights, and two case studies illustrate the capabilities of IIFDTI in practical applications. AVAILABILITY AND IMPLEMENTATION The data and codes underlying this article are available in Github at https://github.com/czjczj/IIFDTI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhongjian Cheng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Qichang Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
141
|
Osaki K, Ekimoto T, Yamane T, Ikeguchi M. 3D-RISM-AI: A Machine Learning Approach to Predict Protein-Ligand Binding Affinity Using 3D-RISM. J Phys Chem B 2022; 126:6148-6158. [PMID: 35969673 PMCID: PMC9421647 DOI: 10.1021/acs.jpcb.2c03384] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 07/27/2022] [Indexed: 11/30/2022]
Abstract
Hydration free energy (HFE) is a key factor in improving protein-ligand binding free energy (BFE) prediction accuracy. The HFE itself can be calculated using the three-dimensional reference interaction model (3D-RISM); however, the BFE predictions solely evaluated using 3D-RISM are not correlated to the experimental BFE for abundant protein-ligand pairs. In this study, to predict the BFE for multiple sets of protein-ligand pairs, we propose a machine learning approach incorporating the HFEs obtained using 3D-RISM, termed 3D-RISM-AI. In the learning process, structural metrics, intra-/intermolecular energies, and HFEs obtained via 3D-RISM of ∼4000 complexes in the PDBbind database (ver. 2018) were used. The BFEs predicted using 3D-RISM-AI were well correlated to the experimental data (Pearson's correlation coefficient of 0.80 and root-mean-square error of 1.91 kcal/mol). As important factors for the prediction, the difference in the solvent accessible surface area between the bound and unbound structures and the hydration properties of the ligands were detected during the learning process.
Collapse
Affiliation(s)
- Kazu Osaki
- Graduate
School of Medical Life Science, Yokohama
City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Toru Ekimoto
- Graduate
School of Medical Life Science, Yokohama
City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Tsutomu Yamane
- Center
for Computational Science, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Mitsunori Ikeguchi
- Graduate
School of Medical Life Science, Yokohama
City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
- Center
for Computational Science, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| |
Collapse
|
142
|
Yan H, Xie Y, Liu Y, Yuan L, Sheng R. ComABAN: refining molecular representation with the graph attention mechanism to accelerate drug discovery. Brief Bioinform 2022; 23:6674166. [PMID: 35998925 DOI: 10.1093/bib/bbac350] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 07/16/2022] [Accepted: 07/27/2022] [Indexed: 11/14/2022] Open
Abstract
An unsolved challenge in developing molecular representation is determining an optimal method to characterize the molecular structure. Comprehension of intramolecular interactions is paramount toward achieving this goal. In this study, ComABAN, a new graph-attention-based approach, is proposed to improve the accuracy of molecular representation by simultaneously considering atom-atom, bond-bond and atom-bond interactions. In addition, we benchmark models extensively on 8 public and 680 proprietary industrial datasets spanning a wide variety of chemical end points. The results show that ComABAN has higher prediction accuracy compared with the classical machine learning method and the deep learning-based methods. Furthermore, the trained neural network was used to predict a library of 1.5 million molecules and picked out compounds with a classification result of grade I. Subsequently, these predicted molecules were scored and ranked using cascade docking, molecular dynamics simulations to generate five potential candidates. All five molecules showed high similarity to nanomolar bioactive inhibitors suppressing the expression of HIF-1α, and we synthesized three compounds (Y-1, Y-3, Y-4) and tested their inhibitory ability in vitro. Our results indicate that ComABAN is an effective tool for accelerating drug discovery.
Collapse
Affiliation(s)
- Huihui Yan
- Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, P. R. China Fax/Tel: 86-571-8820-845 E-mail:
| | - Yuanyuan Xie
- Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals, Zhejiang University of Technology, Hangzhou, 310014, P. R. China
| | - Yao Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, P. R. China Fax/Tel: 86-571-8820-845 E-mail:
| | - Leer Yuan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, P. R. China Fax/Tel: 86-571-8820-845 E-mail:
| | - Rong Sheng
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, P. R. China Fax/Tel: 86-571-8820-845 E-mail:
| |
Collapse
|
143
|
Prediction of Potential Commercially Available Inhibitors against SARS-CoV-2 by Multi-Task Deep Learning Model. Biomolecules 2022; 12:biom12081156. [PMID: 36009050 PMCID: PMC9405964 DOI: 10.3390/biom12081156] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 08/16/2022] [Accepted: 08/18/2022] [Indexed: 11/16/2022] Open
Abstract
The outbreak of COVID-19 caused millions of deaths worldwide, and the number of total infections is still rising. It is necessary to identify some potentially effective drugs that can be used to prevent the development of severe symptoms, or even death for those infected. Fortunately, many efforts have been made and several effective drugs have been identified. The rapidly increasing amount of data is of great help for training an effective and specific deep learning model. In this study, we propose a multi-task deep learning model for the purpose of screening commercially available and effective inhibitors against SARS-CoV-2. First, we pretrained a model on several heterogenous protein-ligand interaction datasets. The model achieved competitive results on some benchmark datasets. Next, a coronavirus-specific dataset was collected and used to fine-tune the model. Then, the fine-tuned model was used to select commercially available drugs against SARS-CoV-2 protein targets. Overall, twenty compounds were listed as potential inhibitors. We further explored the model interpretability and exhibited the predicted important binding sites. Based on this prediction, molecular docking was also performed to visualize the binding modes of the selected inhibitors.
Collapse
|
144
|
Hu F, Jiang J, Yin P. Prediction of Potential Commercially Available Inhibitors against SARS-CoV-2 by Multi-Task Deep Learning Model. Biomolecules 2022. [PMID: 36009050 DOI: 10.48550/arxiv.2003.00728] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023] Open
Abstract
The outbreak of COVID-19 caused millions of deaths worldwide, and the number of total infections is still rising. It is necessary to identify some potentially effective drugs that can be used to prevent the development of severe symptoms, or even death for those infected. Fortunately, many efforts have been made and several effective drugs have been identified. The rapidly increasing amount of data is of great help for training an effective and specific deep learning model. In this study, we propose a multi-task deep learning model for the purpose of screening commercially available and effective inhibitors against SARS-CoV-2. First, we pretrained a model on several heterogenous protein-ligand interaction datasets. The model achieved competitive results on some benchmark datasets. Next, a coronavirus-specific dataset was collected and used to fine-tune the model. Then, the fine-tuned model was used to select commercially available drugs against SARS-CoV-2 protein targets. Overall, twenty compounds were listed as potential inhibitors. We further explored the model interpretability and exhibited the predicted important binding sites. Based on this prediction, molecular docking was also performed to visualize the binding modes of the selected inhibitors.
Collapse
Affiliation(s)
- Fan Hu
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Jiaxin Jiang
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Peng Yin
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| |
Collapse
|
145
|
ML helps predict enzyme turnover rates. Nat Catal 2022. [DOI: 10.1038/s41929-022-00827-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
146
|
Duan B, Peng J, Zhang Y. IMSE: interaction information attention and molecular structure based drug drug interaction extraction. BMC Bioinformatics 2022; 23:338. [PMID: 35965308 PMCID: PMC9375903 DOI: 10.1186/s12859-022-04876-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 08/03/2022] [Indexed: 11/10/2022] Open
Abstract
Background Extraction of drug drug interactions from biomedical literature and other textual data is an important component to monitor drug-safety and this has attracted attention of many researchers in healthcare. Existing works are more pivoted around relation extraction using bidirectional long short-term memory networks (BiLSTM) and BERT model which does not attain the best feature representations. Results Our proposed DDI (drug drug interaction) prediction model provides multiple advantages: (1) The newly proposed attention vector is added to better deal with the problem of overlapping relations, (2) The molecular structure information of drugs is integrated into the model to better express the functional group structure of drugs, (3) We also added text features that combined the T-distribution and chi-square distribution to make the model more focused on drug entities and (4) it achieves similar or better prediction performance (F-scores up to 85.16%) compared to state-of-the-art DDI models when tested on benchmark datasets. Conclusions Our model that leverages state of the art transformer architecture in conjunction with multiple features can bolster the performances of drug drug interation tasks in the biomedical domain. In particular, we believe our research would be helpful in identification of potential adverse drug reactions.
Collapse
|
147
|
Lim S, Lee S, Piao Y, Choi M, Bang D, Gu J, Kim S. On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach. Comput Struct Biotechnol J 2022; 20:4288-4304. [PMID: 36051875 PMCID: PMC9399946 DOI: 10.1016/j.csbj.2022.07.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 07/29/2022] [Accepted: 07/29/2022] [Indexed: 11/22/2022] Open
Abstract
A large number of chemical compounds are available in databases such as PubChem and ZINC. However, currently known compounds, though large, represent only a fraction of possible compounds, which is known as chemical space. Many of these compounds in the databases are annotated with properties and assay data that can be used for drug discovery efforts. For this goal, a number of machine learning algorithms have been developed and recent deep learning technologies can be effectively used to navigate chemical space, especially for unknown chemical compounds, in terms of drug-related tasks. In this article, we survey how deep learning technologies can model and utilize chemical compound information in a task-oriented way by exploiting annotated properties and assay data in the chemical compounds databases. We first compile what kind of tasks are trying to be accomplished by machine learning methods. Then, we survey deep learning technologies to show their modeling power and current applications for accomplishing drug related tasks. Next, we survey deep learning techniques to address the insufficiency issue of annotated data for more effective navigation of chemical space. Chemical compound information alone may not be powerful enough for drug related tasks, thus we survey what kind of information, such as assay and gene expression data, can be used to improve the prediction power of deep learning models. Finally, we conclude this survey with four important newly developed technologies that are yet to be fully incorporated into computational analysis of chemical information.
Collapse
Affiliation(s)
- Sangsoo Lim
- Bioinformatics Institute, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Sangseon Lee
- Institute of Computer Technology, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Yinhua Piao
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - MinGyu Choi
- Department of Chemistry, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Jeonghyeon Gu
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- MOGAM Institute for Biomedical Research, Yong-in 16924, South Korea
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| |
Collapse
|
148
|
Monteiro NR, Oliveira JL, Arrais JP. DTITR: End-to-end drug–target binding affinity prediction with transformers. Comput Biol Med 2022; 147:105772. [DOI: 10.1016/j.compbiomed.2022.105772] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 06/07/2022] [Accepted: 06/19/2022] [Indexed: 11/03/2022]
|
149
|
Yang Z, Zhong W, Lv Q, Yu-Chian Chen C. Learning size-adaptive molecular substructures for explainable drug-drug interaction prediction by substructure-aware graph neural network. Chem Sci 2022; 13:8693-8703. [PMID: 35974769 PMCID: PMC9337739 DOI: 10.1039/d2sc02023h] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 07/06/2022] [Indexed: 01/03/2023] Open
Abstract
Drug-drug interactions (DDIs) can trigger unexpected pharmacological effects on the body, and the causal mechanisms are often unknown. Graph neural networks (GNNs) have been developed to better understand DDIs. However, identifying key substructures that contribute most to the DDI prediction is a challenge for GNNs. In this study, we presented a substructure-aware graph neural network, a message passing neural network equipped with a novel substructure attention mechanism and a substructure-substructure interaction module (SSIM) for DDI prediction (SA-DDI). Specifically, the substructure attention was designed to capture size- and shape-adaptive substructures based on the chemical intuition that the sizes and shapes are often irregular for functional groups in molecules. DDIs are fundamentally caused by chemical substructure interactions. Thus, the SSIM was used to model the substructure-substructure interactions by highlighting important substructures while de-emphasizing the minor ones for DDI prediction. We evaluated our approach in two real-world datasets and compared the proposed method with the state-of-the-art DDI prediction models. The SA-DDI surpassed other approaches on the two datasets. Moreover, the visual interpretation results showed that the SA-DDI was sensitive to the structure information of drugs and was able to detect the key substructures for DDIs. These advantages demonstrated that the proposed method improved the generalization and interpretation capability of DDI prediction modeling.
Collapse
Affiliation(s)
- Ziduo Yang
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen 510275 China +86 02039332153
| | - Weihe Zhong
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen 510275 China +86 02039332153
| | - Qiujie Lv
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen 510275 China +86 02039332153
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen 510275 China +86 02039332153
- Department of Medical Research, China Medical University Hospital Taichung 40447 Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University Taichung 41354 Taiwan
| |
Collapse
|
150
|
Bui-Thi D, Rivière E, Meysman P, Laukens K. Predicting compound-protein interaction using hierarchical graph convolutional networks. PLoS One 2022; 17:e0258628. [PMID: 35862351 PMCID: PMC9302762 DOI: 10.1371/journal.pone.0258628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 06/12/2022] [Indexed: 11/18/2022] Open
Abstract
Motivation
Convolutional neural networks have enabled unprecedented breakthroughs in a variety of computer vision tasks. They have also drawn much attention from other domains, including drug discovery and drug development. In this study, we develop a computational method based on convolutional neural networks to tackle a fundamental question in drug discovery and development, i.e. the prediction of compound-protein interactions based on compound structure and protein sequence. We propose a hierarchical graph convolutional network (HGCN) to encode small molecules. The HGCN aggregates a molecule embedding from substructure embeddings, which are synthesized from atom embeddings. As small molecules usually share substructures, computing a molecule embedding from those common substructures allows us to learn better generic models. We then combined the HGCN with a one-dimensional convolutional network to construct a complete model for predicting compound-protein interactions. Furthermore we apply an explanation technique, Grad-CAM, to visualize the contribution of each amino acid into the prediction.
Results
Experiments using different datasets show the improvement of our model compared to other GCN-based methods and a sequence based method, DeepDTA, in predicting compound-protein interactions. Each prediction made by the model is also explainable and can be used to identify critical residues mediating the interaction.
Collapse
Affiliation(s)
- Danh Bui-Thi
- Adrem Data Lab, University of Antwerp, Antwerp, Belgium
| | | | | | - Kris Laukens
- Adrem Data Lab, University of Antwerp, Antwerp, Belgium
- * E-mail:
| |
Collapse
|