201
|
Kim J, Park S, Min D, Kim W. Comprehensive Survey of Recent Drug Discovery Using Deep Learning. Int J Mol Sci 2021; 22:9983. [PMID: 34576146 PMCID: PMC8470987 DOI: 10.3390/ijms22189983] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/09/2021] [Accepted: 09/10/2021] [Indexed: 02/07/2023] Open
Abstract
Drug discovery based on artificial intelligence has been in the spotlight recently as it significantly reduces the time and cost required for developing novel drugs. With the advancement of deep learning (DL) technology and the growth of drug-related data, numerous deep-learning-based methodologies are emerging at all steps of drug development processes. In particular, pharmaceutical chemists have faced significant issues with regard to selecting and designing potential drugs for a target of interest to enter preclinical testing. The two major challenges are prediction of interactions between drugs and druggable targets and generation of novel molecular structures suitable for a target of interest. Therefore, we reviewed recent deep-learning applications in drug-target interaction (DTI) prediction and de novo drug design. In addition, we introduce a comprehensive summary of a variety of drug and protein representations, DL models, and commonly used benchmark datasets or tools for model training and testing. Finally, we present the remaining challenges for the promising future of DL-based DTI prediction and de novo drug design.
Collapse
Affiliation(s)
- Jintae Kim
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
| | - Sera Park
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
| | - Dongbo Min
- Computer Vision Lab, Department of Computer Science and Engineering, Ewha Womans University, Seoul 03760, Korea
| | - Wankyu Kim
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
- System Pharmacology Lab, Department of Life Sciences, Ewha Womans University, Seoul 03760, Korea
| |
Collapse
|
202
|
Zuo Z, Wang P, Chen X, Tian L, Ge H, Qian D. SWnet: a deep learning model for drug response prediction from cancer genomic signatures and compound chemical structures. BMC Bioinformatics 2021; 22:434. [PMID: 34507532 PMCID: PMC8434731 DOI: 10.1186/s12859-021-04352-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 08/31/2021] [Indexed: 12/13/2022] Open
Abstract
Background One of the major challenges in precision medicine is accurate prediction of individual patient’s response to drugs. A great number of computational methods have been developed to predict compounds activity using genomic profiles or chemical structures, but more exploration is yet to be done to combine genetic mutation, gene expression, and cheminformatics in one machine learning model. Results We presented here a novel deep-learning model that integrates gene expression, genetic mutation, and chemical structure of compounds in a multi-task convolutional architecture. We applied our model to the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets. We selected relevant cancer-related genes based on oncology genetics database and L1000 landmark genes, and used their expression and mutations as genomic features in model training. We obtain the cheminformatics features for compounds from PubChem or ChEMBL. Our finding is that combining gene expression, genetic mutation, and cheminformatics features greatly enhances the predictive performance. Conclusion We implemented an extended Graph Neural Network for molecular graphs and Convolutional Neural Network for gene features. With the employment of multi-tasking and self-attention functions to monitor the similarity between compounds, our model outperforms recently published methods using the same training and testing datasets. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04352-9.
Collapse
Affiliation(s)
- Zhaorui Zuo
- Institute of Medical Robotics, Shanghai Jiao Tong University, 2F of the Translational Medicine Building, No. 800 Dongchuan Road, Shanghai, 200000, China
| | - Penglei Wang
- Institute of Medical Robotics, Shanghai Jiao Tong University, 2F of the Translational Medicine Building, No. 800 Dongchuan Road, Shanghai, 200000, China
| | - Xiaowei Chen
- Novartis Institutes for Biomedical Research, 4218 Jinke Road, Pudong, Shanghai, 201203, China
| | - Li Tian
- Novartis Institutes for Biomedical Research, 4218 Jinke Road, Pudong, Shanghai, 201203, China
| | - Hui Ge
- Novartis Institutes for Biomedical Research, 4218 Jinke Road, Pudong, Shanghai, 201203, China.
| | - Dahong Qian
- Institute of Medical Robotics, Shanghai Jiao Tong University, 2F of the Translational Medicine Building, No. 800 Dongchuan Road, Shanghai, 200000, China.
| |
Collapse
|
203
|
Zheng J, Xiao X, Qiu WR. iCDI-W2vCom: Identifying the Ion Channel-Drug Interaction in Cellular Networking Based on word2vec and node2vec. Front Genet 2021; 12:738274. [PMID: 34567088 PMCID: PMC8458815 DOI: 10.3389/fgene.2021.738274] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 08/02/2021] [Indexed: 12/04/2022] Open
Abstract
Ion channels are the second largest drug target family. Ion channel dysfunction may lead to a number of diseases such as Alzheimer's disease, epilepsy, cephalagra, and type II diabetes. In the research work for predicting ion channel-drug, computational approaches are effective and efficient compared with the costly, labor-intensive, and time-consuming experimental methods. Most of the existing methods can only be used to deal with the ion channels of knowing 3D structures; however, the 3D structures of most ion channels are still unknown. Many predictors based on protein sequence were developed to address the challenge, while most of their results need to be improved, or predicting web servers are missing. In this paper, a sequence-based classifier, called "iCDI-W2vCom," was developed to identify the interactions between ion channels and drugs. In the predictor, the drug compound was formulated by SMILES-word2vec, FP2-word2vec, SMILES-node2vec, and ECFPs via a 1184D vector, ion channel was represented by the word2vec via a 64D vector, and the prediction engine was operated by the LightGBM classifier. The accuracy and AUC achieved by iCDI-W2vCom via the fivefold cross validation were 91.95% and 0.9703, which outperformed other existing predictors in this area. A user-friendly web server for iCDI-W2vCom was established at http://www.jci-bioinfo.cn/icdiw2v. The proposed method may also be a potential method for predicting target-drug interaction.
Collapse
Affiliation(s)
| | - Xuan Xiao
- Department of Computer Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Wang-Ren Qiu
- Department of Computer Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| |
Collapse
|
204
|
Graph classification based on skeleton and component features. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
205
|
Zhang S, Jiang M, Wang S, Wang X, Wei Z, Li Z. SAG-DTA: Prediction of Drug-Target Affinity Using Self-Attention Graph Network. Int J Mol Sci 2021; 22:ijms22168993. [PMID: 34445696 PMCID: PMC8396496 DOI: 10.3390/ijms22168993] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/14/2021] [Accepted: 08/17/2021] [Indexed: 11/16/2022] Open
Abstract
The prediction of drug–target affinity (DTA) is a crucial step for drug screening and discovery. In this study, a new graph-based prediction model named SAG-DTA (self-attention graph drug–target affinity) was implemented. Unlike previous graph-based methods, the proposed model utilized self-attention mechanisms on the drug molecular graph to obtain effective representations of drugs for DTA prediction. Features of each atom node in the molecular graph were weighted using an attention score before being aggregated as molecule representation. Various self-attention scoring methods were compared in this study. In addition, two pooing architectures, namely, global and hierarchical architectures, were presented and evaluated on benchmark datasets. Results of comparative experiments on both regression and binary classification tasks showed that SAG-DTA was superior to previous sequence-based or other graph-based methods and exhibited good generalization ability.
Collapse
Affiliation(s)
- Shugang Zhang
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (Z.W.)
| | - Mingjian Jiang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266033, China;
| | - Shuang Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China;
| | | | - Zhiqiang Wei
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (Z.W.)
| | - Zhen Li
- College of Computer Science and Technology, Qingdao University, Qingdao 266071, China
- Correspondence: ; Tel./Fax: +86-532-85953086
| |
Collapse
|
206
|
Asada M, Miwa M, Sasaki Y. Using drug descriptions and molecular structures for drug-drug interaction extraction from literature. Bioinformatics 2021; 37:1739-1746. [PMID: 33098410 PMCID: PMC8289381 DOI: 10.1093/bioinformatics/btaa907] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 10/07/2020] [Accepted: 10/09/2020] [Indexed: 12/04/2022] Open
Abstract
Motivation Neural methods to extract drug–drug interactions (DDIs) from literature require a large number of annotations. In this study, we propose a novel method to effectively utilize external drug database information as well as information from large-scale plain text for DDI extraction. Specifically, we focus on drug description and molecular structure information as the drug database information. Results We evaluated our approach on the DDIExtraction 2013 shared task dataset. We obtained the following results. First, large-scale raw text information can greatly improve the performance of extracting DDIs when combined with the existing model and it shows the state-of-the-art performance. Second, each of drug description and molecular structure information is helpful to further improve the DDI performance for some specific DDI types. Finally, the simultaneous use of the drug description and molecular structure information can significantly improve the performance on all the DDI types. We showed that the plain text, the drug description information and molecular structure information are complementary and their effective combination is essential for the improvement. Availability and implementation Our code is available at https://github.com/tticoin/DESC_MOL-DDIE.
Collapse
Affiliation(s)
- Masaki Asada
- Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya 468-8511, Japan
| | - Makoto Miwa
- Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya 468-8511, Japan
| | - Yutaka Sasaki
- Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya 468-8511, Japan
| |
Collapse
|
207
|
Vaz JM, Balaji S. Convolutional neural networks (CNNs): concepts and applications in pharmacogenomics. Mol Divers 2021; 25:1569-1584. [PMID: 34031788 PMCID: PMC8342355 DOI: 10.1007/s11030-021-10225-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Accepted: 04/21/2021] [Indexed: 12/17/2022]
Abstract
Convolutional neural networks (CNNs) have been used to extract information from various datasets of different dimensions. This approach has led to accurate interpretations in several subfields of biological research, like pharmacogenomics, addressing issues previously faced by other computational methods. With the rising attention for personalized and precision medicine, scientists and clinicians have now turned to artificial intelligence systems to provide them with solutions for therapeutics development. CNNs have already provided valuable insights into biological data transformation. Due to the rise of interest in precision and personalized medicine, in this review, we have provided a brief overview of the possibilities of implementing CNNs as an effective tool for analyzing one-dimensional biological data, such as nucleotide and protein sequences, as well as small molecular data, e.g., simplified molecular-input line-entry specification, InChI, binary fingerprints, etc., to categorize the models based on their objective and also highlight various challenges. The review is organized into specific research domains that participate in pharmacogenomics for a more comprehensive understanding. Furthermore, the future intentions of deep learning are outlined.
Collapse
Affiliation(s)
- Joel Markus Vaz
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
| | - S Balaji
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India.
| |
Collapse
|
208
|
Zhang XM, Liang L, Liu L, Tang MJ. Graph Neural Networks and Their Current Applications in Bioinformatics. Front Genet 2021; 12:690049. [PMID: 34394185 PMCID: PMC8360394 DOI: 10.3389/fgene.2021.690049] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 05/28/2021] [Indexed: 12/22/2022] Open
Abstract
Graph neural networks (GNNs), as a branch of deep learning in non-Euclidean space, perform particularly well in various tasks that process graph structure data. With the rapid accumulation of biological network data, GNNs have also become an important tool in bioinformatics. In this research, a systematic survey of GNNs and their advances in bioinformatics is presented from multiple perspectives. We first introduce some commonly used GNN models and their basic principles. Then, three representative tasks are proposed based on the three levels of structural information that can be learned by GNNs: node classification, link prediction, and graph generation. Meanwhile, according to the specific applications for various omics data, we categorize and discuss the related studies in three aspects: disease prediction, drug discovery, and biomedical imaging. Based on the analysis, we provide an outlook on the shortcomings of current studies and point out their developing prospect. Although GNNs have achieved excellent results in many biological tasks at present, they still face challenges in terms of low-quality data processing, methodology, and interpretability and have a long road ahead. We believe that GNNs are potentially an excellent method that solves various biological problems in bioinformatics research.
Collapse
Affiliation(s)
- Xiao-Meng Zhang
- School of Information, Yunnan Normal University, Kunming, China
| | - Li Liang
- School of Information, Yunnan Normal University, Kunming, China
| | - Lin Liu
- School of Information, Yunnan Normal University, Kunming, China
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, China
| | - Ming-Jing Tang
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, China
- School of Life Sciences, Yunnan Normal University, Kunming, China
| |
Collapse
|
209
|
Wang S, Jiang M, Zhang S, Wang X, Yuan Q, Wei Z, Li Z. MCN-CPI: Multiscale Convolutional Network for Compound-Protein Interaction Prediction. Biomolecules 2021; 11:1119. [PMID: 34439785 PMCID: PMC8392217 DOI: 10.3390/biom11081119] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 07/19/2021] [Accepted: 07/26/2021] [Indexed: 01/09/2023] Open
Abstract
In the process of drug discovery, identifying the interaction between the protein and the novel compound plays an important role. With the development of technology, deep learning methods have shown excellent performance in various situations. However, the compound-protein interaction is complicated and the features extracted by most deep models are not comprehensive, which limits the performance to a certain extent. In this paper, we proposed a multiscale convolutional network that extracted the local and global features of the protein and the topological feature of the compound using different types of convolutional networks. The results showed that our model obtained the best performance compared with the existing deep learning methods.
Collapse
Affiliation(s)
- Shuang Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China;
| | - Mingjian Jiang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, China;
| | - Shugang Zhang
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (X.W.); (Q.Y.); (Z.W.)
| | - Xiaofeng Wang
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (X.W.); (Q.Y.); (Z.W.)
| | - Qing Yuan
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (X.W.); (Q.Y.); (Z.W.)
| | - Zhiqiang Wei
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (X.W.); (Q.Y.); (Z.W.)
| | - Zhen Li
- College of Computer Science and Technology, Qingdao University, Qingdao 266071, China
| |
Collapse
|
210
|
Zhou D, Xu Z, Li W, Xie X, Peng S. MultiDTI: Drug-target interaction prediction based on multi-modal representation learning to bridge the gap between new chemical entities and known heterogeneous network. Bioinformatics 2021; 37:4485-4492. [PMID: 34180970 DOI: 10.1093/bioinformatics/btab473] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 05/27/2021] [Accepted: 06/27/2021] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION Predicting new drug-target interactions is an important step in new drug development, understanding of its side effects, and drug repositioning. Heterogeneous data sources can provide comprehensive information and different perspectives for drug-target interaction prediction. Thus, there have been many calculation methods relying on heterogeneous networks. Most of them use graph-related algorithms to characterize nodes in heterogeneous networks for predicting new DTI. However, these methods can only make predictions in known heterogeneous network datasets, and cannot support the prediction of new chemical entities outside the heterogeneous network, which hinder further drug discovery and development. RESULTS To solve this problem, we proposed a multi-modal DTI prediction model named 'MultiDTI' which uses our proposed joint learning framework based on heterogeneous networks. It combines the interaction or association information of the heterogeneous network and the drug/target sequence information, and maps the drugs, targets, side effects and disease nodes in the heterogeneous network into a common space. In this way, 'MultiDTI' can map the new chemical entity to this learned common space based on the chemical structure of the new entity. That is, bridging the gap between new chemical entities and known heterogeneous network. Our model has strong predictive performance, and the area under the receiver operating characteristic curve(AUC) of the model is 0.961 and the area under the precision recall curve (AUPRC) is 0.947 with 10-fold cross validation. In addition, some predicted new DTIs have been confirmed by ChEMBL database. Our results indicate that 'MultiDTI' is a powerful and practical tool for predicting new DTI, which can promote the development of drug discovery or drug repositioning. AVAILABILITY Python codes and dataset are available at https://github.com/Deshan-Zhou/MultiDTI/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Deshan Zhou
- Department of Computer Science, Hunan University, Changsha, 410082, China
| | - Zhijian Xu
- CAS Key Laboratory of Receptor Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
| | - WenTao Li
- Department of Computer Science, National University of Defense Technology, Changsha, 410073, China
| | - Xiaolan Xie
- College of Information Science and Engineering, Guilin University of Technology, Guilin, 541004, China
| | - Shaoliang Peng
- Department of Computer Science, Hunan University, Changsha, 410082, China.,Department of Computer Science, National University of Defense Technology, Changsha, 410073, China
| |
Collapse
|
211
|
Chen W, Chen G, Zhao L, Chen CYC. Predicting Drug-Target Interactions with Deep-Embedding Learning of Graphs and Sequences. J Phys Chem A 2021; 125:5633-5642. [PMID: 34142824 DOI: 10.1021/acs.jpca.1c02419] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Computational approaches for predicting drug-target interactions (DTIs) play an important role in drug discovery since conventional screening experiments are time-consuming and expensive. In this study, we proposed end-to-end representation learning of a graph neural network with an attention mechanism and an attentive bidirectional long short-term memory (BiLSTM) to predict DTIs. For efficient training, we introduced a bidirectional encoder representations from transformers (BERT) pretrained method to extract substructure features from protein sequences and a local breadth-first search (BFS) to learn subgraph information from molecular graphs. Integrating both models, we developed a DTI prediction system. As a result, the proposed method achieved high performances with increases of 2.4% and 9.4% for AUC and recall, respectively, on unbalanced datasets compared with other methods. Extensive experiments showed that our model can relatively screen potential drugs for specific protein. Furthermore, visualizing the attention weights provides biological insight.
Collapse
Affiliation(s)
- Wei Chen
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510275, China
| | - Guanxing Chen
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510275, China
| | - Lu Zhao
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510275, China.,Department of Clinical Laboratory, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou 510655, China
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510275, China.,Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan.,Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
212
|
Yang L, Yang G, Chen X, Yang Q, Yao X, Bing Z, Niu Y, Huang L, Yang L. Deep Scoring Neural Network Replacing the Scoring Function Components to Improve the Performance of Structure-Based Molecular Docking. ACS Chem Neurosci 2021; 12:2133-2142. [PMID: 34081851 DOI: 10.1021/acschemneuro.1c00110] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Accurate prediction of protein-ligand interactions can greatly promote drug development. Recently, a number of deep-learning-based methods have been proposed to predict protein-ligand binding affinities. However, these methods independently extract the feature representations of proteins and ligands but ignore the relative spatial positions and interaction pairs between them. Here, we propose a virtual screening method based on deep learning, called Deep Scoring, which directly extracts the relative position information and atomic attribute information on proteins and ligands from the docking poses. Furthermore, we use two Resnets to extract the features of ligand atoms and protein residues, respectively, and generate an atom-residue interaction matrix to learn the underlying principles of the interactions between proteins and ligands. This is then followed by a dual attention network (DAN) to generate the attention for two related entities (i.e., proteins and ligands) and to weigh the contributions of each atom and residue to binding affinity prediction. As a result, Deep Scoring outperforms other structure-based deep learning methods in terms of screening performance (area under the receiver operating characteristic curve (AUC) of 0.901 for an unbiased DUD-E version), pose prediction (AUC of 0.935 for PDBbind test set), and generalization ability (AUC of 0.803 for the CHEMBL data set). Finally, Deep Scoring was used to select novel ERK2 inhibitor, and two compounds (D264-0698 and D483-1785) were obtained with potential inhibitory activity on ERK2 through the biological experiments.
Collapse
Affiliation(s)
- Lijuan Yang
- Institute of Modern Physics, Chinese Academy of Science, Lanzhou 730000, China
- School of Physics and Technology, Lanzhou University, Lanzhou 730000, China
- School of Physics, University of Chinese Academy of Science, Beijing 100049, China
- Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| | - Guanghui Yang
- Institute of Modern Physics, Chinese Academy of Science, Lanzhou 730000, China
- Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| | - Xiaolong Chen
- Institute of Modern Physics, Chinese Academy of Science, Lanzhou 730000, China
- Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| | - Qiong Yang
- Institute of Modern Physics, Chinese Academy of Science, Lanzhou 730000, China
- Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| | - Xiaojun Yao
- College of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou 730000, China
| | - Zhitong Bing
- Institute of Modern Physics, Chinese Academy of Science, Lanzhou 730000, China
- Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| | - Yuzhen Niu
- Shandong Provincial Research Center for Bioinformatic Engineering and Technique, School of Life Sciences, Shandong University of Technology, Zibo 255049, China
| | - Liang Huang
- School of Physics and Technology, Lanzhou University, Lanzhou 730000, China
| | - Lei Yang
- Institute of Modern Physics, Chinese Academy of Science, Lanzhou 730000, China
- Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
| |
Collapse
|
213
|
Huang K, Xiao C, Glass LM, Sun J. MolTrans: Molecular Interaction Transformer for drug-target interaction prediction. Bioinformatics 2021; 37:830-836. [PMID: 33070179 PMCID: PMC8098026 DOI: 10.1093/bioinformatics/btaa880] [Citation(s) in RCA: 149] [Impact Index Per Article: 49.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 08/23/2020] [Accepted: 10/07/2020] [Indexed: 01/02/2023] Open
Abstract
Motivation Drug–target interaction (DTI) prediction is a foundational task for in-silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (i) existing molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain and (ii) existing methods focus on limited labeled data while ignoring the value of massive unlabeled molecular data. Results We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (i) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction and (ii) an augmented transformer encoder to better extract and capture the semantic relations among sub-structures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real-world data and show it improved DTI prediction performance compared to state-of-the-art baselines. Availability and implementation The model scripts are available at https://github.com/kexinhuang12345/moltrans. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kexin Huang
- Health Data Science, Harvard University, Boston, MA 02120, USA
| | - Cao Xiao
- Analytics Center of Excellence, IQVIA, Cambridge, MA 02139, USA
| | - Lucas M Glass
- Analytics Center of Excellence, IQVIA, Cambridge, MA 02139, USA
| | - Jimeng Sun
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
214
|
Jeon J, Kang S, Kim HU. Predicting biochemical and physiological effects of natural products from molecular structures using machine learning. Nat Prod Rep 2021; 38:1954-1966. [PMID: 34047331 DOI: 10.1039/d1np00016k] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Covering: 2016 to 2021Discovery of novel natural products has been greatly facilitated by advances in genome sequencing, genome mining and analytical techniques. As a result, the volume of data for natural products has increased over the years, which started to serve as ingredients for developing machine learning models. In the past few years, a number of machine learning models have been developed to examine various aspects of a molecule by effectively processing its molecular structure. Understanding of the biological effects of natural products can benefit from such machine learning approaches. In this context, this Highlight reviews recent studies on machine learning models developed to infer various biological effects of molecules. A particular attention is paid to molecular featurization, or computational representation of a molecular structure, which is an essential process during the development of a machine learning model. Technical challenges associated with the use of machine learning for natural products are further discussed.
Collapse
Affiliation(s)
- Junhyeok Jeon
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
| | - Seongmo Kang
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea. and KAIST Institute for Artificial Intelligence, KAIST, Daejeon 34141, Republic of Korea and BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon 34141, Republic of Korea
| |
Collapse
|
215
|
Xiang Y, Tang YH, Liu H, Lin G, Sun H. Predicting Single-Substance Phase Diagrams: A Kernel Approach on Graph Representations of Molecules. J Phys Chem A 2021; 125:4488-4497. [PMID: 33999627 DOI: 10.1021/acs.jpca.1c02391] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
This work presents a Gaussian process regression (GPR) model on top of a novel graph representation of chemical molecules that predicts thermodynamic properties of pure substances in single, double, and triple phases. A transferable molecular graph representation is proposed as the input for a marginalized graph kernel, which is the major component of the covariance function in our GPR models. Radial basis function kernels of temperature and pressure are also incorporated into the covariance function when necessary. We predicted three types of representative properties of pure substances in single, double, and triple phases, i.e., critical temperature, vapor-liquid equilibrium (VLE) density, and pressure-temperature density. The accuracy of the models is nearly identical to the precision of the experimental measurements. Moreover, the reliability of our predictions can be quantified on a per-sample basis using the posterior uncertainty of the GPR model. We compare our model against Morgan fingerprints and a graph neural network to further demonstrate the advantage of the proposed method.
Collapse
Affiliation(s)
- Yan Xiang
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yu-Hang Tang
- Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Hongyi Liu
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Guang Lin
- Department of Mathematics & School of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47907, United States
| | - Huai Sun
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
216
|
Zhang J, Norinder U, Svensson F. Deep Learning-Based Conformal Prediction of Toxicity. J Chem Inf Model 2021; 61:2648-2657. [PMID: 34043352 DOI: 10.1021/acs.jcim.1c00208] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Predictive modeling for toxicity can help reduce risks in a range of applications and potentially serve as the basis for regulatory decisions. However, the utility of these predictions can be limited if the associated uncertainty is not adequately quantified. With recent studies showing great promise for deep learning-based models also for toxicity predictions, we investigate the combination of deep learning-based predictors with the conformal prediction framework to generate highly predictive models with well-defined uncertainties. We use a range of deep feedforward neural networks and graph neural networks in a conformal prediction setting and evaluate their performance on data from the Tox21 challenge. We also compare the results from the conformal predictors to those of the underlying machine learning models. The results indicate that highly predictive models can be obtained that result in very efficient conformal predictors even at high confidence levels. Taken together, our results highlight the utility of conformal predictors as a convenient way to deliver toxicity predictions with confidence, adding both statistical guarantees on the model performance as well as better predictions of the minority class compared to the underlying models.
Collapse
Affiliation(s)
- Jin Zhang
- Department of Drug Metabolism and Pharmacokinetics, Janssen Pharmaceutica NV, B-2340 Beerse, Belgium
| | - Ulf Norinder
- Department of Computer and Systems Sciences, Stockholm University, P.O. Box 7003, SE-164 07 Kista, Sweden.,Department of Pharmaceutical Biosciences, Uppsala University, P.O. Box 591, SE-751 24 Uppsala, Sweden.,MTM Research Centre, School of Science and Technology, Örebro University, SE-701 82 Örebro, Sweden
| | - Fredrik Svensson
- The Alzheimer's Research UK University College London Drug Discovery Institute, The Cruciform Building, Gower Street, London WC1E 6BT, U.K
| |
Collapse
|
217
|
Iuchi H, Matsutani T, Yamada K, Iwano N, Sumi S, Hosoda S, Zhao S, Fukunaga T, Hamada M. Representation learning applications in biological sequence analysis. Comput Struct Biotechnol J 2021; 19:3198-3208. [PMID: 34141139 PMCID: PMC8190442 DOI: 10.1016/j.csbj.2021.05.039] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 05/10/2021] [Accepted: 05/20/2021] [Indexed: 12/16/2022] Open
Abstract
Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amount of rapidly generated biological (DNA/RNA/protein) sequencing data remains a critical hurdle. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention. In this method, biological sequences are regarded as sentences while the single nucleic acids/amino acids or k-mers in these sequences represent the words. Embedding is an essential step in NLP, which performs the conversion of these words into vectors. Specifically, representation learning is an approach used for this transformation process, which can be applied to biological sequences. Vectorized biological sequences can then be applied for function and structure estimation, or as input for other probabilistic models. Considering the importance and growing trend for the application of representation learning to biological research, in the present study, we have reviewed the existing knowledge in representation learning for biological sequence analysis.
Collapse
Affiliation(s)
- Hitoshi Iuchi
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
| | - Taro Matsutani
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Keisuke Yamada
- School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Natsuki Iwano
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Shunsuke Sumi
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- Department of Life Science Frontiers, Center for iPS Cell Research and Application, Kyoto University, Kyoto 606-8507, Japan
| | - Shion Hosoda
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Shitao Zhao
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Tokyo 169-0051, Japan
- Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 113-0032, Japan
| | - Michiaki Hamada
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan
| |
Collapse
|
218
|
Abbasi K, Razzaghi P, Poso A, Ghanbari-Ara S, Masoudi-Nejad A. Deep Learning in Drug Target Interaction Prediction: Current and Future Perspectives. Curr Med Chem 2021; 28:2100-2113. [PMID: 32895036 DOI: 10.2174/0929867327666200907141016] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 07/30/2020] [Accepted: 07/30/2020] [Indexed: 11/22/2022]
Abstract
Drug-target Interactions (DTIs) prediction plays a central role in drug discovery. Computational methods in DTIs prediction have gained more attention because carrying out in vitro and in vivo experiments on a large scale is costly and time-consuming. Machine learning methods, especially deep learning, are widely applied to DTIs prediction. In this study, the main goal is to provide a comprehensive overview of deep learning-based DTIs prediction approaches. Here, we investigate the existing approaches from multiple perspectives. We explore these approaches to find out which deep network architectures are utilized to extract features from drug compound and protein sequences. Also, the advantages and limitations of each architecture are analyzed and compared. Moreover, we explore the process of how to combine descriptors for drug and protein features. Likewise, a list of datasets that are commonly used in DTIs prediction is investigated. Finally, current challenges are discussed and a short future outlook of deep learning in DTI prediction is given.
Collapse
Affiliation(s)
- Karim Abbasi
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran
| | - Parvin Razzaghi
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran
| | - Antti Poso
- School of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, Kuopio 80100, Finland
| | - Saber Ghanbari-Ara
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran
| |
Collapse
|
219
|
Shan W, Li X, Yao H, Lin K. Convolutional Neural Network-based Virtual Screening. Curr Med Chem 2021; 28:2033-2047. [PMID: 32452320 DOI: 10.2174/0929867327666200526142958] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 04/19/2020] [Accepted: 04/30/2020] [Indexed: 11/22/2022]
Abstract
Virtual screening is an important means for lead compound discovery. The scoring function is the key to selecting hit compounds. Many scoring functions are currently available; however, there are no all-purpose scoring functions because different scoring functions tend to have conflicting results. Recently, neural networks, especially convolutional neural networks, have constantly been penetrating drug design and most CNN-based virtual screening methods are superior to traditional docking methods, such as Dock and AutoDock. CNNbased virtual screening is expected to improve the previous model of overreliance on computational chemical screening. Utilizing the powerful learning ability of neural networks provides us with a new method for evaluating compounds. We review the latest progress of CNN-based virtual screening and propose prospects.
Collapse
Affiliation(s)
- Wenying Shan
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Xuanyi Li
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Hequan Yao
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Kejiang Lin
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| |
Collapse
|
220
|
Kim QH, Ko JH, Kim S, Park N, Jhe W. Bayesian neural network with pretrained protein embedding enhances prediction accuracy of drug-protein interaction. Bioinformatics 2021; 37:3428-3435. [PMID: 33978713 PMCID: PMC8545317 DOI: 10.1093/bioinformatics/btab346] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 04/26/2021] [Accepted: 05/05/2021] [Indexed: 11/25/2022] Open
Abstract
Motivation Characterizing drug–protein interactions (DPIs) is crucial to the high-throughput screening for drug discovery. The deep learning-based approaches have attracted attention because they can predict DPIs without human trial and error. However, because data labeling requires significant resources, the available protein data size is relatively small, which consequently decreases model performance. Here, we propose two methods to construct a deep learning framework that exhibits superior performance with a small labeled dataset. Results At first, we use transfer learning in encoding protein sequences with a pretrained model, which trains general sequence representations in an unsupervised manner. Second, we use a Bayesian neural network to make a robust model by estimating the data uncertainty. Our resulting model performs better than the previous baselines at predicting interactions between molecules and proteins. We also show that the quantified uncertainty from the Bayesian inference is related to confidence and can be used for screening DPI data points. Availability and implementation The code is available at https://github.com/QHwan/PretrainDPI. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- QHwan Kim
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Joon-Hyuk Ko
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Sunghoon Kim
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Nojun Park
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Wonho Jhe
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| |
Collapse
|
221
|
Yang Z, Zhong W, Zhao L, Chen CYC. ML-DTI: Mutual Learning Mechanism for Interpretable Drug-Target Interaction Prediction. J Phys Chem Lett 2021; 12:4247-4261. [PMID: 33904745 DOI: 10.1021/acs.jpclett.1c00867] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Deep learning (DL) provides opportunities for the identification of drug-target interactions (DTIs). The challenges of applying DL lie primarily with the lack of interpretability. Also, most of the existing DL-based methods formulate the drug and target encoder as two independent modules without considering the relationship between them. In this study, we propose a mutual learning mechanism to bridge the gap between the two encoders. We formulated the DTI problem from a global perspective by inserting mutual learning layers between the two encoders. The mutual learning layer was achieved by multihead attention and position-aware attention. The neural attention mechanism also provides effective visualization, which makes it easier to analyze a model. We evaluated our approach using three benchmark kinase data sets under different experimental settings and compared the proposed method to three baseline models. We found that the four methods yielded similar results in the random split setting (training and test sets share common drugs and targets), while the proposed method increases the predictive performance significantly in the orphan-target and orphan-drug split setting (training and test sets share only targets or drugs). The experimental results demonstrated that the proposed method improved the generalization and interpretation capability of DTI modeling.
Collapse
Affiliation(s)
- Ziduo Yang
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510275, China
| | - Weihe Zhong
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510275, China
| | - Lu Zhao
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510275, China
- Department of Clinical Laboratory, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou 510655, China
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510275, China
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
222
|
Li P, Wang J, Qiao Y, Chen H, Yu Y, Yao X, Gao P, Xie G, Song S. An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Brief Bioinform 2021; 22:6262238. [PMID: 33940598 DOI: 10.1093/bib/bbab109] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 03/06/2021] [Accepted: 03/12/2021] [Indexed: 11/13/2022] Open
Abstract
How to produce expressive molecular representations is a fundamental challenge in artificial intelligence-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and poor generalization capability. Here, we propose a novel molecular pre-training graph-based deep learning framework, named MPG, that learns molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemical insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 14 benchmark datasets. The pre-trained MolGNet in MPG has the potential to become an advanced molecular encoder in the drug discovery pipeline.
Collapse
Affiliation(s)
- Pengyong Li
- Department of Biomedical Engineering at Tsinghua University, China
| | - Jun Wang
- Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China
| | - Yixuan Qiao
- Operations Research and Cybernetics at Beijing University of Technology, China
| | - Hao Chen
- Cybernetics at Beijing University of Technology, China
| | - Yihuan Yu
- Beijing University of Biomedical Engineering, China
| | - Xiaojun Yao
- Analytical Chemistry and Chemoinformatics at Lanzhou University, China
| | - Peng Gao
- Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China
| | - Guotong Xie
- Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China
| | - Sen Song
- Tsinghua Laboratory of Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Haidian, 100084 Beijing, China
| |
Collapse
|
223
|
Watanabe N, Ohnuki Y, Sakakibara Y. Deep learning integration of molecular and interactome data for protein-compound interaction prediction. J Cheminform 2021; 13:36. [PMID: 33933121 PMCID: PMC8088618 DOI: 10.1186/s13321-021-00513-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 04/21/2021] [Indexed: 11/26/2022] Open
Abstract
Motivation Virtual screening, which can computationally predict the presence or absence of protein–compound interactions, has attracted attention as a large-scale, low-cost, and short-term search method for seed compounds. Existing machine learning methods for predicting protein–compound interactions are largely divided into those based on molecular structure data and those based on network data. The former utilize information on proteins and compounds, such as amino acid sequences and chemical structures; the latter rely on interaction network data, such as protein–protein interactions and compound–compound interactions. However, there have been few attempts to combine both types of data in molecular information and interaction networks. Results We developed a deep learning-based method that integrates protein features, compound features, and multiple types of interactome data to predict protein–compound interactions. We designed three benchmark datasets with different difficulties and applied them to evaluate the prediction method. The performance evaluations show that our deep learning framework for integrating molecular structure data and interactome data outperforms state-of-the-art machine learning methods for protein–compound interaction prediction tasks. The performance improvement is statistically significant according to the Wilcoxon signed-rank test. This finding reveals that the multi-interactome data captures perspectives other than amino acid sequence homology and chemical structure similarity and that both types of data synergistically improve the prediction accuracy. Furthermore, experiments on the three benchmark datasets show that our method is more robust than existing methods in accurately predicting interactions between proteins and compounds that are unseen in training samples.
Collapse
Affiliation(s)
- Narumi Watanabe
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa, 223-8522, Japan
| | - Yuuto Ohnuki
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa, 223-8522, Japan
| | - Yasubumi Sakakibara
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa, 223-8522, Japan.
| |
Collapse
|
224
|
CSConv2d: A 2-D Structural Convolution Neural Network with a Channel and Spatial Attention Mechanism for Protein-Ligand Binding Affinity Prediction. Biomolecules 2021; 11:biom11050643. [PMID: 33925310 PMCID: PMC8145762 DOI: 10.3390/biom11050643] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 04/16/2021] [Accepted: 04/20/2021] [Indexed: 12/20/2022] Open
Abstract
The binding affinity of small molecules to receptor proteins is essential to drug discovery and drug repositioning. Chemical methods are often time-consuming and costly, and models for calculating the binding affinity are imperative. In this study, we propose a novel deep learning method, namely CSConv2d, for protein-ligand interactions’ prediction. The proposed method is improved by a DEEPScreen model using 2-D structural representations of compounds as input. Furthermore, a channel and spatial attention mechanism (CS) is added in feature abstractions. Data experiments conducted on ChEMBLv23 datasets show that CSConv2d performs better than the original DEEPScreen model in predicting protein-ligand binding affinity, as well as some state-of-the-art DTIs (drug-target interactions) prediction methods including DeepConv-DTI, CPI-Prediction, CPI-Prediction+CS, DeepGS and DeepGS+CS. In practice, the docking results of protein (PDB ID: 5ceo) and ligand (Chemical ID: 50D) and a series of kinase inhibitors are operated to verify the robustness.
Collapse
|
225
|
Srinivas R, Verma N, Kraka E, Larson EC. Deep Learning-Based Ligand Design Using Shared Latent Implicit Fingerprints from Collaborative Filtering. J Chem Inf Model 2021; 61:2159-2174. [PMID: 33899481 DOI: 10.1021/acs.jcim.0c01355] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In their previous work, Srinivas et al. [ J. Cheminf. 2018, 10, 56] have shown that implicit fingerprints capture ligands and proteins in a shared latent space, typically for the purposes of virtual screening with collaborative filtering models applied on known bioactivity data. In this work, we extend these implicit fingerprints/descriptors using deep learning techniques to translate latent descriptors into discrete representations of molecules (SMILES), without explicitly optimizing for chemical properties. This allows the design of new compounds based upon the latent representation of nearby proteins, thereby encoding druglike properties including binding affinities to known proteins. The implicit descriptor method does not require any fingerprint similarity search, which makes the method free of any bias arising from the empirical nature of the fingerprint models [Srinivas, R.; J. Cheminf. 2018, 10, 56]. We evaluate the properties of the potentially novel drugs generated by our approach using physical properties of druglike molecules and chemical complexity. Additionally, we analyze the reliability of the biological activity of the new compounds generated using this method by employing models of protein-ligand interaction, which assists in assessing the potential binding affinity of the designed compounds. We find that the generated compounds exhibit properties of chemically feasible compounds and are predicted to be excellent binders to known proteins. Furthermore, we also analyze the diversity of compounds created using the Tanimoto distance and conclude that there is a wide diversity in the generated compounds.
Collapse
Affiliation(s)
- Raghuram Srinivas
- Department of Computer Science, Southern Methodist University, Dallas, Texas 75205, United States
| | - Niraj Verma
- Department of Chemistry, Southern Methodist University, Dallas, Texas 75205, United States
| | - Elfi Kraka
- Department of Chemistry, Southern Methodist University, Dallas, Texas 75205, United States
| | - Eric C Larson
- Department of Computer Science, Southern Methodist University, Dallas, Texas 75205, United States
| |
Collapse
|
226
|
Multi-PLI: interpretable multi-task deep learning model for unifying protein-ligand interaction datasets. J Cheminform 2021; 13:30. [PMID: 33858485 PMCID: PMC8051026 DOI: 10.1186/s13321-021-00510-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 04/08/2021] [Indexed: 11/11/2022] Open
Abstract
The assessment of protein–ligand interactions is critical at early stage of drug discovery. Computational approaches for efficiently predicting such interactions facilitate drug development. Recently, methods based on deep learning, including structure- and sequence-based models, have achieved impressive performance on several different datasets. However, their application still suffers from a generalizability issue because of insufficient data, especially for structure based models, as well as a heterogeneity problem because of different label measurements and varying proteins across datasets. Here, we present an interpretable multi-task model to evaluate protein–ligand interaction (Multi-PLI). The model can run classification (binding or not) and regression (binding affinity) tasks concurrently by unifying different datasets. The model outperforms traditional docking and machine learning on both binary classification and regression tasks and achieves competitive results compared with some structure-based deep learning methods, even with the same training set size. Furthermore, combined with the proposed occlusion algorithm, the model can predict the important amino acids of proteins that are crucial for binding, thus providing a biological interpretation.
Collapse
|
227
|
Liu X, Feng H, Wu J, Xia K. Persistent spectral hypergraph based machine learning (PSH-ML) for protein-ligand binding affinity prediction. Brief Bioinform 2021; 22:6219114. [PMID: 33837771 DOI: 10.1093/bib/bbab127] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 03/14/2021] [Accepted: 03/16/2021] [Indexed: 12/21/2022] Open
Abstract
Molecular descriptors are essential to not only quantitative structure activity/property relationship (QSAR/QSPR) models, but also machine learning based chemical and biological data analysis. In this paper, we propose persistent spectral hypergraph (PSH) based molecular descriptors or fingerprints for the first time. Our PSH-based molecular descriptors are used in the characterization of molecular structures and interactions, and further combined with machine learning models, in particular gradient boosting tree (GBT), for protein-ligand binding affinity prediction. Different from traditional molecular descriptors, which are usually based on molecular graph models, a hypergraph-based topological representation is proposed for protein-ligand interaction characterization. Moreover, a filtration process is introduced to generate a series of nested hypergraphs in different scales. For each of these hypergraphs, its eigen spectrum information can be obtained from the corresponding (Hodge) Laplacain matrix. PSH studies the persistence and variation of the eigen spectrum of the nested hypergraphs during the filtration process. Molecular descriptors or fingerprints can be generated from persistent attributes, which are statistical or combinatorial functions of PSH, and combined with machine learning models, in particular, GBT. We test our PSH-GBT model on three most commonly used datasets, including PDBbind-2007, PDBbind-2013 and PDBbind-2016. Our results, for all these databases, are better than all existing machine learning models with traditional molecular descriptors, as far as we know.
Collapse
Affiliation(s)
- Xiang Liu
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371.,Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China, 300071.,Center for Topology and Geometry Based Technology, Hebei Normal University, Hebei, China, 050024
| | - Huitao Feng
- Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China, 300071.,Mathematical Science Research Center, Chongqing University of Technology, Chongqing, China, 400054
| | - Jie Wu
- Center for Topology and Geometry Based Technology, Hebei Normal University, Hebei, China, 050024.,School of Mathematical Sciences, Hebei Normal University, Hebei, China, 050024
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| |
Collapse
|
228
|
Pande A, Manchanda M, Bhat HR, Bairy PS, Kumar N, Gahtori P. Molecular insights into a mechanism of resveratrol action using hybrid computational docking/CoMFA and machine learning approach. J Biomol Struct Dyn 2021; 40:8286-8300. [PMID: 33829956 DOI: 10.1080/07391102.2021.1910572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
A phytoalexin, Resveratrol remains a legendary anticancer drug candidate in the archives of scientific literature. Although earlier wet-lab experiments rendering its multiple biological targets, for example, epidermal growth factors, Pro-apoptotic protein p53, sirtuins, and first apoptosis signal (Fas) receptor, Mouse double minute 2 (MDM2) ubiquitin-protein ligase, Estrogen receptor, Quinone reductase, etc. However, notwithstanding some notable successes, identification of an appropriate Resveratrol target(s) has remained a major challenge using physical methods, and hereby limiting its translation into an effective therapeutic(s). Thus, computational insights are much needed to establish proof-of-concept towards potential Resveratrol target(s) with minimum error rate, narrow down the search space, and to assess a more accurate Resveratrol signaling pathway/mechanism at the starting point. Herein, a brute-force technique combining computational receptor-, ligand-based virtual screening, and classification-based machine learning, reveals the precise mechanism of Resveratrol action. Overall, MDM2 ubiquitin-protein ligase (4OGN.pdb) and co-crystallized quinone reductases 2 (4QOH.pdb) were found two suitable drug targets in the case of Resveratrol derivatives. Indeed, carotenoid cleaving oxygenase together with later twos gave gigantic momentum in guiding the rational drug design of Resveratrol derivatives. These molecular modeling insights would be useful for Resveratrol lead optimization into a more precise science.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Akshara Pande
- Department of Computer Science, Graphic Era Hill University, Dehradun, Uttarakhand, India
| | - Mahesh Manchanda
- Department of Computer Science & Engineering, Graphic Era Hill University, Dehradun, Uttarakhand, India
| | - Hans Raj Bhat
- Department of Pharmaceutical Sciences, Dibrugarh University, Dibrugarh, Dehradun, Uttarakhand, India
| | | | - Navin Kumar
- Department of Biotechnology, Graphic Era University, Dehradun, Uttarakhand, India
| | - Prashant Gahtori
- School of Pharmacy, Graphic Era Hill University, Dehradun, Uttarakhand, India
| |
Collapse
|
229
|
Wei L, Ye X, Xue Y, Sakurai T, Wei L. ATSE: a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural network and attention mechanism. Brief Bioinform 2021; 22:6209691. [PMID: 33822870 DOI: 10.1093/bib/bbab041] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 01/11/2021] [Accepted: 01/28/2021] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Peptides have recently emerged as promising therapeutic agents against various diseases. For both research and safety regulation purposes, it is of high importance to develop computational methods to accurately predict the potential toxicity of peptides within the vast number of candidate peptides. RESULTS In this study, we proposed ATSE, a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural networks and attention mechanism. More specifically, it consists of four modules: (i) a sequence processing module for converting peptide sequences to molecular graphs and evolutionary profiles, (ii) a feature extraction module designed to learn discriminative features from graph structural information and evolutionary information, (iii) an attention module employed to optimize the features and (iv) an output module determining a peptide as toxic or non-toxic, using optimized features from the attention module. CONCLUSION Comparative studies demonstrate that the proposed ATSE significantly outperforms all other competing methods. We found that structural information is complementary to the evolutionary information, effectively improving the predictive performance. Importantly, the data-driven features learned by ATSE can be interpreted and visualized, providing additional information for further analysis. Moreover, we present a user-friendly online computational platform that implements the proposed ATSE, which is now available at http://server.malab.cn/ATSE. We expect that it can be a powerful and useful tool for researchers of interest.
Collapse
Affiliation(s)
- Lesong Wei
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan, 3058577
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan, 3058577
| | - Yuyang Xue
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan, 3058577
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan, 3058577
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China
| |
Collapse
|
230
|
Wang S, Shan P, Zhao Y, Zuo L. GanDTI: A multi-task neural network for drug-target interaction prediction. Comput Biol Chem 2021; 92:107476. [PMID: 33799080 DOI: 10.1016/j.compbiolchem.2021.107476] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 03/08/2021] [Accepted: 03/09/2021] [Indexed: 11/18/2022]
Abstract
Drug discovery processes require drug-target interaction (DTI) prediction by virtual screenings with high accuracy. Compared with traditional methods, the deep learning method requires less time and domain expertise, while achieving higher accuracy. However, there is still room for improvement for higher performance with simplified structures. Meanwhile, this field is calling for multi-task models to solve different tasks. Here we report the GanDTI, an end-to-end deep learning model for both interaction classification and binding affinity prediction tasks. This model employs the compound graph and protein sequence data. It only consists of a graph neural network, an attention module and a multiple-layer perceptron, yet outperforms the state-of-the art methods to predict binding affinity and interaction classification on the DUD-E, human, and bindingDB benchmark datasets. This demonstrates our refined model is highly effective and efficient for DTI prediction and provides a new strategy for performance improvement.
Collapse
Affiliation(s)
- Shuyu Wang
- Department of Control Engineering, Northeastern University, Qinhuangdao, Hebei, 066001, PR China.
| | - Peng Shan
- Department of Control Engineering, Northeastern University, Qinhuangdao, Hebei, 066001, PR China
| | - Yuliang Zhao
- Department of Control Engineering, Northeastern University, Qinhuangdao, Hebei, 066001, PR China
| | - Lei Zuo
- Department of Mechanical Engineering, Virginia Tech, Blacksburg, VA, 24061, USA
| |
Collapse
|
231
|
Shabajee P, Gaudeau A, Legros C, Dorval T, Stéphan JP. [From high content screening to target deconvolution: New insights for phenotypic approaches]. Med Sci (Paris) 2021; 37:249-257. [PMID: 33739272 DOI: 10.1051/medsci/2021013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The advent of the molecular biology and the completion of the human genome sequencing prompted the pharmaceutical industry to progressively implement target-centric drug discovery strategies. However, concerns regarding the research and development productivity during the last ten years, combined with technological developments in high-content screening, automation, image analysis and artificial intelligence triggered a renewed interest for the phenotypic drug discovery approaches. Target-centric and phenotypic approaches are more and more considered complementary, hence, positioning the target deconvolution on the critical path. This review analyzes the evolution of the target-centric and phenotypic approaches, focusing more specifically on the high-content screening and the target deconvolution technologies currently available.
Collapse
Affiliation(s)
- Preety Shabajee
- Pôle d'expertise Criblage pharmacologique, chimiothèque et biobanques, Institut de Recherches Servier, 125, Chemin de Ronde, 78290 Croissy-sur-Seine, France
| | - Albane Gaudeau
- Pôle d'expertise Criblage pharmacologique, chimiothèque et biobanques, Institut de Recherches Servier, 125, Chemin de Ronde, 78290 Croissy-sur-Seine, France
| | - Céline Legros
- Pôle d'expertise Criblage pharmacologique, chimiothèque et biobanques, Institut de Recherches Servier, 125, Chemin de Ronde, 78290 Croissy-sur-Seine, France
| | - Thierry Dorval
- Pôle d'expertise Criblage pharmacologique, chimiothèque et biobanques, Institut de Recherches Servier, 125, Chemin de Ronde, 78290 Croissy-sur-Seine, France
| | - Jean-Philippe Stéphan
- Pôle d'expertise Criblage pharmacologique, chimiothèque et biobanques, Institut de Recherches Servier, 125, Chemin de Ronde, 78290 Croissy-sur-Seine, France
| |
Collapse
|
232
|
Lim S, Lu Y, Cho CY, Sung I, Kim J, Kim Y, Park S, Kim S. A review on compound-protein interaction prediction methods: Data, format, representation and model. Comput Struct Biotechnol J 2021; 19:1541-1556. [PMID: 33841755 PMCID: PMC8008185 DOI: 10.1016/j.csbj.2021.03.004] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 02/28/2021] [Accepted: 03/01/2021] [Indexed: 01/27/2023] Open
Abstract
There has recently been a rapid progress in computational methods for determining protein targets of small molecule drugs, which will be termed as compound protein interaction (CPI). In this review, we comprehensively review topics related to computational prediction of CPI. Data for CPI has been accumulated and curated significantly both in quantity and quality. Computational methods have become powerful ever to analyze such complex the data. Thus, recent successes in the improved quality of CPI prediction are due to use of both sophisticated computational techniques and higher quality information in the databases. The goal of this article is to provide reviews of topics related to CPI, such as data, format, representation, to computational models, so that researchers can take full advantages of these resources to develop novel prediction methods. Chemical compounds and protein data from various resources were discussed in terms of data formats and encoding schemes. For the CPI methods, we grouped prediction methods into five categories from traditional machine learning techniques to state-of-the-art deep learning techniques. In closing, we discussed emerging machine learning topics to help both experimental and computational scientists leverage the current knowledge and strategies to develop more powerful and accurate CPI prediction methods.
Collapse
Affiliation(s)
- Sangsoo Lim
- Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea
| | - Yijingxiu Lu
- Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
| | - Chang Yun Cho
- Institute of Engineering Research, Seoul National University, Seoul, Republic of Korea
| | - Inyoung Sung
- Institute of Engineering Research, Seoul National University, Seoul, Republic of Korea
| | - Jungwoo Kim
- Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
| | - Youngkuk Kim
- Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
| | - Sungjoon Park
- Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
| | - Sun Kim
- Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea
- Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
- Institute of Engineering Research, Seoul National University, Seoul, Republic of Korea
- Interdisciplinary Program in Bioinformatics, College of Natural Sciences, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
233
|
Abbasi K, Razzaghi P, Poso A, Amanlou M, Ghasemi JB, Masoudi-Nejad A. DeepCDA: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks. Bioinformatics 2021; 36:4633-4642. [PMID: 32462178 DOI: 10.1093/bioinformatics/btaa544] [Citation(s) in RCA: 90] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 04/29/2020] [Accepted: 05/22/2020] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION An essential part of drug discovery is the accurate prediction of the binding affinity of new compound-protein pairs. Most of the standard computational methods assume that compounds or proteins of the test data are observed during the training phase. However, in real-world situations, the test and training data are sampled from different domains with different distributions. To cope with this challenge, we propose a deep learning-based approach that consists of three steps. In the first step, the training encoder network learns a novel representation of compounds and proteins. To this end, we combine convolutional layers and long-short-term memory layers so that the occurrence patterns of local substructures through a protein and a compound sequence are learned. Also, to encode the interaction strength of the protein and compound substructures, we propose a two-sided attention mechanism. In the second phase, to deal with the different distributions of the training and test domains, a feature encoder network is learned for the test domain by utilizing an adversarial domain adaptation approach. In the third phase, the learned test encoder network is applied to new compound-protein pairs to predict their binding affinity. RESULTS To evaluate the proposed approach, we applied it to KIBA, Davis and BindingDB datasets. The results show that the proposed method learns a more reliable model for the test domain in more challenging situations. AVAILABILITY AND IMPLEMENTATION https://github.com/LBBSoft/DeepCDA.
Collapse
Affiliation(s)
- Karim Abbasi
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran
| | - Parvin Razzaghi
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan 4513766731, Iran
| | - Antti Poso
- School of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, Kuopio 80100, Finland
| | - Massoud Amanlou
- Department of Medicinal Chemistry, Drug Design and Development Research Center, Tehran University of Medical Sciences, Tehran 1416753955, Iran
| | - Jahan B Ghasemi
- Chemistry Department, Faculty of Sciences, University of Tehran, Tehran 1417614418, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran
| |
Collapse
|
234
|
Graph neural networks for automated de novo drug design. Drug Discov Today 2021; 26:1382-1393. [PMID: 33609779 DOI: 10.1016/j.drudis.2021.02.011] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 01/27/2021] [Accepted: 02/11/2021] [Indexed: 01/10/2023]
Abstract
The goal of de novo drug design is to create novel chemical entities with desired biological activities and pharmacokinetics (PK) properties. Over recent years, with the development of artificial intelligence (AI) technologies, data-driven methods have rapidly gained in popularity in this field. Among them, graph neural networks (GNNs), a type of neural network directly operating on the graph structure data, have received extensive attention. In this review, we introduce the applications of GNNs in de novo drug design from three aspects: molecule scoring, molecule generation and optimization, and synthesis planning. Furthermore, we also discuss the current challenges and future directions of GNNs in de novo drug design.
Collapse
|
235
|
Karki N, Verma N, Trozzi F, Tao P, Kraka E, Zoltowski B. Predicting Potential SARS-COV-2 Drugs-In Depth Drug Database Screening Using Deep Neural Network Framework SSnet, Classical Virtual Screening and Docking. Int J Mol Sci 2021; 22:1573. [PMID: 33557253 PMCID: PMC7915186 DOI: 10.3390/ijms22041573] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 01/24/2021] [Accepted: 01/29/2021] [Indexed: 12/14/2022] Open
Abstract
Severe Acute Respiratory Syndrome Corona Virus 2 has altered life on a global scale. A concerted effort from research labs around the world resulted in the identification of potential pharmaceutical treatments for CoVID-19 using existing drugs, as well as the discovery of multiple vaccines. During an urgent crisis, rapidly identifying potential new treatments requires global and cross-discipline cooperation, together with an enhanced open-access research model to distribute new ideas and leads. Herein, we introduce an application of a deep neural network based drug screening method, validating it using a docking algorithm on approved drugs for drug repurposing efforts, and extending the screen to a large library of 750,000 compounds for de novo drug discovery effort. The results of large library screens are incorporated into an open-access web interface to allow researchers from diverse fields to target molecules of interest. Our combined approach allows for both the identification of existing drugs that may be able to be repurposed and de novo design of ACE2-regulatory compounds. Through these efforts we demonstrate the utility of a new machine learning algorithm for drug discovery, SSnet, that can function as a tool to triage large molecular libraries to identify classes of molecules with possible efficacy.
Collapse
Affiliation(s)
| | | | | | | | | | - Brian Zoltowski
- Department of Chemistry, Southern Methodist University, Dallas, TX 75205, USA; (N.K.); (N.V.); (F.T.); (P.T.); (E.K.)
| |
Collapse
|
236
|
Yang Q, Ji H, Lu H, Zhang Z. Prediction of Liquid Chromatographic Retention Time with Graph Neural Networks to Assist in Small Molecule Identification. Anal Chem 2021; 93:2200-2206. [PMID: 33406817 DOI: 10.1021/acs.analchem.0c04071] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The predicted liquid chromatographic retention times (RTs) of small molecules are not accurate enough for wide adoption in structural identification. In this study, we used the graph neural network to predict the retention time (GNN-RT) from structures of small molecules directly without the requirement of molecular descriptors. The predicted accuracy of GNN-RT was compared with random forests (RFs), Bayesian ridge regression, convolutional neural network (CNN), and a deep-learning regression model (DLM) on a METLIN small molecule retention time (SMRT) dataset. GNN-RT achieved the highest predicting accuracy with a mean relative error of 4.9% and a median relative error of 3.2%. Furthermore, the SMRT-trained GNN-RT model can be transferred to the same type of chromatographic systems easily. The predicted RT is valuable for structural identification in complementary to tandem mass spectra and can be used to assist in the identification of compounds. The results indicate that GNN-RT is a promising method to predict the RT for liquid chromatography and improve the accuracy of structural identification for small molecules.
Collapse
Affiliation(s)
- Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongchao Ji
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
237
|
SSnet: A Deep Learning Approach for Protein-Ligand Interaction Prediction. Int J Mol Sci 2021; 22:ijms22031392. [PMID: 33573266 PMCID: PMC7869013 DOI: 10.3390/ijms22031392] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 01/24/2021] [Accepted: 01/27/2021] [Indexed: 12/15/2022] Open
Abstract
Computational prediction of Protein-Ligand Interaction (PLI) is an important step in the modern drug discovery pipeline as it mitigates the cost, time, and resources required to screen novel therapeutics. Deep Neural Networks (DNN) have recently shown excellent performance in PLI prediction. However, the performance is highly dependent on protein and ligand features utilized for the DNN model. Moreover, in current models, the deciphering of how protein features determine the underlying principles that govern PLI is not trivial. In this work, we developed a DNN framework named SSnet that utilizes secondary structure information of proteins extracted as the curvature and torsion of the protein backbone to predict PLI. We demonstrate the performance of SSnet by comparing against a variety of currently popular machine and non-Machine Learning (ML) models using various metrics. We visualize the intermediate layers of SSnet to show a potential latent space for proteins, in particular to extract structural elements in a protein that the model finds influential for ligand binding, which is one of the key features of SSnet. We observed in our study that SSnet learns information about locations in a protein where a ligand can bind, including binding sites, allosteric sites and cryptic sites, regardless of the conformation used. We further observed that SSnet is not biased to any specific molecular interaction and extracts the protein fold information critical for PLI prediction. Our work forms an important gateway to the general exploration of secondary structure-based Deep Learning (DL), which is not just confined to protein-ligand interactions, and as such will have a large impact on protein research, while being readily accessible for de novo drug designers as a standalone package.
Collapse
|
238
|
Kim H, Kim E, Lee I, Bae B, Park M, Nam H. Artificial Intelligence in Drug Discovery: A Comprehensive Review of Data-driven and Machine Learning Approaches. BIOTECHNOL BIOPROC E 2021; 25:895-930. [PMID: 33437151 PMCID: PMC7790479 DOI: 10.1007/s12257-020-0049-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 05/27/2020] [Accepted: 06/03/2020] [Indexed: 02/07/2023]
Abstract
As expenditure on drug development increases exponentially, the overall drug discovery process requires a sustainable revolution. Since artificial intelligence (AI) is leading the fourth industrial revolution, AI can be considered as a viable solution for unstable drug research and development. Generally, AI is applied to fields with sufficient data such as computer vision and natural language processing, but there are many efforts to revolutionize the existing drug discovery process by applying AI. This review provides a comprehensive, organized summary of the recent research trends in AI-guided drug discovery process including target identification, hit identification, ADMET prediction, lead optimization, and drug repositioning. The main data sources in each field are also summarized in this review. In addition, an in-depth analysis of the remaining challenges and limitations will be provided, and proposals for promising future directions in each of the aforementioned areas.
Collapse
Affiliation(s)
- Hyunho Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Eunyoung Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Minsu Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| |
Collapse
|
239
|
Han L, Xu D, Xi Z, Wu M, Nik Nabil WN, Zhang J, Sui H, Fu W, Zhou H, Lao Y, Xu G, Guo C, Xu H. The Natural Compound Oblongifolin C Exhibits Anticancer Activity by Inhibiting HSPA8 and Cathepsin B In Vitro. Front Pharmacol 2021; 11:564833. [PMID: 33390942 PMCID: PMC7773843 DOI: 10.3389/fphar.2020.564833] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 10/19/2020] [Indexed: 11/13/2022] Open
Abstract
PPAPs (Polycyclic polyprenylated acylphloroglucinols) are a class of compounds with diverse bioactivities, including anticancer effects. Oblongifolin C (OC) is a PPAP isolated from the plant of Garcinia yunnanensis Hu. We previously discovered that OC induces apoptosis, inhibits autophagic flux, and attenuates metastasis in cancer cells. However, the protein targets and the detailed mechanism of action of OC remain unclear. To identify protein targets of OC, a non-labeled protein fishing assay was performed, and it was found that OC may interact with several proteins, including the heat shock 70 kDa protein 8 (HSPA8). Expanding on our previous studies on protein cathepsin B, this current study applied Surface Plasmon Resonance (SPR) and Isothermal Titration Calorimetry (ITC) to confirm the potential binding affinity between OC and two protein targets. This study highlights the inhibitory effect of OC on HSPA8 in cancer cells under heat shock stress, by specifically inhibiting the translocation of HSPA8. OC also enhanced the interaction between HSPA8, HSP90, and p53, upregulated the expression of p53 and significantly promoted apoptosis in cisplatin-treated cells. Additionally, a flow cytometry assay detected that OC sped up the apoptosis rate in HSPA8 knockdown A549 cells, while overexpression of HSPA8 delayed the OC-induced apoptosis rate. In summary, our results reveal that OC potentially interacts with HSPA8 and cathepsin B and inhibits HSPA8 nuclear translocation and cathepsin B activities, altogether suggesting the potential of OC to be developed as an anticancer drug.
Collapse
Affiliation(s)
- Li Han
- School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Shanghai, China.,Department of Pharmacy, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Danqing Xu
- School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Zhichao Xi
- School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Man Wu
- School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Wan Najbah Nik Nabil
- School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Juan Zhang
- School of Chinese Medicine, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China
| | - Hua Sui
- School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Wenwei Fu
- School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Hua Zhou
- Institute of Cardiovascular Disease of Integrated Traditional Chinese and Western Medicine, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Yuanzhi Lao
- School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Gang Xu
- State Key Laboratory of Phytochemistry and Plant Resources in West China and Yunnan Key Laboratory of National Medicinal Chemistry, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
| | - Cheng Guo
- School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Shanghai, China.,Department of Pharmacy, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Hongxi Xu
- Institute of Cardiovascular Disease of Integrated Traditional Chinese and Western Medicine, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| |
Collapse
|
240
|
Mathews N, Tran T, Rekabdar B, Ekenna C. Predicting human–pathogen protein–protein interactions using Natural Language Processing methods. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
241
|
AIM in Pharmacology and Drug Discovery. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_145-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
242
|
Shen C, Luo J, Ouyang W, Ding P, Chen X. IDDkin: Network-based influence deep diffusion model for enhancing prediction of kinase inhibitors. Bioinformatics 2020; 36:5481-5491. [PMID: 33367525 DOI: 10.1093/bioinformatics/btaa1058] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 11/09/2020] [Accepted: 12/10/2020] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION Protein kinases have been the focus of drug discovery research for many years because they play a causal role in many human diseases. Understanding the binding profile of kinase inhibitors is a prerequisite for drug discovery, and traditional methods of predicting kinase inhibitors are time-consuming and inefficient. Calculation-based predictive methods provide a relatively low-cost and high-efficiency approach to the rapid development and effective understanding of the binding profile of kinase inhibitors. Particularly, the continuous improvement of network pharmacology methods provides unprecedented opportunities for drug discovery, network-based computational methods could be employed to aggregate the effective information from heterogeneous sources, which have become a new way for predicting the binding profile of kinase inhibitors. RESULTS In this study, we proposed a network-based influence deep diffusion model, named IDDkin, for enhancing the prediction of kinase inhibitors. IDDkin uses deep graph convolutional networks, graph attention networks and adaptive weighting methods to diffuse the effective information of heterogeneous networks. The updated kinase and compound representations are used to predict potential compound-kinase pairs. The experimental results show that the performance of IDDkin is superior to the comparison methods, including the state-of-the art kinase inhibitor prediction method and the classic model widely used in relationship prediction. In experiments conducted to verify its generalizability and in case studies, the IDDkin model also shows excellent performance. All of these results demonstrate the powerful predictive ability of the IDDkin model in the field of kinase inhibitors. AVAILABILITY AND IMPLEMENTATION Source code and data can be downloaded from https://github.com/ CS-BIO/IDDkin. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cong Shen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, China
| | - Wenjue Ouyang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, China
| | - Pingjian Ding
- School of Computer Science, University of South China, Hengyang, 421001, China
| | - Xiangtao Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, China
| |
Collapse
|
243
|
Agyemang B, Wu WP, Addo D, Kpiebaareh MY, Nanor E, Roland Haruna C. Deep inverse reinforcement learning for structural evolution of small molecules. Brief Bioinform 2020; 22:6043289. [PMID: 33348357 DOI: 10.1093/bib/bbaa364] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 09/25/2020] [Accepted: 11/10/2020] [Indexed: 11/14/2022] Open
Abstract
The size and quality of chemical libraries to the drug discovery pipeline are crucial for developing new drugs or repurposing existing drugs. Existing techniques such as combinatorial organic synthesis and high-throughput screening usually make the process extraordinarily tough and complicated since the search space of synthetically feasible drugs is exorbitantly huge. While reinforcement learning has been mostly exploited in the literature for generating novel compounds, the requirement of designing a reward function that succinctly represents the learning objective could prove daunting in certain complex domains. Generative adversarial network-based methods also mostly discard the discriminator after training and could be hard to train. In this study, we propose a framework for training a compound generator and learn a transferable reward function based on the entropy maximization inverse reinforcement learning (IRL) paradigm. We show from our experiments that the IRL route offers a rational alternative for generating chemical compounds in domains where reward function engineering may be less appealing or impossible while data exhibiting the desired objective is readily available.
Collapse
|
244
|
Yuan Y, Bar-Joseph Z. GCNG: graph convolutional networks for inferring gene interaction from spatial transcriptomics data. Genome Biol 2020; 21:300. [PMID: 33303016 PMCID: PMC7726911 DOI: 10.1186/s13059-020-02214-w] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 11/30/2020] [Indexed: 12/13/2022] Open
Abstract
Most methods for inferring gene-gene interactions from expression data focus on intracellular interactions. The availability of high-throughput spatial expression data opens the door to methods that can infer such interactions both within and between cells. To achieve this, we developed Graph Convolutional Neural networks for Genes (GCNG). GCNG encodes the spatial information as a graph and combines it with expression data using supervised training. GCNG improves upon prior methods used to analyze spatial transcriptomics data and can propose novel pairs of extracellular interacting genes. The output of GCNG can also be used for downstream analysis including functional gene assignment.Supporting website with software and data: https://github.com/xiaoyeye/GCNG .
Collapse
Affiliation(s)
- Ye Yuan
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Ziv Bar-Joseph
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| |
Collapse
|
245
|
Ata SK, Wu M, Fang Y, Ou-Yang L, Kwoh CK, Li XL. Recent advances in network-based methods for disease gene prediction. Brief Bioinform 2020; 22:6023077. [PMID: 33276376 DOI: 10.1093/bib/bbaa303] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/29/2020] [Accepted: 10/10/2020] [Indexed: 01/28/2023] Open
Abstract
Disease-gene association through genome-wide association study (GWAS) is an arduous task for researchers. Investigating single nucleotide polymorphisms that correlate with specific diseases needs statistical analysis of associations. Considering the huge number of possible mutations, in addition to its high cost, another important drawback of GWAS analysis is the large number of false positives. Thus, researchers search for more evidence to cross-check their results through different sources. To provide the researchers with alternative and complementary low-cost disease-gene association evidence, computational approaches come into play. Since molecular networks are able to capture complex interplay among molecules in diseases, they become one of the most extensively used data for disease-gene association prediction. In this survey, we aim to provide a comprehensive and up-to-date review of network-based methods for disease gene prediction. We also conduct an empirical analysis on 14 state-of-the-art methods. To summarize, we first elucidate the task definition for disease gene prediction. Secondly, we categorize existing network-based efforts into network diffusion methods, traditional machine learning methods with handcrafted graph features and graph representation learning methods. Thirdly, an empirical analysis is conducted to evaluate the performance of the selected methods across seven diseases. We also provide distinguishing findings about the discussed methods based on our empirical analysis. Finally, we highlight potential research directions for future studies on disease gene prediction.
Collapse
Affiliation(s)
- Sezin Kircali Ata
- School of Computer Science and Engineering Nanyang Technological University (NTU)
| | - Min Wu
- Institute for Infocomm Research (I2R), A*STAR, Singapore
| | - Yuan Fang
- School of Information Systems, Singapore Management University, Singapore
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen China
| | | | - Xiao-Li Li
- Department head and principal scientist at I2R, A*STAR, Singapore
| |
Collapse
|
246
|
Huang K, Xiao C, Glass LM, Zitnik M, Sun J. SkipGNN: predicting molecular interactions with skip-graph networks. Sci Rep 2020; 10:21092. [PMID: 33273494 PMCID: PMC7713130 DOI: 10.1038/s41598-020-77766-9] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 11/17/2020] [Indexed: 11/17/2022] Open
Abstract
Molecular interaction networks are powerful resources for molecular discovery. They are increasingly used with machine learning methods to predict biologically meaningful interactions. While deep learning on graphs has dramatically advanced the prediction prowess, current graph neural network (GNN) methods are mainly optimized for prediction on the basis of direct similarity between interacting nodes. In biological networks, however, similarity between nodes that do not directly interact has proved incredibly useful in the last decade across a variety of interaction networks. Here, we present SkipGNN, a graph neural network approach for the prediction of molecular interactions. SkipGNN predicts molecular interactions by not only aggregating information from direct interactions but also from second-order interactions, which we call skip similarity. In contrast to existing GNNs, SkipGNN receives neural messages from two-hop neighbors as well as immediate neighbors in the interaction network and non-linearly transforms the messages to obtain useful information for prediction. To inject skip similarity into a GNN, we construct a modified version of the original network, called the skip graph. We then develop an iterative fusion scheme that optimizes a GNN using both the skip graph and the original graph. Experiments on four interaction networks, including drug–drug, drug–target, protein–protein, and gene–disease interactions, show that SkipGNN achieves superior and robust performance. Furthermore, we show that unlike popular GNNs, SkipGNN learns biologically meaningful embeddings and performs especially well on noisy, incomplete interaction networks.
Collapse
Affiliation(s)
- Kexin Huang
- Health Data Science, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Cao Xiao
- Analytics Center of Excellence, IQVIA, Cambridge, MA, USA
| | - Lucas M Glass
- Analytics Center of Excellence, IQVIA, Cambridge, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard University, Boston, MA, USA
| | - Jimeng Sun
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
247
|
Prediction of energies for reaction intermediates and transition states on catalyst surfaces using graph-based machine learning models. MOLECULAR CATALYSIS 2020. [DOI: 10.1016/j.mcat.2020.111266] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
248
|
Li P, Li Y, Hsieh CY, Zhang S, Liu X, Liu H, Song S, Yao X. TrimNet: learning molecular representation from triplet messages for biomedicine. Brief Bioinform 2020; 22:5955940. [PMID: 33147620 DOI: 10.1093/bib/bbaa266] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 09/11/2020] [Accepted: 09/14/2020] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Computational methods accelerate drug discovery and play an important role in biomedicine, such as molecular property prediction and compound-protein interaction (CPI) identification. A key challenge is to learn useful molecular representation. In the early years, molecular properties are mainly calculated by quantum mechanics or predicted by traditional machine learning methods, which requires expert knowledge and is often labor-intensive. Nowadays, graph neural networks have received significant attention because of the powerful ability to learn representation from graph data. Nevertheless, current graph-based methods have some limitations that need to be addressed, such as large-scale parameters and insufficient bond information extraction. RESULTS In this study, we proposed a graph-based approach and employed a novel triplet message mechanism to learn molecular representation efficiently, named triplet message networks (TrimNet). We show that TrimNet can accurately complete multiple molecular representation learning tasks with significant parameter reduction, including the quantum properties, bioactivity, physiology and CPI prediction. In the experiments, TrimNet outperforms the previous state-of-the-art method by a significant margin on various datasets. Besides the few parameters and high prediction accuracy, TrimNet could focus on the atoms essential to the target properties, providing a clear interpretation of the prediction tasks. These advantages have established TrimNet as a powerful and useful computational tool in solving the challenging problem of molecular representation learning. AVAILABILITY The quantum and drug datasets are available on the website of MoleculeNet: http://moleculenet.ai. The source code is available in GitHub: https://github.com/yvquanli/trimnet. CONTACT xjyao@lzu.edu.cn, songsen@tsinghua.edu.cn.
Collapse
Affiliation(s)
- Pengyong Li
- Department of Biomedical Engineering at Tsinghua University
| | - Yuquan Li
- College of Chemistry and Chemical Engineering at Lanzhou University
| | | | | | | | | | | | | |
Collapse
|
249
|
Liu GH, Zhang BW, Qian G, Wang B, Mao B, Bichindaritz I. Bioimage-Based Prediction of Protein Subcellular Location in Human Tissue with Ensemble Features and Deep Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1966-1980. [PMID: 31107658 DOI: 10.1109/tcbb.2019.2917429] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Prediction of protein subcellular location has currently become a hot topic because it has been proven to be useful for understanding both the disease mechanisms and novel drug design. With the rapid development of automated microscopic imaging technology in recent years, classification methods of bioimage-based protein subcellular location have attracted considerable attention for images can describe the protein distribution intuitively and in detail. In the current study, a prediction method of protein subcellular location was proposed based on multi-view image features that are extracted from three different views, including the four texture features of the original image, the global and local features of the protein extracted from the protein channel images after color segmentation, and the global features of DNA extracted from the DNA channel image. Finally, the extracted features were combined together to improve the performance of subcellular localization prediction. From the performance comparison of different combination features under the same classifier, the best ensemble features could be obtained. In this work, a classifier based on Stacked Auto-encoders and the random forest was also put forward. To improve the prediction results, the deep network was combined with the traditional statistical classification methods. Stringent cross-validation and independent validation tests on the benchmark dataset demonstrated the efficacy of the proposed method.
Collapse
|
250
|
Russo DP, Yan X, Shende S, Huang H, Yan B, Zhu H. Virtual Molecular Projections and Convolutional Neural Networks for the End-to-End Modeling of Nanoparticle Activities and Properties. Anal Chem 2020; 92:13971-13979. [PMID: 32970421 DOI: 10.1021/acs.analchem.0c02878] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Digitalizing complex nanostructures into data structures suitable for machine learning modeling without losing nanostructure information has been a major challenge. Deep learning frameworks, particularly convolutional neural networks (CNNs), are especially adept at handling multidimensional and complex inputs. In this study, CNNs were applied for the modeling of nanoparticle activities exclusively from nanostructures. The nanostructures were represented by virtual molecular projections, a multidimensional digitalization of nanostructures, and used as input data to train CNNs. To this end, 77 nanoparticles with various activities and/or physicochemical property results were used for modeling. The resulting CNN model predictions show high correlations with the experimental results. An analysis of a trained CNN quantitatively showed that neurons were able to recognize distinct nanostructure features critical to activities and physicochemical properties. This "end-to-end" deep learning approach is well suited to digitalize complex nanostructures for data-driven machine learning modeling and can be broadly applied to rationally design nanoparticles with desired activities.
Collapse
Affiliation(s)
- Daniel P Russo
- Center for Computational and Integrative Biology, Rutgers University, 201 S Broadway, Camden, New Jersey 08103, United States
| | - Xiliang Yan
- Center for Computational and Integrative Biology, Rutgers University, 201 S Broadway, Camden, New Jersey 08103, United States.,Institute of Environmental Research at Greater Bay, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Sunil Shende
- Center for Computational and Integrative Biology, Rutgers University, 201 S Broadway, Camden, New Jersey 08103, United States.,Department of Computer Science, Rutgers University, 227 Penn Street, Camden, New Jersey 08102, United States
| | | | - Bing Yan
- Institute of Environmental Research at Greater Bay, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China.,School of Environmental Science and Engineering, Shandong University, Jinan 250100, China
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University, 201 S Broadway, Camden, New Jersey 08103, United States.,Department of Chemistry, Rutgers University, 315 Penn Street, Camden, New Jersey 08102, United States
| |
Collapse
|