151
|
Yang Q, Ji H, Fan X, Zhang Z, Lu H. Retention time prediction in hydrophilic interaction liquid chromatography with graph neural network and transfer learning. J Chromatogr A 2021; 1656:462536. [PMID: 34563892 DOI: 10.1016/j.chroma.2021.462536] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 09/02/2021] [Accepted: 09/03/2021] [Indexed: 01/04/2023]
Abstract
The combination of retention time (RT), accurate mass and tandem mass spectra can improve the structural annotation in untargeted metabolomics. However, the incorporation of RT for metabolite identification has received less attention because of the limitation of available RT data, especially for hydrophilic interaction liquid chromatography (HILIC). Here, the Graph Neural Network-based Transfer Learning (GNN-TL) is proposed to train a model for HILIC RTs prediction. The graph neural network was pre-trained using an in silico HILIC RT dataset (pseudo-labeling dataset) with ∼306 K molecules. Then, the weights of dense layers in the pre-trained GNN (pre-GNN) model were fine-tuned by transfer learning using a small number of experimental HILIC RTs from the target chromatographic system. The GNN-TL outperformed the methods in Retip, including the Random Forest (RF), Bayesian-regularized neural network (BRNN), XGBoost, light gradient-boosting machine (LightGBM), and Keras. It achieved the lowest mean absolute error (MAE) of 38.6 s on the test set and 33.4 s on an additional test set. It has the best ability to generalize with a small performance difference between training, test, and additional test sets. Furthermore, the predicted RTs can filter out nearly 60% false positive candidates on average, which is valuable for the identification of compounds complementary to mass spectrometry.
Collapse
Affiliation(s)
- Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Hongchao Ji
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Xiaqiong Fan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China.
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China.
| |
Collapse
|
152
|
Xuan P, Hu K, Cui H, Zhang T, Nakaguchi T. Learning multi-scale heterogeneous representations and global topology for drug-target interaction prediction. IEEE J Biomed Health Inform 2021; 26:1891-1902. [PMID: 34673498 DOI: 10.1109/jbhi.2021.3121798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Identification of drug-target interactions (DTIs) plays a critical role in drug discovery and repositioning. Deep integration of inter-connections and intra-similarities between heterogeneous multi-source data related to drugs and targets, however, is a challenging issue. We propose a DTI prediction model by learning from drug and protein related multi-scale attributes and global topology formed by heterogeneous connections. A drug-protein-disease heterogeneous network (RPD-Net) is firstly constructed to associate diverse similarities, interactions and associations across nodes. Secondly, we propose a multi-scale pairwise deep representation learning module consisting of a new embedding strategy to integrate diverse inter-relations and intra-relations, and dilation convolutions for multi-scale deep representation extraction. A global topology learning module is proposed which is composed of strategy based on non-negative matrix factorization (NMF) to extract topology from RPD-Net, and a new relational-level attention mechanism for discriminative topology embedding. Experimental results using public dataset demonstrate improved performance over state-of-the-art methods and contributions of our major innovations. Evaluation results by top k recall rates and case studies on five drugs further show the effectiveness in retrieving potential target candidates for drugs.
Collapse
|
153
|
Khan MKA, Akhtar S. Novel drug design and bioinformatics: an introduction. PHYSICAL SCIENCES REVIEWS 2021. [DOI: 10.1515/psr-2018-0158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
In the current era of high-throughput technology, where enormous amounts of biological data are generated day by day via various sequencing projects, thereby the staggering volume of biological targets deciphered. The discovery of new chemical entities and bioisosteres of relatively low molecular weight has been gaining high momentum in the pharmacopoeia, and traditional combinatorial design wherein chemical structure is used as an initial template for enhancing efficacy pharmacokinetic selectivity properties. Once the compound is identified, it undergoes ADMET filtration to ensure whether it has toxic and mutagenic properties or not. If the compound has no toxicity and mutagenicity is either considered a potential lead molecule. Understanding the mechanism of lead molecules with various biological targets is imperative to advance related functions for drug discovery and development. Notwithstanding, a tedious and costly process, taking around 10–15 years and costing around $4 billion, cascaded approached of Bioinformatics and Computational biology viz., structure-based drug design (SBDD) and cognate ligand-based drug design (LBDD) respectively rely on the availability of 3D structure of target biomacromolecules and vice versa has made this process easy and approachable. SBDD encompasses homology modelling, ligand docking, fragment-based drug design and molecular dynamics, while LBDD deals with pharmacophore mapping, QSAR, and similarity search. All the computational methods discussed herein, whether for target identification or novel ligand discovery, continuously evolve and facilitate cost-effective and reliable outcomes in an era of overwhelming data.
Collapse
Affiliation(s)
- Mohammad Kalim Ahmad Khan
- Department of Bioengineering, Faculty of Engineering , Integral University , Lucknow , Uttar Pradesh , 226026 , India
| | - Salman Akhtar
- Department of Bioengineering, Faculty of Engineering , Integral University , Lucknow , Uttar Pradesh , 226026 , India
| |
Collapse
|
154
|
Tran KA, Kondrashova O, Bradley A, Williams ED, Pearson JV, Waddell N. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med 2021; 13:152. [PMID: 34579788 PMCID: PMC8477474 DOI: 10.1186/s13073-021-00968-x] [Citation(s) in RCA: 318] [Impact Index Per Article: 79.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Accepted: 09/12/2021] [Indexed: 12/13/2022] Open
Abstract
Deep learning is a subdiscipline of artificial intelligence that uses a machine learning technique called artificial neural networks to extract patterns and make predictions from large data sets. The increasing adoption of deep learning across healthcare domains together with the availability of highly characterised cancer datasets has accelerated research into the utility of deep learning in the analysis of the complex biology of cancer. While early results are promising, this is a rapidly evolving field with new knowledge emerging in both cancer biology and deep learning. In this review, we provide an overview of emerging deep learning techniques and how they are being applied to oncology. We focus on the deep learning applications for omics data types, including genomic, methylation and transcriptomic data, as well as histopathology-based genomic inference, and provide perspectives on how the different data types can be integrated to develop decision support tools. We provide specific examples of how deep learning may be applied in cancer diagnosis, prognosis and treatment management. We also assess the current limitations and challenges for the application of deep learning in precision oncology, including the lack of phenotypically rich data and the need for more explainable deep learning models. Finally, we conclude with a discussion of how current obstacles can be overcome to enable future clinical utilisation of deep learning.
Collapse
Affiliation(s)
- Khoa A. Tran
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology (QUT), Brisbane, 4059 Australia
| | - Olga Kondrashova
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| | - Andrew Bradley
- Faculty of Engineering, Queensland University of Technology (QUT), Brisbane, 4000 Australia
| | - Elizabeth D. Williams
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology (QUT), Brisbane, 4059 Australia
- Australian Prostate Cancer Research Centre - Queensland (APCRC-Q) and Queensland Bladder Cancer Initiative (QBCI), Brisbane, 4102 Australia
| | - John V. Pearson
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| | - Nicola Waddell
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| |
Collapse
|
155
|
Kim J, Park S, Min D, Kim W. Comprehensive Survey of Recent Drug Discovery Using Deep Learning. Int J Mol Sci 2021; 22:9983. [PMID: 34576146 PMCID: PMC8470987 DOI: 10.3390/ijms22189983] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/09/2021] [Accepted: 09/10/2021] [Indexed: 02/07/2023] Open
Abstract
Drug discovery based on artificial intelligence has been in the spotlight recently as it significantly reduces the time and cost required for developing novel drugs. With the advancement of deep learning (DL) technology and the growth of drug-related data, numerous deep-learning-based methodologies are emerging at all steps of drug development processes. In particular, pharmaceutical chemists have faced significant issues with regard to selecting and designing potential drugs for a target of interest to enter preclinical testing. The two major challenges are prediction of interactions between drugs and druggable targets and generation of novel molecular structures suitable for a target of interest. Therefore, we reviewed recent deep-learning applications in drug-target interaction (DTI) prediction and de novo drug design. In addition, we introduce a comprehensive summary of a variety of drug and protein representations, DL models, and commonly used benchmark datasets or tools for model training and testing. Finally, we present the remaining challenges for the promising future of DL-based DTI prediction and de novo drug design.
Collapse
Affiliation(s)
- Jintae Kim
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
| | - Sera Park
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
| | - Dongbo Min
- Computer Vision Lab, Department of Computer Science and Engineering, Ewha Womans University, Seoul 03760, Korea
| | - Wankyu Kim
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
- System Pharmacology Lab, Department of Life Sciences, Ewha Womans University, Seoul 03760, Korea
| |
Collapse
|
156
|
ALAKUŞ TB, TÜRKOĞLU İ. Kanser Teşhisinde Protein Haritalama Tekniklerinin Başarımlarının Derin Öğrenme Kullanılarak Karşılaştırılması. FIRAT ÜNIVERSITESI MÜHENDISLIK BILIMLERI DERGISI 2021; 33:547-565. [DOI: 10.35234/fumbd.881228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Kanser, dünya çapında çoğu insanın ölmesine neden olan ve birçok farklı alt tiplerden oluşan heterojen bir hastalıktır. Bir kanser türünün erken teşhisi ve prognozu, hastaların sonraki klinik takibini kolaylaştırabildiği için kanser araştırmalarında bir gereklilik haline gelmiştir. Bunun için en çok kullanılan yöntemlerden birisi histolojik incelemedir. Ancak bu yöntemde çok sayıda gözlemciler arası değişkenlik bulunmakta, bu ise inceleme sürecinin uzun olmasına ve zaman almasına neden olmaktadır. Bu dezavantajın önüne geçmek için araştırmacılar hesaplama-tabanlı yaklaşımlara yönelmişler ve kanserli proteinlerin belirlenmesi için protein-protein etkileşimleri, protein etkileşim ağları ve moleküler parmak izleri yöntemlerinden yararlanmaktadırlar. Bu yöntemler arasında, çeşitli çalışmalar genomik bilgilerden de kanserli hücrelerin tespit edilebildiğini göstermiştir. Kansere ait genlerin dizilimlerine göre belirli kanser türlerinin belirlenebildiği ve bu süreçte yapay öğrenme tabanlı yaklaşımların etkili olduğu görülmüştür. Bu çalışmada, derin öğrenme algoritmalarından birisi olan tekrarlayıcı sinir ağı mimarisi kullanılmış ve insana ait mesane, kolon ve prostat kanserlerinin, protein dizilimlerine göre sınıflandırılması yapılmıştır. Çalışma, verilerin elde edilmesi, protein dizilimlerinin sayısallaştırılması, derin öğrenme model uygulamasının geliştirilmesi ve protein haritalama tekniklerinin başarımının karşılaştırılması olmak üzere dört aşamadan meydana gelmektedir. Protein dizilimlerini sayısallaştırmak için AESNN1, hidrofobiklik, tam sayı, Miyazawa enerjileri ve rastgele kodlama yöntemleri ele alınmıştır. Çalışmanın sonunda, mesane kanseri için en yüksek doğruluk değeri %87.15 ile AESNN1 haritalama yöntemiyle, kolon kanseri ve prostat kanseri için ise en yüksek doğruluk değeri sırasıyla %94.40 ve %75.45 olarak Miyazawa enerjileri ve rastgele kodlama protein haritalama yöntemi ile elde edilmiştir. Bu çalışma ile yapay öğrenme ve protein haritalama tekniklerinin, kanserli protein dizilimlerinin belirlenmesinde etkili olduğu gözlemlenmiştir.
Collapse
|
157
|
Li Y, Qiao G, Wang K, Wang G. Drug-target interaction predication via multi-channel graph neural networks. Brief Bioinform 2021; 23:6363570. [PMID: 34661237 DOI: 10.1093/bib/bbab346] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 07/21/2021] [Accepted: 08/12/2021] [Indexed: 12/15/2022] Open
Abstract
Drug-target interaction (DTI) is an important step in drug discovery. Although there are many methods for predicting drug targets, these methods have limitations in using discrete or manual feature representations. In recent years, deep learning methods have been used to predict DTIs to improve these defects. However, most of the existing deep learning methods lack the fusion of topological structure and semantic information in DPP representation learning process. Besides, when learning the DPP node representation in the DPP network, the different influences between neighboring nodes are ignored. In this paper, a new model DTI-MGNN based on multi-channel graph convolutional network and graph attention is proposed for DTI prediction. We use two independent graph attention networks to learn the different interactions between nodes for the topology graph and feature graph with different strengths. At the same time, we use a graph convolutional network with shared weight matrices to learn the common information of the two graphs. The DTI-MGNN model combines topological structure and semantic features to improve the representation learning ability of DPPs, and obtain the state-of-the-art results on public datasets. Specifically, DTI-MGNN has achieved a high accuracy in identifying DTIs (the area under the receiver operating characteristic curve is 0.9665).
Collapse
Affiliation(s)
- Yang Li
- College of Information and Computer Engineering, Northeast Forestry University, 150004, Harbin, China
| | - Guanyu Qiao
- College of Information and Computer Engineering, Northeast Forestry University, 150004, Harbin, China
| | - Keqi Wang
- College of Information and Computer Engineering, Northeast Forestry University, 150004, Harbin, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, 150004, Harbin, China
| |
Collapse
|
158
|
Prediction of Drug-Target Interactions by Combining Dual-Tree Complex Wavelet Transform with Ensemble Learning Method. Molecules 2021; 26:molecules26175359. [PMID: 34500792 PMCID: PMC8433937 DOI: 10.3390/molecules26175359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 08/27/2021] [Accepted: 08/30/2021] [Indexed: 11/17/2022] Open
Abstract
Identification of drug–target interactions (DTIs) is vital for drug discovery. However, traditional biological approaches have some unavoidable shortcomings, such as being time consuming and expensive. Therefore, there is an urgent need to develop novel and effective computational methods to predict DTIs in order to shorten the development cycles of new drugs. In this study, we present a novel computational approach to identify DTIs, which uses protein sequence information and the dual-tree complex wavelet transform (DTCWT). More specifically, a position-specific scoring matrix (PSSM) was performed on the target protein sequence to obtain its evolutionary information. Then, DTCWT was used to extract representative features from the PSSM, which were then combined with the drug fingerprint features to form the feature descriptors. Finally, these descriptors were sent to the Rotation Forest (RoF) model for classification. A 5-fold cross validation (CV) was adopted on four datasets (Enzyme, Ion Channel, GPCRs (G-protein-coupled receptors), and NRs (Nuclear Receptors)) to validate the proposed model; our method yielded high average accuracies of 89.21%, 85.49%, 81.02%, and 74.44%, respectively. To further verify the performance of our model, we compared the RoF classifier with two state-of-the-art algorithms: the support vector machine (SVM) and the k-nearest neighbor (KNN) classifier. We also compared it with some other published methods. Moreover, the prediction results for the independent dataset further indicated that our method is effective for predicting potential DTIs. Thus, we believe that our method is suitable for facilitating drug discovery and development.
Collapse
|
159
|
Lim G, Lim CJ, Lee JH, Lee BH, Ryu JY, Oh KS. Identification of new target proteins of a Urotensin-II receptor antagonist using transcriptome-based drug repositioning approach. Sci Rep 2021; 11:17138. [PMID: 34429474 PMCID: PMC8384862 DOI: 10.1038/s41598-021-96612-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 08/11/2021] [Indexed: 12/20/2022] Open
Abstract
Drug repositioning research using transcriptome data has recently attracted attention. In this study, we attempted to identify new target proteins of the urotensin-II receptor antagonist, KR-37524 (4-(3-bromo-4-(piperidin-4-yloxy)benzyl)-N-(3-(dimethylamino)phenyl)piperazine-1-carboxamide dihydrochloride), using a transcriptome-based drug repositioning approach. To do this, we obtained KR-37524-induced gene expression profile changes in four cell lines (A375, A549, MCF7, and PC3), and compared them with the approved drug-induced gene expression profile changes available in the LINCS L1000 database to identify approved drugs with similar gene expression profile changes. Here, the similarity between the two gene expression profile changes was calculated using the connectivity score. We then selected proteins that are known targets of the top three approved drugs with the highest connectivity score in each cell line (12 drugs in total) as potential targets of KR-37524. Seven potential target proteins were experimentally confirmed using an in vitro binding assay. Through this analysis, we identified that neurologically regulated serotonin transporter proteins are new target proteins of KR-37524. These results indicate that the transcriptome-based drug repositioning approach can be used to identify new target proteins of a given compound, and we provide a standalone software developed in this study that will serve as a useful tool for drug repositioning.
Collapse
Affiliation(s)
- Gyutae Lim
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, 141 Gajeong-ro, Yuseong-gu, Daejeon, 34114, Republic of Korea
| | - Chae Jo Lim
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, 141 Gajeong-ro, Yuseong-gu, Daejeon, 34114, Republic of Korea
| | - Jeong Hyun Lee
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, 141 Gajeong-ro, Yuseong-gu, Daejeon, 34114, Republic of Korea
| | - Byung Ho Lee
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, 141 Gajeong-ro, Yuseong-gu, Daejeon, 34114, Republic of Korea
| | - Jae Yong Ryu
- Department of Biotechnology, Duksung Women's University, 33 Samyang-ro 144-gil, Dobong-gu, Seoul, 01369, Republic of Korea.
| | - Kwang-Seok Oh
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, 141 Gajeong-ro, Yuseong-gu, Daejeon, 34114, Republic of Korea. .,Department of Medicinal and Pharmaceutical Chemistry, University of Science and Technology, 217 Gajeong-ro, Yuseong,-gu, Daejeon, 34113, Republic of Korea.
| |
Collapse
|
160
|
Recent Advances in In Silico Target Fishing. Molecules 2021; 26:molecules26175124. [PMID: 34500568 PMCID: PMC8433825 DOI: 10.3390/molecules26175124] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 08/14/2021] [Accepted: 08/18/2021] [Indexed: 12/24/2022] Open
Abstract
In silico target fishing, whose aim is to identify possible protein targets for a query molecule, is an emerging approach used in drug discovery due its wide variety of applications. This strategy allows the clarification of mechanism of action and biological activities of compounds whose target is still unknown. Moreover, target fishing can be employed for the identification of off targets of drug candidates, thus recognizing and preventing their possible adverse effects. For these reasons, target fishing has increasingly become a key approach for polypharmacology, drug repurposing, and the identification of new drug targets. While experimental target fishing can be lengthy and difficult to implement, due to the plethora of interactions that may occur for a single small-molecule with different protein targets, an in silico approach can be quicker, less expensive, more efficient for specific protein structures, and thus easier to employ. Moreover, the possibility to use it in combination with docking and virtual screening studies, as well as the increasing number of web-based tools that have been recently developed, make target fishing a more appealing method for drug discovery. It is especially worth underlining the increasing implementation of machine learning in this field, both as a main target fishing approach and as a further development of already applied strategies. This review reports on the main in silico target fishing strategies, belonging to both ligand-based and receptor-based approaches, developed and applied in the last years, with a particular attention to the different web tools freely accessible by the scientific community for performing target fishing studies.
Collapse
|
161
|
Liu G, Singha M, Pu L, Neupane P, Feinstein J, Wu HC, Ramanujam J, Brylinski M. GraphDTI: A robust deep learning predictor of drug-target interactions from multiple heterogeneous data. J Cheminform 2021; 13:58. [PMID: 34380569 PMCID: PMC8356453 DOI: 10.1186/s13321-021-00540-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 07/31/2021] [Indexed: 12/22/2022] Open
Abstract
Traditional techniques to identify macromolecular targets for drugs utilize solely the information on a query drug and a putative target. Nonetheless, the mechanisms of action of many drugs depend not only on their binding affinity toward a single protein, but also on the signal transduction through cascades of molecular interactions leading to certain phenotypes. Although using protein-protein interaction networks and drug-perturbed gene expression profiles can facilitate system-level investigations of drug-target interactions, utilizing such large and heterogeneous data poses notable challenges. To improve the state-of-the-art in drug target identification, we developed GraphDTI, a robust machine learning framework integrating the molecular-level information on drugs, proteins, and binding sites with the system-level information on gene expression and protein-protein interactions. In order to properly evaluate the performance of GraphDTI, we compiled a high-quality benchmarking dataset and devised a new cluster-based cross-validation protocol. Encouragingly, GraphDTI not only yields an AUC of 0.996 against the validation dataset, but it also generalizes well to unseen data with an AUC of 0.939, significantly outperforming other predictors. Finally, selected examples of identified drugtarget interactions are validated against the biomedical literature. Numerous applications of GraphDTI include the investigation of drug polypharmacological effects, side effects through offtarget binding, and repositioning opportunities.
Collapse
Affiliation(s)
- Guannan Liu
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Manali Singha
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Limeng Pu
- Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Prasanga Neupane
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Joseph Feinstein
- Department of Computer Science, Brown University, Providence, RI, 02902, USA
| | - Hsiao-Chun Wu
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - J Ramanujam
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA, 70803, USA.,Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, 70803, USA. .,Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, 70803, USA.
| |
Collapse
|
162
|
Zhang S, Wang J, Lin Z, Liang Y. Application of Machine Learning Techniques in Drug-target Interactions Prediction. Curr Pharm Des 2021; 27:2076-2087. [PMID: 33238865 DOI: 10.2174/1381612826666201125105730] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Accepted: 08/06/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Drug-Target interactions are vital for drug design and drug repositioning. However, traditional lab experiments are both expensive and time-consuming. Various computational methods which applied machine learning techniques performed efficiently and effectively in the field. RESULTS The machine learning methods can be divided into three categories basically: Supervised methods, Semi-Supervised methods and Unsupervised methods. We reviewed recent representative methods applying machine learning techniques of each category in DTIs and summarized a brief list of databases frequently used in drug discovery. In addition, we compared the advantages and limitations of these methods in each category. CONCLUSION Every prediction model has both strengths and weaknesses and should be adopted in proper ways. Three major problems in DTIs prediction including the lack of nonreactive drug-target pairs data sets, over optimistic results due to the biases and the exploiting of regression models on DTIs prediction should be seriously considered.
Collapse
Affiliation(s)
- Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Jiesheng Wang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Zhenhui Lin
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Yunyun Liang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| |
Collapse
|
163
|
Perpetuo L, Klein J, Ferreira R, Guedes S, Amado F, Leite-Moreira A, Silva AMS, Thongboonkerd V, Vitorino R. How can artificial intelligence be used for peptidomics? Expert Rev Proteomics 2021; 18:527-556. [PMID: 34343059 DOI: 10.1080/14789450.2021.1962303] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
INTRODUCTION Peptidomics is an emerging field of omics sciences using advanced isolation, analysis, and computational techniques that enable qualitative and quantitative analyses of various peptides in biological samples. Peptides can act as useful biomarkers and as therapeutic molecules for diseases. AREAS COVERED The use of therapeutic peptides can be predicted quickly and efficiently using data-driven computational methods, particularly artificial intelligence (AI) approach. Various AI approaches are useful for peptide-based drug discovery, such as support vector machine, random forest, extremely randomized trees, and other more recently developed deep learning methods. AI methods are relatively new to the development of peptide-based therapies, but these techniques already become essential tools in protein science by dissecting novel therapeutic peptides and their functions (Figure 1).[Figure: see text]. EXPERT OPINION Researchers have shown that AI models can facilitate the development of peptidomics and selective peptide therapies in the field of peptide science. Biopeptide prediction is important for the discovery and development of successful peptide-based drugs. Due to their ability to predict therapeutic roles based on sequence details, many AI-dependent prediction tools have been developed (Figure 1).
Collapse
Affiliation(s)
- Luís Perpetuo
- iBiMED, Department of Medical Sciences, University of Aveiro, Aveiro
| | - Julie Klein
- Institut National de la Santé et de la Recherche Médicale (INSERM), U1297, Institute of Cardiovascular and Metabolic Disease, Université Toulouse III, Toulouse, France
| | - Rita Ferreira
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Sofia Guedes
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Francisco Amado
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Adelino Leite-Moreira
- UnIC, Departamento de Cirurgia e Fisiologia, Faculdade de Medicina da Universidade do Porto, Porto
| | - Artur M S Silva
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Visith Thongboonkerd
- Medical Proteomics Unit, Office for Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Rui Vitorino
- iBiMED, Department of Medical Sciences, University of Aveiro, Aveiro.,LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro.,UnIC, Departamento de Cirurgia e Fisiologia, Faculdade de Medicina da Universidade do Porto, Porto
| |
Collapse
|
164
|
El-Behery H, Attia AF, El-Feshawy N, Torkey H. Efficient machine learning model for predicting drug-target interactions with case study for Covid-19. Comput Biol Chem 2021; 93:107536. [PMID: 34271420 PMCID: PMC8256690 DOI: 10.1016/j.compbiolchem.2021.107536] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 06/23/2021] [Accepted: 06/24/2021] [Indexed: 11/30/2022]
Abstract
BACKGROUND Discover possible Drug Target Interactions (DTIs) is a decisive step in the detection of the effects of drugs as well as drug repositioning. There is a strong incentive to develop effective computational methods that can effectively predict potential DTIs, as traditional DTI laboratory experiments are expensive, time-consuming, and labor-intensive. Some technologies have been developed for this purpose, however large numbers of interactions have not yet been detected, the accuracy of their prediction still low, and protein sequences and structured data are rarely used together in the prediction process. METHODS This paper presents DTIs prediction model that takes advantage of the special capacity of the structured form of proteins and drugs. Our model obtains features from protein amino-acid sequences using physical and chemical properties, and from drugs smiles (Simplified Molecular Input Line Entry System) strings using encoding techniques. Comparing the proposed model with different existing methods under K-fold cross validation, empirical results show that our model based on ensemble learning algorithms for DTI prediction provide more accurate results from both structures and features data. RESULTS The proposed model is applied on two datasets:Benchmark (feature only) datasets and DrugBank (Structure data) datasets. Experimental results obtained by Light-Boost and ExtraTree using structures and feature data results in 98 % accuracy and 0.97 f-score comparing to 94 % and 0.92 achieved by the existing methods. Moreover, our model can successfully predict more yet undiscovered interactions, and hence can be used as a practical tool to drug repositioning. A case study of applying our prediction model on the proteins that are known to be affected by Corona viruses in order to predict the possible interactions among these proteins and existing drugs is performed. Also, our model is applied on Covid-19 related drugs announced on DrugBank. The results show that some drugs like DB00691 and DB05203 are predicted with 100 % accuracy to interact with ACE2 protein. This protein is a self-membrane protein that enables Covid-19 infection. Hence, our model can be used as an effective tool in drug reposition to predict possible drug treatments for Covid-19.
Collapse
Affiliation(s)
- Heba El-Behery
- Department of Computer Science and Engineering, Faculty of Engineering, Kafrelsheikh University, Kafr_El_Sheikh, Egypt.
| | - Abdel-Fattah Attia
- Department of Computer Science and Engineering, Faculty of Engineering, Kafrelsheikh University, Kafr_El_Sheikh, Egypt.
| | - Nawal El-Feshawy
- Computer Science & Engineering Department, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt.
| | - Hanaa Torkey
- Computer Science & Engineering Department, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt.
| |
Collapse
|
165
|
Wang S, Jiang M, Zhang S, Wang X, Yuan Q, Wei Z, Li Z. MCN-CPI: Multiscale Convolutional Network for Compound-Protein Interaction Prediction. Biomolecules 2021; 11:1119. [PMID: 34439785 PMCID: PMC8392217 DOI: 10.3390/biom11081119] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 07/19/2021] [Accepted: 07/26/2021] [Indexed: 01/09/2023] Open
Abstract
In the process of drug discovery, identifying the interaction between the protein and the novel compound plays an important role. With the development of technology, deep learning methods have shown excellent performance in various situations. However, the compound-protein interaction is complicated and the features extracted by most deep models are not comprehensive, which limits the performance to a certain extent. In this paper, we proposed a multiscale convolutional network that extracted the local and global features of the protein and the topological feature of the compound using different types of convolutional networks. The results showed that our model obtained the best performance compared with the existing deep learning methods.
Collapse
Affiliation(s)
- Shuang Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China;
| | - Mingjian Jiang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, China;
| | - Shugang Zhang
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (X.W.); (Q.Y.); (Z.W.)
| | - Xiaofeng Wang
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (X.W.); (Q.Y.); (Z.W.)
| | - Qing Yuan
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (X.W.); (Q.Y.); (Z.W.)
| | - Zhiqiang Wei
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (X.W.); (Q.Y.); (Z.W.)
| | - Zhen Li
- College of Computer Science and Technology, Qingdao University, Qingdao 266071, China
| |
Collapse
|
166
|
Hinnerichs T, Hoehndorf R. DTI-Voodoo: machine learning over interaction networks and ontology-based background knowledge predicts drug-target interactions. Bioinformatics 2021; 37:4835-4843. [PMID: 34320178 PMCID: PMC8665763 DOI: 10.1093/bioinformatics/btab548] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/14/2021] [Accepted: 07/26/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION In silico drug-target interaction (DTI) prediction is important for drug discovery and drug repurposing. Approaches to predict DTIs can proceed indirectly, top-down, using phenotypic effects of drugs to identify potential drug targets, or they can be direct, bottom-up and use molecular information to directly predict binding affinities. Both approaches can be combined with information about interaction networks. RESULTS We developed DTI-Voodoo as a computational method that combines molecular features and ontology-encoded phenotypic effects of drugs with protein-protein interaction networks, and uses a graph convolutional neural network to predict DTIs. We demonstrate that drug effect features can exploit information in the interaction network whereas molecular features do not. DTI-Voodoo is designed to predict candidate drugs for a given protein; we use this formulation to show that common DTI datasets contain intrinsic biases with major effects on performance evaluation and comparison of DTI prediction methods. Using a modified evaluation scheme, we demonstrate that DTI-Voodoo improves significantly over state of the art DTI prediction methods. AVAILABILITY DTI-Voodoo source code and data necessary to reproduce results are freely available at https://github.com/THinnerichs/DTI-VOODOO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tilman Hinnerichs
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| |
Collapse
|
167
|
Logistic matrix factorisation and generative adversarial neural network-based method for predicting drug-target interactions. Mol Divers 2021; 25:1497-1516. [PMID: 34297278 DOI: 10.1007/s11030-021-10273-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 07/04/2021] [Indexed: 12/21/2022]
Abstract
Identifying drug-target protein association pairs is a prerequisite and a crucial task in drug discovery and development. Numerous computational models, based on different assumptions and algorithms, have been proposed as an alternative to the laborious, costly, and time-consuming traditional wet-lab methods. Most proposed methods focus on separated drug and target descriptors, calculated, respectively, from chemical structures and protein sequences, and fail to introduce and extract features where the interaction information is embedded. In this paper, we propose a new three-step method based on matrix factorisation and generative adversarial network (GAN) for drug-target interaction prediction. Firstly, the matrix factorisation technique is used to capture and extract the joint interaction feature, for both drugs and targets, from the drug-target interaction matrix. Then, a GAN is introduced for data augmentation. It generates a fake positive sample similar to the real positive sample (known interactions) in order to balance the samples, allow the exploitation of the entire negative sample, and increase the data size for an accurate prediction. Finally, a fully connected four-layer neural network is built for classification. Experimental results illustrate a higher prediction performance of the proposed method compared to shallow classifiers and to state-of-the-art methods with an accuracy higher than 97%. Moreover, the data generation effect is confirmed by evaluating the proposed method with and without the generation step. These results demonstrated the efficiency of the latent interaction features and data generation on predicting new drugs or repurposing existing drugs. Overview of the WGANMF-DTI workflow for the Drug-Target Interaction Prediction task.
Collapse
|
168
|
Piroozmand F, Mohammadipanah F, Sajedi H. Spectrum of deep learning algorithms in drug discovery. Chem Biol Drug Des 2021; 96:886-901. [PMID: 33058458 DOI: 10.1111/cbdd.13674] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 02/11/2020] [Accepted: 02/19/2020] [Indexed: 12/16/2022]
Abstract
Deep learning (DL) algorithms are a subset of machine learning algorithms with the aim of modeling complex mapping between a set of elements and their classes. In parallel to the advance in revealing the molecular bases of diseases, a notable innovation has been undertaken to apply DL in data/libraries management, reaction optimizations, differentiating uncertainties, molecule constructions, creating metrics from qualitative results, and prediction of structures or interactions. From source identification to lead discovery and medicinal chemistry of the drug candidate, drug delivery, and modification, the challenges can be subjected to artificial intelligence algorithms to aid in the generation and interpretation of data. Discovery and design approach, both demand automation, large data management and data fusion by the advance in high-throughput mode. The application of DL can accelerate the exploration of drug mechanisms, finding novel indications for existing drugs (drug repositioning), drug development, and preclinical and clinical studies. The impact of DL in the workflow of drug discovery, design, and their complementary tools are highlighted in this review. Additionally, the type of DL algorithms used for this purpose, and their pros and cons along with the dominant directions of future research are presented.
Collapse
Affiliation(s)
- Firoozeh Piroozmand
- Pharmaceutical Biotechnology Lab, Department of Microbiology, School of Biology and Center of Excellence in Phylogeny of Living Organisms, College of Science, University of Tehran, Tehran, Iran
| | - Fatemeh Mohammadipanah
- Pharmaceutical Biotechnology Lab, Department of Microbiology, School of Biology and Center of Excellence in Phylogeny of Living Organisms, College of Science, University of Tehran, Tehran, Iran
| | - Hedieh Sajedi
- Department of Computer Science, School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
| |
Collapse
|
169
|
Yu Z, Lu J, Jin Y, Yang Y. KenDTI: An Ensemble Model for Predicting Drug-Target Interaction by Integrating Multi-Source Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1305-1314. [PMID: 33877984 DOI: 10.1109/tcbb.2021.3074401] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The identification of drug-target interactions (DTIs) is an essential step in the process of drug discovery. As experimental validation suffers from high cost and low success rate, various computational models have been exploited to infer potential DTIs. The performance of DTI prediction depends heavily on the features extracted from drugs and target proteins. The existing predictors vary in input information and each has its own advantages. Therefore, combining the advantages of individual models and generating high-quality representations for drug-target pairs are effective ways to improve the performance of DTI prediction. In this study, we exploit both biochemical characteristics of drugs via network integration and molecular sequences via word embeddings, then we develop an ensemble model, KenDTI, based on two types of methods, i.e., network-based and classification-based. We assess the performance of KenDTI on two large-scale datasets, The experimental results show that KenDTI outperforms the state-of-the-art DTI predictors by a large margin. Moreover, KenDTI is robust against missing data in input networks and lack of prior knowledge. It is able to predict for drug-candidate chemical compounds with scarce information.
Collapse
|
170
|
Pliakos K, Vens C, Tsoumakas G. Predicting Drug-Target Interactions With Multi-Label Classification and Label Partitioning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1596-1607. [PMID: 31689203 DOI: 10.1109/tcbb.2019.2951378] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Identifying drug-target interactions is crucial for drug discovery. Despite modern technologies used in drug screening, experimental identification of drug-target interactions is an extremely demanding task. Predicting drug-target interactions in silico can thereby facilitate drug discovery as well as drug repositioning. Various machine learning models have been developed over the years to predict such interactions. Multi-output learning models in particular have drawn the attention of the scientific community due to their high predictive performance and computational efficiency. These models are based on the assumption that all the labels are correlated with each other. However, this assumption is too optimistic. Here, we address drug-target interaction prediction as a multi-label classification task that is combined with label partitioning. We show that building multi-output learning models over groups (clusters) of labels often leads to superior results. The performed experiments confirm the efficiency of the proposed framework.
Collapse
|
171
|
Zhou D, Peng S, Wei DQ, Zhong W, Dou Y, Xie X. LUNAR :Drug Screening for Novel Coronavirus Based on Representation Learning Graph Convolutional Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1290-1298. [PMID: 34081583 PMCID: PMC8769035 DOI: 10.1109/tcbb.2021.3085972] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 04/23/2021] [Accepted: 05/30/2021] [Indexed: 06/12/2023]
Abstract
An outbreak of COVID-19 that began in late 2019 was caused by a novel coronavirus(SARS-CoV-2). It has become a global pandemic. As of June 9, 2020, it has infected nearly 7 million people and killed more than 400,000, but there is no specific drug. Therefore, there is an urgent need to find or develop more drugs to suppress the virus. Here, we propose a new nonlinear end-to-end model called LUNAR. It uses graph convolutional neural networks to automatically learn the neighborhood information of complex heterogeneous relational networks and combines the attention mechanism to reflect the importance of the sum of different types of neighborhood information to obtain the representation characteristics of each node. Finally, through the topology reconstruction process, the feature representations of drugs and targets are forcibly extracted to match the observed network as much as possible. Through this reconstruction process, we obtain the strength of the relationship between different nodes and predict drug candidates that may affect the treatment of COVID-19 based on the known targets of COVID-19. These selected candidate drugs can be used as a reference for experimental scientists and accelerate the speed of drug development. LUNAR can well integrate various topological structure information in heterogeneous networks, and skillfully combine attention mechanisms to reflect the importance of neighborhood information of different types of nodes, improving the interpretability of the model. The area under the curve(AUC) of the model is 0.949 and the accurate recall curve (AUPR) is 0.866 using 10-fold cross-validation. These two performance indexes show that the model has superior predictive performance. Besides, some of the drugs screened out by our model have appeared in some clinical studies to further illustrate the effectiveness of the model.
Collapse
Affiliation(s)
- Deshan Zhou
- College of Computer ScienceHunan UniversityChangshaHunan410082China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering & National Supercomputing Centre in ChangshaHunan UniversityChangshaHunan410082China
- School of Computer ScienceNational University of Defense TechnologyChangshaHunan410082China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghai200030China
- Peng Cheng LaboratoryShenzhenGuangdong518055China
| | - Wu Zhong
- National Engineering Research Center for the Emergency DrugBeijing Institute of Pharmacology and ToxicologyBeijing100850China
| | - Yutao Dou
- School of Computer ScienceThe University of SydneySydneyNSW2006Australia
| | - Xiaolan Xie
- School of Information Science and EngineeringGuilin University of TechnologyGuilin CityGuangxi541004China
| |
Collapse
|
172
|
Zhou D, Xu Z, Li W, Xie X, Peng S. MultiDTI: Drug-target interaction prediction based on multi-modal representation learning to bridge the gap between new chemical entities and known heterogeneous network. Bioinformatics 2021; 37:4485-4492. [PMID: 34180970 DOI: 10.1093/bioinformatics/btab473] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 05/27/2021] [Accepted: 06/27/2021] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION Predicting new drug-target interactions is an important step in new drug development, understanding of its side effects, and drug repositioning. Heterogeneous data sources can provide comprehensive information and different perspectives for drug-target interaction prediction. Thus, there have been many calculation methods relying on heterogeneous networks. Most of them use graph-related algorithms to characterize nodes in heterogeneous networks for predicting new DTI. However, these methods can only make predictions in known heterogeneous network datasets, and cannot support the prediction of new chemical entities outside the heterogeneous network, which hinder further drug discovery and development. RESULTS To solve this problem, we proposed a multi-modal DTI prediction model named 'MultiDTI' which uses our proposed joint learning framework based on heterogeneous networks. It combines the interaction or association information of the heterogeneous network and the drug/target sequence information, and maps the drugs, targets, side effects and disease nodes in the heterogeneous network into a common space. In this way, 'MultiDTI' can map the new chemical entity to this learned common space based on the chemical structure of the new entity. That is, bridging the gap between new chemical entities and known heterogeneous network. Our model has strong predictive performance, and the area under the receiver operating characteristic curve(AUC) of the model is 0.961 and the area under the precision recall curve (AUPRC) is 0.947 with 10-fold cross validation. In addition, some predicted new DTIs have been confirmed by ChEMBL database. Our results indicate that 'MultiDTI' is a powerful and practical tool for predicting new DTI, which can promote the development of drug discovery or drug repositioning. AVAILABILITY Python codes and dataset are available at https://github.com/Deshan-Zhou/MultiDTI/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Deshan Zhou
- Department of Computer Science, Hunan University, Changsha, 410082, China
| | - Zhijian Xu
- CAS Key Laboratory of Receptor Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
| | - WenTao Li
- Department of Computer Science, National University of Defense Technology, Changsha, 410073, China
| | - Xiaolan Xie
- College of Information Science and Engineering, Guilin University of Technology, Guilin, 541004, China
| | - Shaoliang Peng
- Department of Computer Science, Hunan University, Changsha, 410082, China.,Department of Computer Science, National University of Defense Technology, Changsha, 410073, China
| |
Collapse
|
173
|
Zhang HW, Lv C, Zhang LJ, Guo X, Shen YW, Nagle DG, Zhou YD, Liu SH, Zhang WD, Luan X. Application of omics- and multi-omics-based techniques for natural product target discovery. Biomed Pharmacother 2021; 141:111833. [PMID: 34175822 DOI: 10.1016/j.biopha.2021.111833] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 06/07/2021] [Accepted: 06/14/2021] [Indexed: 02/07/2023] Open
Abstract
Natural products continue to be an unparalleled source of pharmacologically active lead compounds because of their unprecedented structures and unique biological activities. Natural product target discovery is a vital component of natural product-based medicine translation and development and is required to understand and potentially reduce mechanisms that may be associated with off-target side effects and toxicity. Omics-based techniques, including genomics, transcriptomics, proteomics, metabolomics, and bioinformatics, have become recognized as effective tools needed to construct innovative strategies to discover natural product targets. Although considerable progress has been made, the successful discovery of natural product targets remains a challenging time-consuming process that has come to increasingly rely on the effective integration of multi-omics-based technologies to create emerging panomics (a.k.a., integrative omics, pan-omics, multiomics)-based strategies. This review summarizes a series of successful studies regarding the application of integrative omics-based methods in natural product target discovery. The advantages and disadvantages of each technique are discussed, with a particular focus on the systematic integration of multi-omics strategies. Further, emerging micro-scale single-cell-based techniques are introduced, especially to deal with minute natural product samples.
Collapse
Affiliation(s)
- Hong-Wei Zhang
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Chao Lv
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Li-Jun Zhang
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Xin Guo
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Yi-Wen Shen
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Dale G Nagle
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China; Department of BioMolecular Sciences and Research Institute of Pharmaceutical Sciences, School of Pharmacy, University of Mississippi, University-1848, MS 38677-1848, USA
| | - Yu-Dong Zhou
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China; Department of Chemistry and Biochemistry, University of Mississippi, University, MS 38677, USA
| | - San-Hong Liu
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China.
| | - Wei-Dong Zhang
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China; School of Pharmacy, Second Military Medical University, Shanghai 200433, China.
| | - Xin Luan
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China.
| |
Collapse
|
174
|
Predicting Drug-Target Interactions Based on the Ensemble Models of Multiple Feature Pairs. Int J Mol Sci 2021; 22:ijms22126598. [PMID: 34202954 PMCID: PMC8234024 DOI: 10.3390/ijms22126598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 06/09/2021] [Accepted: 06/16/2021] [Indexed: 11/30/2022] Open
Abstract
Backgroud: The prediction of drug–target interactions (DTIs) is of great significance in drug development. It is time-consuming and expensive in traditional experimental methods. Machine learning can reduce the cost of prediction and is limited by the characteristics of imbalanced datasets and problems of essential feature selection. Methods: The prediction method based on the Ensemble model of Multiple Feature Pairs (Ensemble-MFP) is introduced. Firstly, three negative sets are generated according to the Euclidean distance of three feature pairs. Then, the negative samples of the validation set/test set are randomly selected from the union set of the three negative sets in the validation set/test set. At the same time, the ensemble model with weight is optimized and applied to the test set. Results: The area under the receiver operating characteristic curve (area under ROC, AUC) in three out of four sub-datasets in gold standard datasets was more than 94.0% in the prediction of new drugs. The effectiveness of the proposed method is also shown with the comparison of state-of-the-art methods and demonstration of predicted drug–target pairs. Conclusion: The Ensemble-MFP can weigh the existing feature pairs and has a good prediction effect for general prediction on new drugs.
Collapse
|
175
|
Bai Q, Ma J, Liu S, Xu T, Banegas-Luna AJ, Pérez-Sánchez H, Tian Y, Huang J, Liu H, Yao X. WADDAICA: A webserver for aiding protein drug design by artificial intelligence and classical algorithm. Comput Struct Biotechnol J 2021; 19:3573-3579. [PMID: 34194678 PMCID: PMC8234348 DOI: 10.1016/j.csbj.2021.06.017] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 06/05/2021] [Accepted: 06/12/2021] [Indexed: 10/25/2022] Open
Abstract
Artificial intelligence can train the related known drug data into deep learning models for drug design, while classical algorithms can design drugs through established and predefined procedures. Both deep learning and classical algorithms have their merits for drug design. Here, the webserver WADDAICA is built to employ the advantage of deep learning model and classical algorithms for drug design. The WADDAICA mainly contains two modules. In the first module, WADDAICA provides deep learning models for scaffold hopping of compounds to modify or design new novel drugs. The deep learning model which is used in WADDAICA shows a good scoring power based on the PDBbind database. In the second module, WADDAICA supplies functions for modifying or designing new novel drugs by classical algorithms. WADDAICA shows better Pearson and Spearman correlations of binding affinity than Autodock Vina that is considered to have the best scoring power. Besides, WADDAICA supplies a friendly and convenient web interface for users to submit drug design jobs. We believe that WADDAICA is a useful and effective tool to help researchers to modify or design novel drugs by deep learning models and classical algorithms. WADDAICA is free and accessible at https://bqflab.github.io or https://heisenberg.ucam.edu:5000.
Collapse
Affiliation(s)
- Qifeng Bai
- Key Lab of Preclinical Study for New Drugs of Gansu Province, Institute of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, P. R. China
| | - Jian Ma
- School of Pharmacy, Lanzhou University, Lanzhou, Gansu 730000, P. R. China
| | - Shuo Liu
- School of Pharmacy, Lanzhou University, Lanzhou, Gansu 730000, P. R. China
| | | | - Antonio Jesús Banegas-Luna
- Structural Bioinformatics and High Performance Computing Research Group (BIO-HPC), Computer Engineering Department, UCAM Universidad Católica de Murcia, Murcia, Spain
| | - Horacio Pérez-Sánchez
- Structural Bioinformatics and High Performance Computing Research Group (BIO-HPC), Computer Engineering Department, UCAM Universidad Católica de Murcia, Murcia, Spain
| | - Yanan Tian
- School of Pharmacy, Lanzhou University, Lanzhou, Gansu 730000, P. R. China
| | | | - Huanxiang Liu
- School of Pharmacy, Lanzhou University, Lanzhou, Gansu 730000, P. R. China
| | - Xiaojun Yao
- Key Lab of Preclinical Study for New Drugs of Gansu Province, Institute of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, P. R. China
| |
Collapse
|
176
|
Applications of artificial intelligence to drug design and discovery in the big data era: a comprehensive review. Mol Divers 2021; 25:1643-1664. [PMID: 34110579 DOI: 10.1007/s11030-021-10237-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 05/26/2021] [Indexed: 10/21/2022]
Abstract
Artificial intelligence (AI) renders cutting-edge applications in diverse sectors of society. Due to substantial progress in high-performance computing, the development of superior algorithms, and the accumulation of huge biological and chemical data, computer-assisted drug design technology is playing a key role in drug discovery with its advantages of high efficiency, fast speed, and low cost. Over recent years, due to continuous progress in machine learning (ML) algorithms, AI has been extensively employed in various drug discovery stages. Very recently, drug design and discovery have entered the big data era. ML algorithms have progressively developed into a deep learning technique with potent generalization capability and more effectual big data handling, which further promotes the integration of AI technology and computer-assisted drug discovery technology, hence accelerating the design and discovery of the newest drugs. This review mainly summarizes the application progression of AI technology in the drug discovery process, and explores and compares its advantages over conventional methods. The challenges and limitations of AI in drug design and discovery have also been discussed.
Collapse
|
177
|
Huang K, Xiao C, Glass LM, Sun J. MolTrans: Molecular Interaction Transformer for drug-target interaction prediction. Bioinformatics 2021; 37:830-836. [PMID: 33070179 PMCID: PMC8098026 DOI: 10.1093/bioinformatics/btaa880] [Citation(s) in RCA: 168] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 08/23/2020] [Accepted: 10/07/2020] [Indexed: 01/02/2023] Open
Abstract
Motivation Drug–target interaction (DTI) prediction is a foundational task for in-silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (i) existing molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain and (ii) existing methods focus on limited labeled data while ignoring the value of massive unlabeled molecular data. Results We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (i) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction and (ii) an augmented transformer encoder to better extract and capture the semantic relations among sub-structures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real-world data and show it improved DTI prediction performance compared to state-of-the-art baselines. Availability and implementation The model scripts are available at https://github.com/kexinhuang12345/moltrans. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kexin Huang
- Health Data Science, Harvard University, Boston, MA 02120, USA
| | - Cao Xiao
- Analytics Center of Excellence, IQVIA, Cambridge, MA 02139, USA
| | - Lucas M Glass
- Analytics Center of Excellence, IQVIA, Cambridge, MA 02139, USA
| | - Jimeng Sun
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
178
|
Cichońska A, Ravikumar B, Allaway RJ, Wan F, Park S, Isayev O, Li S, Mason M, Lamb A, Tanoli Z, Jeon M, Kim S, Popova M, Capuzzi S, Zeng J, Dang K, Koytiger G, Kang J, Wells CI, Willson TM, Oprea TI, Schlessinger A, Drewry DH, Stolovitzky G, Wennerberg K, Guinney J, Aittokallio T. Crowdsourced mapping of unexplored target space of kinase inhibitors. Nat Commun 2021; 12:3307. [PMID: 34083538 PMCID: PMC8175708 DOI: 10.1038/s41467-021-23165-1] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Accepted: 04/15/2021] [Indexed: 12/31/2022] Open
Abstract
Despite decades of intensive search for compounds that modulate the activity of particular protein targets, a large proportion of the human kinome remains as yet undrugged. Effective approaches are therefore required to map the massive space of unexplored compound-kinase interactions for novel and potent activities. Here, we carry out a crowdsourced benchmarking of predictive algorithms for kinase inhibitor potencies across multiple kinase families tested on unpublished bioactivity data. We find the top-performing predictions are based on various models, including kernel learning, gradient boosting and deep learning, and their ensemble leads to a predictive accuracy exceeding that of single-dose kinase activity assays. We design experiments based on the model predictions and identify unexpected activities even for under-studied kinases, thereby accelerating experimental mapping efforts. The open-source prediction algorithms together with the bioactivities between 95 compounds and 295 kinases provide a resource for benchmarking prediction algorithms and for extending the druggable kinome.
Collapse
Affiliation(s)
- Anna Cichońska
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Department of Computer Science, Helsinki Institute for Information Technology (HIIT), Aalto University, Espoo, Finland
- Department of Computing, University of Turku, Turku, Finland
| | - Balaguru Ravikumar
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | | | - Fangping Wan
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Sungjoon Park
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Michael Mason
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | - Andrew Lamb
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | - Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | - Minji Jeon
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Sunkyu Kim
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Mariya Popova
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Stephen Capuzzi
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Kristen Dang
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | | | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Carrow I Wells
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - Timothy M Willson
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - Tudor I Oprea
- Translational Informatics Division and Comprehensive Cancer Center, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Avner Schlessinger
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - David H Drewry
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | | | - Krister Wennerberg
- Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark.
| | - Justin Guinney
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA.
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
- Department of Computer Science, Helsinki Institute for Information Technology (HIIT), Aalto University, Espoo, Finland.
- Department of Mathematics and Statistics, University of Turku, Turku, Finland.
- Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.
- Oslo Centre for Biostatistics and Epidemiology (OCBE), University of Oslo, Oslo, Norway.
| |
Collapse
|
179
|
Suh D, Lee JW, Choi S, Lee Y. Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction. Int J Mol Sci 2021; 22:6032. [PMID: 34199677 PMCID: PMC8199773 DOI: 10.3390/ijms22116032] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 05/29/2021] [Accepted: 05/29/2021] [Indexed: 01/23/2023] Open
Abstract
The new advances in deep learning methods have influenced many aspects of scientific research, including the study of the protein system. The prediction of proteins' 3D structural components is now heavily dependent on machine learning techniques that interpret how protein sequences and their homology govern the inter-residue contacts and structural organization. Especially, methods employing deep neural networks have had a significant impact on recent CASP13 and CASP14 competition. Here, we explore the recent applications of deep learning methods in the protein structure prediction area. We also look at the potential opportunities for deep learning methods to identify unknown protein structures and functions to be discovered and help guide drug-target interactions. Although significant problems still need to be addressed, we expect these techniques in the near future to play crucial roles in protein structural bioinformatics as well as in drug discovery.
Collapse
Affiliation(s)
- Donghyuk Suh
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Jai Woo Lee
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Sun Choi
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Yoonji Lee
- College of Pharmacy, Chung-Ang University, Seoul 06974, Korea
| |
Collapse
|
180
|
Abbasi K, Razzaghi P, Poso A, Ghanbari-Ara S, Masoudi-Nejad A. Deep Learning in Drug Target Interaction Prediction: Current and Future Perspectives. Curr Med Chem 2021; 28:2100-2113. [PMID: 32895036 DOI: 10.2174/0929867327666200907141016] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 07/30/2020] [Accepted: 07/30/2020] [Indexed: 11/22/2022]
Abstract
Drug-target Interactions (DTIs) prediction plays a central role in drug discovery. Computational methods in DTIs prediction have gained more attention because carrying out in vitro and in vivo experiments on a large scale is costly and time-consuming. Machine learning methods, especially deep learning, are widely applied to DTIs prediction. In this study, the main goal is to provide a comprehensive overview of deep learning-based DTIs prediction approaches. Here, we investigate the existing approaches from multiple perspectives. We explore these approaches to find out which deep network architectures are utilized to extract features from drug compound and protein sequences. Also, the advantages and limitations of each architecture are analyzed and compared. Moreover, we explore the process of how to combine descriptors for drug and protein features. Likewise, a list of datasets that are commonly used in DTIs prediction is investigated. Finally, current challenges are discussed and a short future outlook of deep learning in DTI prediction is given.
Collapse
Affiliation(s)
- Karim Abbasi
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran
| | - Parvin Razzaghi
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran
| | - Antti Poso
- School of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, Kuopio 80100, Finland
| | - Saber Ghanbari-Ara
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran
| |
Collapse
|
181
|
Kim QH, Ko JH, Kim S, Park N, Jhe W. Bayesian neural network with pretrained protein embedding enhances prediction accuracy of drug-protein interaction. Bioinformatics 2021; 37:3428-3435. [PMID: 33978713 PMCID: PMC8545317 DOI: 10.1093/bioinformatics/btab346] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 04/26/2021] [Accepted: 05/05/2021] [Indexed: 11/25/2022] Open
Abstract
Motivation Characterizing drug–protein interactions (DPIs) is crucial to the high-throughput screening for drug discovery. The deep learning-based approaches have attracted attention because they can predict DPIs without human trial and error. However, because data labeling requires significant resources, the available protein data size is relatively small, which consequently decreases model performance. Here, we propose two methods to construct a deep learning framework that exhibits superior performance with a small labeled dataset. Results At first, we use transfer learning in encoding protein sequences with a pretrained model, which trains general sequence representations in an unsupervised manner. Second, we use a Bayesian neural network to make a robust model by estimating the data uncertainty. Our resulting model performs better than the previous baselines at predicting interactions between molecules and proteins. We also show that the quantified uncertainty from the Bayesian inference is related to confidence and can be used for screening DPI data points. Availability and implementation The code is available at https://github.com/QHwan/PretrainDPI. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- QHwan Kim
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Joon-Hyuk Ko
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Sunghoon Kim
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Nojun Park
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Wonho Jhe
- Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea
| |
Collapse
|
182
|
Zhao Y, Tian S, Yu L, Zhang Z, Zhang W. Analysis and Classification of Hepatitis Infections Using Raman Spectroscopy and Multiscale Convolutional Neural Networks. JOURNAL OF APPLIED SPECTROSCOPY 2021; 88:441-451. [PMID: 33972806 PMCID: PMC8099702 DOI: 10.1007/s10812-021-01192-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Hepatitis infections represent a major health concern worldwide. Numerous computer-aided approaches have been devised for the early detection of hepatitis. In this study, we propose a method for the analysis and classification of cases of hepatitis-B virus ( HBV), hepatitis-C virus (HCV), and healthy subjects using Raman spectroscopy and a multiscale convolutional neural network (MSCNN). In particular, serum samples of HBV-infected patients (435 cases), HCV-infected patients (374 cases), and healthy persons (499 cases) are analyzed via Raman spectroscopy. The differences between Raman peaks in the measured serum spectra indicate specific biomolecular differences among the three classes. The dimensionality of the spectral data is reduced through principal component analysis. Subsequently, features are extracted, and then feature normalization is applied. Next, the extracted features are used to train different classifiers, namely MSCNN, a single-scale convolutional neural network, and other traditional classifiers. Among these classifiers, the MSCNN model achieved the best outcomes with a precision of 98.89%, sensitivity of 97.44%, specificity of 94.54%, and accuracy of 94.92%. Overall, the results demonstrate that Raman spectral analysis and MSCNN can be effectively utilized for rapid screening of hepatitis B and C cases.
Collapse
Affiliation(s)
- Y. Zhao
- Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000 China
| | - Sh. Tian
- Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000 China
| | - L. Yu
- College of Software Engineering at Xin Jiang University, Urumqi, 830000 China
| | - Zh. Zhang
- The First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830000 China
| | - W. Zhang
- Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000 China
| |
Collapse
|
183
|
Li P, Wang J, Qiao Y, Chen H, Yu Y, Yao X, Gao P, Xie G, Song S. An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Brief Bioinform 2021; 22:6262238. [PMID: 33940598 DOI: 10.1093/bib/bbab109] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 03/06/2021] [Accepted: 03/12/2021] [Indexed: 11/13/2022] Open
Abstract
How to produce expressive molecular representations is a fundamental challenge in artificial intelligence-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and poor generalization capability. Here, we propose a novel molecular pre-training graph-based deep learning framework, named MPG, that learns molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemical insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 14 benchmark datasets. The pre-trained MolGNet in MPG has the potential to become an advanced molecular encoder in the drug discovery pipeline.
Collapse
Affiliation(s)
- Pengyong Li
- Department of Biomedical Engineering at Tsinghua University, China
| | - Jun Wang
- Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China
| | - Yixuan Qiao
- Operations Research and Cybernetics at Beijing University of Technology, China
| | - Hao Chen
- Cybernetics at Beijing University of Technology, China
| | - Yihuan Yu
- Beijing University of Biomedical Engineering, China
| | - Xiaojun Yao
- Analytical Chemistry and Chemoinformatics at Lanzhou University, China
| | - Peng Gao
- Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China
| | - Guotong Xie
- Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China
| | - Sen Song
- Tsinghua Laboratory of Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Haidian, 100084 Beijing, China
| |
Collapse
|
184
|
Zhao BW, You ZH, Hu L, Guo ZH, Wang L, Chen ZH, Wong L. A Novel Method to Predict Drug-Target Interactions Based on Large-Scale Graph Representation Learning. Cancers (Basel) 2021; 13:2111. [PMID: 33925568 PMCID: PMC8123765 DOI: 10.3390/cancers13092111] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Revised: 04/20/2021] [Accepted: 04/22/2021] [Indexed: 11/22/2022] Open
Abstract
Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.
Collapse
Affiliation(s)
- Bo-Wei Zhao
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (B.-W.Z.); (L.H.); (Z.-H.G.); (L.W.); (L.W.)
- University of Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (B.-W.Z.); (L.H.); (Z.-H.G.); (L.W.); (L.W.)
- University of Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Lun Hu
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (B.-W.Z.); (L.H.); (Z.-H.G.); (L.W.); (L.W.)
- University of Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Zhen-Hao Guo
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (B.-W.Z.); (L.H.); (Z.-H.G.); (L.W.); (L.W.)
- University of Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Lei Wang
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (B.-W.Z.); (L.H.); (Z.-H.G.); (L.W.); (L.W.)
- University of Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Zhan-Heng Chen
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China;
| | - Leon Wong
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (B.-W.Z.); (L.H.); (Z.-H.G.); (L.W.); (L.W.)
- University of Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| |
Collapse
|
185
|
Cai T, Lim H, Abbu KA, Qiu Y, Nussinov R, Xie L. MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization. J Chem Inf Model 2021; 61:1570-1582. [PMID: 33757283 PMCID: PMC8154251 DOI: 10.1021/acs.jcim.0c01285] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Indexed: 01/14/2023]
Abstract
Small molecules play a critical role in modulating biological systems. Knowledge of chemical-protein interactions helps address fundamental and practical questions in biology and medicine. However, with the rapid emergence of newly sequenced genes, the endogenous or surrogate ligands of a vast number of proteins remain unknown. Homology modeling and machine learning are two major methods for assigning new ligands to a protein but mostly fail when sequence homology between an unannotated protein and those with known functions or structures is low. In this study, we develop a new deep learning framework to predict chemical binding to evolutionary divergent unannotated proteins, whose ligand cannot be reliably predicted by existing methods. By incorporating evolutionary information into self-supervised learning of unlabeled protein sequences, we develop a novel method, distilled sequence alignment embedding (DISAE), for the protein sequence representation. DISAE can utilize all protein sequences and their multiple sequence alignment (MSA) to capture functional relationships between proteins without the knowledge of their structure and function. Followed by the DISAE pretraining, we devise a module-based fine-tuning strategy for the supervised learning of chemical-protein interactions. In the benchmark studies, DISAE significantly improves the generalizability of machine learning models and outperforms the state-of-the-art methods by a large margin. Comprehensive ablation studies suggest that the use of MSA, sequence distillation, and triplet pretraining critically contributes to the success of DISAE. The interpretability analysis of DISAE suggests that it learns biologically meaningful information. We further use DISAE to assign ligands to human orphan G-protein coupled receptors (GPCRs) and to cluster the human GPCRome by integrating their phylogenetic and ligand relationships. The promising results of DISAE open an avenue for exploring the chemical landscape of entire sequenced genomes.
Collapse
Affiliation(s)
- Tian Cai
- Ph.D.
Program in Computer Science, The Graduate Center, The City University of New York, New York, New York 10016, United States
| | - Hansaim Lim
- Ph.D.
Program in Biochemistry, The Graduate Center, The City University of New York, New York, New York 10016, United States
| | - Kyra Alyssa Abbu
- Department
of Computer Science, Hunter College, The
City University of New York, New York, New York 10065, United States
| | - Yue Qiu
- Ph.D.
Program in Biology, The Graduate Center, The City University of New York, New York, New York 10016, United States
| | - Ruth Nussinov
- Computational
Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States
- Department
of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Lei Xie
- Ph.D.
Program in Computer Science, The Graduate Center, The City University of New York, New York, New York 10016, United States
- Ph.D.
Program in Biochemistry, The Graduate Center, The City University of New York, New York, New York 10016, United States
- Department
of Computer Science, Hunter College, The
City University of New York, New York, New York 10065, United States
- Ph.D.
Program in Biology, The Graduate Center, The City University of New York, New York, New York 10016, United States
- Helen
and Robert Appel Alzheimer’s Disease Research Institute, Feil
Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, New York 10021, United States
| |
Collapse
|
186
|
Sajadi SZ, Zare Chahooki MA, Gharaghani S, Abbasi K. AutoDTI++: deep unsupervised learning for DTI prediction by autoencoders. BMC Bioinformatics 2021; 22:204. [PMID: 33879050 PMCID: PMC8056558 DOI: 10.1186/s12859-021-04127-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 04/09/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Drug-target interaction (DTI) plays a vital role in drug discovery. Identifying drug-target interactions related to wet-lab experiments are costly, laborious, and time-consuming. Therefore, computational methods to predict drug-target interactions are an essential task in the drug discovery process. Meanwhile, computational methods can reduce search space by proposing potential drugs already validated on wet-lab experiments. Recently, deep learning-based methods in drug-target interaction prediction have gotten more attention. Traditionally, DTI prediction methods' performance heavily depends on additional information, such as protein sequence and molecular structure of the drug, as well as deep supervised learning. RESULTS This paper proposes a method based on deep unsupervised learning for drug-target interaction prediction called AutoDTI++. The proposed method includes three steps. The first step is to pre-process the interaction matrix. Since the interaction matrix is sparse, we solved the sparsity of the interaction matrix with drug fingerprints. Then, in the second step, the AutoDTI approach is introduced. In the third step, we post-preprocess the output of the AutoDTI model. CONCLUSIONS Experimental results have shown that we were able to improve the prediction performance. To this end, the proposed method has been compared to other algorithms using the same reference datasets. The proposed method indicates that the experimental results of running five repetitions of tenfold cross-validation on golden standard datasets (Nuclear Receptors, GPCRs, Ion channels, and Enzymes) achieve good performance with high accuracy.
Collapse
Affiliation(s)
| | | | - Sajjad Gharaghani
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Karim Abbasi
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| |
Collapse
|
187
|
Zeng Y, Chen X, Luo Y, Li X, Peng D. Deep drug-target binding affinity prediction with multiple attention blocks. Brief Bioinform 2021; 22:6231754. [PMID: 33866349 PMCID: PMC8083346 DOI: 10.1093/bib/bbab117] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 02/12/2021] [Accepted: 03/13/2021] [Indexed: 11/23/2022] Open
Abstract
Drug-target interaction (DTI) prediction has drawn increasing interest due to its substantial position in the drug discovery process. Many studies have introduced computational models to treat DTI prediction as a regression task, which directly predict the binding affinity of drug-target pairs. However, existing studies (i) ignore the essential correlations between atoms when encoding drug compounds and (ii) model the interaction of drug-target pairs simply by concatenation. Based on those observations, in this study, we propose an end-to-end model with multiple attention blocks to predict the binding affinity scores of drug-target pairs. Our proposed model offers the abilities to (i) encode the correlations between atoms by a relation-aware self-attention block and (ii) model the interaction of drug representations and target representations by the multi-head attention block. Experimental results of DTI prediction on two benchmark datasets show our approach outperforms existing methods, which are benefit from the correlation information encoded by the relation-aware self-attention block and the interaction information extracted by the multi-head attention block. Moreover, we conduct the experiments on the effects of max relative position length and find out the best max relative position length value \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$k \in \{3, 5\}$\end{document}. Furthermore, we apply our model to predict the binding affinity of Corona Virus Disease 2019 (COVID-19)-related genome sequences and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$3137$\end{document} FDA-approved drugs.
Collapse
Affiliation(s)
- Yuni Zeng
- College of Computer Science, Sichuan University, Chengdu, Sichuan,610065, China
| | - Xiangru Chen
- College of Computer Science, Sichuan University, Chengdu, Sichuan,610065, China
| | - Yujie Luo
- Shenzhen Peng Cheng Laboratory, Shenzhen, 518052, China
| | - Xuedong Li
- Chengdu Sobey Digital Technology Co., Ltd, Chengdu, 610041,China
| | - Dezhong Peng
- College of Computer Science, Sichuan University, Chengdu, Sichuan,610065, China
| |
Collapse
|
188
|
Yang S, Zhu F, Ling X, Liu Q, Zhao P. Intelligent Health Care: Applications of Deep Learning in Computational Medicine. Front Genet 2021; 12:607471. [PMID: 33912213 PMCID: PMC8075004 DOI: 10.3389/fgene.2021.607471] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 03/05/2021] [Indexed: 12/24/2022] Open
Abstract
With the progress of medical technology, biomedical field ushered in the era of big data, based on which and driven by artificial intelligence technology, computational medicine has emerged. People need to extract the effective information contained in these big biomedical data to promote the development of precision medicine. Traditionally, the machine learning methods are used to dig out biomedical data to find the features from data, which generally rely on feature engineering and domain knowledge of experts, requiring tremendous time and human resources. Different from traditional approaches, deep learning, as a cutting-edge machine learning branch, can automatically learn complex and robust feature from raw data without the need for feature engineering. The applications of deep learning in medical image, electronic health record, genomics, and drug development are studied, where the suggestion is that deep learning has obvious advantage in making full use of biomedical data and improving medical health level. Deep learning plays an increasingly important role in the field of medical health and has a broad prospect of application. However, the problems and challenges of deep learning in computational medical health still exist, including insufficient data, interpretability, data privacy, and heterogeneity. Analysis and discussion on these problems provide a reference to improve the application of deep learning in medical health.
Collapse
Affiliation(s)
- Sijie Yang
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Fei Zhu
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Xinghong Ling
- School of Computer Science and Technology, Soochow University, Suzhou, China
- WenZheng College of Soochow University, Suzhou, China
| | - Quan Liu
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Peiyao Zhao
- School of Computer Science and Technology, Soochow University, Suzhou, China
| |
Collapse
|
189
|
Gupta P, Mohanty D. SMMPPI: a machine learning-based approach for prediction of modulators of protein-protein interactions and its application for identification of novel inhibitors for RBD:hACE2 interactions in SARS-CoV-2. Brief Bioinform 2021; 22:6220172. [PMID: 33839740 PMCID: PMC8083326 DOI: 10.1093/bib/bbab111] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 02/18/2021] [Accepted: 03/12/2021] [Indexed: 11/30/2022] Open
Abstract
Small molecule modulators of protein–protein interactions (PPIs) are being pursued as novel anticancer, antiviral and antimicrobial drug candidates. We have utilized a large data set of experimentally validated PPI modulators and developed machine learning classifiers for prediction of new small molecule modulators of PPI. Our analysis reveals that using random forest (RF) classifier, general PPI Modulators independent of PPI family can be predicted with ROC-AUC higher than 0.9, when training and test sets are generated by random split. The performance of the classifier on data sets very different from those used in training has also been estimated by using different state of the art protocols for removing various types of bias in division of data into training and test sets. The family-specific PPIM predictors developed in this work for 11 clinically important PPI families also have prediction accuracies of above 90% in majority of the cases. All these ML-based predictors have been implemented in a freely available software named SMMPPI for prediction of small molecule modulators for clinically relevant PPIs like RBD:hACE2, Bromodomain_Histone, BCL2-Like_BAX/BAK, LEDGF_IN, LFA_ICAM, MDM2-Like_P53, RAS_SOS1, XIAP_Smac, WDR5_MLL1, KEAP1_NRF2 and CD4_gp120. We have identified novel chemical scaffolds as inhibitors for RBD_hACE PPI involved in host cell entry of SARS-CoV-2. Docking studies for some of the compounds reveal that they can inhibit RBD_hACE2 interaction by high affinity binding to interaction hotspots on RBD. Some of these new scaffolds have also been found in SARS-CoV-2 viral growth inhibitors reported recently; however, it is not known if these molecules inhibit the entry phase.
Collapse
Affiliation(s)
| | - Debasisa Mohanty
- Bioinformatics & Computational Biology research group at NII, New Delhi 110067, India
| |
Collapse
|
190
|
Liu Z, Chen Q, Lan W, Pan H, Hao X, Pan S. GADTI: Graph Autoencoder Approach for DTI Prediction From Heterogeneous Network. Front Genet 2021; 12:650821. [PMID: 33912218 PMCID: PMC8072283 DOI: 10.3389/fgene.2021.650821] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 03/12/2021] [Indexed: 12/26/2022] Open
Abstract
Identifying drug–target interaction (DTI) is the basis for drug development. However, the method of using biochemical experiments to discover drug-target interactions has low coverage and high costs. Many computational methods have been developed to predict potential drug-target interactions based on known drug-target interactions, but the accuracy of these methods still needs to be improved. In this article, a graph autoencoder approach for DTI prediction (GADTI) was proposed to discover potential interactions between drugs and targets using a heterogeneous network, which integrates diverse drug-related and target-related datasets. Its encoder consists of two components: a graph convolutional network (GCN) and a random walk with restart (RWR). And the decoder is DistMult, a matrix factorization model, using embedding vectors from encoder to discover potential DTIs. The combination of GCN and RWR can provide nodes with more information through a larger neighborhood, and it can also avoid over-smoothing and computational complexity caused by multi-layer message passing. Based on the 10-fold cross-validation, we conduct three experiments in different scenarios. The results show that GADTI is superior to the baseline methods in both the area under the receiver operator characteristic curve and the area under the precision–recall curve. In addition, based on the latest Drugbank dataset (V5.1.8), the case study shows that 54.8% of new approved DTIs are predicted by GADTI.
Collapse
Affiliation(s)
- Zhixian Liu
- School of Medical, Guangxi University, Nanning, China.,School of Electronics and Information Engineering, Beibu Gulf University, Qinzhou, China
| | - Qingfeng Chen
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Wei Lan
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Haiming Pan
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Xinkun Hao
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Shirui Pan
- Department of Data Science and AI, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
191
|
Zhi HY, Zhao L, Lee CC, Chen CYC. A Novel Graph Neural Network Methodology to Investigate Dihydroorotate Dehydrogenase Inhibitors in Small Cell Lung Cancer. Biomolecules 2021; 11:biom11030477. [PMID: 33806898 PMCID: PMC8005042 DOI: 10.3390/biom11030477] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 02/26/2021] [Accepted: 03/16/2021] [Indexed: 12/17/2022] Open
Abstract
Small cell lung cancer (SCLC) is a particularly aggressive tumor subtype, and dihydroorotate dehydrogenase (DHODH) has been demonstrated to be a therapeutic target for SCLC. Network pharmacology analysis and virtual screening were utilized to find out related proteins and investigate candidates with high docking capacity to multiple targets. Graph neural networks (GNNs) and machine learning were used to build reliable predicted models. We proposed a novel concept of multi-GNNs, and then built three multi-GNN models called GIAN, GIAT, and SGCA, which achieved satisfactory results in our dataset containing 532 molecules with all R^2 values greater than 0.92 on the training set and higher than 0.8 on the test set. Compared with machine learning algorithms, random forest (RF), and support vector regression (SVR), multi-GNNs had a better modeling effect and higher precision. Furthermore, the long-time 300 ns molecular dynamics simulation verified the stability of the protein–ligand complexes. The result showed that ZINC8577218, ZINC95618747, and ZINC4261765 might be the potentially potent inhibitors for DHODH. Multi-GNNs show great performance in practice, making them a promising field for future research. We therefore suggest that this novel concept of multi-GNNs is a promising protocol for drug discovery.
Collapse
Affiliation(s)
- Hong-Yi Zhi
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510275, China; (H.-Y.Z.); (L.Z.)
| | - Lu Zhao
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510275, China; (H.-Y.Z.); (L.Z.)
- Department of Clinical Laboratory, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou 510655, China
| | - Cheng-Chun Lee
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan;
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510275, China; (H.-Y.Z.); (L.Z.)
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan;
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
- Correspondence:
| |
Collapse
|
192
|
Trapotsi MA, Mervin LH, Afzal AM, Sturm N, Engkvist O, Barrett IP, Bender A. Comparison of Chemical Structure and Cell Morphology Information for Multitask Bioactivity Predictions. J Chem Inf Model 2021; 61:1444-1456. [PMID: 33661004 DOI: 10.1021/acs.jcim.0c00864] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The understanding of the mechanism-of-action (MoA) of compounds and the prediction of potential drug targets play an important role in small-molecule drug discovery. The aim of this work was to compare chemical and cell morphology information for bioactivity prediction. The comparison was performed using bioactivity data from the ExCAPE database, image data (in the form of CellProfiler features) from the Cell Painting data set (the largest publicly available data set of cell images with ∼30,000 compound perturbations), and extended connectivity fingerprints (ECFPs) using the multitask Bayesian matrix factorization (BMF) approach Macau. We found that the BMF Macau and random forest (RF) performance were overall similar when ECFPs were used as compound descriptors. However, BMF Macau outperformed RF in 159 out of 224 targets (71%) when image data were used as compound information. Using BMF Macau, 100 (corresponding to about 45%) and 90 (about 40%) of the 224 targets were predicted with high predictive performance (AUC > 0.8) with ECFP data and image data as side information, respectively. There were targets better predicted by image data as side information, such as β-catenin, and others better predicted by fingerprint-based side information, such as proteins belonging to the G-protein-Coupled Receptor 1 family, which could be rationalized from the underlying data distributions in each descriptor domain. In conclusion, both cell morphology changes and chemical structure information contain information about compound bioactivity, which is also partially complementary, and can hence contribute to in silico MoA analysis.
Collapse
Affiliation(s)
- Maria-Anna Trapotsi
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Lewis H Mervin
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Cambridge CB4 0WG, U.K
| | - Avid M Afzal
- Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB4 0WG, U.K
| | - Noé Sturm
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg SE-43183, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg SE-43183, Sweden
| | - Ian P Barrett
- Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge CB4 0WG, U.K
| | - Andreas Bender
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| |
Collapse
|
193
|
Shim J, Hong ZY, Sohn I, Hwang C. Prediction of drug-target binding affinity using similarity-based convolutional neural network. Sci Rep 2021; 11:4416. [PMID: 33627791 PMCID: PMC7904939 DOI: 10.1038/s41598-021-83679-y] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 01/18/2021] [Indexed: 12/02/2022] Open
Abstract
Identifying novel drug–target interactions (DTIs) plays an important role in drug discovery. Most of the computational methods developed for predicting DTIs use binary classification, whose goal is to determine whether or not a drug–target (DT) pair interacts. However, it is more meaningful but also more challenging to predict the binding affinity that describes the strength of the interaction between a DT pair. If the binding affinity is not sufficiently large, such drug may not be useful. Therefore, the methods for predicting DT binding affinities are very valuable. The increase in novel public affinity data available in the DT-related databases enables advanced deep learning techniques to be used to predict binding affinities. In this paper, we propose a similarity-based model that applies 2-dimensional (2D) convolutional neural network (CNN) to the outer products between column vectors of two similarity matrices for the drugs and targets to predict DT binding affinities. To our best knowledge, this is the first application of 2D CNN in similarity-based DT binding affinity prediction. The validation results on multiple public datasets show that the proposed model is an effective approach for DT binding affinity prediction and can be quite helpful in drug development process.
Collapse
Affiliation(s)
- Jooyong Shim
- Department of Statistics, Institute of Statistical Information, Inje University, Gimhae, Gyeongsangnamdo, South Korea
| | | | | | - Changha Hwang
- Department of Applied Statistics, Dankook University, Yongin, Gyeonggido, 16890, South Korea.
| |
Collapse
|
194
|
Abbasi K, Razzaghi P, Poso A, Amanlou M, Ghasemi JB, Masoudi-Nejad A. DeepCDA: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks. Bioinformatics 2021; 36:4633-4642. [PMID: 32462178 DOI: 10.1093/bioinformatics/btaa544] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 04/29/2020] [Accepted: 05/22/2020] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION An essential part of drug discovery is the accurate prediction of the binding affinity of new compound-protein pairs. Most of the standard computational methods assume that compounds or proteins of the test data are observed during the training phase. However, in real-world situations, the test and training data are sampled from different domains with different distributions. To cope with this challenge, we propose a deep learning-based approach that consists of three steps. In the first step, the training encoder network learns a novel representation of compounds and proteins. To this end, we combine convolutional layers and long-short-term memory layers so that the occurrence patterns of local substructures through a protein and a compound sequence are learned. Also, to encode the interaction strength of the protein and compound substructures, we propose a two-sided attention mechanism. In the second phase, to deal with the different distributions of the training and test domains, a feature encoder network is learned for the test domain by utilizing an adversarial domain adaptation approach. In the third phase, the learned test encoder network is applied to new compound-protein pairs to predict their binding affinity. RESULTS To evaluate the proposed approach, we applied it to KIBA, Davis and BindingDB datasets. The results show that the proposed method learns a more reliable model for the test domain in more challenging situations. AVAILABILITY AND IMPLEMENTATION https://github.com/LBBSoft/DeepCDA.
Collapse
Affiliation(s)
- Karim Abbasi
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran
| | - Parvin Razzaghi
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan 4513766731, Iran
| | - Antti Poso
- School of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, Kuopio 80100, Finland
| | - Massoud Amanlou
- Department of Medicinal Chemistry, Drug Design and Development Research Center, Tehran University of Medical Sciences, Tehran 1416753955, Iran
| | - Jahan B Ghasemi
- Chemistry Department, Faculty of Sciences, University of Tehran, Tehran 1417614418, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran
| |
Collapse
|
195
|
Wang C, Kurgan L. Survey of Similarity-Based Prediction of Drug-Protein Interactions. Curr Med Chem 2021; 27:5856-5886. [PMID: 31393241 DOI: 10.2174/0929867326666190808154841] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Revised: 04/16/2018] [Accepted: 10/23/2018] [Indexed: 12/20/2022]
Abstract
Therapeutic activity of a significant majority of drugs is determined by their interactions with proteins. Databases of drug-protein interactions (DPIs) primarily focus on the therapeutic protein targets while the knowledge of the off-targets is fragmented and partial. One way to bridge this knowledge gap is to employ computational methods to predict protein targets for a given drug molecule, or interacting drugs for given protein targets. We survey a comprehensive set of 35 methods that were published in high-impact venues and that predict DPIs based on similarity between drugs and similarity between protein targets. We analyze the internal databases of known PDIs that these methods utilize to compute similarities, and investigate how they are linked to the 12 publicly available source databases. We discuss contents, impact and relationships between these internal and source databases, and well as the timeline of their releases and publications. The 35 predictors exploit and often combine three types of similarities that consider drug structures, drug profiles, and target sequences. We review the predictive architectures of these methods, their impact, and we explain how their internal DPIs databases are linked to the source databases. We also include a detailed timeline of the development of these predictors and discuss the underlying limitations of the current resources and predictive tools. Finally, we provide several recommendations concerning the future development of the related databases and methods.
Collapse
Affiliation(s)
- Chen Wang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| |
Collapse
|
196
|
Kim H, Kim E, Lee I, Bae B, Park M, Nam H. Artificial Intelligence in Drug Discovery: A Comprehensive Review of Data-driven and Machine Learning Approaches. BIOTECHNOL BIOPROC E 2021; 25:895-930. [PMID: 33437151 PMCID: PMC7790479 DOI: 10.1007/s12257-020-0049-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 05/27/2020] [Accepted: 06/03/2020] [Indexed: 02/07/2023]
Abstract
As expenditure on drug development increases exponentially, the overall drug discovery process requires a sustainable revolution. Since artificial intelligence (AI) is leading the fourth industrial revolution, AI can be considered as a viable solution for unstable drug research and development. Generally, AI is applied to fields with sufficient data such as computer vision and natural language processing, but there are many efforts to revolutionize the existing drug discovery process by applying AI. This review provides a comprehensive, organized summary of the recent research trends in AI-guided drug discovery process including target identification, hit identification, ADMET prediction, lead optimization, and drug repositioning. The main data sources in each field are also summarized in this review. In addition, an in-depth analysis of the remaining challenges and limitations will be provided, and proposals for promising future directions in each of the aforementioned areas.
Collapse
Affiliation(s)
- Hyunho Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Eunyoung Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Minsu Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| |
Collapse
|
197
|
Hudson IL. Data Integration Using Advances in Machine Learning in Drug Discovery and Molecular Biology. Methods Mol Biol 2021; 2190:167-184. [PMID: 32804365 DOI: 10.1007/978-1-0716-0826-5_7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
While the term artificial intelligence and the concept of deep learning are not new, recent advances in high-performance computing, the availability of large annotated data sets required for training, and novel frameworks for implementing deep neural networks have led to an unprecedented acceleration of the field of molecular (network) biology and pharmacogenomics. The need to align biological data to innovative machine learning has stimulated developments in both data integration (fusion) and knowledge representation, in the form of heterogeneous, multiplex, and biological networks or graphs. In this chapter we briefly introduce several popular neural network architectures used in deep learning, namely, the fully connected deep neural network, recurrent neural network, convolutional neural network, and the autoencoder. Deep learning predictors, classifiers, and generators utilized in modern feature extraction may well assist interpretability and thus imbue AI tools with increased explication, potentially adding insights and advancements in novel chemistry and biology discovery.The capability of learning representations from structures directly without using any predefined structure descriptor is an important feature distinguishing deep learning from other machine learning methods and makes the traditional feature selection and reduction procedures unnecessary. In this chapter we briefly show how these technologies are applied for data integration (fusion) and analysis in drug discovery research covering these areas: (1) application of convolutional neural networks to predict ligand-protein interactions; (2) application of deep learning in compound property and activity prediction; (3) de novo design through deep learning. We also: (1) discuss some aspects of future development of deep learning in drug discovery/chemistry; (2) provide references to published information; (3) provide recently advocated recommendations on using artificial intelligence and deep learning in -omics research and drug discovery.
Collapse
Affiliation(s)
- Irene Lena Hudson
- Mathematical Sciences, School of Science, RMIT University, Melbourne, VIC, Australia.
| |
Collapse
|
198
|
Patra J, Singh D, Jain S, Mahindroo N. Application of Docking for Lead Optimization. MOLECULAR DOCKING FOR COMPUTER-AIDED DRUG DESIGN 2021:271-294. [DOI: 10.1016/b978-0-12-822312-3.00012-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2025]
|
199
|
Chang S, Chen JY, Chuang YJ, Chen BS. Systems Approach to Pathogenic Mechanism of Type 2 Diabetes and Drug Discovery Design Based on Deep Learning and Drug Design Specifications. Int J Mol Sci 2020; 22:ijms22010166. [PMID: 33375269 PMCID: PMC7795239 DOI: 10.3390/ijms22010166] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 12/21/2020] [Accepted: 12/21/2020] [Indexed: 12/16/2022] Open
Abstract
In this study, we proposed a systems biology approach to investigate the pathogenic mechanism for identifying significant biomarkers as drug targets and a systematic drug discovery strategy to design a potential multiple-molecule targeting drug for type 2 diabetes (T2D) treatment. We first integrated databases to construct the genome-wide genetic and epigenetic networks (GWGENs), which consist of protein–protein interaction networks (PPINs) and gene regulatory networks (GRNs) for T2D and non-T2D (health), respectively. Second, the relevant “real GWGENs” are identified by system identification and system order detection methods performed on the T2D and non-T2D RNA-seq data. To simplify network analysis, principal network projection (PNP) was thereby exploited to extract core GWGENs from real GWGENs. Then, with the help of KEGG pathway annotation, core signaling pathways were constructed to identify significant biomarkers. Furthermore, in order to discover potential drugs for the selected pathogenic biomarkers (i.e., drug targets) from the core signaling pathways, not only did we train a deep neural network (DNN)-based drug–target interaction (DTI) model to predict candidate drug’s binding with the identified biomarkers but also considered a set of design specifications, including drug regulation ability, toxicity, sensitivity, and side effects to sieve out promising drugs suitable for T2D.
Collapse
Affiliation(s)
- Shen Chang
- Laboratory of Automatic Control, Signal Processing and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan; (S.C.); (J.-Y.C.)
| | - Jian-You Chen
- Laboratory of Automatic Control, Signal Processing and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan; (S.C.); (J.-Y.C.)
| | - Yung-Jen Chuang
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu 30013, Taiwan;
| | - Bor-Sen Chen
- Laboratory of Automatic Control, Signal Processing and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan; (S.C.); (J.-Y.C.)
- Correspondence:
| |
Collapse
|
200
|
Wang MWH, Goodman JM, Allen TEH. Machine Learning in Predictive Toxicology: Recent Applications and Future Directions for Classification Models. Chem Res Toxicol 2020; 34:217-239. [PMID: 33356168 DOI: 10.1021/acs.chemrestox.0c00316] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
In recent times, machine learning has become increasingly prominent in predictive toxicology as it has shifted from in vivo studies toward in silico studies. Currently, in vitro methods together with other computational methods such as quantitative structure-activity relationship modeling and absorption, distribution, metabolism, and excretion calculations are being used. An overview of machine learning and its applications in predictive toxicology is presented here, including support vector machines (SVMs), random forest (RF) and decision trees (DTs), neural networks, regression models, naïve Bayes, k-nearest neighbors, and ensemble learning. The recent successes of these machine learning methods in predictive toxicology are summarized, and a comparison of some models used in predictive toxicology is presented. In predictive toxicology, SVMs, RF, and DTs are the dominant machine learning methods due to the characteristics of the data available. Lastly, this review describes the current challenges facing the use of machine learning in predictive toxicology and offers insights into the possible areas of improvement in the field.
Collapse
Affiliation(s)
- Marcus W H Wang
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Jonathan M Goodman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Timothy E H Allen
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom.,MRC Toxicology Unit, University of Cambridge, Hodgkin Building, Lancaster Road, Leicester LE1 7HB, United Kingdom
| |
Collapse
|