1
|
Choi D, Park S. Improving binding affinity prediction by emphasizing local features of drug and protein. Comput Biol Chem 2024; 115:108310. [PMID: 39674048 DOI: 10.1016/j.compbiolchem.2024.108310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 10/10/2024] [Accepted: 12/04/2024] [Indexed: 12/16/2024]
Abstract
Binding affinity prediction has been considered as a fundamental task in drug discovery. Despite much effort to improve accuracy of binding affinity prediction, the prior work considered only macro-level features that can represent the characteristics of the whole architecture of a drug and a target protein, and the features from local structure of the drug and the protein tend to be lost. In this paper, we propose a deep learning model that can comprehensively extract the local features of both a drug and a target protein for accurate binding affinity prediction. The proposed model consists of two components named as Multi-Stream CNN and Multi-Stream GCN, each of which is responsible for capturing micro-level characteristics or local features from subsequences of a target protein sequence and subgraph of a drug molecule, respectively. Having multiple streams consisting of different numbers of layers, both the components can compute and preserve the local features with a stream consisting of a single layer. Our evaluation with two popular datasets, Davis and KIBA, demonstrates that the proposed model outperforms all the baseline models using the global features, implying that local features play significant roles of binding affinity prediction.
Collapse
Affiliation(s)
- Daejin Choi
- Department of Computer Science and Engineering, Incheon National University, Incheon, Republic of Korea.
| | - Sangjun Park
- Department of Artificial Intelligence, Korea University, Seoul, Republic of Korea.
| |
Collapse
|
2
|
Matboli M, Al-Amodi HS, Khaled A, Khaled R, Ali M, Kamel HFM, Hamid MSAEL, ELsawi HA, Habib EK, Youssef I. Integrating molecular, biochemical, and immunohistochemical features as predictors of hepatocellular carcinoma drug response using machine-learning algorithms. Front Mol Biosci 2024; 11:1430794. [PMID: 39479501 PMCID: PMC11521808 DOI: 10.3389/fmolb.2024.1430794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 09/27/2024] [Indexed: 11/02/2024] Open
Abstract
Introduction Liver cancer, particularly Hepatocellular carcinoma (HCC), remains a significant global health concern due to its high prevalence and heterogeneous nature. Despite the existence of approved drugs for HCC treatment, the scarcity of predictive biomarkers limits their effective utilization. Integrating diverse data types to revolutionize drug response prediction, ultimately enabling personalized HCC management. Method In this study, we developed multiple supervised machine learning models to predict treatment response. These models utilized classifiers such as logistic regression (LR), k-nearest neighbors (kNN), neural networks (NN), support vector machines (SVM), and random forests (RF) using a comprehensive set of molecular, biochemical, and immunohistochemical features as targets of three drugs: Pantoprazole, Cyanidin 3-glycoside (Cyan), and Hesperidin. A set of performance metrics for the complete and reduced models were reported including accuracy, precision, recall (sensitivity), specificity, and the Matthews Correlation Coefficient (MCC). Results and Discussion Notably, (NN) achieved the best prediction accuracy where the combined model using molecular and biochemical features exhibited exceptional predictive power, achieving solid accuracy of 0.9693 ∓ 0.0105 and average area under the ROC curve (AUC) of 0.94 ∓ 0.06 coming from three cross-validation iterations. Also, found seven molecular features, seven biochemical features, and one immunohistochemistry feature as promising biomarkers of treatment response. This comprehensive method has the potential to significantly advance personalized HCC therapy by allowing for more precise drug response estimation and assisting in the identification of effective treatment strategies.
Collapse
Affiliation(s)
- Marwa Matboli
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
- Faculty of Oral and Dental Medicine, Misr International University (MIU), Cairo, Egypt
| | - Hiba S. Al-Amodi
- Biochemistry Department, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Abdelrahman Khaled
- Bioinformatics Group, Center of Informatics Sciences (CIS), School of Information Technology and Computer Sciences, Nile University, Giza, Egypt
| | - Radwa Khaled
- Biotechnology/Biomolecular Chemistry Department, Faculty of Science, Cairo University, Giza, Egypt
| | - Marwa Ali
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Hala F. M. Kamel
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
- Biochemistry Department, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
| | | | - Hind A. ELsawi
- Department of Internal Medicine, Badr University in Cairo, Badr, Egypt
| | - Eman K. Habib
- Department of Anatomy and Cell Biology, Faculty of Medicine, Ain Shams University, Cairo, Egypt
- Department of Anatomy and Cell Biology, Faculty of Medicine, Galala University, Suez, Egypt
| | - Ibrahim Youssef
- Systems and Biomedical Engineering Department, Faculty of Engineering, Cairo University, Giza, Egypt
| |
Collapse
|
3
|
Huang Z, Fan Z, Shen S, Wu M, Deng L. MolMVC: Enhancing molecular representations for drug-related tasks through multi-view contrastive learning. Bioinformatics 2024; 40:ii190-ii197. [PMID: 39230706 PMCID: PMC11373324 DOI: 10.1093/bioinformatics/btae386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
MOTIVATION Effective molecular representation is critical in drug development. The complex nature of molecules demands comprehensive multi-view representations, considering 1D, 2D, and 3D aspects, to capture diverse perspectives. Obtaining representations that encompass these varied structures is crucial for a holistic understanding of molecules in drug-related contexts. RESULTS In this study, we introduce an innovative multi-view contrastive learning framework for molecular representation, denoted as MolMVC. Initially, we use a Transformer encoder to capture 1D sequence information and a Graph Transformer to encode the intricate 2D and 3D structural details of molecules. Our approach incorporates a novel attention-guided augmentation scheme, leveraging prior knowledge to create positive samples tailored to different molecular data views. To align multi-view molecular positive samples effectively in latent space, we introduce an adaptive multi-view contrastive loss (AMCLoss). In particular, we calculate AMCLoss at various levels within the model to effectively capture the hierarchical nature of the molecular information. Eventually, we pre-train the encoders via minimizing AMCLoss to obtain the molecular representation, which can be used for various down-stream tasks. In our experiments, we evaluate the performance of our MolMVC on multiple tasks, including molecular property prediction (MPP), drug-target binding affinity (DTA) prediction and cancer drug response (CDR) prediction. The results demonstrate that the molecular representation learned by our MolMVC can enhance the predictive accuracy on these tasks and also reduce the computational costs. Furthermore, we showcase MolMVC's efficacy in drug repositioning across a spectrum of drug-related applications. AVAILABILITY AND IMPLEMENTATION The code and pre-trained model are publicly available at https://github.com/Hhhzj-7/MolMVC.
Collapse
Affiliation(s)
- Zhijian Huang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ziyu Fan
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Siyuan Shen
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Wu
- Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
4
|
Theisen R, Wang T, Ravikumar B, Rahman R, Cichońska A. Leveraging multiple data types for improved compound-kinase bioactivity prediction. Nat Commun 2024; 15:7596. [PMID: 39217147 PMCID: PMC11365929 DOI: 10.1038/s41467-024-52055-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 08/21/2024] [Indexed: 09/04/2024] Open
Abstract
Machine learning provides efficient ways to map compound-kinase interactions. However, diverse bioactivity data types, including single-dose and multi-dose-response assay results, present challenges. Traditional models utilize only multi-dose data, overlooking information contained in single-dose measurements. Here, we propose a machine learning methodology for compound-kinase activity prediction that leverages both single-dose and dose-response data. We demonstrate that our two-stage approach yields accurate activity predictions and significantly improves model performance compared to training solely on dose-response labels. This superior performance is consistent across five diverse machine learning methods. Using the best performing model, we carried out extensive experimental profiling on a total of 347 selected compound-kinase pairs, achieving a high hit rate of 40% and a negative predictive value of 78%. We show that these rates can be improved further by incorporating model uncertainty estimates into the compound selection process. By integrating multiple activity data types, we demonstrate that our approach holds promise for facilitating the development of training activity datasets in a more efficient and cost-effective way.
Collapse
Affiliation(s)
- Ryan Theisen
- Harmonic Discovery Inc., New York City, NY, USA.
| | | | | | | | | |
Collapse
|
5
|
Wu H, Liu J, Zhang R, Lu Y, Cui G, Cui Z, Ding Y. A review of deep learning methods for ligand based drug virtual screening. FUNDAMENTAL RESEARCH 2024; 4:715-737. [PMID: 39156568 PMCID: PMC11330120 DOI: 10.1016/j.fmre.2024.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/10/2024] [Accepted: 02/18/2024] [Indexed: 08/20/2024] Open
Abstract
Drug discovery is costly and time consuming, and modern drug discovery endeavors are progressively reliant on computational methodologies, aiming to mitigate temporal and financial expenditures associated with the process. In particular, the time required for vaccine and drug discovery is prolonged during emergency situations such as the coronavirus 2019 pandemic. Recently, the performance of deep learning methods in drug virtual screening has been particularly prominent. It has become a concern for researchers how to summarize the existing deep learning in drug virtual screening, select different models for different drug screening problems, exploit the advantages of deep learning models, and further improve the capability of deep learning in drug virtual screening. This review first introduces the basic concepts of drug virtual screening, common datasets, and data representation methods. Then, large numbers of common deep learning methods for drug virtual screening are compared and analyzed. In addition, a dataset of different sizes is constructed independently to evaluate the performance of each deep learning model for the difficult problem of large-scale ligand virtual screening. Finally, the existing challenges and future directions in the field of virtual screening are presented.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Runhua Zhang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yaoyao Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Guozeng Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Zhiming Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| |
Collapse
|
6
|
Liu Y, Xing L, Zhang L, Cai H, Guo M. GEFormerDTA: drug target affinity prediction based on transformer graph for early fusion. Sci Rep 2024; 14:7416. [PMID: 38548825 PMCID: PMC10979032 DOI: 10.1038/s41598-024-57879-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 03/22/2024] [Indexed: 04/01/2024] Open
Abstract
Predicting the interaction affinity between drugs and target proteins is crucial for rapid and accurate drug discovery and repositioning. Therefore, more accurate prediction of DTA has become a key area of research in the field of drug discovery and drug repositioning. However, traditional experimental methods have disadvantages such as long operation cycles, high manpower requirements, and high economic costs, making it difficult to predict specific interactions between drugs and target proteins quickly and accurately. Some methods mainly use the SMILES sequence of drugs and the primary structure of proteins as inputs, ignoring the graph information such as bond encoding, degree centrality encoding, spatial encoding of drug molecule graphs, and the structural information of proteins such as secondary structure and accessible surface area. Moreover, previous methods were based on protein sequences to learn feature representations, neglecting the completeness of information. To address the completeness of drug and protein structure information, we propose a Transformer graph-based early fusion research approach for drug-target affinity prediction (GEFormerDTA). Our method reduces prediction errors caused by insufficient feature learning. Experimental results on Davis and KIBA datasets showed a better prediction of drugtarget affinity than existing affinity prediction methods.
Collapse
Affiliation(s)
- Youzhi Liu
- Department of Computer Science and Technology, Shandong University of Technology, Zibo, 255000, China
| | - Linlin Xing
- Department of Computer Science and Technology, Shandong University of Technology, Zibo, 255000, China.
| | - Longbo Zhang
- Department of Computer Science and Technology, Shandong University of Technology, Zibo, 255000, China
| | - Hongzhen Cai
- Department of Agricultural Engineering and Food Science, Shandong University of Technology, Zibo, 255000, China
| | - Maozu Guo
- Department of Electrical and Information Engineering, Beijing University of Architecture, Beijing, 102616, China
| |
Collapse
|
7
|
Gao M, Jiang S, Ding W, Xu T, Lyu Z. Learning long- and short-term dependencies for improving drug-target binding affinity prediction using transformer and edge contraction pooling. J Bioinform Comput Biol 2024; 22:2350030. [PMID: 38567388 DOI: 10.1142/s0219720023500300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The accurate identification of drug-target affinity (DTA) is crucial for advancements in drug discovery and development. Many deep learning-based approaches have been devised to predict drug-target binding affinity accurately, exhibiting notable improvements in performance. However, the existing prediction methods often fall short of capturing the global features of proteins. In this study, we proposed a novel model called ETransDTA, specifically designed for predicting drug-target binding affinity. ETransDTA combines convolutional layers and transformer, allowing for the simultaneous extraction of both global and local features of target proteins. Additionally, we have integrated a new graph pooling mechanism into the topology adaptive graph convolutional network (TAGCN) to enhance its capacity for learning feature representations of chemical compounds. The proposed ETransDTA model has been evaluated using the Davis and Kinase Inhibitor BioActivity (KIBA) datasets, consistently outperforming other baseline methods. The evaluation results on the KIBA dataset reveal that our model achieves the lowest mean square error (MSE) of 0.125, representing a 0.6% reduction compared to the lowest-performing baseline method. Furthermore, the incorporation of queries, keys and values produced by the stacked convolutional neural network (CNN) enables our model to better integrate the local and global context of protein representation, leading to further improvements in the accuracy of DTA prediction.
Collapse
Affiliation(s)
- Min Gao
- College of Information Science and Engineering, Hunan Normal University, Changsha, P. R. China
| | - Shaohua Jiang
- College of Information Science and Engineering, Hunan Normal University, Changsha, P. R. China
| | - Weibin Ding
- College of Information Science and Engineering, Hunan Normal University, Changsha, P. R. China
| | - Ting Xu
- College of Information Science and Engineering, Hunan Normal University, Changsha, P. R. China
| | - Zhijian Lyu
- College of Information Science and Engineering, Hunan Normal University, Changsha, P. R. China
| |
Collapse
|
8
|
Wu Z, Wu Y, Zhu C, Wu X, Zhai S, Wang X, Su Z, Duan H. Efficient Computational Framework for Target-Specific Active Peptide Discovery: A Case Study on IL-17C Targeting Cyclic Peptides. J Chem Inf Model 2023; 63:7655-7668. [PMID: 38049371 DOI: 10.1021/acs.jcim.3c01385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/06/2023]
Abstract
The development of potentially active peptides for specific targets is critical for the modern pharmaceutical industry's growth. In this study, we present an efficient computational framework for the discovery of active peptides targeting a specific pharmacological target, which combines a conditional variational autoencoder (CVAE) and a classifier named TCPP based on the Transformer and convolutional neural network. In our example scenario, we constructed an active cyclic peptide library targeting interleukin-17C (IL-17C) through a library-based in vitro selection strategy. The CVAE model is trained on the preprocessed peptide data sets to generate potentially active peptides and the TCPP further screens the generated peptides. Ultimately, six candidate peptides predicted by the model were synthesized and assayed for their activity, and four of them exhibited promising binding affinity to IL-17C. Our study provides a one-stop-shop for target-specific active peptide discovery, which is expected to boost up the process of peptide drug development.
Collapse
Affiliation(s)
- Zhipeng Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Yejian Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Cheng Zhu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Xinyi Wu
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Silong Zhai
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Xinqiao Wang
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Zhihao Su
- Artificial Intelligence Aided Drug Discovery Institute, College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| |
Collapse
|
9
|
Zhai H, Hou H, Luo J, Liu X, Wu Z, Wang J. DGDTA: dynamic graph attention network for predicting drug-target binding affinity. BMC Bioinformatics 2023; 24:367. [PMID: 37777712 PMCID: PMC10543834 DOI: 10.1186/s12859-023-05497-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 09/23/2023] [Indexed: 10/02/2023] Open
Abstract
BACKGROUND Obtaining accurate drug-target binding affinity (DTA) information is significant for drug discovery and drug repositioning. Although some methods have been proposed for predicting DTA, the features of proteins and drugs still need to be further analyzed. Recently, deep learning has been successfully used in many fields. Hence, designing a more effective deep learning method for predicting DTA remains attractive. RESULTS Dynamic graph DTA (DGDTA), which uses a dynamic graph attention network combined with a bidirectional long short-term memory (Bi-LSTM) network to predict DTA is proposed in this paper. DGDTA adopts drug compound as input according to its corresponding simplified molecular input line entry system (SMILES) and protein amino acid sequence. First, each drug is considered a graph of interactions between atoms and edges, and dynamic attention scores are used to consider which atoms and edges in the drug are most important for predicting DTA. Then, Bi-LSTM is used to better extract the contextual information features of protein amino acid sequences. Finally, after combining the obtained drug and protein feature vectors, the DTA is predicted by a fully connected layer. The source code is available from GitHub at https://github.com/luojunwei/DGDTA . CONCLUSIONS The experimental results show that DGDTA can predict DTA more accurately than some other methods.
Collapse
Affiliation(s)
- Haixia Zhai
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Hongli Hou
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China.
| | - Xiaoyan Liu
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Zhengjiang Wu
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Junfeng Wang
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| |
Collapse
|
10
|
Rønneberg L, Kirk PDW, Zucknick M. Dose-response prediction for in-vitro drug combination datasets: a probabilistic approach. BMC Bioinformatics 2023; 24:161. [PMID: 37085771 PMCID: PMC10120211 DOI: 10.1186/s12859-023-05256-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 03/28/2023] [Indexed: 04/23/2023] Open
Abstract
In this paper we propose PIICM, a probabilistic framework for dose-response prediction in high-throughput drug combination datasets. PIICM utilizes a permutation invariant version of the intrinsic co-regionalization model for multi-output Gaussian process regression, to predict dose-response surfaces in untested drug combination experiments. Coupled with an observation model that incorporates experimental uncertainty, PIICM is able to learn from noisily observed cell-viability measurements in settings where the underlying dose-response experiments are of varying quality, utilize different experimental designs, and the resulting training dataset is sparsely observed. We show that the model can accurately predict dose-response in held out experiments, and the resulting function captures relevant features indicating synergistic interaction between drugs.
Collapse
Affiliation(s)
- Leiv Rønneberg
- Oslo Centre for Biostatistics and Epidemiology, University of Oslo, Oslo, Norway
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Paul D W Kirk
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, University of Cambridge, Cambridge, UK
- Ovarian Cancer Programme, Cancer Research UK Cambridge Centre, Cambridge, UK
| | - Manuela Zucknick
- Oslo Centre for Biostatistics and Epidemiology, University of Oslo, Oslo, Norway.
| |
Collapse
|
11
|
Abbasi Mesrabadi H, Faez K, Pirgazi J. Drug-target interaction prediction based on protein features, using wrapper feature selection. Sci Rep 2023; 13:3594. [PMID: 36869062 PMCID: PMC9984486 DOI: 10.1038/s41598-023-30026-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 02/14/2023] [Indexed: 03/05/2023] Open
Abstract
Drug-target interaction prediction is a vital stage in drug development, involving lots of methods. Experimental methods that identify these relationships on the basis of clinical remedies are time-taking, costly, laborious, and complex introducing a lot of challenges. One group of new methods is called computational methods. The development of new computational methods which are more accurate can be preferable to experimental methods, in terms of total cost and time. In this paper, a new computational model to predict drug-target interaction (DTI), consisting of three phases, including feature extraction, feature selection, and classification is proposed. In feature extraction phase, different features such as EAAC, PSSM and etc. would be extracted from sequence of proteins and fingerprint features from drugs. These extracted features would then be combined. In the next step, one of the wrapper feature selection methods named IWSSR, due to the large amount of extracted data, is applied. The selected features are then given to rotation forest classification, to have a more efficient prediction. Actually, the innovation of our work is that we extract different features; and then select features by the use of IWSSR. The accuracy of the rotation forest classifier based on tenfold on the golden standard datasets (enzyme, ion channels, G-protein-coupled receptors, nuclear receptors) is as follows: 98.12, 98.07, 96.82, and 95.64. The results of experiments indicate that the proposed model has an acceptable rate in DTI prediction and is compatible with the proposed methods in other papers.
Collapse
Affiliation(s)
- Hengame Abbasi Mesrabadi
- Faculty of Computer and Information Technology Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran
| | - Karim Faez
- Department of Electrical Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran.
| | - Jamshid Pirgazi
- Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran
| |
Collapse
|
12
|
Bae H, Nam H. GraphATT-DTA: Attention-Based Novel Representation of Interaction to Predict Drug-Target Binding Affinity. Biomedicines 2022; 11:biomedicines11010067. [PMID: 36672575 PMCID: PMC9855982 DOI: 10.3390/biomedicines11010067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 12/06/2022] [Accepted: 12/20/2022] [Indexed: 12/29/2022] Open
Abstract
Drug-target binding affinity (DTA) prediction is an essential step in drug discovery. Drug-target protein binding occurs at specific regions between the protein and drug, rather than the entire protein and drug. However, existing deep-learning DTA prediction methods do not consider the interactions between drug substructures and protein sub-sequences. This work proposes GraphATT-DTA, a DTA prediction model that constructs the essential regions for determining interaction affinity between compounds and proteins, modeled with an attention mechanism for interpretability. We make the model consider the local-to-global interactions with the attention mechanism between compound and protein. As a result, GraphATT-DTA shows an improved prediction of DTA performance and interpretability compared with state-of-the-art models. The model is trained and evaluated with the Davis dataset, the human kinase dataset; an external evaluation is achieved with the independently proposed human kinase dataset from the BindingDB dataset.
Collapse
Affiliation(s)
- Haelee Bae
- AI Graduate School, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
| | - Hojung Nam
- AI Graduate School, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
- Center for AI-Applied High Efficiency Drug Discovery (AHEDD), Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
- Correspondence:
| |
Collapse
|
13
|
Liu M, Shen X, Pan W. Deep reinforcement learning for personalized treatment recommendation. Stat Med 2022; 41:4034-4056. [PMID: 35716038 PMCID: PMC9427729 DOI: 10.1002/sim.9491] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 05/22/2022] [Accepted: 05/25/2022] [Indexed: 12/12/2022]
Abstract
In precision medicine, the ultimate goal is to recommend the most effective treatment to an individual patient based on patient-specific molecular and clinical profiles, possibly high-dimensional. To advance cancer treatment, large-scale screenings of cancer cell lines against chemical compounds have been performed to help better understand the relationship between genomic features and drug response; existing machine learning approaches use exclusively supervised learning, including penalized regression and recommender systems. However, it would be more efficient to apply reinforcement learning to sequentially learn as data accrue, including selecting the most promising therapy for a patient given individual molecular and clinical features and then collecting and learning from the corresponding data. In this article, we propose a novel personalized ranking system called Proximal Policy Optimization Ranking (PPORank), which ranks the drugs based on their predicted effects per cell line (or patient) in the framework of deep reinforcement learning (DRL). Modeled as a Markov decision process, the proposed method learns to recommend the most suitable drugs sequentially and continuously over time. As a proof-of-concept, we conduct experiments on two large-scale cancer cell line data sets in addition to simulated data. The results demonstrate that the proposed DRL-based PPORank outperforms the state-of-the-art competitors based on supervised learning. Taken together, we conclude that novel methods in the framework of DRL have great potential for precision medicine and should be further studied.
Collapse
Affiliation(s)
- Mingyang Liu
- School of StatisticsUniversity of MinnesotaMinneapolisMinnesotaUSA
| | - Xiaotong Shen
- School of StatisticsUniversity of MinnesotaMinneapolisMinnesotaUSA
| | - Wei Pan
- Division of BiostatisticsUniversity of MinnesotaMinneapolisMinnesotaUSA
| |
Collapse
|
14
|
Pantelidis P, Spartalis M, Zakynthinos G, Anastasiou A, Goliopoulou A, Oikonomou E, Iliopoulos DC, Siasos G. Artificial Intelligence: The new "fuel" to accelerate pharmaceutical development. Curr Pharm Des 2022; 28:2127-2128. [PMID: 35909280 DOI: 10.2174/1381612828666220729101103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 06/27/2022] [Indexed: 12/07/2022]
Affiliation(s)
- Panteleimon Pantelidis
- 3rd Department of Cardiology, Sotiria Thoracic Diseases General Hospital, National and Kapodistrian University of Athens, Athens, Greece
| | - Michael Spartalis
- 3rd Department of Cardiology, Sotiria Thoracic Diseases General Hospital, National and Kapodistrian University of Athens, Athens, Greece
| | - George Zakynthinos
- 3rd Department of Cardiology, Sotiria Thoracic Diseases General Hospital, National and Kapodistrian University of Athens, Athens, Greece
| | - Artemis Anastasiou
- 3rd Department of Cardiology, Sotiria Thoracic Diseases General Hospital, National and Kapodistrian University of Athens, Athens, Greece
| | - Athina Goliopoulou
- 3rd Department of Cardiology, Sotiria Thoracic Diseases General Hospital, National and Kapodistrian University of Athens, Athens, Greece
| | - Evangelos Oikonomou
- 3rd Department of Cardiology, Sotiria Thoracic Diseases General Hospital, National and Kapodistrian University of Athens, Athens, Greece
| | - Dimitrios C Iliopoulos
- Laboratory of Experimental Surgery and Surgical Research 'N. S. Christeas', National and Kapodistrian University of Athens, Medical School, Athens, Greece
| | - Gerasimos Siasos
- 3rd Department of Cardiology, Sotiria Thoracic Diseases General Hospital, National and Kapodistrian University of Athens, Athens, Greece
| |
Collapse
|
15
|
Luo H, Xiang Y, Fang X, Lin W, Wang F, Wu H, Wang H. BatchDTA: implicit batch alignment enhances deep learning-based drug-target affinity estimation. Brief Bioinform 2022; 23:6632927. [PMID: 35794723 DOI: 10.1093/bib/bbac260] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/23/2022] [Accepted: 06/03/2022] [Indexed: 11/14/2022] Open
Abstract
Candidate compounds with high binding affinities toward a target protein are likely to be developed as drugs. Deep neural networks (DNNs) have attracted increasing attention for drug-target affinity (DTA) estimation owning to their efficiency. However, the negative impact of batch effects caused by measure metrics, system technologies and other assay information is seldom discussed when training a DNN model for DTA. Suffering from the data deviation caused by batch effects, the DNN models can only be trained on a small amount of 'clean' data. Thus, it is challenging for them to provide precise and consistent estimations. We design a batch-sensitive training framework, namely BatchDTA, to train the DNN models. BatchDTA implicitly aligns multiple batches toward the same protein through learning the orders of candidate compounds with respect to the batches, alleviating the impact of the batch effects on the DNN models. Extensive experiments demonstrate that BatchDTA facilitates four mainstream DNN models to enhance the ability and robustness on multiple DTA datasets (BindingDB, Davis and KIBA). The average concordance index of the DNN models achieves a relative improvement of 4.0%. The case study reveals that BatchDTA can successfully learn the ranking orders of the compounds from multiple batches. In addition, BatchDTA can also be applied to the fused data collected from multiple sources to achieve further improvement.
Collapse
Affiliation(s)
- Hongyu Luo
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Yingfei Xiang
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Xiaomin Fang
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Wei Lin
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Fan Wang
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Hua Wu
- Baidu Inc., 100000, Beijing, China
| | | |
Collapse
|
16
|
Paltun BG, Kaski S, Mamitsuka H. DIVERSE: Bayesian Data IntegratiVE Learning for Precise Drug ResponSE Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2197-2207. [PMID: 33705322 DOI: 10.1109/tcbb.2021.3065535] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Detecting predictive biomarkers from multi-omics data is important for precision medicine, to improve diagnostics of complex diseases and for better treatments. This needs substantial experimental efforts that are made difficult by the heterogeneity of cell lines and huge cost. An effective solution is to build a computational model over the diverse omics data, including genomic, molecular, and environmental information. However, choosing informative and reliable data sources from among the different types of data is a challenging problem. We propose DIVERSE, a framework of Bayesian importance-weighted tri- and bi-matrix factorization(DIVERSE3 or DIVERSE2) to predict drug responses from data of cell lines, drugs, and gene interactions. DIVERSE integrates the data sources systematically, in a step-wise manner, examining the importance of each added data set in turn. More specifically, we sequentially integrate five different data sets, which have not all been combined in earlier bioinformatic methods for predicting drug responses. Empirical experiments show that DIVERSE clearly outperformed five other methods including three state-of-the-art approaches, under cross-validation, particularly in out-of-matrix prediction, which is closer to the setting of real use cases and more challenging than simpler in-matrix prediction. Additionally, case studies for discovering new drugs further confirmed the performance advantage of DIVERSE.
Collapse
|
17
|
Wang Z, Wang Z, Huang Y, Lu L, Fu Y. A multi-view multi-omics model for cancer drug response prediction. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03294-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
18
|
Zhang L, Yuan Y, Yu J, Liu H. SEMCM: A Self-Expressive Matrix Completion Model for Anti-cancer Drug Sensitivity Prediction. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220302123118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Genomic data sets generated by several recent large scale high-throughput screening efforts pose a thorny computational challenge for anticancer drug sensitivity prediction.
Objective:
We aimed to design an algorithm model that would predict missing elements in incomplete matrices and could be applicable to drug response prediction programs.
Method:
We developed a novel self-expressive matrix completion model to improve the predictive performance of drug response prediction problems. The model is based on the idea of subspace clustering and as a convex problem, it can be solved by alternating direction method of
multipliers. The original incomplete matrix can be filled through model training and parameters updated iteratively.
Results:
We applied SEMCM to Genomics of Drug Sensitivity in Cancer
(GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets to predict
unknown response values. A large number of experiments have proved that the algorithm has good prediction results and stability, which are better than several existing advanced drug sensitivity prediction and matrix
completion algorithms. Without modeling mutation information, SEMCM
could correctly predict cell line-drug associations for mutated cell lines and
wild cell lines. SEMCM can also be used for drug repositioning. The newly
predicted drug responses of GDSC dataset suggest that BL-41 was highly
sensitive to Bortezomib. Moreover, the sensitivity of A172 and NCI-H1437
to Paclitaxel was roughly the same.
Conclusion:
We report an efficient anticancer drug sensitivity prediction algorithm which is open-source and can predict the unknown responses of
cancer cell lines to drugs. Experimental results prove that our method can
not only improve the prediction accuracy but also can be applied to drug
repositioning.
Collapse
Affiliation(s)
- Lin Zhang
- Engineering Research Center of Intelligent Control for Underground
Space, Ministry of Education,
- China University of Mining and Technology, Xuzhou 221116, China
| | - Yuwei Yuan
- Engineering Research Center of Intelligent Control for Underground
Space, Ministry of Education,
- China University of Mining and Technology, Xuzhou 221116, China
| | - Jian Yu
- Engineering Research Center of Intelligent Control for Underground
Space, Ministry of Education,
- China University of Mining and Technology, Xuzhou 221116, China
| | - Hui Liu
- Engineering Research Center of Intelligent Control for Underground
Space, Ministry of Education,
- China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
19
|
Nguyen TM, Nguyen T, Le TM, Tran T. GEFA: Early Fusion Approach in Drug-Target Affinity Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:718-728. [PMID: 34197324 DOI: 10.1109/tcbb.2021.3094217] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Predicting the interaction between a compound and a target is crucial for rapid drug repurposing. Deep learning has been successfully applied in drug-target affinity (DTA)problem. However, previous deep learning-based methods ignore modeling the direct interactions between drug and protein residues. This would lead to inaccurate learning of target representation which may change due to the drug binding effects. In addition, previous DTA methods learn protein representation solely based on a small number of protein sequences in DTA datasets while neglecting the use of proteins outside of the DTA datasets. We propose GEFA (Graph Early Fusion Affinity), a novel graph-in-graph neural network with attention mechanism to address the changes in target representation because of the binding effects. Specifically, a drug is modeled as a graph of atoms, which then serves as a node in a larger graph of residues-drug complex. The resulting model is an expressive deep nested graph neural network. We also use pre-trained protein representation powered by the recent effort of learning contextualized protein representation. The experiments are conducted under different settings to evaluate scenarios such as novel drugs or targets. The results demonstrate the effectiveness of the pre-trained protein embedding and the advantages our GEFA in modeling the nested graph for drug-target interaction.
Collapse
|
20
|
Abstract
Multi-omics data analysis is an important aspect of cancer molecular biology studies and has led to ground-breaking discoveries. Many efforts have been made to develop machine learning methods that automatically integrate omics data. Here, we review machine learning tools categorized as either general-purpose or task-specific, covering both supervised and unsupervised learning for integrative analysis of multi-omics data. We benchmark the performance of five machine learning approaches using data from the Cancer Cell Line Encyclopedia, reporting accuracy on cancer type classification and mean absolute error on drug response prediction, and evaluating runtime efficiency. This review provides recommendations to researchers regarding suitable machine learning method selection for their specific applications. It should also promote the development of novel machine learning methodologies for data integration, which will be essential for drug discovery, clinical trial design, and personalized treatments.
Collapse
Affiliation(s)
- Zhaoxiang Cai
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
| | - Rebecca C. Poulos
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
| | - Jia Liu
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
- Faculty of Medicine, Western Sydney University, Campbelltown, NSW, Australia
| | - Qing Zhong
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
| |
Collapse
|
21
|
Viljanen M, Airola A, Pahikkala T. Generalized vec trick for fast learning of pairwise kernel models. Mach Learn 2022. [DOI: 10.1007/s10994-021-06127-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AbstractPairwise learning corresponds to the supervised learning setting where the goal is to make predictions for pairs of objects. Prominent applications include predicting drug-target or protein-protein interactions, or customer-product preferences. In this work, we present a comprehensive review of pairwise kernels, that have been proposed for incorporating prior knowledge about the relationship between the objects. Specifically, we consider the standard, symmetric and anti-symmetric Kronecker product kernels, metric-learning, Cartesian, ranking, as well as linear, polynomial and Gaussian kernels. Recently, a $$O(nm+nq)$$
O
(
n
m
+
n
q
)
time generalized vec trick algorithm, where $$n$$
n
, $$m$$
m
, and $$q$$
q
denote the number of pairs, drugs and targets, was introduced for training kernel methods with the Kronecker product kernel. This was a significant improvement over previous $$O(n^2)$$
O
(
n
2
)
training methods, since in most real-world applications $$m,q<< n$$
m
,
q
<
<
n
. In this work we show how all the reviewed kernels can be expressed as sums of Kronecker products, allowing the use of generalized vec trick for speeding up their computation. In the experiments, we demonstrate how the introduced approach allows scaling pairwise kernels to much larger data sets than previously feasible, and provide an extensive comparison of the kernels on a number of biological interaction prediction tasks.
Collapse
|
22
|
Chen Y, Zhang L. How much can deep learning improve prediction of the responses to drugs in cancer cell lines? Brief Bioinform 2021; 23:6370847. [PMID: 34529029 DOI: 10.1093/bib/bbab378] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 08/21/2021] [Accepted: 08/24/2021] [Indexed: 12/24/2022] Open
Abstract
The drug response prediction problem arises from personalized medicine and drug discovery. Deep neural networks have been applied to the multi-omics data being available for over 1000 cancer cell lines and tissues for better drug response prediction. We summarize and examine state-of-the-art deep learning methods that have been published recently. Although significant progresses have been made in deep learning approach in drug response prediction, deep learning methods show their weakness for predicting the response of a drug that does not appear in the training dataset. In particular, all the five evaluated deep learning methods performed worst than the similarity-regularized matrix factorization (SRMF) method in our drug blind test. We outline the challenges in applying deep learning approach to drug response prediction and suggest unique opportunities for deep learning integrated with established bioinformatics analyses to overcome some of these challenges.
Collapse
Affiliation(s)
- Yurui Chen
- Department of Mathematics and Computational Biology Programme, National University of Singapore, 119076, Singapore
| | - Louxin Zhang
- Department of Mathematics and Computational Biology Programme, National University of Singapore, 119076, Singapore
| |
Collapse
|
23
|
Tanoli Z, Aldahdooh J, Alam F, Wang Y, Seemab U, Fratelli M, Pavlis P, Hajduch M, Bietrix F, Gribbon P, Zaliani A, Hall MD, Shen M, Brimacombe K, Kulesskiy E, Saarela J, Wennerberg K, Vähä-Koskela M, Tang J. Minimal information for chemosensitivity assays (MICHA): a next-generation pipeline to enable the FAIRification of drug screening experiments. Brief Bioinform 2021; 23:6361039. [PMID: 34472587 PMCID: PMC8769689 DOI: 10.1093/bib/bbab350] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 08/03/2021] [Accepted: 08/02/2021] [Indexed: 12/29/2022] Open
Abstract
Chemosensitivity assays are commonly used for preclinical drug discovery and clinical trial optimization. However, data from independent assays are often discordant, largely attributed to uncharacterized variation in the experimental materials and protocols. We report here the launching of Minimal Information for Chemosensitivity Assays (MICHA), accessed via https://micha-protocol.org. Distinguished from existing efforts that are often lacking support from data integration tools, MICHA can automatically extract publicly available information to facilitate the assay annotation including: 1) compounds, 2) samples, 3) reagents and 4) data processing methods. For example, MICHA provides an integrative web server and database to obtain compound annotation including chemical structures, targets and disease indications. In addition, the annotation of cell line samples, assay protocols and literature references can be greatly eased by retrieving manually curated catalogues. Once the annotation is complete, MICHA can export a report that conforms to the FAIR principle (Findable, Accessible, Interoperable and Reusable) of drug screening studies. To consolidate the utility of MICHA, we provide FAIRified protocols from five major cancer drug screening studies as well as six recently conducted COVID-19 studies. With the MICHA web server and database, we envisage a wider adoption of a community-driven effort to improve the open access of drug sensitivity assays.
Collapse
Affiliation(s)
- Ziaurrehman Tanoli
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Jehad Aldahdooh
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Farhan Alam
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Yinyin Wang
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Umair Seemab
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | | | - Petr Pavlis
- Institute of Molecular and Translational Medicine, Czech
| | - Marian Hajduch
- Institute of Molecular and Translational Medicine, Czech
| | | | - Philip Gribbon
- Fraunhofer Institute for Molecular Biology and Applied Ecology, Germany
| | - Andrea Zaliani
- Fraunhofer Institute for Molecular Biology and Applied Ecology, Germany
| | - Matthew D Hall
- National Center for Advancing Translational Sciences, USA
| | - Min Shen
- National Center for Advancing Translational Sciences, USA
| | | | - Evgeny Kulesskiy
- Institute for Molecular Medicine Finland, University of Helsinki, Finland
| | - Jani Saarela
- Institute for Molecular Medicine Finland, University of Helsinki, Finland
| | - Krister Wennerberg
- Biotech Research & Innovation Centre (BRIC), University of Copenhagen, Denmark
| | | | - Jing Tang
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| |
Collapse
|
24
|
Zhang S, Jiang M, Wang S, Wang X, Wei Z, Li Z. SAG-DTA: Prediction of Drug-Target Affinity Using Self-Attention Graph Network. Int J Mol Sci 2021; 22:ijms22168993. [PMID: 34445696 PMCID: PMC8396496 DOI: 10.3390/ijms22168993] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/14/2021] [Accepted: 08/17/2021] [Indexed: 11/16/2022] Open
Abstract
The prediction of drug–target affinity (DTA) is a crucial step for drug screening and discovery. In this study, a new graph-based prediction model named SAG-DTA (self-attention graph drug–target affinity) was implemented. Unlike previous graph-based methods, the proposed model utilized self-attention mechanisms on the drug molecular graph to obtain effective representations of drugs for DTA prediction. Features of each atom node in the molecular graph were weighted using an attention score before being aggregated as molecule representation. Various self-attention scoring methods were compared in this study. In addition, two pooing architectures, namely, global and hierarchical architectures, were presented and evaluated on benchmark datasets. Results of comparative experiments on both regression and binary classification tasks showed that SAG-DTA was superior to previous sequence-based or other graph-based methods and exhibited good generalization ability.
Collapse
Affiliation(s)
- Shugang Zhang
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (Z.W.)
| | - Mingjian Jiang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266033, China;
| | - Shuang Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China;
| | | | - Zhiqiang Wei
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (Z.W.)
| | - Zhen Li
- College of Computer Science and Technology, Qingdao University, Qingdao 266071, China
- Correspondence: ; Tel./Fax: +86-532-85953086
| |
Collapse
|
25
|
Gupta R, Srivastava D, Sahu M, Tiwari S, Ambasta RK, Kumar P. Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers 2021; 25:1315-1360. [PMID: 33844136 PMCID: PMC8040371 DOI: 10.1007/s11030-021-10217-3] [Citation(s) in RCA: 331] [Impact Index Per Article: 82.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 03/22/2021] [Indexed: 02/06/2023]
Abstract
Drug designing and development is an important area of research for pharmaceutical companies and chemical scientists. However, low efficacy, off-target delivery, time consumption, and high cost impose a hurdle and challenges that impact drug design and discovery. Further, complex and big data from genomics, proteomics, microarray data, and clinical trials also impose an obstacle in the drug discovery pipeline. Artificial intelligence and machine learning technology play a crucial role in drug discovery and development. In other words, artificial neural networks and deep learning algorithms have modernized the area. Machine learning and deep learning algorithms have been implemented in several drug discovery processes such as peptide synthesis, structure-based virtual screening, ligand-based virtual screening, toxicity prediction, drug monitoring and release, pharmacophore modeling, quantitative structure-activity relationship, drug repositioning, polypharmacology, and physiochemical activity. Evidence from the past strengthens the implementation of artificial intelligence and deep learning in this field. Moreover, novel data mining, curation, and management techniques provided critical support to recently developed modeling algorithms. In summary, artificial intelligence and deep learning advancements provide an excellent opportunity for rational drug design and discovery process, which will eventually impact mankind. The primary concern associated with drug design and development is time consumption and production cost. Further, inefficiency, inaccurate target delivery, and inappropriate dosage are other hurdles that inhibit the process of drug delivery and development. With advancements in technology, computer-aided drug design integrating artificial intelligence algorithms can eliminate the challenges and hurdles of traditional drug design and development. Artificial intelligence is referred to as superset comprising machine learning, whereas machine learning comprises supervised learning, unsupervised learning, and reinforcement learning. Further, deep learning, a subset of machine learning, has been extensively implemented in drug design and development. The artificial neural network, deep neural network, support vector machines, classification and regression, generative adversarial networks, symbolic learning, and meta-learning are examples of the algorithms applied to the drug design and discovery process. Artificial intelligence has been applied to different areas of drug design and development process, such as from peptide synthesis to molecule design, virtual screening to molecular docking, quantitative structure-activity relationship to drug repositioning, protein misfolding to protein-protein interactions, and molecular pathway identification to polypharmacology. Artificial intelligence principles have been applied to the classification of active and inactive, monitoring drug release, pre-clinical and clinical development, primary and secondary drug screening, biomarker development, pharmaceutical manufacturing, bioactivity identification and physiochemical properties, prediction of toxicity, and identification of mode of action.
Collapse
Affiliation(s)
- Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Devesh Srivastava
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Mehar Sahu
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Swati Tiwari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India.
| |
Collapse
|
26
|
Feng F, Shen B, Mou X, Li Y, Li H. Large-scale pharmacogenomic studies and drug response prediction for personalized cancer medicine. J Genet Genomics 2021; 48:540-551. [PMID: 34023295 DOI: 10.1016/j.jgg.2021.03.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 03/26/2021] [Accepted: 03/28/2021] [Indexed: 12/26/2022]
Abstract
The response rate of most anti-cancer drugs is limited because of the high heterogeneity of cancer and the complex mechanism of drug action. Personalized treatment that stratifies patients into subgroups using molecular biomarkers is promising to improve clinical benefit. With the accumulation of preclinical models and advances in computational approaches of drug response prediction, pharmacogenomics has made great success over the last 20 years and is increasingly used in the clinical practice of personalized cancer medicine. In this article, we first summarize FDA-approved pharmacogenomic biomarkers and large-scale pharmacogenomic studies of preclinical cancer models such as patient-derived cell lines, organoids, and xenografts. Furthermore, we comprehensively review the recent developments of computational methods in drug response prediction, covering network, machine learning, and deep learning technologies and strategies to evaluate immunotherapy response. In the end, we discuss challenges and propose possible solutions for further improvement.
Collapse
Affiliation(s)
- Fangyoumin Feng
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Bihan Shen
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Xiaoqin Mou
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yixue Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China; Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 330106, China
| | - Hong Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| |
Collapse
|
27
|
Tanoli Z, Aldahdooh J, Alam F, Wang Y, Seemab U, Fratelli M, Pavlis P, Hajduch M, Bietrix F, Gribbon P, Zaliani A, Hall MD, Shen M, Brimacombe K, Kulesskiy E, Saarela J, Wennerberg K, Vähä-Koskela M, Tang J. Minimal information for Chemosensitivity assays (MICHA): A next-generation pipeline to enable the FAIRification of drug screening experiments. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2020.12.03.409409. [PMID: 33300000 PMCID: PMC7724669 DOI: 10.1101/2020.12.03.409409] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Chemosensitivity assays are commonly used for preclinical drug discovery and clinical trial optimization. However, data from independent assays are often discordant, largely attributed to uncharacterized variation in the experimental materials and protocols. We report here the launching of MICHA (Minimal Information for Chemosensitivity Assays), accessed via https://micha-protocol.org. Distinguished from existing efforts that are often lacking support from data integration tools, MICHA can automatically extract publicly available information to facilitate the assay annotation including: 1) compounds, 2) samples, 3) reagents, and 4) data processing methods. For example, MICHA provides an integrative web server and database to obtain compound annotation including chemical structures, targets, and disease indications. In addition, the annotation of cell line samples, assay protocols and literature references can be greatly eased by retrieving manually curated catalogues. Once the annotation is complete, MICHA can export a report that conforms to the FAIR principle (Findable, Accessible, Interoperable and Reusable) of drug screening studies. To consolidate the utility of MICHA, we provide FAIRified protocols from five major cancer drug screening studies, as well as six recently conducted COVID-19 studies. With the MICHA webserver and database, we envisage a wider adoption of a community-driven effort to improve the open access of drug sensitivity assays.
Collapse
Affiliation(s)
- Ziaurrehman Tanoli
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Jehad Aldahdooh
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Farhan Alam
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Yinyin Wang
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Umair Seemab
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | | | - Petr Pavlis
- Institute of Molecular and Translational Medicine, Czech
| | - Marian Hajduch
- Institute of Molecular and Translational Medicine, Czech
| | | | - Philip Gribbon
- Fraunhofer Institute for Translational Medicine and Pharmacology, Hamburg, Germany
| | - Andrea Zaliani
- Fraunhofer Institute for Translational Medicine and Pharmacology, Hamburg, Germany
| | | | - Min Shen
- National Center for Advancing Translational Sciences, U.S.A
| | | | - Evgeny Kulesskiy
- Institute for Molecular Medicine Finland, University of Helsinki, Finland
| | - Jani Saarela
- Institute for Molecular Medicine Finland, University of Helsinki, Finland
| | - Krister Wennerberg
- Biotech Research & Innovation Centre (BRIC), University of Copenhagen, Denmark
| | | | - Jing Tang
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| |
Collapse
|
28
|
Nguyen T, Le H, Quinn TP, Nguyen T, Le TD, Venkatesh S. GraphDTA: predicting drug-target binding affinity with graph neural networks. Bioinformatics 2021; 37:1140-1147. [PMID: 33119053 DOI: 10.1093/bioinformatics/btaa921] [Citation(s) in RCA: 336] [Impact Index Per Article: 84.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 10/01/2020] [Accepted: 10/15/2020] [Indexed: 12/21/2022] Open
Abstract
SUMMARY The development of new drugs is costly, time consuming and often accompanied with safety issues. Drug repurposing can avoid the expensive and lengthy process of drug development by finding new uses for already approved drugs. In order to repurpose drugs effectively, it is useful to know which proteins are targeted by which drugs. Computational models that estimate the interaction strength of new drug-target pairs have the potential to expedite drug repurposing. Several models have been proposed for this task. However, these models represent the drugs as strings, which is not a natural way to represent molecules. We propose a new model called GraphDTA that represents drugs as graphs and uses graph neural networks to predict drug-target affinity. We show that graph neural networks not only predict drug-target affinity better than non-deep learning models, but also outperform competing deep learning methods. Our results confirm that deep learning models are appropriate for drug-target binding affinity prediction, and that representing drugs as graphs can lead to further improvements. AVAILABILITY OF IMPLEMENTATION The proposed models are implemented in Python. Related data, pre-trained models and source code are publicly available at https://github.com/thinng/GraphDTA. All scripts and data needed to reproduce the post hoc statistical analysis are available from https://doi.org/10.5281/zenodo.3603523. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thin Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Geelong, VIC, 3216, Australia
| | - Hang Le
- Faculty of Information Technology, Nha Trang University, Nha Trang, Khanh Hoa, Viet Nam
| | - Thomas P Quinn
- Applied Artificial Intelligence Institute, Deakin University, Geelong, VIC, 3216, Australia
| | - Tri Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Geelong, VIC, 3216, Australia
| | - Thuc Duy Le
- School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, SA, 5095, Australia
| | - Svetha Venkatesh
- Applied Artificial Intelligence Institute, Deakin University, Geelong, VIC, 3216, Australia
| |
Collapse
|
29
|
Majumdar A, Liu Y, Lu Y, Wu S, Cheng L. kESVR: An Ensemble Model for Drug Response Prediction in Precision Medicine Using Cancer Cell Lines Gene Expression. Genes (Basel) 2021; 12:genes12060844. [PMID: 34070793 PMCID: PMC8229729 DOI: 10.3390/genes12060844] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 05/25/2021] [Accepted: 05/28/2021] [Indexed: 12/02/2022] Open
Abstract
Background: Cancer cell lines are frequently used in research as in-vitro tumor models. Genomic data and large-scale drug screening have accelerated the right drug selection for cancer patients. Accuracy in drug response prediction is crucial for success. Due to data-type diversity and big data volume, few methods can integrative and efficiently find the principal low-dimensional manifold of the high-dimensional cancer multi-omics data to predict drug response in precision medicine. Method: A novelty k-means Ensemble Support Vector Regression (kESVR) is developed to predict each drug response values for single patient based on cell-line gene expression data. The kESVR is a blend of supervised and unsupervised learning methods and is entirely data driven. It utilizes embedded clustering (Principal Component Analysis and k-means clustering) and local regression (Support Vector Regression) to predict drug response and obtain the global pattern while overcoming missing data and outliers’ noise. Results: We compared the efficiency and accuracy of kESVR to 4 standard machine learning regression models: (1) simple linear regression, (2) support vector regression (3) random forest (quantile regression forest) and (4) back propagation neural network. Our results, which based on drug response across 610 cancer cells from Cancer Cell Line Encyclopedia (CCLE) and Cancer Therapeutics Response Portal (CTRP v2), proved to have the highest accuracy (smallest mean squared error (MSE) measure). We next compared kESVR with existing 17 drug response prediction models based a varied range of methods such as regression, Bayesian inference, matrix factorization and deep learning. After ranking the 18 models based on their accuracy of prediction, kESVR ranks first (best performing) in majority (74%) of the time. As for the remaining (26%) cases, kESVR still ranked in the top five performing models. Conclusion: In this paper we introduce a novel model (kESVR) for drug response prediction using high dimensional cell-line gene expression data. This model outperforms current existing prediction models in terms of prediction accuracy and speed and overcomes overfitting. This can be used in future to develop a robust drug response prediction system for cancer patients using the cancer cell-lines guidance and multi-omics data.
Collapse
Affiliation(s)
- Abhishek Majumdar
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA; (A.M.); (S.W.)
| | - Yueze Liu
- The Grainger College of Engineering, The University of Illinois Urbana-Champaign, Urbana and Champaign, Champaign, IL 61801, USA;
| | - Yaoqin Lu
- Department of Occupational and Environmental Health, School of Public Health, XinJiang Medical University, Urumqi 830011, China;
| | - Shaofeng Wu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA; (A.M.); (S.W.)
| | - Lijun Cheng
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA; (A.M.); (S.W.)
- Correspondence:
| |
Collapse
|
30
|
Tanoli Z, Vähä-Koskela M, Aittokallio T. Artificial intelligence, machine learning, and drug repurposing in cancer. Expert Opin Drug Discov 2021; 16:977-989. [PMID: 33543671 DOI: 10.1080/17460441.2021.1883585] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Introduction: Drug repurposing provides a cost-effective strategy to re-use approved drugs for new medical indications. Several machine learning (ML) and artificial intelligence (AI) approaches have been developed for systematic identification of drug repurposing leads based on big data resources, hence further accelerating and de-risking the drug development process by computational means.Areas covered: The authors focus on supervised ML and AI methods that make use of publicly available databases and information resources. While most of the example applications are in the field of anticancer drug therapies, the methods and resources reviewed are widely applicable also to other indications including COVID-19 treatment. A particular emphasis is placed on the use of comprehensive target activity profiles that enable a systematic repurposing process by extending the target profile of drugs to include potent off-targets with therapeutic potential for a new indication.Expert opinion: The scarcity of clinical patient data and the current focus on genetic aberrations as primary drug targets may limit the performance of anticancer drug repurposing approaches that rely solely on genomics-based information. Functional testing of cancer patient cells exposed to a large number of targeted therapies and their combinations provides an additional source of repurposing information for tissue-aware AI approaches.
Collapse
Affiliation(s)
- Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLife, University of Helsinki, Helsinki, Finland
| | - Markus Vähä-Koskela
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLife, University of Helsinki, Helsinki, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLife, University of Helsinki, Helsinki, Finland.,Institute for Cancer Research, Department of Cancer Genetics, Oslo University Hospital, Oslo, Norway.,Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway
| |
Collapse
|
31
|
Wani N, Raza K. MKL-GRNI: A parallel multiple kernel learning approach for supervised inference of large-scale gene regulatory networks. PeerJ Comput Sci 2021; 7:e363. [PMID: 33817013 PMCID: PMC7924726 DOI: 10.7717/peerj-cs.363] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 12/29/2020] [Indexed: 06/12/2023]
Abstract
High throughput multi-omics data generation coupled with heterogeneous genomic data fusion are defining new ways to build computational inference models. These models are scalable and can support very large genome sizes with the added advantage of exploiting additional biological knowledge from the integration framework. However, the limitation with such an arrangement is the huge computational cost involved when learning from very large datasets in a sequential execution environment. To overcome this issue, we present a multiple kernel learning (MKL) based gene regulatory network (GRN) inference approach wherein multiple heterogeneous datasets are fused using MKL paradigm. We formulate the GRN learning problem as a supervised classification problem, whereby genes regulated by a specific transcription factor are separated from other non-regulated genes. A parallel execution architecture is devised to learn a large scale GRN by decomposing the initial classification problem into a number of subproblems that run as multiple processes on a multi-processor machine. We evaluate the approach in terms of increased speedup and inference potential using genomic data from Escherichia coli, Saccharomyces cerevisiae and Homo sapiens. The results thus obtained demonstrate that the proposed method exhibits better classification accuracy and enhanced speedup compared to other state-of-the-art methods while learning large scale GRNs from multiple and heterogeneous datasets.
Collapse
Affiliation(s)
- Nisar Wani
- Govt. Degree College Baramulla, Jammu & Kashmir, India
| | - Khalid Raza
- Department of Computer Science, Jamia Millia Islamia, New Delhi, India
| |
Collapse
|
32
|
Ding Y, Tang J, Guo F. The Computational Models of Drug-target Interaction Prediction. Protein Pept Lett 2020; 27:348-358. [PMID: 30968771 DOI: 10.2174/0929866526666190410124110] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 02/22/2019] [Accepted: 04/02/2019] [Indexed: 12/19/2022]
Abstract
The identification of Drug-Target Interactions (DTIs) is an important process in drug discovery and medical research. However, the tradition experimental methods for DTIs identification are still time consuming, extremely expensive and challenging. In the past ten years, various computational methods have been developed to identify potential DTIs. In this paper, the identification methods of DTIs are summarized. What's more, several state-of-the-art computational methods are mainly introduced, containing network-based method and machine learning-based method. In particular, for machine learning-based methods, including the supervised and semisupervised models, have essential differences in the approach of negative samples. Although these effective computational models in identification of DTIs have achieved significant improvements, network-based and machine learning-based methods have their disadvantages, respectively. These computational methods are evaluated on four benchmark data sets via values of Area Under the Precision Recall curve (AUPR).
Collapse
Affiliation(s)
- Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States.,School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
33
|
Leveraging multi-way interactions for systematic prediction of pre-clinical drug combination effects. Nat Commun 2020; 11:6136. [PMID: 33262326 PMCID: PMC7708835 DOI: 10.1038/s41467-020-19950-z] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 11/05/2020] [Indexed: 12/12/2022] Open
Abstract
We present comboFM, a machine learning framework for predicting the responses of drug combinations in pre-clinical studies, such as those based on cell lines or patient-derived cells. comboFM models the cell context-specific drug interactions through higher-order tensors, and efficiently learns latent factors of the tensor using powerful factorization machines. The approach enables comboFM to leverage information from previous experiments performed on similar drugs and cells when predicting responses of new combinations in so far untested cells; thereby, it achieves highly accurate predictions despite sparsely populated data tensors. We demonstrate high predictive performance of comboFM in various prediction scenarios using data from cancer cell line pharmacogenomic screens. Subsequent experimental validation of a set of previously untested drug combinations further supports the practical and robust applicability of comboFM. For instance, we confirm a novel synergy between anaplastic lymphoma kinase (ALK) inhibitor crizotinib and proteasome inhibitor bortezomib in lymphoma cells. Overall, our results demonstrate that comboFM provides an effective means for systematic pre-screening of drug combinations to support precision oncology applications. Combinatorial treatments have become a standard of care for various complex diseases including cancers. Here, the authors show that combinatorial responses of two anticancer drugs can be accurately predicted using factorization machines trained on large-scale pharmacogenomic data for guiding precision oncology studies.
Collapse
|
34
|
Huang LC, Yeung W, Wang Y, Cheng H, Venkat A, Li S, Ma P, Rasheed K, Kannan N. Quantitative Structure-Mutation-Activity Relationship Tests (QSMART) model for protein kinase inhibitor response prediction. BMC Bioinformatics 2020; 21:520. [PMID: 33183223 PMCID: PMC7664030 DOI: 10.1186/s12859-020-03842-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Accepted: 10/27/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Protein kinases are a large family of druggable proteins that are genomically and proteomically altered in many human cancers. Kinase-targeted drugs are emerging as promising avenues for personalized medicine because of the differential response shown by altered kinases to drug treatment in patients and cell-based assays. However, an incomplete understanding of the relationships connecting genome, proteome and drug sensitivity profiles present a major bottleneck in targeting kinases for personalized medicine. RESULTS In this study, we propose a multi-component Quantitative Structure-Mutation-Activity Relationship Tests (QSMART) model and neural networks framework for providing explainable models of protein kinase inhibition and drug response ([Formula: see text]) profiles in cell lines. Using non-small cell lung cancer as a case study, we show that interaction terms that capture associations between drugs, pathways, and mutant kinases quantitatively contribute to the response of two EGFR inhibitors (afatinib and lapatinib). In particular, protein-protein interactions associated with the JNK apoptotic pathway, associations between lung development and axon extension, and interaction terms connecting drug substructures and the volume/charge of mutant residues at specific structural locations contribute significantly to the observed [Formula: see text] values in cell-based assays. CONCLUSIONS By integrating multi-omics data in the QSMART model, we not only predict drug responses in cancer cell lines with high accuracy but also identify features and explainable interaction terms contributing to the accuracy. Although we have tested our multi-component explainable framework on protein kinase inhibitors, it can be extended across the proteome to investigate the complex relationships connecting genotypes and drug sensitivity profiles.
Collapse
Affiliation(s)
- Liang-Chin Huang
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Wayland Yeung
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Ye Wang
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602 USA
| | - Huimin Cheng
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602 USA
| | - Aarya Venkat
- Department of Biochemistry and Molecular Biology, 120 Green St., Athens, GA 30602 USA
| | - Sheng Li
- Department of Computer Science, 415 Boyd Graduate Studies Research Center, Athens, GA 30602 USA
| | - Ping Ma
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602 USA
| | - Khaled Rasheed
- Department of Computer Science, 415 Boyd Graduate Studies Research Center, Athens, GA 30602 USA
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
- Department of Biochemistry and Molecular Biology, 120 Green St., Athens, GA 30602 USA
| |
Collapse
|
35
|
Koras K, Juraeva D, Kreis J, Mazur J, Staub E, Szczurek E. Feature selection strategies for drug sensitivity prediction. Sci Rep 2020; 10:9377. [PMID: 32523056 PMCID: PMC7287073 DOI: 10.1038/s41598-020-65927-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 05/06/2020] [Indexed: 12/16/2022] Open
Abstract
Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. Critically, the sensitivity of cancer cells to treatment depends on an unknown subset of a large number of biological features. Here, we compare standard, data-driven feature selection approaches to feature selection driven by prior knowledge of drug targets, target pathways, and gene expression signatures. We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, evaluating 2484 unique models. For 23 drugs, better predictive performance is achieved when the features are selected according to prior knowledge of drug targets and pathways. The best correlation of observed and predicted response using the test set is achieved for Linifanib (r = 0.75). Extending the drug-dependent features with gene expression signatures yields the most predictive models for 60 drugs, with the best performing example of Dabrafenib. For many compounds, even a very small subset of drug-related features is highly predictive of drug sensitivity. Small feature sets selected using prior knowledge are more predictive for drugs targeting specific genes and pathways, while models with wider feature sets perform better for drugs affecting general cellular mechanisms. Appropriate feature selection strategies facilitate the development of interpretable models that are indicative for therapy design.
Collapse
Affiliation(s)
- Krzysztof Koras
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Dilafruz Juraeva
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Julian Kreis
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Johanna Mazur
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Eike Staub
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Ewa Szczurek
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland.
| |
Collapse
|
36
|
Chen J, Zhang L. A survey and systematic assessment of computational methods for drug response prediction. Brief Bioinform 2020; 22:232-246. [PMID: 31927568 DOI: 10.1093/bib/bbz164] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Drug response prediction arises from both basic and clinical research of personalized therapy, as well as drug discovery for cancers. With gene expression profiles and other omics data being available for over 1000 cancer cell lines and tissues, different machine learning approaches have been applied to drug response prediction. These methods appear in a body of literature and have been evaluated on different datasets with only one or two accuracy metrics. We systematically assess 17 representative methods for drug response prediction, which have been developed in the past 5 years, on four large public datasets in nine metrics. This study provides insights and lessons for future research into drug response prediction.
Collapse
|
37
|
Güvenç Paltun B, Mamitsuka H, Kaski S. Improving drug response prediction by integrating multiple data sources: matrix factorization, kernel and network-based approaches. Brief Bioinform 2019; 22:346-359. [PMID: 31838491 PMCID: PMC7820853 DOI: 10.1093/bib/bbz153] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 11/01/2019] [Accepted: 11/04/2019] [Indexed: 12/17/2022] Open
Abstract
Predicting the response of cancer cell lines to specific drugs is one of the central problems in personalized medicine, where the cell lines show diverse characteristics. Researchers have developed a variety of computational methods to discover associations between drugs and cell lines, and improved drug sensitivity analyses by integrating heterogeneous biological data. However, choosing informative data sources and methods that can incorporate multiple sources efficiently is the challenging part of successful analysis in personalized medicine. The reason is that finding decisive factors of cancer and developing methods that can overcome the problems of integrating data, such as differences in data structures and data complexities, are difficult. In this review, we summarize recent advances in data integration-based machine learning for drug response prediction, by categorizing methods as matrix factorization-based, kernel-based and network-based methods. We also present a short description of relevant databases used as a benchmark in drug response prediction analyses, followed by providing a brief discussion of challenges faced in integrating and interpreting data from multiple sources. Finally, we address the advantages of combining multiple heterogeneous data sources on drug sensitivity analysis by showing an experimental comparison. Contact: betul.guvenc@aalto.fi
Collapse
Affiliation(s)
- Betül Güvenç Paltun
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Samuel Kaski
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| |
Collapse
|
38
|
Ali M, Aittokallio T. Machine learning and feature selection for drug response prediction in precision oncology applications. Biophys Rev 2018; 11:31-39. [PMID: 30097794 PMCID: PMC6381361 DOI: 10.1007/s12551-018-0446-z] [Citation(s) in RCA: 105] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 07/22/2018] [Indexed: 02/07/2023] Open
Abstract
In-depth modeling of the complex interplay among multiple omics data measured from cancer cell lines or patient tumors is providing new opportunities toward identification of tailored therapies for individual cancer patients. Supervised machine learning algorithms are increasingly being applied to the omics profiles as they enable integrative analyses among the high-dimensional data sets, as well as personalized predictions of therapy responses using multi-omics panels of response-predictive biomarkers identified through feature selection and cross-validation. However, technical variability and frequent missingness in input "big data" require the application of dedicated data preprocessing pipelines that often lead to some loss of information and compressed view of the biological signal. We describe here the state-of-the-art machine learning methods for anti-cancer drug response modeling and prediction and give our perspective on further opportunities to make better use of high-dimensional multi-omics profiles along with knowledge about cancer pathways targeted by anti-cancer compounds when predicting their phenotypic responses.
Collapse
Affiliation(s)
- Mehreen Ali
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, FI-00290, Helsinki, Finland.,Helsinki Institute for Information Technology (HIIT), Aalto University, FI-02150, Espoo, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, FI-00290, Helsinki, Finland. .,Helsinki Institute for Information Technology (HIIT), Aalto University, FI-02150, Espoo, Finland. .,Department of Mathematics and Statistics, University of Turku, FI-20014, Turku, Finland.
| |
Collapse
|