1
|
Pang W, Chen M, Qin Y. Prediction of anticancer drug sensitivity using an interpretable model guided by deep learning. BMC Bioinformatics 2024; 25:182. [PMID: 38724920 PMCID: PMC11080240 DOI: 10.1186/s12859-024-05669-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 01/22/2024] [Indexed: 05/13/2024] Open
Abstract
BACKGROUND The prediction of drug sensitivity plays a crucial role in improving the therapeutic effect of drugs. However, testing the effectiveness of drugs is challenging due to the complex mechanism of drug reactions and the lack of interpretability in most machine learning and deep learning methods. Therefore, it is imperative to establish an interpretable model that receives various cell line and drug feature data to learn drug response mechanisms and achieve stable predictions between available datasets. RESULTS This study proposes a new and interpretable deep learning model, DrugGene, which integrates gene expression, gene mutation, gene copy number variation of cancer cells, and chemical characteristics of anticancer drugs to predict their sensitivity. This model comprises two different branches of neural networks, where the first involves a hierarchical structure of biological subsystems that uses the biological processes of human cells to form a visual neural network (VNN) and an interpretable deep neural network for human cancer cells. DrugGene receives genotype input from the cell line and detects changes in the subsystem states. We also employ a traditional artificial neural network (ANN) to capture the chemical structural features of drugs. DrugGene generates final drug response predictions by combining VNN and ANN and integrating their outputs into a fully connected layer. The experimental results using drug sensitivity data extracted from the Cancer Drug Sensitivity Genome Database and the Cancer Treatment Response Portal v2 reveal that the proposed model is better than existing prediction methods. Therefore, our model achieves higher accuracy, learns the reaction mechanisms between anticancer drugs and cell lines from various features, and interprets the model's predicted results. CONCLUSIONS Our method utilizes biological pathways to construct neural networks, which can use genotypes to monitor changes in the state of network subsystems, thereby interpreting the prediction results in the model and achieving satisfactory prediction accuracy. This will help explore new directions in cancer treatment. More available code resources can be downloaded for free from GitHub ( https://github.com/pangweixiong/DrugGene ).
Collapse
Affiliation(s)
- Weixiong Pang
- College of Information Technology, Shanghai Ocean University, Hucheng Ring Road, Shanghai, China
- Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China
| | - Ming Chen
- College of Information Technology, Shanghai Ocean University, Hucheng Ring Road, Shanghai, China
- Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China
| | - Yufang Qin
- College of Information Technology, Shanghai Ocean University, Hucheng Ring Road, Shanghai, China.
- Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China.
| |
Collapse
|
2
|
Zhang H, Liu X, Cheng W, Wang T, Chen Y. Prediction of drug-target binding affinity based on deep learning models. Comput Biol Med 2024; 174:108435. [PMID: 38608327 DOI: 10.1016/j.compbiomed.2024.108435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/05/2024] [Accepted: 04/07/2024] [Indexed: 04/14/2024]
Abstract
The prediction of drug-target binding affinity (DTA) plays an important role in drug discovery. Computerized virtual screening techniques have been used for DTA prediction, greatly reducing the time and economic costs of drug discovery. However, these techniques have not succeeded in reversing the low success rate of new drug development. In recent years, the continuous development of deep learning (DL) technology has brought new opportunities for drug discovery through the DTA prediction. This shift has moved the prediction of DTA from traditional machine learning methods to DL. The DL frameworks used for DTA prediction include convolutional neural networks (CNN), graph convolutional neural networks (GCN), and recurrent neural networks (RNN), and reinforcement learning (RL), among others. This review article summarizes the available literature on DTA prediction using DL models, including DTA quantification metrics and datasets, and DL algorithms used for DTA prediction (including input representation of models, neural network frameworks, valuation indicators, and model interpretability). In addition, the opportunities, challenges, and prospects of the application of DL frameworks for DTA prediction in the field of drug discovery are discussed.
Collapse
Affiliation(s)
- Hao Zhang
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Xiaoqian Liu
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Wenya Cheng
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Tianshi Wang
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Yuanyuan Chen
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China.
| |
Collapse
|
3
|
Dawood M, Vu QD, Young LS, Branson K, Jones L, Rajpoot N, Minhas FUAA. Cancer drug sensitivity prediction from routine histology images. NPJ Precis Oncol 2024; 8:5. [PMID: 38184744 PMCID: PMC10771481 DOI: 10.1038/s41698-023-00491-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 12/08/2023] [Indexed: 01/08/2024] Open
Abstract
Drug sensitivity prediction models can aid in personalising cancer therapy, biomarker discovery, and drug design. Such models require survival data from randomised controlled trials which can be time consuming and expensive. In this proof-of-concept study, we demonstrate for the first time that deep learning can link histological patterns in whole slide images (WSIs) of Haematoxylin & Eosin (H&E) stained breast cancer sections with drug sensitivities inferred from cell lines. We employ patient-wise drug sensitivities imputed from gene expression-based mapping of drug effects on cancer cell lines to train a deep learning model that predicts patients' sensitivity to multiple drugs from WSIs. We show that it is possible to use routine WSIs to predict the drug sensitivity profile of a cancer patient for a number of approved and experimental drugs. We also show that the proposed approach can identify cellular and histological patterns associated with drug sensitivity profiles of cancer patients.
Collapse
Affiliation(s)
- Muhammad Dawood
- Tissue Image Analytics Centre, University of Warwick, Coventry, UK.
| | - Quoc Dang Vu
- Tissue Image Analytics Centre, University of Warwick, Coventry, UK
| | - Lawrence S Young
- Warwick Medical School, University of Warwick, Coventry, UK
- Cancer Research Centre, University of Warwick, Coventry, UK
| | - Kim Branson
- Artificial Intelligence & Machine Learning, GlaxoSmithKline, San Francisco, CA, USA
| | - Louise Jones
- Barts Cancer Institute, Queen Mary University of London, London, UK
| | - Nasir Rajpoot
- Tissue Image Analytics Centre, University of Warwick, Coventry, UK
- Cancer Research Centre, University of Warwick, Coventry, UK
- The Alan Turing Institute, London, UK
| | - Fayyaz Ul Amir Afsar Minhas
- Tissue Image Analytics Centre, University of Warwick, Coventry, UK
- Cancer Research Centre, University of Warwick, Coventry, UK
| |
Collapse
|
4
|
Sarhan MO, Haffez H, Elsayed NA, El-Haggar RS, Zaghary WA. New phenothiazine conjugates as apoptosis inducing agents: Design, synthesis, In-vitro anti-cancer screening and 131I-radiolabeling for in-vivo evaluation. Bioorg Chem 2023; 141:106924. [PMID: 37871390 DOI: 10.1016/j.bioorg.2023.106924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 10/06/2023] [Accepted: 10/16/2023] [Indexed: 10/25/2023]
Abstract
Phenothiazines (PTZs) are a group of compounds characterized by the presence of the 10H-dibenzo-[b,e]-1,4-thiazine system. PTZs used in clinics as antipsychotic drugs with other diverse biological activities. The current aim of the study is to investigate and understand the effect of potent PTZs compounds using a group of In-vitro and In-vivo assays. A total of seventeen novel phenothiazine derivatives have been designed, synthesized, and evaluated primarily in-vitro for their ability to inhibit proliferation activity against NCI-60 cancer cell lines, including several multi-drug resistant (MDR) tumor cell lines. Almost all compounds were active and displayed promising cellular activities with GI50 values in the sub-micromolar range. Four of the most promising derivatives (4b, 4h, 4g and 6e) have been further tested against two selected sensitive cancer cell lines (colon cancer; HCT-116 and breast cancer; MDA-MB231). The apoptosis assay showed that all the selected compounds were able to induce early apoptosis and compound 6e was able to induce additional cellular necrosis. Cell cycle assay showed all selected compounds were able to induce cell cycle arrest at sub-molecular phase of G0-G1 with compound 6e induced cell cycle arrest at G2M in HCT-116 cells. Accordingly, the apoptotic effect of the selected compounds was extensively investigated on genetic level and Casp-3, Casp-9 and Bax gene were up-regulated with down-regulation of Bcl-2 gene suggesting the activation of both intrinsic and extrinsic pathways. In-vivo evaluation of the antitumor activity of compound 4b in solid tumor bearing mice showed promising therapeutic effect with manifestation of dose and time dependent toxic effects at higher doses. For better evaluation of the degree of localization of 4b, its 131I-congener (131I-4b) was injected intravenously in Ehrlich solid tumor bearing mice that showed good localization at tumor site with rapid distribution and clearance from the blood. In-silico study suggested NADPH oxidases (NOXs) as potential molecular target. The compounds introduced in the current study work provided a cutting-edge phenothiazine hybrid scaffold with promising anti-proliferation action that may suggest their anti-cancer activity.
Collapse
Affiliation(s)
- Mona O Sarhan
- Labelled Compounds Department, Hot Lab Centre, Egyptian Atomic Energy Authority, Egypt
| | - Hesham Haffez
- Biochemistry and Molecular Biology Department, Faculty of Pharmacy, Helwan University, 11795 Cairo, Egypt; Center of Scientific Excellence "Helwan Structural Biology Research, (HSBR)", Helwan University, 11795 Cairo, Egypt.
| | - Nosaiba A Elsayed
- Pharmaceutical Chemistry Department, Faculty of Pharmacy, Helwan University, 11795 Cairo, Egypt
| | - Radwan S El-Haggar
- Pharmaceutical Chemistry Department, Faculty of Pharmacy, Helwan University, 11795 Cairo, Egypt
| | - Wafaa A Zaghary
- Pharmaceutical Chemistry Department, Faculty of Pharmacy, Helwan University, 11795 Cairo, Egypt.
| |
Collapse
|
5
|
Shahzad M, Tahir MA, Alhussein M, Mobin A, Shams Malick RA, Anwar MS. NeuPD-A Neural Network-Based Approach to Predict Antineoplastic Drug Response. Diagnostics (Basel) 2023; 13:2043. [PMID: 37370938 DOI: 10.3390/diagnostics13122043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 06/01/2023] [Accepted: 06/05/2023] [Indexed: 06/29/2023] Open
Abstract
With the beginning of the high-throughput screening, in silico-based drug response analysis has opened lots of research avenues in the field of personalized medicine. For a decade, many different predicting techniques have been recommended for the antineoplastic (anti-cancer) drug response, but still, there is a need for improvements in drug sensitivity prediction. The intent of this research study is to propose a framework, namely NeuPD, to validate the potential anti-cancer drugs against a panel of cancer cell lines in publicly available datasets. The datasets used in this work are Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE). As not all drugs are effective on cancer cell lines, we have worked on 10 essential drugs from the GDSC dataset that have achieved the best modeling results in previous studies. We also extracted 1610 essential oncogene expressions from 983 cell lines from the same dataset. Whereas, from the CCLE dataset, 16,383 gene expressions from 1037 cell lines and 24 drugs have been used in our experiments. For dimensionality reduction, Pearson correlation is applied to best fit the model. We integrate the genomic features of cell lines and drugs' fingerprints to fit the neural network model. For evaluation of the proposed NeuPD framework, we have used repeated K-fold cross-validation with 5 times repeats where K = 10 to demonstrate the performance in terms of root mean square error (RMSE) and coefficient determination (R2). The results obtained on the GDSC dataset that were measured using these cost functions show that our proposed NeuPD framework has outperformed existing approaches with an RMSE of 0.490 and R2 of 0.929.
Collapse
Affiliation(s)
- Muhammad Shahzad
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Muhammad Atif Tahir
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Musaed Alhussein
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 51178, Riyadh 11543, Saudi Arabia
| | - Ansharah Mobin
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Rauf Ahmed Shams Malick
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Muhammad Shahid Anwar
- Department of AI and Software, Gachon University, Seongnam-si 13120, Republic of Korea
| |
Collapse
|
6
|
Hernández-Hernández S, Ballester PJ. On the Best Way to Cluster NCI-60 Molecules. Biomolecules 2023; 13:biom13030498. [PMID: 36979433 PMCID: PMC10046274 DOI: 10.3390/biom13030498] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 03/02/2023] [Accepted: 03/06/2023] [Indexed: 03/30/2023] Open
Abstract
Machine learning-based models have been widely used in the early drug-design pipeline. To validate these models, cross-validation strategies have been employed, including those using clustering of molecules in terms of their chemical structures. However, the poor clustering of compounds will compromise such validation, especially on test molecules dissimilar to those in the training set. This study aims at finding the best way to cluster the molecules screened by the National Cancer Institute (NCI)-60 project by comparing hierarchical, Taylor-Butina, and uniform manifold approximation and projection (UMAP) clustering methods. The best-performing algorithm can then be used to generate clusters for model validation strategies. This study also aims at measuring the impact of removing outlier molecules prior to the clustering step. Clustering results are evaluated using three well-known clustering quality metrics. In addition, we compute an average similarity matrix to assess the quality of each cluster. The results show variation in clustering quality from method to method. The clusters obtained by the hierarchical and Taylor-Butina methods are more computationally expensive to use in cross-validation strategies, and both cluster the molecules poorly. In contrast, the UMAP method provides the best quality, and therefore we recommend it to analyze this highly valuable dataset.
Collapse
Affiliation(s)
- Saiveth Hernández-Hernández
- Cancer Research Center of Marseille (INSERM U1068, Institut Paoli-Calmettes, Aix-Marseille Université UM105, CNRS UMR7258), 13009 Marseille, France
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
7
|
Cui T, Wang Z, Gu H, Qin P, Wang J. Gamma distribution based predicting model for breast cancer drug response based on multi-layer feature selection. Front Genet 2023; 14:1095976. [PMID: 36816042 PMCID: PMC9932661 DOI: 10.3389/fgene.2023.1095976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 01/23/2023] [Indexed: 02/05/2023] Open
Abstract
In the pursuit of precision medicine for cancer, a promising step is to predict drug response based on data mining, which can provide clinical decision support for cancer patients. Although some machine learning methods for predicting drug response from genomic data already exist, most of them focus on point prediction, which cannot reveal the distribution of predicted results. In this paper, we propose a three-layer feature selection combined with a gamma distribution based GLM and a two-layer feature selection combined with an ANN. The two regression methods are applied to the Encyclopedia of Cancer Cell Lines (CCLE) and the Cancer Drug Sensitivity Genomics (GDSC) datasets. Using ten-fold cross-validation, our methods achieve higher accuracy on anticancer drug response prediction compared to existing methods, with an R 2 and RMSE of 0.87 and 0.53, respectively. Through data validation, the significance of assessing the reliability of predictions by predicting confidence intervals and its role in personalized medicine are illustrated. The correlation analysis of the genes selected from the three layers of features also shows the effectiveness of our proposed methods.
Collapse
Affiliation(s)
- Tongtong Cui
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China
| | - Zeyuan Wang
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China
| | - Hong Gu
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China
| | - Pan Qin
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China,*Correspondence: Jia Wang, ; Pan Qin,
| | - Jia Wang
- Department of Breast Surgery, Second Hospital of Dalian Medical University, Dalian, Liaoning, China,*Correspondence: Jia Wang, ; Pan Qin,
| |
Collapse
|
8
|
Artificial intelligence in cancer research and precision medicine: Applications, limitations and priorities to drive transformation in the delivery of equitable and unbiased care. Cancer Treat Rev 2023; 112:102498. [PMID: 36527795 DOI: 10.1016/j.ctrv.2022.102498] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/03/2022] [Accepted: 12/06/2022] [Indexed: 12/14/2022]
Abstract
Artificial intelligence (AI) has experienced explosive growth in oncology and related specialties in recent years. The improved expertise in data capture, the increased capacity for data aggregation and analytic power, along with decreasing costs of genome sequencing and related biologic "omics", set the foundation and need for novel tools that can meaningfully process these data from multiple sources and of varying types. These advances provide value across biomedical discovery, diagnosis, prognosis, treatment, and prevention, in a multimodal fashion. However, while big data and AI tools have already revolutionized many fields, medicine has partially lagged due to its complexity and multi-dimensionality, leading to technical challenges in developing and validating solutions that generalize to diverse populations. Indeed, inner biases and miseducation of algorithms, in view of their implementation in daily clinical practice, are increasingly relevant concerns; critically, it is possible for AI to mirror the unconscious biases of the humans who generated these algorithms. Therefore, to avoid worsening existing health disparities, it is critical to employ a thoughtful, transparent, and inclusive approach that involves addressing bias in algorithm design and implementation along the cancer care continuum. In this review, a broad landscape of major applications of AI in cancer care is provided, with a focus on cancer research and precision medicine. Major challenges posed by the implementation of AI in the clinical setting will be discussed. Potentially feasible solutions for mitigating bias are provided, in the light of promoting cancer health equity.
Collapse
|
9
|
Drug Response Prediction Based on 1D Convolutional Neural Network and Attention Mechanism. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:8671348. [PMID: 36164615 PMCID: PMC9509240 DOI: 10.1155/2022/8671348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 08/18/2022] [Indexed: 11/30/2022]
Abstract
There are multiple methods based on gene expression, copy number variation, and methylation biomarkers for screening drug response have been developed. On the other hand, many machine learning algorithms have been applied in recent years to predict drug response, such as neural networks and random forests for the discovery of genomic markers of drug sensitivity for individual drugs in cancer cell lines. In this paper, we propose a drug response prediction algorithm based on 1D convolutional neural networks with attention mechanism and combined with pathway networks, which combines the individual histological data affecting drug response and considers the topological nature of the pathways to find the subpathways highly correlated with drug response and use this as a feature to predict drug response by training using convolutional neural networks. Thus, the output values will represent the probability of occurrence of each of these two categories. In this experiment, using five-fold cross-validation, the identification accuracy reached an average of 84.6%, which is 4.5% higher than the direct random forest approach for drug prediction with an AUC value. This proves that the use of the one-dimensional1D convolutional neural network with attention mechanism to predict the response of low-grade glioma patients and drugs has better prediction results.
Collapse
|
10
|
Lin YF, Liu JJ, Chang YJ, Yu CS, Yi W, Lane HY, Lu CH. Predicting Anticancer Drug Resistance Mediated by Mutations. Pharmaceuticals (Basel) 2022; 15:ph15020136. [PMID: 35215249 PMCID: PMC8878306 DOI: 10.3390/ph15020136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/16/2022] [Accepted: 01/21/2022] [Indexed: 02/01/2023] Open
Abstract
Cancer drug resistance presents a challenge for precision medicine. Drug-resistant mutations are always emerging. In this study, we explored the relationship between drug-resistant mutations and drug resistance from the perspective of protein structure. By combining data from previously identified drug-resistant mutations and information of protein structure and function, we used machine learning-based methods to build models to predict cancer drug resistance mutations. The performance of our combined model achieved an accuracy of 86%, a Matthews correlation coefficient score of 0.57, and an F1 score of 0.66. We have constructed a fast, reliable method that predicts and investigates cancer drug resistance in a protein structure. Nonetheless, more information is needed concerning drug resistance and, in particular, clarification is needed about the relationships between the drug and the drug resistance mutations in proteins. Highly accurate predictions regarding drug resistance mutations can be helpful for developing new strategies with personalized cancer treatments. Our novel concept, which combines protein structure information, has the potential to elucidate physiological mechanisms of cancer drug resistance.
Collapse
Affiliation(s)
- Yu-Feng Lin
- Department of Medical Laboratory Science and Biotechnology, Asia University, Taichung 41354, Taiwan; (Y.-F.L.); (W.Y.)
| | - Jia-Jun Liu
- The Ph.D. Program of Biotechnology and Biomedical Industry, China Medical University, Taichung 40402, Taiwan; (J.-J.L.); (Y.-J.C.)
| | - Yu-Jen Chang
- The Ph.D. Program of Biotechnology and Biomedical Industry, China Medical University, Taichung 40402, Taiwan; (J.-J.L.); (Y.-J.C.)
| | - Chin-Sheng Yu
- Department of Information Engineering and Computer Science, Feng Chia University, Taichung 40201, Taiwan;
| | - Wei Yi
- Department of Medical Laboratory Science and Biotechnology, Asia University, Taichung 41354, Taiwan; (Y.-F.L.); (W.Y.)
| | - Hsien-Yuan Lane
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan;
- Department of Psychiatry, China Medical University Hospital, Taichung 40402, Taiwan
- Brain Disease Research Center, China Medical University Hospital, Taichung 40402, Taiwan
| | - Chih-Hao Lu
- The Ph.D. Program of Biotechnology and Biomedical Industry, China Medical University, Taichung 40402, Taiwan; (J.-J.L.); (Y.-J.C.)
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan;
- Department of Medical Laboratory Science and Biotechnology, China Medical University, Taichung 40402, Taiwan
- Correspondence:
| |
Collapse
|
11
|
Muller C, Rabal O, Diaz Gonzalez C. Artificial Intelligence, Machine Learning, and Deep Learning in Real-Life Drug Design Cases. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:383-407. [PMID: 34731478 DOI: 10.1007/978-1-0716-1787-8_16] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The discovery and development of drugs is a long and expensive process with a high attrition rate. Computational drug discovery contributes to ligand discovery and optimization, by using models that describe the properties of ligands and their interactions with biological targets. In recent years, artificial intelligence (AI) has made remarkable modeling progress, driven by new algorithms and by the increase in computing power and storage capacities, which allow the processing of large amounts of data in a short time. This review provides the current state of the art of AI methods applied to drug discovery, with a focus on structure- and ligand-based virtual screening, library design and high-throughput analysis, drug repurposing and drug sensitivity, de novo design, chemical reactions and synthetic accessibility, ADMET, and quantum mechanics.
Collapse
Affiliation(s)
- Christophe Muller
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | - Obdulia Rabal
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | | |
Collapse
|
12
|
Nguyen LC, Naulaerts S, Bruna A, Ghislat G, Ballester PJ. Predicting Cancer Drug Response In Vivo by Learning an Optimal Feature Selection of Tumour Molecular Profiles. Biomedicines 2021; 9:biomedicines9101319. [PMID: 34680436 PMCID: PMC8533095 DOI: 10.3390/biomedicines9101319] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 09/22/2021] [Accepted: 09/23/2021] [Indexed: 12/17/2022] Open
Abstract
(1) Background: Inter-tumour heterogeneity is one of cancer’s most fundamental features. Patient stratification based on drug response prediction is hence needed for effective anti-cancer therapy. However, single-gene markers of response are rare and/or may fail to achieve a significant impact in the clinic. Machine Learning (ML) is emerging as a particularly promising complementary approach to precision oncology. (2) Methods: Here we leverage comprehensive Patient-Derived Xenograft (PDX) pharmacogenomic data sets with dimensionality-reducing ML algorithms with this purpose. (3) Results: Combining multiple gene alterations via ML leads to better discrimination between sensitive and resistant PDXs in 19 of the 26 analysed cases. Highly predictive ML models employing concise gene lists were found for three cases: paclitaxel (breast cancer), binimetinib (breast cancer) and cetuximab (colorectal cancer). Interestingly, each of these multi-gene ML models identifies some treatment-responsive PDXs not harbouring the best actionable mutation for that case. Thus, ML multi-gene predictors generally have much fewer false negatives than the corresponding single-gene marker. (4) Conclusions: As PDXs often recapitulate clinical outcomes, these results suggest that many more patients could benefit from precision oncology if ML algorithms were also applied to existing clinical pharmacogenomics data, especially those algorithms generating classifiers combining data-selected gene alterations.
Collapse
Affiliation(s)
- Linh C. Nguyen
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France;
- Institut Paoli-Calmettes, F-13009 Marseille, France
- Aix-Marseille Université UM105, F-13009 Marseille, France
- CNRS UMR7258, F-13009 Marseille, France
- Department of Life Sciences, University of Science and Technology of Hanoi, Vietnam Academy of Science and Technology, Hanoi 100803, Vietnam
| | - Stefan Naulaerts
- Ludwig Institute for Cancer Research, 1200 Brussels, Belgium;
- Duve Institute, UCLouvain, 1200 Brussels, Belgium
| | | | - Ghita Ghislat
- Centre d’Immunologie de Marseille-Luminy, INSERM U1104, CNRS UMR7280, F-13009 Marseille, France;
| | - Pedro J. Ballester
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France;
- Institut Paoli-Calmettes, F-13009 Marseille, France
- Aix-Marseille Université UM105, F-13009 Marseille, France
- CNRS UMR7258, F-13009 Marseille, France
- Correspondence: ; Tel.: + 33-(0)4-8697-7201
| |
Collapse
|
13
|
Wang D, Yu J, Chen L, Li X, Jiang H, Chen K, Zheng M, Luo X. A hybrid framework for improving uncertainty quantification in deep learning-based QSAR regression modeling. J Cheminform 2021; 13:69. [PMID: 34544485 PMCID: PMC8454160 DOI: 10.1186/s13321-021-00551-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 09/05/2021] [Indexed: 11/24/2022] Open
Abstract
Reliable uncertainty quantification for statistical models is crucial in various downstream applications, especially for drug design and discovery where mistakes may incur a large amount of cost. This topic has therefore absorbed much attention and a plethora of methods have been proposed over the past years. The approaches that have been reported so far can be mainly categorized into two classes: distance-based approaches and Bayesian approaches. Although these methods have been widely used in many scenarios and shown promising performance with their distinct superiorities, being overconfident on out-of-distribution examples still poses challenges for the deployment of these techniques in real-world applications. In this study we investigated a number of consensus strategies in order to combine both distance-based and Bayesian approaches together with post-hoc calibration for improved uncertainty quantification in QSAR (Quantitative Structure-Activity Relationship) regression modeling. We employed a set of criteria to quantitatively assess the ranking and calibration ability of these models. Experiments based on 24 bioactivity datasets were designed to make critical comparison between the model we proposed and other well-studied baseline models. Our findings indicate that the hybrid framework proposed by us can robustly enhance the model ability of ranking absolute errors. Together with post-hoc calibration on the validation set, we show that well-calibrated uncertainty quantification results can be obtained in domain shift settings. The complementarity between different methods is also conceptually analyzed.
Collapse
Affiliation(s)
- Dingyan Wang
- Shanghai Key Laboratory of Forensic Medicine, Academy of Forensic Science, Shanghai, 200063, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Jie Yu
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Lifan Chen
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xutong Li
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Hualiang Jiang
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Kaixian Chen
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Mingyue Zheng
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
| | - Xiaomin Luo
- Shanghai Key Laboratory of Forensic Medicine, Academy of Forensic Science, Shanghai, 200063, China.
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
| |
Collapse
|
14
|
Miranda SP, Baião FA, Fleck JL, Piccolo SR. Predicting drug sensitivity of cancer cells based on DNA methylation levels. PLoS One 2021; 16:e0238757. [PMID: 34506489 PMCID: PMC8432830 DOI: 10.1371/journal.pone.0238757] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 06/28/2021] [Indexed: 01/22/2023] Open
Abstract
Cancer cell lines, which are cell cultures derived from tumor samples, represent one of the least expensive and most studied preclinical models for drug development. Accurately predicting drug responses for a given cell line based on molecular features may help to optimize drug-development pipelines and explain mechanisms behind treatment responses. In this study, we focus on DNA methylation profiles as one type of molecular feature that is known to drive tumorigenesis and modulate treatment responses. Using genome-wide, DNA methylation profiles from 987 cell lines in the Genomics of Drug Sensitivity in Cancer database, we used machine-learning algorithms to evaluate the potential to predict cytotoxic responses for eight anti-cancer drugs. We compared the performance of five classification algorithms and four regression algorithms representing diverse methodologies, including tree-, probability-, kernel-, ensemble-, and distance-based approaches. We artificially subsampled the data to varying degrees, aiming to understand whether training based on relatively extreme outcomes would yield improved performance. When using classification or regression algorithms to predict discrete or continuous responses, respectively, we consistently observed excellent predictive performance when the training and test sets consisted of cell-line data. Classification algorithms performed best when we trained the models using cell lines with relatively extreme drug-response values, attaining area-under-the-receiver-operating-characteristic-curve values as high as 0.97. The regression algorithms performed best when we trained the models using the full range of drug-response values, although this depended on the performance metrics we used. Finally, we used patient data from The Cancer Genome Atlas to evaluate the feasibility of classifying clinical responses for human tumors based on models derived from cell lines. Generally, the algorithms were unable to identify patterns that predicted patient responses reliably; however, predictions by the Random Forests algorithm were significantly correlated with Temozolomide responses for low-grade gliomas.
Collapse
Affiliation(s)
- Sofia P. Miranda
- Department of Industrial Engineering, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Fernanda A. Baião
- Department of Industrial Engineering, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Julia L. Fleck
- Mines Saint-Etienne, Univ Clermont Auvergne, CNRS, UMR 6158 LIMOS, Centre CIS, Saint-Etienne, France
| | - Stephen R. Piccolo
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
| |
Collapse
|
15
|
Rafique R, Islam SR, Kazi JU. Machine learning in the prediction of cancer therapy. Comput Struct Biotechnol J 2021; 19:4003-4017. [PMID: 34377366 PMCID: PMC8321893 DOI: 10.1016/j.csbj.2021.07.003] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 07/06/2021] [Accepted: 07/07/2021] [Indexed: 12/15/2022] Open
Abstract
Resistance to therapy remains a major cause of cancer treatment failures, resulting in many cancer-related deaths. Resistance can occur at any time during the treatment, even at the beginning. The current treatment plan is dependent mainly on cancer subtypes and the presence of genetic mutations. Evidently, the presence of a genetic mutation does not always predict the therapeutic response and can vary for different cancer subtypes. Therefore, there is an unmet need for predictive models to match a cancer patient with a specific drug or drug combination. Recent advancements in predictive models using artificial intelligence have shown great promise in preclinical settings. However, despite massive improvements in computational power, building clinically useable models remains challenging due to a lack of clinically meaningful pharmacogenomic data. In this review, we provide an overview of recent advancements in therapeutic response prediction using machine learning, which is the most widely used branch of artificial intelligence. We describe the basics of machine learning algorithms, illustrate their use, and highlight the current challenges in therapy response prediction for clinical practice.
Collapse
Affiliation(s)
| | - S.M. Riazul Islam
- Department of Computer Science and Engineering, Sejong University, Seoul, South Korea
| | - Julhash U. Kazi
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Corresponding author at: Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Medicon village Building 404:C3, Scheelevägen 8, 22363 Lund, Sweden.
| |
Collapse
|
16
|
Al-Jarf R, de Sá AGC, Pires DEV, Ascher DB. pdCSM-cancer: Using Graph-Based Signatures to Identify Small Molecules with Anticancer Properties. J Chem Inf Model 2021; 61:3314-3322. [PMID: 34213323 PMCID: PMC8317153 DOI: 10.1021/acs.jcim.1c00168] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
![]()
The development of
new, effective, and safe drugs to treat cancer
remains a challenging and time-consuming task due to limited hit rates,
restraining subsequent development efforts. Despite the impressive
progress of quantitative structure–activity relationship and
machine learning-based models that have been developed to predict
molecule pharmacodynamics and bioactivity, they have had mixed success
at identifying compounds with anticancer properties against multiple
cell lines. Here, we have developed a novel predictive tool, pdCSM-cancer,
which uses a graph-based signature representation of the chemical
structure of a small molecule in order to accurately predict molecules
likely to be active against one or multiple cancer cell lines. pdCSM-cancer
represents the most comprehensive anticancer bioactivity prediction
platform developed till date, comprising trained and validated models
on experimental data of the growth inhibition concentration (GI50%)
effects, including over 18,000 compounds, on 9 tumor types and 74
distinct cancer cell lines. Across 10-fold cross-validation, it achieved
Pearson’s correlation coefficients of up to 0.74 and comparable
performance of up to 0.67 across independent, non-redundant blind
tests. Leveraging the insights from these cell line-specific models,
we developed a generic predictive model to identify molecules active
in at least 60 cell lines. Our final model achieved an area under
the receiver operating characteristic curve (AUC) of up to 0.94 on
10-fold cross-validation and up to 0.94 on independent non-redundant
blind tests, outperforming alternative approaches. We believe that
our predictive tool will provide a valuable resource to optimizing
and enriching screening libraries for the identification of effective
and safe anticancer molecules. To provide a simple and integrated
platform to rapidly screen for potential biologically active molecules
with favorable anticancer properties, we made pdCSM-cancer freely
available online at http://biosig.unimelb.edu.au/pdcsm_cancer.
Collapse
Affiliation(s)
- Raghad Al-Jarf
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Alex G C de Sá
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia.,Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, United Kingdom
| |
Collapse
|
17
|
Machine learning analyses of antibody somatic mutations predict immunoglobulin light chain toxicity. Nat Commun 2021; 12:3532. [PMID: 34112780 PMCID: PMC8192768 DOI: 10.1038/s41467-021-23880-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 05/23/2021] [Indexed: 02/05/2023] Open
Abstract
In systemic light chain amyloidosis (AL), pathogenic monoclonal immunoglobulin light chains (LC) form toxic aggregates and amyloid fibrils in target organs. Prompt diagnosis is crucial to avoid permanent organ damage, but delayed diagnosis is common because symptoms usually appear only after strong organ involvement. Here we present LICTOR, a machine learning approach predicting LC toxicity in AL, based on the distribution of somatic mutations acquired during clonal selection. LICTOR achieves a specificity and a sensitivity of 0.82 and 0.76, respectively, with an area under the receiver operating characteristic curve (AUC) of 0.87. Tested on an independent set of 12 LCs sequences with known clinical phenotypes, LICTOR achieves a prediction accuracy of 83%. Furthermore, we are able to abolish the toxic phenotype of an LC by in silico reverting two germline-specific somatic mutations identified by LICTOR, and by experimentally assessing the loss of in vivo toxicity in a Caenorhabditis elegans model. Therefore, LICTOR represents a promising strategy for AL diagnosis and reducing high mortality rates in AL.
Collapse
|
18
|
Park S, Soh J, Lee H. Super.FELT: supervised feature extraction learning using triplet loss for drug response prediction with multi-omics data. BMC Bioinformatics 2021; 22:269. [PMID: 34034645 PMCID: PMC8152321 DOI: 10.1186/s12859-021-04146-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 04/22/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Predicting the drug response of a patient is important for precision oncology. In recent studies, multi-omics data have been used to improve the prediction accuracy of drug response. Although multi-omics data are good resources for drug response prediction, the large dimension of data tends to hinder performance improvement. In this study, we aimed to develop a new method, which can effectively reduce the large dimension of data, based on the supervised deep learning model for predicting drug response. RESULTS We proposed a novel method called Supervised Feature Extraction Learning using Triplet loss (Super.FELT) for drug response prediction. Super.FELT consists of three stages, namely, feature selection, feature encoding using a supervised method, and binary classification of drug response (sensitive or resistant). We used multi-omics data including mutation, copy number aberration, and gene expression, and these were obtained from cell lines [Genomics of Drug Sensitivity in Cancer (GDSC), Cancer Cell Line Encyclopedia (CCLE), and Cancer Therapeutics Response Portal (CTRP)], patient-derived tumor xenografts (PDX), and The Cancer Genome Atlas (TCGA). GDSC was used for training and cross-validation tests, and CCLE, CTRP, PDX, and TCGA were used for external validation. We performed ablation studies for the three stages and verified that the use of multi-omics data guarantees better performance of drug response prediction. Our results verified that Super.FELT outperformed the other methods at external validation on PDX and TCGA and was good at cross-validation on GDSC and external validation on CCLE and CTRP. In addition, through our experiments, we confirmed that using multi-omics data is useful for external non-cell line data. CONCLUSION By separating the three stages, Super.FELT achieved better performance than the other methods. Through our results, we found that it is important to train encoders and a classifier independently, especially for external test on PDX and TCGA. Moreover, although gene expression is the most powerful data on cell line data, multi-omics promises better performance for external validation on non-cell line data than gene expression data. Source codes of Super.FELT are available at https://github.com/DMCB-GIST/Super.FELT .
Collapse
Affiliation(s)
- Sejin Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, South Korea
| | - Jihee Soh
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, South Korea
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, South Korea. .,Graduate School of Artificial Intelligence, Gwangju Institute of Science and Technology, Gwangju, South Korea.
| |
Collapse
|
19
|
Comandante-Lou N, Fallahi-Sichani M. Models of Cancer Drug Discovery and Response to Therapy. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11356-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
20
|
Kuenzi BM, Park J, Fong SH, Sanchez KS, Lee J, Kreisberg JF, Ma J, Ideker T. Predicting Drug Response and Synergy Using a Deep Learning Model of Human Cancer Cells. Cancer Cell 2020; 38:672-684.e6. [PMID: 33096023 PMCID: PMC7737474 DOI: 10.1016/j.ccell.2020.09.014] [Citation(s) in RCA: 160] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 08/07/2020] [Accepted: 09/22/2020] [Indexed: 12/16/2022]
Abstract
Most drugs entering clinical trials fail, often related to an incomplete understanding of the mechanisms governing drug response. Machine learning techniques hold immense promise for better drug response predictions, but most have not reached clinical practice due to their lack of interpretability and their focus on monotherapies. We address these challenges by developing DrugCell, an interpretable deep learning model of human cancer cells trained on the responses of 1,235 tumor cell lines to 684 drugs. Tumor genotypes induce states in cellular subsystems that are integrated with drug structure to predict response to therapy and, simultaneously, learn biological mechanisms underlying the drug response. DrugCell predictions are accurate in cell lines and also stratify clinical outcomes. Analysis of DrugCell mechanisms leads directly to the design of synergistic drug combinations, which we validate systematically by combinatorial CRISPR, drug-drug screening in vitro, and patient-derived xenografts. DrugCell provides a blueprint for constructing interpretable models for predictive medicine.
Collapse
Affiliation(s)
- Brent M Kuenzi
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jisoo Park
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Samson H Fong
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Kyle S Sanchez
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - John Lee
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jason F Kreisberg
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jianzhu Ma
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA; Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
21
|
Kleandrova VV, Scotti MT, Scotti L, Nayarisseri A, Speck-Planche A. Cell-based multi-target QSAR model for design of virtual versatile inhibitors of liver cancer cell lines. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2020; 31:815-836. [PMID: 32967475 DOI: 10.1080/1062936x.2020.1818617] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 08/31/2020] [Indexed: 06/11/2023]
Abstract
Liver cancers are one of the leading fatal diseases among malignant neoplasms. Current chemotherapeutic treatments used to fight these illnesses have become less efficient in terms of both efficacy and safety. Therefore, there is a great need of search for new anti-liver cancer agents and this can be accelerated by using computer-aided drug discovery approaches. In this work, we report the development of the first cell-based multi-target model based on quantitative structure-activity relationships (CBMT-QSAR) for the design and prediction of chemicals as anticancer agents against 17 liver cancer cell lines. While having a good quality and predictive power (accuracy higher than 80%) in the training and test sets, respectively, the CBMT-QSAR model was employed as a tool to directly extract suitable fragments from the physicochemical and structural interpretations of the molecular descriptors. Some of these desirable fragments were assembled, leading to the virtual design of eight molecules with drug-like properties, with six of them being predicted as versatile anticancer agents against the 17 liver cancer cell lines reported here.
Collapse
Affiliation(s)
- V V Kleandrova
- Laboratory of Fundamental and Applied Research of Quality and Technology of Food Production, Moscow State University of Food Production , Moscow, Russian Federation
| | - M T Scotti
- Postgraduate Program in Natural and Synthetic Bioactive Products, Federal University of Paraíba , João Pessoa, Brazil
| | - L Scotti
- Postgraduate Program in Natural and Synthetic Bioactive Products, Federal University of Paraíba , João Pessoa, Brazil
| | - A Nayarisseri
- In Silico Research Laboratory, Eminent Biosciences , Indore, Madhya Pradesh, India
| | - A Speck-Planche
- Postgraduate Program in Natural and Synthetic Bioactive Products, Federal University of Paraíba , João Pessoa, Brazil
| |
Collapse
|
22
|
Li Z, Lam YW, Liu Q, Lau AYK, Yu Au-Yeung H, Chan RHM. Machine Learning-Driven Drug Discovery: Prediction of Structure-Cytotoxicity Correlation Leads to Identification of Potential Anti-Leukemia Compounds. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2020:5464-5467. [PMID: 33019216 DOI: 10.1109/embc44109.2020.9175850] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In vitro cytotoxicity screening is a crucial step of anticancer drug discovery. The application of deep learning methodology is gaining increasing attentions in processing drug screening data and studying anticancer mechanisms of chemical compounds. In this work, we explored the utilization of convolutional neural network in modeling the anticancer efficacy of small molecules. In particular, we presented a VGG19 model trained on 2D structural formulae to predict the growth-inhibitory effects of compounds against leukemia cell line CCRF-CEM, without any use of chemical descriptors. The model achieved a normalized RMSE of 15.76% on predicting growth inhibition and a Pearson Correlation Coefficient of 0.72 between predicted and experimental data, demonstrating a strong predictive power in this task. Furthermore, we implemented the Layer-wise Relevance Propagation technique to interpret the network and visualize the chemical groups predicted by the model that contribute to toxicity with human-readable representations.Clinical relevance-This work predicts the cytotoxicity of chemical compounds against human leukemic lymphoblast CCRF-CEM cell lines on a continuous scale, which only requires 2D images of the structural formulae of the compounds as inputs. Knowledge in the structure-toxicity relationship of small molecules will potentially increase the hit rate of primary drug screening assays.
Collapse
|
23
|
Nakano T, Takeda S, Brown JB. Active learning effectively identifies a minimal set of maximally informative and asymptotically performant cytotoxic structure-activity patterns in NCI-60 cell lines. RSC Med Chem 2020; 11:1075-1087. [PMID: 33479700 PMCID: PMC7513593 DOI: 10.1039/d0md00110d] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 06/30/2020] [Indexed: 11/21/2022] Open
Abstract
The NCI-60 cancer cell line screening panel has provided insights for development of subtype-specific chemical therapies and repurposing. By extracting chemical structure and cytotoxicity patterns, virtual screening potentially complements the availability of high-throughput assay platforms and improves bioactive compound discovery rates by computational prefiltering of candidate compound libraries. Many groups report high prediction performances in computational models of NCI-60 data when using cross-validation or similar techniques, yet prospective therapy development in novel cancers may have little to no such data and further may not have the resources to perform hit identification using large compound libraries. In contrast to bulk screening and analysis, the active learning methodology has demonstrated how to identify compounds for screening in small batches and update computational models iteratively, leading to predictive models with a minimum number of compounds, and importantly clarifying data volumes at which limits in predictive ability are achieved. Here, in replicate per-cell line experiments using 50% of data (∼20 000 compounds) as the external prediction target, predictive limits are reproducibly demonstrated at the stage of systematic selection of 10-30% of the incorporable half. The pattern was consistent across all 60 cell lines. Limits of predictability are found to be correlated to the doubling times of cell lines and the number of cellular response discontinuities (activity cliffs) present per cell line. Organization into chemical scaffolds delineated degrees of predictive challenge. These results provide key insights for strategies in developing new inhibitors in existing cell lines or for future automated therapy selection in personalized oncotherapy.
Collapse
Affiliation(s)
- Takumi Nakano
- Kyoto University Graduate School of Medicine , Department of Molecular Biosciences , Life Science Informatics Research Unit , Konoemachi Yoshida Sakyo , Kyoto 606-8501 , Japan .
| | - Shunichi Takeda
- Kyoto University Graduate School of Medicine , Department of Radiation Genetics , Konoemachi Yoshida Sakyo , Kyoto 606-8501 , Japan
| | - J B Brown
- Kyoto University Graduate School of Medicine , Department of Molecular Biosciences , Life Science Informatics Research Unit , Konoemachi Yoshida Sakyo , Kyoto 606-8501 , Japan .
| |
Collapse
|
24
|
Liu C, Wei D, Xiang J, Ren F, Huang L, Lang J, Tian G, Li Y, Yang J. An Improved Anticancer Drug-Response Prediction Based on an Ensemble Method Integrating Matrix Completion and Ridge Regression. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 21:676-686. [PMID: 32759058 PMCID: PMC7403773 DOI: 10.1016/j.omtn.2020.07.003] [Citation(s) in RCA: 59] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 06/10/2020] [Accepted: 07/06/2020] [Indexed: 12/16/2022]
Abstract
In this study, we proposed an ensemble learning method, simultaneously integrating a low-rank matrix completion model and a ridge regression model to predict anticancer drug response on cancer cell lines. The model was applied to two benchmark datasets, including the Cancer Cell Line Encyclopedia (CCLE) and the Genomics of Drug Sensitivity in Cancer (GDSC). As previous studies suggest, the dual-layer integrated cell line-drug network model was one of the best models by far and outperformed most state-of-the-art models. Thus, we performed a head-to-head comparison between the dual-layer integrated cell line-drug network model and our model by a 10-fold crossvalidation study. For the CCLE dataset, our model has a higher Pearson correlation coefficient between predicted and observed drug responses than that of the dual-layer integrated cell line-drug network model in 18 out of 23 drugs. For the GDSC dataset, our model is better in 26 out of 28 drugs in the phosphatidylinositol 3-kinase (PI3K) pathway and 26 out of 30 drugs in the extracellular signal-regulated kinase (ERK) signaling pathway, respectively. Based on the prediction results, we carried out two types of case studies, which further verified the effectiveness of the proposed model on the drug-response prediction. In addition, our model is more biologically interpretable than the compared method, since it explicitly outputs the genes involved in the prediction, which are enriched in functions, like transcription, Src homology 2/3 (SH2/3) domain, cell cycle, ATP binding, and zinc finger.
Collapse
Affiliation(s)
- Chuanying Liu
- School of Science, Yanshan University, Qinhuangdao, Hebei 066004, China
| | - Dong Wei
- School of Science, Yanshan University, Qinhuangdao, Hebei 066004, China
| | - Ju Xiang
- College of Information Engineering, Changsha Medical University, Changsha, Hunan 410219, China; School of Information Science and Engineering, Central South University, Changsha 410083, China
| | - Fuquan Ren
- School of Science, Yanshan University, Qinhuangdao, Hebei 066004, China
| | - Li Huang
- Tianhang Experiment School, Hangzhou, Zhejiang 310004, China
| | - Jidong Lang
- Geneis Beijing Co., Ltd., Beijing 100102, China
| | - Geng Tian
- Geneis Beijing Co., Ltd., Beijing 100102, China
| | - Yushuang Li
- School of Science, Yanshan University, Qinhuangdao, Hebei 066004, China.
| | - Jialiang Yang
- College of Information Engineering, Changsha Medical University, Changsha, Hunan 410219, China; Geneis Beijing Co., Ltd., Beijing 100102, China.
| |
Collapse
|
25
|
Cortés-Ciriano I, Škuta C, Bender A, Svozil D. QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction. J Cheminform 2020; 12:41. [PMID: 33431016 PMCID: PMC7339533 DOI: 10.1186/s13321-020-00444-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 05/16/2020] [Indexed: 01/22/2023] Open
Abstract
Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as human toxicity and in vitro cancer cell line sensitivity. Here, we propose to model in vitro compound activity using computationally predicted bioactivity profiles as compound descriptors. To this aim, we apply and validate a framework for the calculation of QSAR-derived affinity fingerprints (QAFFP) using a set of 1360 QSAR models generated using Ki, Kd, IC50 and EC50 data from ChEMBL database. QAFFP thus represent a method to encode and relate compounds on the basis of their similarity in bioactivity space. To benchmark the predictive power of QAFFP we assembled IC50 data from ChEMBL database for 18 diverse cancer cell lines widely used in preclinical drug discovery, and 25 diverse protein target data sets. This study complements part 1 where the performance of QAFFP in similarity searching, scaffold hopping, and bioactivity classification is evaluated. Despite being inherently noisy, we show that using QAFFP as descriptors leads to errors in prediction on the test set in the ~ 0.65-0.95 pIC50 units range, which are comparable to the estimated uncertainty of bioactivity data in ChEMBL (0.76-1.00 pIC50 units). We find that the predictive power of QAFFP is slightly worse than that of Morgan2 fingerprints and 1D and 2D physicochemical descriptors, with an effect size in the 0.02-0.08 pIC50 units range. Including QSAR models with low predictive power in the generation of QAFFP does not lead to improved predictive power. Given that the QSAR models we used to compute the QAFFP were selected on the basis of data availability alone, we anticipate better modeling results for QAFFP generated using more diverse and biologically meaningful targets. Data sets and Python code are publicly available at https://github.com/isidroc/QAFFP_regression .
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK. .,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, CB10 1SD, UK.
| | - Ctibor Škuta
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague, Czech Republic
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Daniel Svozil
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague, Czech Republic.,CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| |
Collapse
|
26
|
Škuta C, Cortés-Ciriano I, Dehaen W, Kříž P, van Westen GJP, Tetko IV, Bender A, Svozil D. QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping. J Cheminform 2020; 12:39. [PMID: 33431038 PMCID: PMC7260783 DOI: 10.1186/s13321-020-00443-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 05/16/2020] [Indexed: 02/11/2023] Open
Abstract
An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds.![]()
Collapse
Affiliation(s)
- C Škuta
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic
| | - I Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - W Dehaen
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic.,CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - P Kříž
- Department of Mathematics, Faculty of Chemical Engineering, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - G J P van Westen
- Computational Drug Discovery, Drug Discovery and Safety, LACDR, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - I V Tetko
- Helmholtz Zentrum Muenchen - German Research Center for Environmental Health (GmbH) and BIGCHEM GmbH, Ingolstaedter Landstrasse 1, 85764, Neuherberg, Germany
| | - A Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - D Svozil
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic. .,CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic.
| |
Collapse
|
27
|
Wang H, Xi J, Wang M, Li A. Dual-Layer Strengthened Collaborative Topic Regression Modeling for Predicting Drug Sensitivity. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:587-598. [PMID: 30106738 DOI: 10.1109/tcbb.2018.2864739] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
An effective way to facilitate the development of modern oncology precision medicine is the systematical analysis of the known drug sensitivities that have emerged in recent years. Meanwhile, the screening of drug response in cancer cell lines provides an estimable genomic and pharmacological data towards high accuracy prediction. Existing works primarily utilize genomic or functional genomic features to classify or regress the drug response. Here in this work, by the migration and extension of the conventional merchandise recommendation methods, we introduce an innovation model on accurate drug sensitivity prediction by using dual-layer strengthened collaborative topic regression (DS-CTR), which incorporates not only the graphic model to jointly learn drugs and cell lines feature from pharmacogenomics data but also drug and cell line similarity network model to strengthen the correlation of the prediction results. Using Genomics of Drug Sensitivity in Cancer project (GDSC) as benchmark datasets, the 5-fold cross-validation experiment demonstrates that DS-CTR model significantly improves drug response prediction performance compared with four categories of state-of-the-art algorithms as for both Receiver Operator Curve (ROC) and the Area Under Receiver Operator Curve (AUC). By uncovering the unknown cell-drug associations with advanced literature evidences, our novel model DS-CTR is validated and supported. The model also provides the possibility to make the discovery of new anti-cancer therapeutics in the preclinical trials cheaper and faster.
Collapse
|
28
|
Zakharov AV, Zhao T, Nguyen DT, Peryea T, Sheils T, Yasgar A, Huang R, Southall N, Simeonov A. Novel Consensus Architecture To Improve Performance of Large-Scale Multitask Deep Learning QSAR Models. J Chem Inf Model 2019; 59:4613-4624. [PMID: 31584270 DOI: 10.1021/acs.jcim.9b00526] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Advances in the development of high-throughput screening and automated chemistry have rapidly accelerated the production of chemical and biological data, much of them freely accessible through literature aggregator services such as ChEMBL and PubChem. Here, we explore how to use this comprehensive mapping of chemical biology space to support the development of large-scale quantitative structure-activity relationship (QSAR) models. We propose a new deep learning consensus architecture (DLCA) that combines consensus and multitask deep learning approaches together to generate large-scale QSAR models. This method improves knowledge transfer across different target/assays while also integrating contributions from models based on different descriptors. The proposed approach was validated and compared with proteochemometrics, multitask deep learning, and Random Forest methods paired with various descriptors types. DLCA models demonstrated improved prediction accuracy for both regression and classification tasks. The best models together with their modeling sets are provided through publicly available web services at https://predictor.ncats.io .
Collapse
Affiliation(s)
- Alexey V Zakharov
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Tongan Zhao
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Tyler Peryea
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Timothy Sheils
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Adam Yasgar
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Ruili Huang
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Noel Southall
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Anton Simeonov
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| |
Collapse
|
29
|
Parca L, Pepe G, Pietrosanto M, Galvan G, Galli L, Palmeri A, Sciandrone M, Ferrè F, Ausiello G, Helmer-Citterich M. Modeling cancer drug response through drug-specific informative genes. Sci Rep 2019; 9:15222. [PMID: 31645597 PMCID: PMC6811538 DOI: 10.1038/s41598-019-50720-0] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 09/06/2019] [Indexed: 12/18/2022] Open
Abstract
Recent advances in pharmacogenomics have generated a wealth of data of different types whose analysis have helped in the identification of signatures of different cellular sensitivity/resistance responses to hundreds of chemical compounds. Among the different data types, gene expression has proven to be the more successful for the inference of drug response in cancer cell lines. Although effective, the whole transcriptome can introduce noise in the predictive models, since specific mechanisms are required for different drugs and these realistically involve only part of the proteins encoded in the genome. We analyzed the pharmacogenomics data of 961 cell lines tested with 265 anti-cancer drugs and developed different machine learning approaches for dissecting the genome systematically and predict drug responses using both drug-unspecific and drug-specific genes. These methodologies reach better response predictions for the vast majority of the screened drugs using tens to few hundreds genes specific to each drug instead of the whole genome, thus allowing a better understanding and interpretation of drug-specific response mechanisms which are not necessarily restricted to the drug known targets.
Collapse
Affiliation(s)
- Luca Parca
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Gerardo Pepe
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Marco Pietrosanto
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Giulio Galvan
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Leonardo Galli
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Antonio Palmeri
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
- Celgene Institute for Translational Research Europe, Sevilla, Spain
| | - Marco Sciandrone
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Fabrizio Ferrè
- Department of Pharmacy and Biotechnology, University of Bologna Alma Mater, Bologna, Italy
| | - Gabriele Ausiello
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | | |
Collapse
|
30
|
Tejera E, Carrera I, Jimenes-Vargas K, Armijos-Jaramillo V, Sánchez-Rodríguez A, Cruz-Monteagudo M, Perez-Castillo Y. Cell fishing: A similarity based approach and machine learning strategy for multiple cell lines-compound sensitivity prediction. PLoS One 2019; 14:e0223276. [PMID: 31589649 PMCID: PMC6779297 DOI: 10.1371/journal.pone.0223276] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 09/17/2019] [Indexed: 12/21/2022] Open
Abstract
The prediction of cell-lines sensitivity to a given set of compounds is a very important factor in the optimization of in-vitro assays. To date, the most common prediction strategies are based upon machine learning or other quantitative structure-activity relationships (QSAR) based approaches. In the present research, we propose and discuss a straightforward strategy not based on any learning modelling but exclusively relying upon the chemical similarity of a query compound to reference compounds with annotated activity against cell lines. We also compare the performance of the proposed method to machine learning predictions on the same problem. A curated database of compounds-cell lines associations derived from ChemBL version 22 was created for algorithm construction and cross-validation. Validation was done using 10-fold cross-validation and testing the models on new data obtained from ChemBL version 25. In terms of accuracy, both methods perform similarly with values around 0.65 across 750 cell lines in 10-fold cross-validation experiments. By combining both methods it is possible to achieve 66% of correct classification rate in more than 26000 newly reported interactions comprising 11000 new compounds. A Web Service implementing the described approaches (both similarity and machine learning based models) is freely available at: http://bioquimio.udla.edu.ec/cellfishing.
Collapse
Affiliation(s)
- E. Tejera
- Ingeniería en Biotecnología, Facultad de Ingeniería y Ciencias Aplicadas, Universidad de Las Américas, Quito, Ecuador
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador
| | - I. Carrera
- Departamento de Informática y Ciencias de la Computación, Escuela Politécnica Nacional, Quito, Ecuador
- Departamento de Ciências de Computadores, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - Karina Jimenes-Vargas
- Ingeniería en Biotecnología, Facultad de Ingeniería y Ciencias Aplicadas, Universidad de Las Américas, Quito, Ecuador
| | - V. Armijos-Jaramillo
- Ingeniería en Biotecnología, Facultad de Ingeniería y Ciencias Aplicadas, Universidad de Las Américas, Quito, Ecuador
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador
| | - A. Sánchez-Rodríguez
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador
- Universidad Técnica Particular de Loja, Loja, Ecuador
| | - M. Cruz-Monteagudo
- Center for Computational Science (CCS), University of Miami (UM), Miami, FL, United States of America
- West Coast University, Miami, Florida, United States of America
| | - Y. Perez-Castillo
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador
- Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito, Ecuador
| |
Collapse
|
31
|
Cortés-Ciriano I, Bender A. Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout. J Chem Inf Model 2019; 59:3330-3339. [PMID: 31241929 DOI: 10.1021/acs.jcim.9b00297] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
While the use of deep learning in drug discovery is gaining increasing attention, the lack of methods to compute reliable errors in prediction for Neural Networks prevents their application to guide decision making in domains where identifying unreliable predictions is essential, e.g., precision medicine. Here, we present a framework to compute reliable errors in prediction for Neural Networks using Test-Time Dropout and Conformal Prediction. Specifically, the algorithm consists of training a single Neural Network using dropout, and then applying it N times to both the validation and test sets, also employing dropout in this step. Therefore, for each instance in the validation and test sets an ensemble of predictions are generated. The residuals and absolute errors in prediction for the validation set are then used to compute prediction errors for the test set instances using Conformal Prediction. We show using 24 bioactivity data sets from ChEMBL 23 that Dropout Conformal Predictors are valid (i.e., the fraction of instances whose true value lies within the predicted interval strongly correlates with the confidence level) and efficient, as the predicted confidence intervals span a narrower set of values than those computed with Conformal Predictors generated using Random Forest (RF) models. Lastly, we show in retrospective virtual screening experiments that dropout and RF-based Conformal Predictors lead to comparable retrieval rates of active compounds. Overall, we propose a computationally efficient framework (as only N extra forward passes are required in addition to training a single network) to harness Test-Time Dropout and the Conformal Prediction framework, which is generally applicable to generate reliable prediction errors for Deep Neural Networks in drug discovery and beyond.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| |
Collapse
|
32
|
Cortés-Ciriano I, Bender A. KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images. J Cheminform 2019; 11:41. [PMID: 31218493 PMCID: PMC6582521 DOI: 10.1186/s13321-019-0364-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 06/09/2019] [Indexed: 02/08/2023] Open
Abstract
The application of convolutional neural networks (ConvNets) to harness high-content screening images or 2D compound representations is gaining increasing attention in drug discovery. However, existing applications often require large data sets for training, or sophisticated pretraining schemes. Here, we show using 33 IC50 data sets from ChEMBL 23 that the in vitro activity of compounds on cancer cell lines and protein targets can be accurately predicted on a continuous scale from their Kekulé structure representations alone by extending existing architectures (AlexNet, DenseNet-201, ResNet152 and VGG-19), which were pretrained on unrelated image data sets. We show that the predictive power of the generated models, which just require standard 2D compound representations as input, is comparable to that of Random Forest (RF) models and fully-connected Deep Neural Networks trained on circular (Morgan) fingerprints. Notably, including additional fully-connected layers further increases the predictive power of the ConvNets by up to 10%. Analysis of the predictions generated by RF models and ConvNets shows that by simply averaging the output of the RF models and ConvNets we obtain significantly lower errors in prediction for multiple data sets, although the effect size is small, than those obtained with either model alone, indicating that the features extracted by the convolutional layers of the ConvNets provide complementary predictive signal to Morgan fingerprints. Lastly, we show that multi-task ConvNets trained on compound images permit to model COX isoform selectivity on a continuous scale with errors in prediction comparable to the uncertainty of the data. Overall, in this work we present a set of ConvNet architectures for the prediction of compound activity from their Kekulé structure representations with state-of-the-art performance, that require no generation of compound descriptors or use of sophisticated image processing techniques. The code needed to reproduce the results presented in this study and all the data sets are provided at https://github.com/isidroc/kekulescope .
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| |
Collapse
|
33
|
Xu X, Gu H, Wang Y, Wang J, Qin P. Autoencoder Based Feature Selection Method for Classification of Anticancer Drug Response. Front Genet 2019; 10:233. [PMID: 30972101 PMCID: PMC6445890 DOI: 10.3389/fgene.2019.00233] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2018] [Accepted: 03/04/2019] [Indexed: 12/14/2022] Open
Abstract
Anticancer drug responses can be varied for individual patients. This difference is mainly caused by genetic reasons, like mutations and RNA expression. Thus, these genetic features are often used to construct classification models to predict the drug response. This research focuses on the feature selection issue for the classification models. Because of the vast dimensions of the feature space for predicting drug response, the autoencoder network was first built, and a subset of inputs with the important contribution was selected. Then by using the Boruta algorithm, a further small set of features was determined for the random forest, which was used to predict drug response. Two datasets, GDSC and CCLE, were used to illustrate the efficiency of the proposed method.
Collapse
Affiliation(s)
- Xiaolu Xu
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
| | - Hong Gu
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
| | - Yang Wang
- Institute of Cancer Stem Cell, Dalian Medical University, Dalian, China
| | - Jia Wang
- Department of Breast Surgery, Institute of Breast Disease, Second Hospital of Dalian Medical University, Dalian, China
| | - Pan Qin
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
| |
Collapse
|
34
|
Ali M, Khan SA, Wennerberg K, Aittokallio T. Global proteomics profiling improves drug sensitivity prediction: results from a multi-omics, pan-cancer modeling approach. Bioinformatics 2019; 34:1353-1362. [PMID: 29186355 PMCID: PMC5905617 DOI: 10.1093/bioinformatics/btx766] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 11/27/2017] [Indexed: 12/13/2022] Open
Abstract
Motivation Proteomics profiling is increasingly being used for molecular stratification of cancer patients and cell-line panels. However, systematic assessment of the predictive power of large-scale proteomic technologies across various drug classes and cancer types is currently lacking. To that end, we carried out the first pan-cancer, multi-omics comparative analysis of the relative performance of two proteomic technologies, targeted reverse phase protein array (RPPA) and global mass spectrometry (MS), in terms of their accuracy for predicting the sensitivity of cancer cells to both cytotoxic chemotherapeutics and molecularly targeted anticancer compounds. Results Our results in two cell-line panels demonstrate how MS profiling improves drug response predictions beyond that of the RPPA or the other omics profiles when used alone. However, frequent missing MS data values complicate its use in predictive modeling and required additional filtering, such as focusing on completely measured or known oncoproteins, to obtain maximal predictive performance. Rather strikingly, the two proteomics profiles provided complementary predictive signal both for the cytotoxic and targeted compounds. Further, information about the cellular-abundance of primary target proteins was found critical for predicting the response of targeted compounds, although the non-target features also contributed significantly to the predictive power. The clinical relevance of the selected protein markers was confirmed in cancer patient data. These results provide novel insights into the relative performance and optimal use of the widely applied proteomic technologies, MS and RPPA, which should prove useful in translational applications, such as defining the best combination of omics technologies and marker panels for understanding and predicting drug sensitivities in cancer patients. Availability and implementation Processed datasets, R as well as Matlab implementations of the methods are available at https://github.com/mehr-een/bemkl-rbps. Contact mehreen.ali@helsinki.fi or tero.aittokallio@fimm.fi. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mehreen Ali
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, 00290 Helsinki, Finland.,Helsinki Institute for Information Technology (HIIT), Aalto University, 02150 Espoo, Finland
| | - Suleiman A Khan
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, 00290 Helsinki, Finland.,Helsinki Institute for Information Technology (HIIT), Aalto University, 02150 Espoo, Finland
| | - Krister Wennerberg
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, 00290 Helsinki, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, 00290 Helsinki, Finland.,Helsinki Institute for Information Technology (HIIT), Aalto University, 02150 Espoo, Finland.,Department of Mathematics and Statistics, University of Turku, 20014 Turku, Finland
| |
Collapse
|
35
|
Cortés-Ciriano I, Bender A. Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Prediction Errors for Deep Neural Networks. J Chem Inf Model 2018; 59:1269-1281. [DOI: 10.1021/acs.jcim.8b00542] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|
36
|
Liu H, Zhao Y, Zhang L, Chen X. Anti-cancer Drug Response Prediction Using Neighbor-Based Collaborative Filtering with Global Effect Removal. MOLECULAR THERAPY. NUCLEIC ACIDS 2018; 13:303-311. [PMID: 30321817 PMCID: PMC6197792 DOI: 10.1016/j.omtn.2018.09.011] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Revised: 09/17/2018] [Accepted: 09/18/2018] [Indexed: 02/06/2023]
Abstract
Patients of the same cancer may differ in their responses to a specific medical therapy. Identification of predictive molecular features for drug sensitivity holds the key in the era of precision medicine. Human cell lines have harbored most of the same genetic changes found in patients’ tumors and thus are widely used in the research of drug response. In this work, we formulated drug-response prediction as a recommender system problem and then adopted a neighbor-based collaborative filtering with global effect removal (NCFGER) method to estimate anti-cancer drug responses of cell lines by integrating cell-line similarity networks and drug similarity networks based on the fact that similar cell lines and similar drugs exhibit similar responses. Specifically, we removed the global effect in the available responses and shrunk the similarity score for each cell line pair as well as each drug pair. We then used the K most similar neighbors (hybrid of cell-line-oriented and drug-oriented) in the available responses to predict the unknown ones. Through 10-fold cross-validation, this approach was shown to reach accurate and reproducible outcomes of drug sensitivity. We also discussed the biological outcomes based on the newly predicted response values.
Collapse
Affiliation(s)
- Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China.
| |
Collapse
|
37
|
Zhang L, Chen X, Guan NN, Liu H, Li JQ. A Hybrid Interpolation Weighted Collaborative Filtering Method for Anti-cancer Drug Response Prediction. Front Pharmacol 2018; 9:1017. [PMID: 30258362 PMCID: PMC6143790 DOI: 10.3389/fphar.2018.01017] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 08/22/2018] [Indexed: 12/16/2022] Open
Abstract
Individualized therapies ask for the most effective regimen for each patient, while the patients' response may differ from each other. However, it is impossible to clinically evaluate each patient's response due to the large population. Human cell lines have harbored most of the same genetic changes found in patients' tumors, thus are widely used to help understand initial responses of drugs. Based on the more credible assumption that similar cell lines and similar drugs exhibit similar responses, we formulated drug response prediction as a recommender system problem, and then adopted a hybrid interpolation weighted collaborative filtering (HIWCF) method to predict anti-cancer drug responses of cell lines by incorporating cell line similarity and drug similarity shown from gene expression profiles, drug chemical structure as well as drug response similarity. Specifically, we estimated the baseline based on the available responses and shrunk the similarity score for each cell line pair as well as each drug pair. The similarity scores were then shrunk and weighted by the correlation coefficients drawn from the know response between each pair. Before used to find the K most similar neighbors for further prediction, they went through the case amplification strategy to emphasize high similarity and neglect low similarity. In the last step for prediction, cell line-oriented and drug-oriented collaborative filtering models were carried out, and the average of predicted values from both models was used as the final predicted sensitivity. Through 10-fold cross validation, this approach was shown to reach accurate and reproducible outcome for those missing drug sensitivities. We also found that the drug response similarity between cell lines or drugs may play important role in the prediction. Finally, we discussed the biological outcomes based on the newly predicted response values in GDSC dataset.
Collapse
Affiliation(s)
- Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Na-Na Guan
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| |
Collapse
|
38
|
Cortés-Ciriano I, Firth NC, Bender A, Watson O. Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening. J Chem Inf Model 2018; 58:2000-2014. [PMID: 30130102 DOI: 10.1021/acs.jcim.8b00376] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The versatility of similarity searching and quantitative structure-activity relationships to model the activity of compound sets within given bioactivity ranges (i.e., interpolation) is well established. However, their relative performance in the common scenario in early stage drug discovery where lots of inactive data but no active data points are available (i.e., extrapolation from the low-activity to the high-activity range) has not been thoroughly examined yet. To this aim, we have designed an iterative virtual screening strategy which was evaluated on 25 diverse bioactivity data sets from ChEMBL. We benchmark the efficiency of random forest (RF), multiple linear regression, ridge regression, similarity searching, and random selection of compounds to identify a highly active molecule in the test set among a large number of low-potency compounds. We use the number of iterations required to find this active molecule to evaluate the performance of each experimental setup. We show that linear and ridge regression often outperform RF and similarity searching, reducing the number of iterations to find an active compound by a factor of 2 or more. Even simple regression methods seem better able to extrapolate to high-bioactivity ranges than RF, which only provides output values in the range covered by the training set. In addition, examination of the scaffold diversity in the data sets used shows that in some cases similarity searching and RF require two times as many iterations as random selection depending on the chemical space covered in the initial training data. Lastly, we show using bioactivity data for COX-1 and COX-2 that our framework can be extended to multitarget drug discovery, where compounds are selected by concomitantly considering their activity against multiple targets. Overall, this study provides an approach for iterative screening where only inactive data are present in early stages of drug discovery in order to discover highly potent compounds and the best experimental set up in which to do so.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| | - Nicholas C Firth
- Centre for Medical Image Computing, Department of Computer Science , UCL , London WC1E 6BT , United Kingdom.,Evariste Technologies Ltd , Goring on Thames RG8 9AL , United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| | - Oliver Watson
- Evariste Technologies Ltd , Goring on Thames RG8 9AL , United Kingdom
| |
Collapse
|
39
|
Giblin KA, Hughes SJ, Boyd H, Hansson P, Bender A. Prospectively Validated Proteochemometric Models for the Prediction of Small-Molecule Binding to Bromodomain Proteins. J Chem Inf Model 2018; 58:1870-1888. [DOI: 10.1021/acs.jcim.8b00400] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Kathryn A. Giblin
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Samantha J. Hughes
- Computational Chemistry, Oncology, IMED Biotech Unit, AstraZeneca, Cambridge CB10 1XL, U.K
| | - Helen Boyd
- Discovery Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg 431 50 SE, Sweden
| | - Pia Hansson
- Discovery Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg 431 50 SE, Sweden
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| |
Collapse
|
40
|
Ali M, Aittokallio T. Machine learning and feature selection for drug response prediction in precision oncology applications. Biophys Rev 2018; 11:31-39. [PMID: 30097794 PMCID: PMC6381361 DOI: 10.1007/s12551-018-0446-z] [Citation(s) in RCA: 102] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 07/22/2018] [Indexed: 02/07/2023] Open
Abstract
In-depth modeling of the complex interplay among multiple omics data measured from cancer cell lines or patient tumors is providing new opportunities toward identification of tailored therapies for individual cancer patients. Supervised machine learning algorithms are increasingly being applied to the omics profiles as they enable integrative analyses among the high-dimensional data sets, as well as personalized predictions of therapy responses using multi-omics panels of response-predictive biomarkers identified through feature selection and cross-validation. However, technical variability and frequent missingness in input "big data" require the application of dedicated data preprocessing pipelines that often lead to some loss of information and compressed view of the biological signal. We describe here the state-of-the-art machine learning methods for anti-cancer drug response modeling and prediction and give our perspective on further opportunities to make better use of high-dimensional multi-omics profiles along with knowledge about cancer pathways targeted by anti-cancer compounds when predicting their phenotypic responses.
Collapse
Affiliation(s)
- Mehreen Ali
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, FI-00290, Helsinki, Finland.,Helsinki Institute for Information Technology (HIIT), Aalto University, FI-02150, Espoo, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, FI-00290, Helsinki, Finland. .,Helsinki Institute for Information Technology (HIIT), Aalto University, FI-02150, Espoo, Finland. .,Department of Mathematics and Statistics, University of Turku, FI-20014, Turku, Finland.
| |
Collapse
|
41
|
Kalamara A, Tobalina L, Saez-Rodriguez J. How to find the right drug for each patient? Advances and challenges in pharmacogenomics. CURRENT OPINION IN SYSTEMS BIOLOGY 2018; 10:53-62. [PMID: 31763498 PMCID: PMC6855262 DOI: 10.1016/j.coisb.2018.07.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Cancer is a highly heterogeneous disease with complex underlying biology. For these reasons, effective cancer treatment is still a challenge. Nowadays, it is clear that a cancer therapy that fits all the cases cannot be found, and as a result the design of therapies tailored to the patient's molecular characteristics is needed. Pharmacogenomics aims to study the relationship between an individual's genotype and drug response. Scientists use different biological models, ranging from cell lines to mouse models, as proxies for patients for preclinical and translational studies. The rapid development of "-omics" technologies is increasing the amount of features that can be measured in these models, expanding the possibilities of finding predictive biomarkers of drug response. Finding these relationships requires diverse computational approaches ranging from machine learning to dynamic modeling. Despite major advances, we are still far from being able to precisely predict drug efficacy in cancer models, let alone directly on patients. We believe that the new experimental techniques and computational approaches covered in this review will bring us closer to this goal.
Collapse
Affiliation(s)
- Angeliki Kalamara
- RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine, Aachen, Germany
| | - Luis Tobalina
- RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine, Aachen, Germany
| | - Julio Saez-Rodriguez
- RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine, Aachen, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK
- Heidelberg University, Faculty of Medicine, Institute of Computational Biomedicine, Heidelberg, Germany
| |
Collapse
|
42
|
Svensson F, Aniceto N, Norinder U, Cortes-Ciriano I, Spjuth O, Carlsson L, Bender A. Conformal Regression for Quantitative Structure–Activity Relationship Modeling—Quantifying Prediction Uncertainty. J Chem Inf Model 2018; 58:1132-1140. [DOI: 10.1021/acs.jcim.8b00054] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Fredrik Svensson
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
- IOTA Pharmaceuticals, St Johns Innovation Centre, Cowley Road, Cambridge CB4 0WS, U.K
| | - Natalia Aniceto
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Ulf Norinder
- Swetox, Unit of Toxicology Sciences, Karolinska Institutet, Forskargatan 20, SE-151 36 Södertälje, Sweden
- Department of Computer and Systems Sciences, Stockholm University, Box 7003, SE-164 07 Kista, Sweden
| | - Isidro Cortes-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, SE-75124, Uppsala Sweden
| | - Lars Carlsson
- Quantitative Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, SE-43183, Mölndal, Sweden
- Department of Computer Science, Royal Holloway, University of London, Egham Hill, Surrey, U.K
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| |
Collapse
|
43
|
Lapins M, Arvidsson S, Lampa S, Berg A, Schaal W, Alvarsson J, Spjuth O. A confidence predictor for logD using conformal regression and a support-vector machine. J Cheminform 2018; 10:17. [PMID: 29616425 PMCID: PMC5882484 DOI: 10.1186/s13321-018-0271-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 03/25/2018] [Indexed: 02/03/2023] Open
Abstract
Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water–octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and evaluated by a support-vector machine with a linear kernel using conformal prediction methodology, outputting prediction intervals at a specified confidence level. The resulting model shows a predictive ability of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\hbox {Q}^{2}=0.973$$\end{document}Q2=0.973 and with the best performing nonconformity measure having median prediction interval of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\pm ~0.39$$\end{document}±0.39 log units at 80% confidence and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\pm ~0.60$$\end{document}±0.60 log units at 90% confidence. The model is available as an online service via an OpenAPI interface, a web page with a molecular editor, and we also publish predictive values at 90% confidence level for 91 M PubChem structures in RDF format for download and as an URI resolver service.![]()
Collapse
Affiliation(s)
- Maris Lapins
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Staffan Arvidsson
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Samuel Lampa
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Arvid Berg
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Wesley Schaal
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden.
| |
Collapse
|
44
|
Dang CC, Peón A, Ballester PJ. Unearthing new genomic markers of drug response by improved measurement of discriminative power. BMC Med Genomics 2018; 11:10. [PMID: 29409485 PMCID: PMC5801688 DOI: 10.1186/s12920-018-0336-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Accepted: 01/29/2018] [Indexed: 12/29/2022] Open
Abstract
Background Oncology drugs are only effective in a small proportion of cancer patients. Our current ability to identify these responsive patients before treatment is still poor in most cases. Thus, there is a pressing need to discover response markers for marketed and research oncology drugs. Screening these drugs against a large panel of cancer cell lines has led to the discovery of new genomic markers of in vitro drug response. However, while the identification of such markers among thousands of candidate drug-gene associations in the data is error-prone, an appraisal of the effectiveness of such detection task is currently lacking. Methods Here we present a new non-parametric method to measuring the discriminative power of a drug-gene association. Unlike parametric statistical tests, the adopted non-parametric test has the advantage of not making strong assumptions about the data distorting the identification of genomic markers. Furthermore, we introduce a new benchmark to further validate these markers in vitro using more recent data not used to identify the markers. Results The application of this new methodology has led to the identification of 128 new genomic markers distributed across 61% of the analysed drugs, including 5 drugs without previously known markers, which were missed by the MANOVA test initially applied to analyse data from the Genomics of Drug Sensitivity in Cancer consortium. Conclusions Discovering markers using more than one statistical test and testing them on independent data is unusual. We found this helpful to discard statistically significant drug-gene associations that were actually spurious correlations. This approach also revealed new, independently validated, in vitro markers of drug response such as Temsirolimus-CDKN2A (resistance) and Gemcitabine-EWS_FLI1 (sensitivity). Electronic supplementary material The online version of this article (10.1186/s12920-018-0336-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Cuong C Dang
- Cancer Research Center of Marseille, INSERM U1068, F-13009, Marseille, France.,Institut Paoli-Calmettes, F-13009, Marseille, France.,Aix-Marseille Université, F-13284, Marseille, France.,CNRS UMR7258, F-13009, Marseille, France
| | - Antonio Peón
- Cancer Research Center of Marseille, INSERM U1068, F-13009, Marseille, France.,Institut Paoli-Calmettes, F-13009, Marseille, France.,Aix-Marseille Université, F-13284, Marseille, France.,CNRS UMR7258, F-13009, Marseille, France
| | - Pedro J Ballester
- Cancer Research Center of Marseille, INSERM U1068, F-13009, Marseille, France. .,Institut Paoli-Calmettes, F-13009, Marseille, France. .,Aix-Marseille Université, F-13284, Marseille, France. .,CNRS UMR7258, F-13009, Marseille, France.
| |
Collapse
|
45
|
Azuaje F. Computational models for predicting drug responses in cancer research. Brief Bioinform 2017; 18:820-829. [PMID: 27444372 PMCID: PMC5862310 DOI: 10.1093/bib/bbw065] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Indexed: 02/06/2023] Open
Abstract
The computational prediction of drug responses based on the analysis of multiple types of genome-wide molecular data is vital for accomplishing the promise of precision medicine in oncology. This will benefit cancer patients by matching their tumor characteristics to the most effective therapy available. As larger and more diverse layers of patient-related data become available, further demands for new bioinformatics approaches and expertise will arise. This article reviews key strategies, resources and techniques for the prediction of drug sensitivity in cell lines and patient-derived samples. It discusses major advances and challenges associated with the different model development steps. This review highlights major trends in this area, and will assist researchers in the assessment of recent progress and in the selection of approaches to emerging applications in oncology.
Collapse
Affiliation(s)
- Francisco Azuaje
- NorLux Neuro-Oncology Laboratory, Department of Oncology, Luxembourg Institute of Health (LIH), Luxembourg, Luxembourg
- Corresponding author: Francisco Azuaje, NorLux Neuro-Oncology Laboratory, Department of Oncology, Luxembourg Institute of Health (LIH), Luxembourg L-1526, Luxembourg. Tel.: +352-26970875; Fax: +352-26970396; E-mail:
| |
Collapse
|
46
|
Naulaerts S, Dang CC, Ballester PJ. Precision and recall oncology: combining multiple gene mutations for improved identification of drug-sensitive tumours. Oncotarget 2017; 8:97025-97040. [PMID: 29228590 PMCID: PMC5722542 DOI: 10.18632/oncotarget.20923] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 08/14/2017] [Indexed: 02/07/2023] Open
Abstract
Cancer drug therapies are only effective in a small proportion of patients. To make things worse, our ability to identify these responsive patients before administering a treatment is generally very limited. The recent arrival of large-scale pharmacogenomic data sets, which measure the sensitivity of molecularly profiled cancer cell lines to a panel of drugs, has boosted research on the discovery of drug sensitivity markers. However, no systematic comparison of widely-used single-gene markers with multi-gene machine-learning markers exploiting genomic data has been so far conducted. We therefore assessed the performance offered by these two types of models in discriminating between sensitive and resistant cell lines to a given drug. This was carried out for each of 127 considered drugs using genomic data characterising the cell lines. We found that the proportion of cell lines predicted to be sensitive that are actually sensitive (precision) varies strongly with the drug and type of model used. Furthermore, the proportion of sensitive cell lines that are correctly predicted as sensitive (recall) of the best single-gene marker was lower than that of the multi-gene marker in 118 of the 127 tested drugs. We conclude that single-gene markers are only able to identify those drug-sensitive cell lines with the considered actionable mutation, unlike multi-gene markers that can in principle combine multiple gene mutations to identify additional sensitive cell lines. We also found that cell line sensitivities to some drugs (e.g. Temsirolimus, 17-AAG or Methotrexate) are better predicted by these machine-learning models.
Collapse
Affiliation(s)
- Stefan Naulaerts
- Computational Biology and Drug Design, Cancer Research Center of Marseille, INSERM U1068, Marseille, France.,Institut Paoli-Calmettes, Marseille, France.,Aix-Marseille Université, Marseille, France.,CNRS UMR7258, Marseille, France
| | - Cuong C Dang
- Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam
| | - Pedro J Ballester
- Computational Biology and Drug Design, Cancer Research Center of Marseille, INSERM U1068, Marseille, France.,Institut Paoli-Calmettes, Marseille, France.,Aix-Marseille Université, Marseille, France.,CNRS UMR7258, Marseille, France
| |
Collapse
|
47
|
Wang L, Li X, Zhang L, Gao Q. Improved anticancer drug response prediction in cell lines using matrix factorization with similarity regularization. BMC Cancer 2017; 17:513. [PMID: 28768489 PMCID: PMC5541434 DOI: 10.1186/s12885-017-3500-5] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2017] [Accepted: 07/24/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Human cancer cell lines are used in research to study the biology of cancer and to test cancer treatments. Recently there are already some large panels of several hundred human cancer cell lines which are characterized with genomic and pharmacological data. The ability to predict drug responses using these pharmacogenomics data can facilitate the development of precision cancer medicines. Although several methods have been developed to address the drug response prediction, there are many challenges in obtaining accurate prediction. METHODS Based on the fact that similar cell lines and similar drugs exhibit similar drug responses, we adopted a similarity-regularized matrix factorization (SRMF) method to predict anticancer drug responses of cell lines using chemical structures of drugs and baseline gene expression levels in cell lines. Specifically, chemical structural similarity of drugs and gene expression profile similarity of cell lines were considered as regularization terms, which were incorporated to the drug response matrix factorization model. RESULTS We first demonstrated the effectiveness of SRMF using a set of simulation data and compared it with two typical similarity-based methods. Furthermore, we applied it to the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets, and performance of SRMF exceeds three state-of-the-art methods. We also applied SRMF to estimate the missing drug response values in the GDSC dataset. Even though SRMF does not specifically model mutation information, it could correctly predict drug-cancer gene associations that are consistent with existing data, and identify novel drug-cancer gene associations that are not found in existing data as well. SRMF can also aid in drug repositioning. The newly predicted drug responses of GDSC dataset suggest that mTOR inhibitor rapamycin was sensitive to non-small cell lung cancer (NSCLC), and expression of AK1RC3 and HINT1 may be adjunct markers of cell line sensitivity to rapamycin. CONCLUSIONS Our analysis showed that the proposed data integration method is able to improve the accuracy of prediction of anticancer drug responses in cell lines, and can identify consistent and novel drug-cancer gene associations compared to existing data as well as aid in drug repositioning.
Collapse
Affiliation(s)
- Lin Wang
- School of Computer Science and Information Engineering, Tianjin University of Science and Technology, Tianjin, 300457, China.
| | - Xiaozhong Li
- School of Computer Science and Information Engineering, Tianjin University of Science and Technology, Tianjin, 300457, China
| | - Louxin Zhang
- Department of Mathematics, National University of Singapore, Singapore, 119076, Singapore
| | - Qiang Gao
- Key Lab of Industrial Fermentation Microbiology, Ministry of Education & Tianjin City, College of Biotechnology, Tianjin University of Science and Technology, Tianjin, 300457, China
| |
Collapse
|
48
|
Ammad-ud-din M, Khan SA, Wennerberg K, Aittokallio T. Systematic identification of feature combinations for predicting drug response with Bayesian multi-view multi-task linear regression. Bioinformatics 2017; 33:i359-i368. [PMID: 28881998 PMCID: PMC5870540 DOI: 10.1093/bioinformatics/btx266] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION A prime challenge in precision cancer medicine is to identify genomic and molecular features that are predictive of drug treatment responses in cancer cells. Although there are several computational models for accurate drug response prediction, these often lack the ability to infer which feature combinations are the most predictive, particularly for high-dimensional molecular datasets. As increasing amounts of diverse genome-wide data sources are becoming available, there is a need to build new computational models that can effectively combine these data sources and identify maximally predictive feature combinations. RESULTS We present a novel approach that leverages on systematic integration of data sources to identify response predictive features of multiple drugs. To solve the modeling task we implement a Bayesian linear regression method. To further improve the usefulness of the proposed model, we exploit the known human cancer kinome for identifying biologically relevant feature combinations. In case studies with a synthetic dataset and two publicly available cancer cell line datasets, we demonstrate the improved accuracy of our method compared to the widely used approaches in drug response analysis. As key examples, our model identifies meaningful combinations of features for the well known EGFR, ALK, PLK and PDGFR inhibitors. AVAILABILITY AND IMPLEMENTATION The source code of the method is available at https://github.com/suleimank/mvlr . CONTACT muhammad.ammad-ud-din@helsinki.fi or suleiman.khan@helsinki.fi. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Muhammad Ammad-ud-din
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Suleiman A Khan
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Krister Wennerberg
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| |
Collapse
|
49
|
Peón A, Naulaerts S, Ballester PJ. Predicting the Reliability of Drug-target Interaction Predictions with Maximum Coverage of Target Space. Sci Rep 2017; 7:3820. [PMID: 28630414 PMCID: PMC5476590 DOI: 10.1038/s41598-017-04264-w] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2017] [Accepted: 05/26/2017] [Indexed: 02/05/2023] Open
Abstract
Many computational methods to predict the macromolecular targets of small organic molecules have been presented to date. Despite progress, target prediction methods still have important limitations. For example, the most accurate methods implicitly restrict their predictions to a relatively small number of targets, are not systematically validated on drugs (whose targets are harder to predict than those of non-drug molecules) and often lack a reliability score associated with each predicted target. Here we present a systematic validation of ligand-centric target prediction methods on a set of clinical drugs. These methods exploit a knowledge-base covering 887,435 known ligand-target associations between 504,755 molecules and 4,167 targets. Based on this dataset, we provide a new estimate of the polypharmacology of drugs, which on average have 11.5 targets below IC50 10 µM. The average performance achieved across clinical drugs is remarkable (0.348 precision and 0.423 recall, with large drug-dependent variability), especially given the unusually large coverage of the target space. Furthermore, we show how a sparse ligand-target bioactivity matrix to retrospectively validate target prediction methods could underestimate prospective performance. Lastly, we present and validate a first-in-kind score capable of accurately predicting the reliability of target predictions.
Collapse
Affiliation(s)
- Antonio Peón
- Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France
- CNRS, UMR7258, Marseille, F-13009, France
- Institut Paoli-Calmettes, Marseille, F-13009, France
- Aix-Marseille University, UM 105, F-13284, Marseille, France
| | - Stefan Naulaerts
- Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France
- CNRS, UMR7258, Marseille, F-13009, France
- Institut Paoli-Calmettes, Marseille, F-13009, France
- Aix-Marseille University, UM 105, F-13284, Marseille, France
| | - Pedro J Ballester
- Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France.
- CNRS, UMR7258, Marseille, F-13009, France.
- Institut Paoli-Calmettes, Marseille, F-13009, France.
- Aix-Marseille University, UM 105, F-13284, Marseille, France.
| |
Collapse
|
50
|
Svensson F, Norinder U, Bender A. Modelling compound cytotoxicity using conformal prediction and PubChem HTS data. Toxicol Res (Camb) 2017; 6:73-80. [PMID: 30090478 PMCID: PMC6061930 DOI: 10.1039/c6tx00252h] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 10/28/2016] [Indexed: 12/28/2022] Open
Abstract
The assessment of compound cytotoxicity is an important part of the drug discovery process. Accurate predictions of cytotoxicity have the potential to expedite decision making and save considerable time and effort. In this work we apply class conditional conformal prediction to model the cytotoxicity of compounds based on 16 high throughput cytotoxicity assays from PubChem. The data span 16 cell lines and comprise more than 440 000 unique compounds. The data sets are heavily imbalanced with only 0.8% of the tested compounds being cytotoxic. We trained one classification model for each cell line and validated the performance with respect to validity and accuracy. The generated models deliver high quality predictions for both toxic and non-toxic compounds despite the imbalance between the two classes. On external data collected from the same assay provider as one of the investigated cell lines the model had a sensitivity of 74% and a specificity of 65% at the 80% confidence level among the compounds assigned to a single class. Compared to previous approaches for large scale cytotoxicity modelling, this represents a balanced performance in the prediction of the toxic and non-toxic classes. The conformal prediction framework also allows the modeller to control the error frequency of the predictions, allowing predictions of cytotoxicity outcomes with confidence.
Collapse
Affiliation(s)
- Fredrik Svensson
- Centre for Molecular Informatics , Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , UK .
| | - Ulf Norinder
- Swedish Toxicology Sciences Research Center , SE-151 36 Södertälje , Sweden
- Dept. Computer and Systems Sciences , Stockholm Univ. , Box 7003 , SE-164 07 Kista , Sweden
| | - Andreas Bender
- Centre for Molecular Informatics , Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , UK .
| |
Collapse
|