1
|
Vishwakarma S, Hernandez-Hernandez S, Ballester PJ. Graph neural networks are promising for phenotypic virtual screening on cancer cell lines. Biol Methods Protoc 2024; 9:bpae065. [PMID: 39502795 PMCID: PMC11537795 DOI: 10.1093/biomethods/bpae065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 08/20/2024] [Accepted: 09/02/2024] [Indexed: 11/08/2024] Open
Abstract
Artificial intelligence is increasingly driving early drug design, offering novel approaches to virtual screening. Phenotypic virtual screening (PVS) aims to predict how cancer cell lines respond to different compounds by focusing on observable characteristics rather than specific molecular targets. Some studies have suggested that deep learning may not be the best approach for PVS. However, these studies are limited by the small number of tested molecules as well as not employing suitable performance metrics and dissimilar-molecules splits better mimicking the challenging chemical diversity of real-world screening libraries. Here we prepared 60 datasets, each containing approximately 30 000-50 000 molecules tested for their growth inhibitory activities on one of the NCI-60 cancer cell lines. We conducted multiple performance evaluations of each of the five machine learning algorithms for PVS on these 60 problem instances. To provide even a more comprehensive evaluation, we used two model validation types: the random split and the dissimilar-molecules split. Overall, about 14 440 training runs aczross datasets were carried out per algorithm. The models were primarily evaluated using hit rate, a more suitable metric in VS contexts. The results show that all models are more challenged by test molecules that are substantially different from those in the training data. In both validation types, the D-MPNN algorithm, a graph-based deep neural network, was found to be the most suitable for building predictive models for this PVS problem.
Collapse
Affiliation(s)
- Sachin Vishwakarma
- Evotec SAS (France), Toulouse, France
- Centre de Recherche en Cancérologie de Marseille, Marseille 13009, France
| | | | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, United Kingdom
| |
Collapse
|
2
|
Wang Z, Gu H, Qin P, Wang J. Single nucleotide and copy number variants of cancer driver genes inform drug response in multiple cancers. PLoS One 2024; 19:e0306343. [PMID: 39083502 PMCID: PMC11290640 DOI: 10.1371/journal.pone.0306343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 06/14/2024] [Indexed: 08/02/2024] Open
Abstract
Due to the heterogeneity of cancer, precision medicine has been a major challenge for cancer treatment. Determining medication regimens based on patient genotypes has become a research hotspot in cancer genomics. In this study, we aim to identify key biomarkers for targeted therapies based on single nucleotide variants (SNVs) and copy number variants (CNVs) of genes. The experiment is carried out on 7 cancers on the Encyclopedia of Cancer Cell Lines (CCLE) dataset. Considering the high mutability of driver genes which result in abundant mutated samples, the effect of data sparsity can be eliminated to a large extent. Therefore, we focus on discovering the relationship between driver mutation patterns and three measures of drug response, namely area under the curve (AUC), half maximal effective concentration (EC50), and log2-fold change (LFC). First, multiple statistical methods are applied to assess the significance of difference in drug response between sample groups. Next, for each driver gene, we analyze the extent to which its mutations can affect drug response. Based on the results of multiple hypothesis tests and correlation analyses, our main findings include the validation of several known drug response biomarkers such as BRAF, NRAS, MAP2K1, MAP2K2, and CDKN2A, as well as genes with huge potential to infer drug responses. It is worth emphasizing that we identify a list of genes including SALL4, B2M, BAP1, CCDC6, ERBB4, FOXA1, GRIN2A, and PTPRT, whose impact on drug response spans multiple cancers and should be prioritized as key biomarkers for targeted therapies. Furthermore, based on the statistical p-values and correlation coefficients, we construct gene-drug sensitivity maps for cancer drug recommendation. In this work, we show that driver mutation patterns could be used to tailor therapeutics for precision medicine.
Collapse
Affiliation(s)
- Zeyuan Wang
- School of Control Science and Engineering, Dalian University of Technology, Dalian, Liaoning, China
| | - Hong Gu
- School of Control Science and Engineering, Dalian University of Technology, Dalian, Liaoning, China
| | - Pan Qin
- School of Control Science and Engineering, Dalian University of Technology, Dalian, Liaoning, China
| | - Jia Wang
- Department of Breast Surgery, Institute of Breast Disease, Second Hospital of Dalian Medical University, Dalian, Liaoning, China
| |
Collapse
|
3
|
Liu H, Wang F, Yu J, Pan Y, Gong C, Zhang L, Zhang L. DBDNMF: A Dual Branch Deep Neural Matrix Factorization method for drug response prediction. PLoS Comput Biol 2024; 20:e1012012. [PMID: 38574114 PMCID: PMC11020650 DOI: 10.1371/journal.pcbi.1012012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 04/16/2024] [Accepted: 03/19/2024] [Indexed: 04/06/2024] Open
Abstract
Anti-cancer response of cell lines to drugs is in urgent need for individualized precision medical decision-making in the era of precision medicine. Measurements with wet-experiments is time-consuming and expensive and it is almost impossible for wide ranges of application. The design of computational models that can precisely predict the responses between drugs and cell lines could provide a credible reference for further research. Existing methods of response prediction based on matrix factorization or neural networks have revealed that both linear or nonlinear latent characteristics are applicable and effective for the precise prediction of drug responses. However, the majority of them consider only linear or nonlinear relationships for drug response prediction. Herein, we propose a Dual Branch Deep Neural Matrix Factorization (DBDNMF) method to address the above-mentioned issues. DBDNMF learns the latent representation of drugs and cell lines through flexible inputs and reconstructs the partially observed matrix through a series of hidden neural network layers. Experimental results on the datasets of Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (GDSC) show that the accuracy of drug prediction exceeds state-of-the-art drug response prediction algorithms, demonstrating its reliability and stability. The hierarchical clustering results show that drugs with similar response levels tend to target similar signaling pathway, and cell lines coming from the same tissue subtype tend to share the same pattern of response, which are consistent with previously published studies.
Collapse
Affiliation(s)
- Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, China
| | - Feng Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, China
| | - Jian Yu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, China
| | - Yong Pan
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, China
| | - Chaoju Gong
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, China
- Department of Ophthalmology, Xuzhou First People’s Hospital, Xuzhou, Jiangsu, China
| | - Liang Zhang
- Department of Gastrointestinal Surgery, Xuzhou Central Hospital, Xuzhou, Jiangsu, China
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, China
| |
Collapse
|
4
|
Sharma R, Saghapour E, Chen JY. An NLP-based technique to extract meaningful features from drug SMILES. iScience 2024; 27:109127. [PMID: 38455979 PMCID: PMC10918220 DOI: 10.1016/j.isci.2024.109127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 09/30/2023] [Accepted: 02/01/2024] [Indexed: 03/09/2024] Open
Abstract
NLP is a well-established field in ML for developing language models that capture the sequence of words in a sentence. Similarly, drug molecule structures can also be represented as sequences using the SMILES notation. However, unlike natural language texts, special characters in drug SMILES have specific meanings and cannot be ignored. We introduce a novel NLP-based method that extracts interpretable sequences and essential features from drug SMILES notation using N-grams. Our method compares these features to Morgan fingerprint bit-vectors using UMAP-based embedding, and we validate its effectiveness through two personalized drug screening (PSD) case studies. Our NLP-based features are sparse and, when combined with gene expressions and disease phenotype features, produce better ML models for PSD. This approach provides a new way to analyze drug molecule structures represented as SMILES notation, which can help accelerate drug discovery efforts. We have also made our method accessible through a Python library.
Collapse
Affiliation(s)
- Rahul Sharma
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Ehsan Saghapour
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Jake Y. Chen
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
5
|
Srivastava S, Jain P. Computational Approaches: A New Frontier in Cancer Research. Comb Chem High Throughput Screen 2024; 27:1861-1876. [PMID: 38031782 DOI: 10.2174/0113862073265604231106112203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/08/2023] [Accepted: 09/21/2023] [Indexed: 12/01/2023]
Abstract
Cancer is a broad category of disease that can start in virtually any organ or tissue of the body when aberrant cells assault surrounding organs and proliferate uncontrollably. According to the most recent statistics, cancer will be the cause of 10 million deaths worldwide in 2020, accounting for one death out of every six worldwide. The typical approach used in anti-cancer research is highly time-consuming and expensive, and the outcomes are not particularly encouraging. Computational techniques have been employed in anti-cancer research to advance our understanding. Recent years have seen a significant and exceptional impact on anticancer research due to the rapid development of computational tools for novel drug discovery, drug design, genetic studies, genome characterization, cancer imaging and detection, radiotherapy, cancer metabolomics, and novel therapeutic approaches. In this paper, we examined the various subfields of contemporary computational techniques, including molecular docking, artificial intelligence, bioinformatics, virtual screening, and QSAR, and their applications in the study of cancer.
Collapse
Affiliation(s)
- Shubham Srivastava
- Department of Pharmacy, IIMT College of Pharmacy, Uttar Pradesh, 201310, India
| | - Pushpendra Jain
- Department of Pharmacy, IIMT College of Pharmacy, Uttar Pradesh, 201310, India
| |
Collapse
|
6
|
Yang Y, Li P. GPDRP: a multimodal framework for drug response prediction with graph transformer. BMC Bioinformatics 2023; 24:484. [PMID: 38105227 PMCID: PMC10726525 DOI: 10.1186/s12859-023-05618-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 12/13/2023] [Indexed: 12/19/2023] Open
Abstract
BACKGROUND In the field of computational personalized medicine, drug response prediction (DRP) is a critical issue. However, existing studies often characterize drugs as strings, a representation that does not align with the natural description of molecules. Additionally, they ignore gene pathway-specific combinatorial implication. RESULTS In this study, we propose drug Graph and gene Pathway based Drug response prediction method (GPDRP), a new multimodal deep learning model for predicting drug responses based on drug molecular graphs and gene pathway activity. In GPDRP, drugs are represented by molecular graphs, while cell lines are described by gene pathway activity scores. The model separately learns these two types of data using Graph Neural Networks (GNN) with Graph Transformers and deep neural networks. Predictions are subsequently made through fully connected layers. CONCLUSIONS Our results indicate that Graph Transformer-based model delivers superior performance. We apply GPDRP on hundreds of cancer cell lines' bulk RNA-sequencing data, and it outperforms some recently published models. Furthermore, the generalizability and applicability of GPDRP are demonstrated through its predictions on unknown drug-cell line pairs and xenografts. This underscores the interpretability achieved by incorporating gene pathways.
Collapse
Affiliation(s)
- Yingke Yang
- School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang, 471000, China
| | - Peiluan Li
- School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang, 471000, China.
- Longmen Laboratory, Luoyang, 471003, China.
| |
Collapse
|
7
|
Shah OS, Chen F, Wedn A, Kashiparekh A, Knapick B, Chen J, Savariau L, Clifford B, Hooda J, Christgen M, Xavier J, Oesterreich S, Lee AV. Multi-omic characterization of ILC and ILC-like cell lines as part of ILC cell line encyclopedia (ICLE) defines new models to study potential biomarkers and explore therapeutic opportunities. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.26.559548. [PMID: 37808708 PMCID: PMC10557671 DOI: 10.1101/2023.09.26.559548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Invasive lobular carcinoma (ILC), the most common histological "special type", accounts for ∼10-15% of all BC diagnoses, is characterized by unique features such as E-cadherin loss/deficiency, lower grade, hormone receptor positivity, larger diffuse tumors, and specific metastatic patterns. Despite ILC being acknowledged as a disease with distinct biology that necessitates specialized and precision medicine treatments, the further exploration of its molecular alterations with the goal of discovering new treatments has been hindered due to the scarcity of well-characterized cell line models for studying this disease. To address this, we generated the ILC Cell Line Encyclopedia (ICLE), providing a comprehensive multi-omic characterization of ILC and ILC-like cell lines. Using consensus multi-omic subtyping, we confirmed luminal status of previously established ILC cell lines and uncovered additional ILC/ILC-like cell lines with luminal features for modeling ILC disease. Furthermore, most of these luminal ILC/ILC-like cell lines also showed RNA and copy number similarity to ILC patient tumors. Similarly, ILC/ILC-like cell lines also retained molecular alterations in key ILC genes at similar frequency to both primary and metastatic ILC tumors. Importantly, ILC/ILC-like cell lines recapitulated the CDH1 alteration landscape of ILC patient tumors including enrichment of truncating mutations in and biallelic inactivation of CDH1 gene. Using whole-genome optical mapping, we uncovered novel genomic-rearrangements including novel structural variations in CDH1 and functional gene fusions and characterized breast cancer specific patterns of chromothripsis in chromosomes 8, 11 and 17. In addition, we systematically analyzed aberrant DNAm events and integrative analysis with RNA expression revealed epigenetic activation of TFAP2B - an emerging biomarker of lobular disease that is preferentially expressed in lobular disease. Finally, towards the goal of identifying novel druggable vulnerabilities in ILC, we analyzed publicly available RNAi loss of function breast cancer cell line datasets and revealed numerous putative vulnerabilities cytoskeletal components, focal adhesion and PI3K/AKT pathway in ILC/ILC-like vs NST cell lines. In summary, we addressed the lack of suitable models to study E-cadherin deficient breast cancers by first collecting both established and putative ILC models, then characterizing them comprehensively to show their molecular similarity to patient tumors along with uncovering their novel multi-omic features as well as highlighting putative novel druggable vulnerabilities. Not only we expand the array of suitable E-cadherin deficient cell lines available for modelling human-ILC disease but also employ them for studying epigenetic activation of a putative lobular biomarker as well as identifying potential druggable vulnerabilities for this disease towards enabling precision medicine research for human-ILC.
Collapse
|
8
|
Viljanen M, Minnema J, Wassenaar PNH, Rorije E, Peijnenburg W. What is the ecotoxicity of a given chemical for a given aquatic species? Predicting interactions between species and chemicals using recommender system techniques. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2023; 34:765-788. [PMID: 37670728 DOI: 10.1080/1062936x.2023.2254225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 08/27/2023] [Indexed: 09/07/2023]
Abstract
Ecotoxicological safety assessment of chemicals requires toxicity data on multiple species, despite the general desire of minimizing animal testing. Predictive models, specifically machine learning (ML) methods, are one of the tools capable of solving this apparent contradiction as they allow to generalize toxicity patterns across chemicals and species. However, despite the availability of large public toxicity datasets, the data is highly sparse, complicating model development. The aim of this study is to provide insights into how ML can predict toxicity using a large but sparse dataset. We developed models to predict LC50-values, based on experimental LC50-data covering 2431 organic chemicals and 1506 aquatic species from the ECOTOX-database. Several well-known ML techniques were evaluated and a new ML model was developed, inspired by recommender systems. This new model involves a simple linear model that learns low-rank interactions between species and chemicals using factorization machines. We evaluated the predictive performances of the developed models based on two validation settings: 1) predicting unseen chemical-species pairs, and 2) predicting unseen chemicals. The results of this study show that ML models can accurately predict LC50-values in both validation settings. Moreover, we show that the novel factorization machine approach can match well-tuned, complex, ML approaches.
Collapse
Affiliation(s)
- M Viljanen
- Department of Statistics, Data Science and Modelling, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - J Minnema
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - P N H Wassenaar
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - E Rorije
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - W Peijnenburg
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
- Institute of Environmental Sciences (CML), Leiden University, Leiden, The Netherlands
| |
Collapse
|
9
|
Kaushik AC, Zhao Z. Machine learning-driven exploration of drug therapies for triple-negative breast cancer treatment. Front Mol Biosci 2023; 10:1215204. [PMID: 37602329 PMCID: PMC10436744 DOI: 10.3389/fmolb.2023.1215204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 07/21/2023] [Indexed: 08/22/2023] Open
Abstract
Breast cancer is the second leading cause of cancer death in women among all cancer types. It is highly heterogeneous in nature, which means that the tumors have different morphologies and there is heterogeneity even among people who have the same type of tumor. Several staging and classifying systems have been developed due to the variability of different types of breast cancer. Due to high heterogeneity, personalized treatment has become a new strategy. Out of all breast cancer subtypes, triple-negative breast cancer (TNBC) comprises ∼10%-15%. TNBC refers to the subtype of breast cancer where cells do not express estrogen receptors, progesterone receptors, or human epidermal growth factor receptors (ERs, PRs, and HERs). Tumors in TNBC have a diverse set of genetic markers and prognostic indicators. We scanned the Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (GDSC) databases for potential drugs using human breast cancer cell lines and drug sensitivity data. Three different machine-learning approaches were used to evaluate the prediction of six effective drugs against the TNBC cell lines. The top biomarkers were then shortlisted on the basis of their involvement in breast cancer and further subjected to testing for radion resistance using data from the Cleveland database. It was observed that Panobinostat, PLX4720, Lapatinib, Nilotinib, Selumetinib, and Tanespimycin were six effective drugs against the TNBC cell lines. We could identify potential derivates that may be used against approved drugs. Only one biomarker (SETD7) was sensitive to all six drugs on the shortlist, while two others (SRARP and YIPF5) were sensitive to both radiation and drugs. Furthermore, we did not find any radioresistance markers for the TNBC. The proposed biomarkers and drug sensitivity analysis will provide potential candidates for future clinical investigation.
Collapse
Affiliation(s)
- Aman Chandra Kaushik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States
- MD Anderson Cancer Center, UTHealth Graduate School of Biomedical Sciences, Houston, TX, United States
| |
Collapse
|
10
|
Hao Y, Romano JD, Moore JH. Knowledge graph aids comprehensive explanation of drug and chemical toxicity. CPT Pharmacometrics Syst Pharmacol 2023; 12:1072-1079. [PMID: 37475158 PMCID: PMC10431039 DOI: 10.1002/psp4.12975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 04/04/2023] [Accepted: 04/06/2023] [Indexed: 07/22/2023] Open
Abstract
In computational toxicology, prediction of complex endpoints has always been challenging, as they often involve multiple distinct mechanisms. State-of-the-art models are either limited by low accuracy, or lack of interpretability due to their black-box nature. Here, we introduce AIDTox, an interpretable deep learning model which incorporates curated knowledge of chemical-gene connections, gene-pathway annotations, and pathway hierarchy. AIDTox accurately predicts cytotoxicity outcomes in HepG2 and HEK293 cells. It also provides comprehensive explanations of cytotoxicity covering multiple aspects of drug activity, including target interaction, metabolism, and elimination. In summary, AIDTox provides a computational framework for unveiling cellular mechanisms for complex toxicity endpoints.
Collapse
Affiliation(s)
- Yun Hao
- Genomics and Computational Biology (GCB) Graduate ProgramUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Joseph D. Romano
- Institute for Biomedical InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Center of Excellence in Environmental ToxicologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Jason H. Moore
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| |
Collapse
|
11
|
Mehmood A, Nawab S, Jin Y, Hassan H, Kaushik AC, Wei DQ. Ranking Breast Cancer Drugs and Biomarkers Identification Using Machine Learning and Pharmacogenomics. ACS Pharmacol Transl Sci 2023; 6:399-409. [PMID: 36926455 PMCID: PMC10012252 DOI: 10.1021/acsptsci.2c00212] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Indexed: 02/26/2023]
Abstract
Breast cancer is one of the major causes of death in women worldwide. It is a diverse illness with substantial intersubject heterogeneity, even among individuals with the same type of tumor, and customized therapy has become increasingly important in this sector. Because of the clinical and physical variability of different kinds of breast cancers, multiple staging and classification systems have been developed. As a result, these tumors exhibit a wide range of gene expression and prognostic indicators. To date, no comprehensive investigation of model training procedures on information from numerous cell line screenings has been conducted together with radiation data. We used human breast cancer cell lines and drug sensitivity information from Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (GDSC) databases to scan for potential drugs using cell line data. The results are further validated through three machine learning approaches: Elastic Net, LASSO, and Ridge. Next, we selected top-ranked biomarkers based on their role in breast cancer and tested them further for their resistance to radiation using the data from the Cleveland database. We have identified six drugs named Palbociclib, Panobinostat, PD-0325901, PLX4720, Selumetinib, and Tanespimycin that significantly perform on breast cancer cell lines. Also, five biomarkers named TNFSF15, DCAF6, KDM6A, PHETA2, and IFNGR1 are sensitive to all six shortlisted drugs and show sensitivity to the radiations. The proposed biomarkers and drug sensitivity analysis are helpful in translational cancer studies and provide valuable insights for clinical trial design.
Collapse
Affiliation(s)
- Aamir Mehmood
- Department
of Bioinformatics and Biological Statistics, School of Life Sciences
and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - Sadia Nawab
- State
Key Laboratory of Microbial Metabolism and School of Life Sciences
and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, P.R. China
| | - Yifan Jin
- Department
of Bioinformatics and Biological Statistics, School of Life Sciences
and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - Hesham Hassan
- Department
of Pathology, College of Medicine, King
Khalid University, Abha 61421, Saudi Arabia
- Department
of Pathology, Faculty of Medicine, Assiut
University, Assiut 71515, Egypt
| | - Aman Chandra Kaushik
- Department
of Bioinformatics and Biological Statistics, School of Life Sciences
and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - Dong-Qing Wei
- State
Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade
Joint Innovation Center on Antibacterial Resistances, Joint International
Research Laboratory of Metabolic & Developmental Sciences and
School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
- Zhongjing
Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, Henan 473006, P.R. China
- Peng
Cheng National Laboratory, Vanke Cloud City Phase I Building 8, Xili Street, Nanshan District, Shenzhen, Guangdong 518055, P.R. China
| |
Collapse
|
12
|
Abbasi Mesrabadi H, Faez K, Pirgazi J. Drug-target interaction prediction based on protein features, using wrapper feature selection. Sci Rep 2023; 13:3594. [PMID: 36869062 PMCID: PMC9984486 DOI: 10.1038/s41598-023-30026-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 02/14/2023] [Indexed: 03/05/2023] Open
Abstract
Drug-target interaction prediction is a vital stage in drug development, involving lots of methods. Experimental methods that identify these relationships on the basis of clinical remedies are time-taking, costly, laborious, and complex introducing a lot of challenges. One group of new methods is called computational methods. The development of new computational methods which are more accurate can be preferable to experimental methods, in terms of total cost and time. In this paper, a new computational model to predict drug-target interaction (DTI), consisting of three phases, including feature extraction, feature selection, and classification is proposed. In feature extraction phase, different features such as EAAC, PSSM and etc. would be extracted from sequence of proteins and fingerprint features from drugs. These extracted features would then be combined. In the next step, one of the wrapper feature selection methods named IWSSR, due to the large amount of extracted data, is applied. The selected features are then given to rotation forest classification, to have a more efficient prediction. Actually, the innovation of our work is that we extract different features; and then select features by the use of IWSSR. The accuracy of the rotation forest classifier based on tenfold on the golden standard datasets (enzyme, ion channels, G-protein-coupled receptors, nuclear receptors) is as follows: 98.12, 98.07, 96.82, and 95.64. The results of experiments indicate that the proposed model has an acceptable rate in DTI prediction and is compatible with the proposed methods in other papers.
Collapse
Affiliation(s)
- Hengame Abbasi Mesrabadi
- Faculty of Computer and Information Technology Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran
| | - Karim Faez
- Department of Electrical Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran.
| | - Jamshid Pirgazi
- Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran
| |
Collapse
|
13
|
Cui T, Wang Z, Gu H, Qin P, Wang J. Gamma distribution based predicting model for breast cancer drug response based on multi-layer feature selection. Front Genet 2023; 14:1095976. [PMID: 36816042 PMCID: PMC9932661 DOI: 10.3389/fgene.2023.1095976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 01/23/2023] [Indexed: 02/05/2023] Open
Abstract
In the pursuit of precision medicine for cancer, a promising step is to predict drug response based on data mining, which can provide clinical decision support for cancer patients. Although some machine learning methods for predicting drug response from genomic data already exist, most of them focus on point prediction, which cannot reveal the distribution of predicted results. In this paper, we propose a three-layer feature selection combined with a gamma distribution based GLM and a two-layer feature selection combined with an ANN. The two regression methods are applied to the Encyclopedia of Cancer Cell Lines (CCLE) and the Cancer Drug Sensitivity Genomics (GDSC) datasets. Using ten-fold cross-validation, our methods achieve higher accuracy on anticancer drug response prediction compared to existing methods, with an R 2 and RMSE of 0.87 and 0.53, respectively. Through data validation, the significance of assessing the reliability of predictions by predicting confidence intervals and its role in personalized medicine are illustrated. The correlation analysis of the genes selected from the three layers of features also shows the effectiveness of our proposed methods.
Collapse
Affiliation(s)
- Tongtong Cui
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China
| | - Zeyuan Wang
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China
| | - Hong Gu
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China
| | - Pan Qin
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China,*Correspondence: Jia Wang, ; Pan Qin,
| | - Jia Wang
- Department of Breast Surgery, Second Hospital of Dalian Medical University, Dalian, Liaoning, China,*Correspondence: Jia Wang, ; Pan Qin,
| |
Collapse
|
14
|
Liu XY, Mei XY. Prediction of drug sensitivity based on multi-omics data using deep learning and similarity network fusion approaches. Front Bioeng Biotechnol 2023; 11:1156372. [PMID: 37139048 PMCID: PMC10150883 DOI: 10.3389/fbioe.2023.1156372] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 03/31/2023] [Indexed: 05/05/2023] Open
Abstract
With the rapid development of multi-omics technologies and accumulation of large-scale bio-datasets, many studies have conducted a more comprehensive understanding of human diseases and drug sensitivity from multiple biomolecules, such as DNA, RNA, proteins and metabolites. Using single omics data is difficult to systematically and comprehensively analyze the complex disease pathology and drug pharmacology. The molecularly targeted therapy-based approaches face some challenges, such as insufficient target gene labeling ability, and no clear targets for non-specific chemotherapeutic drugs. Consequently, the integrated analysis of multi-omics data has become a new direction for scientists to explore the mechanism of disease and drug. However, the available drug sensitivity prediction models based on multi-omics data still have problems such as overfitting, lack of interpretability, difficulties in integrating heterogeneous data, and the prediction accuracy needs to be improved. In this paper, we proposed a novel drug sensitivity prediction (NDSP) model based on deep learning and similarity network fusion approaches, which extracts drug targets using an improved sparse principal component analysis (SPCA) method for each omics data, and construct sample similarity networks based on the sparse feature matrices. Furthermore, the fused similarity networks are put into a deep neural network for training, which greatly reduces the data dimensionality and weakens the risk of overfitting problem. We use three omics of data, RNA sequence, copy number aberration and methylation, and select 35 drugs from Genomics of Drug Sensitivity in Cancer (GDSC) for experiments, including Food and Drug Administration (FDA)-approved targeted drugs, FDA-unapproved targeted drugs and non-specific therapies. Compared with some current deep learning methods, our proposed method can extract highly interpretable biological features to achieve highly accurate sensitivity prediction of targeted and non-specific cancer drugs, which is beneficial for the development of precision oncology beyond targeted therapy.
Collapse
Affiliation(s)
- Xiao-Ying Liu
- Guangdong Polytechnic of Science and Technology, Zhuhai, China
- *Correspondence: Xiao-Ying Liu,
| | - Xin-Yue Mei
- Institute of Systems Engineering, Macau University of Science and Technology, Taipa, China
| |
Collapse
|
15
|
Chen S, Yang Y, Zhou H, Sun Q, Su R. DNN-PNN: A parallel deep neural network model to improve anticancer drug sensitivity. Methods 2023; 209:1-9. [PMID: 36410694 DOI: 10.1016/j.ymeth.2022.11.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 10/11/2022] [Accepted: 11/14/2022] [Indexed: 11/19/2022] Open
Abstract
With the rapid development of deep learning techniques and large-scale genomics database, it is of great potential to apply deep learning to the prediction task of anticancer drug sensitivity, which can effectively improve the identification efficiency and accuracy of therapeutic biomarkers. In this study, we propose a parallel deep learning framework DNN-PNN, which integrates rich and heterogeneous information from gene expression and pharmaceutical chemical structure data. With the proposal of DNN-PNN, a new and more effective drug data representation strategy is introduced, that is, the correlation between features is represented by product, which alleviates the limitations of high-dimensional discrete data in deep learning. Furthermore, the framework is optimized to reduce the time complexity of the model. We conducted extensive experiments on the CCLE datasets to compare DNN-PNN with its variant DNN-FM representing the traditional feature correlation model, the component DNN or PNN alone, and the common machine learning models. It is found that DNN-PNN not only has high prediction accuracy, but also has significant advantages in stability and convergence speed.
Collapse
Affiliation(s)
- Siqi Chen
- College of Intelligence and Computing, Tianjin University, Tianjin 300072, China.
| | - Yang Yang
- College of Intelligence and Computing, Tianjin University, Tianjin 300072, China
| | - Haoran Zhou
- College of Intelligence and Computing, Tianjin University, Tianjin 300072, China
| | - Qisong Sun
- College of Intelligence and Computing, Tianjin University, Tianjin 300072, China
| | - Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
16
|
Xie M, Lei X, Zhong J, Ouyang J, Li G. Drug response prediction using graph representation learning and Laplacian feature selection. BMC Bioinformatics 2022; 23:532. [PMID: 36494630 PMCID: PMC9733001 DOI: 10.1186/s12859-022-05080-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 11/22/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Knowing the responses of a patient to drugs is essential to make personalized medicine practical. Since the current clinical drug response experiments are time-consuming and expensive, utilizing human genomic information and drug molecular characteristics to predict drug responses is of urgent importance. Although a variety of computational drug response prediction methods have been proposed, their effectiveness is still not satisfying. RESULTS In this study, we propose a method called LGRDRP (Learning Graph Representation for Drug Response Prediction) to predict cell line-drug responses. At first, LGRDRP constructs a heterogeneous network integrating multiple kinds of information: cell line miRNA expression profiles, drug chemical structure similarity, gene-gene interaction, cell line-gene interaction and known cell line-drug responses. Then, for each cell line, learning graph representation and Laplacian feature selection are combined to obtain network topology features related to the cell line. The learning graph representation method learns network topology structure features, and the Laplacian feature selection method further selects out some most important ones from them. Finally, LGRDRP trains an SVM model to predict drug responses based on the selected features of the known cell line-drug responses. Our five-fold cross-validation results show that LGRDRP is significantly superior to the art-of-the-state methods in the measures of the average area under the receiver operating characteristics curve, the average area under the precision-recall curve and the recall rate of top-k predicted sensitive cell lines. CONCLUSIONS Our results demonstrated that the usage of multiple types of information about cell lines and drugs, the learning graph representation method, and the Laplacian feature selection is useful to the improvement of performance in predicting drug responses. We believe that such an approach would be easily extended to similar problems such as miRNA-disease relationship inference.
Collapse
Affiliation(s)
- Minzhu Xie
- grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, China ,grid.411427.50000 0001 0089 3695Key Laboratory of Computing and Stochastic Mathematics (LCSM) (Ministry of Education), School of Mathematics and Statistics, Hunan Normal University, Changsha, China
| | - Xiaowen Lei
- grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Jianchen Zhong
- grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Jianxing Ouyang
- grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Guijing Li
- grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, China
| |
Collapse
|
17
|
Hao Y, Romano JD, Moore JH. Knowledge-guided deep learning models of drug toxicity improve interpretation. PATTERNS (NEW YORK, N.Y.) 2022; 3:100565. [PMID: 36124309 PMCID: PMC9481960 DOI: 10.1016/j.patter.2022.100565] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 05/16/2022] [Accepted: 07/12/2022] [Indexed: 12/04/2022]
Abstract
In drug development, a major reason for attrition is the lack of understanding of cellular mechanisms governing drug toxicity. The black-box nature of conventional classification models has limited their utility in identifying toxicity pathways. Here we developed DTox (deep learning for toxicology), an interpretation framework for knowledge-guided neural networks, which can predict compound response to toxicity assays and infer toxicity pathways of individual compounds. We demonstrate that DTox can achieve the same level of predictive performance as conventional models with a significant improvement in interpretability. Using DTox, we were able to rediscover mechanisms of transcription activation by three nuclear receptors, recapitulate cellular activities induced by aromatase inhibitors and pregnane X receptor (PXR) agonists, and differentiate distinctive mechanisms leading to HepG2 cytotoxicity. Virtual screening by DTox revealed that compounds with predicted cytotoxicity are at higher risk for clinical hepatic phenotypes. In summary, DTox provides a framework for deciphering cellular mechanisms of toxicity in silico.
Collapse
Affiliation(s)
- Yun Hao
- Genomics and Computational Biology (GCB) Graduate Program, University of Pennsylvania, Philadelphia, PA, USA
| | - Joseph D. Romano
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
- Center of Excellence in Environmental Toxicology, University of Pennsylvania, Philadelphia, PA, USA
| | - Jason H. Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| |
Collapse
|
18
|
Paltun BG, Kaski S, Mamitsuka H. DIVERSE: Bayesian Data IntegratiVE Learning for Precise Drug ResponSE Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2197-2207. [PMID: 33705322 DOI: 10.1109/tcbb.2021.3065535] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Detecting predictive biomarkers from multi-omics data is important for precision medicine, to improve diagnostics of complex diseases and for better treatments. This needs substantial experimental efforts that are made difficult by the heterogeneity of cell lines and huge cost. An effective solution is to build a computational model over the diverse omics data, including genomic, molecular, and environmental information. However, choosing informative and reliable data sources from among the different types of data is a challenging problem. We propose DIVERSE, a framework of Bayesian importance-weighted tri- and bi-matrix factorization(DIVERSE3 or DIVERSE2) to predict drug responses from data of cell lines, drugs, and gene interactions. DIVERSE integrates the data sources systematically, in a step-wise manner, examining the importance of each added data set in turn. More specifically, we sequentially integrate five different data sets, which have not all been combined in earlier bioinformatic methods for predicting drug responses. Empirical experiments show that DIVERSE clearly outperformed five other methods including three state-of-the-art approaches, under cross-validation, particularly in out-of-matrix prediction, which is closer to the setting of real use cases and more challenging than simpler in-matrix prediction. Additionally, case studies for discovering new drugs further confirmed the performance advantage of DIVERSE.
Collapse
|
19
|
Automatic identification of drug sensitivity of cancer cell with novel regression-based ensemble convolution neural network model. Soft comput 2022. [DOI: 10.1007/s00500-022-07098-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
20
|
Shahzad M, Tahir MA, Khan MA, Jiang R, Malick RAS. EBSRMF: Ensemble based similarity-regularized matrix factorization to predict anticancer drug responses. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-212867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Drug sensitivity prediction to a panel of cancer cell lines using computational approaches has been a challenge for two decades. With the emergence of high-throughput screening technologies, thousands of compounds and cancer cell lines panels with drug sensitivity data are publicly available at various pharmacogenomics databases. Analyzing these data is crucial to improve cancer treatment and develop new anticancer drugs. In this work, we propose EBSRMF: Ensemble Based Similarity-Regularized Matrix Factorization, which is a bagging based framework to improve the drug sensitivity prediction on the Cancer Cell Line Encyclopedia (CCLE) data. Based on the fact that similar drugs and cell lines exhibit similar drug response, we have investigated cell line and drug similarity matrices based on gene expression profiles and chemical structure respectively. The drug sensitivity value is used as outcome values which are the half maximal inhibitory concentrations (IC50). In order to improve the generalization ability of the proposed model, a homogeneous ensemble based bagging learning approach is also investigated where multiple SRMF models are used to train N subsets of the input data. The outcome of each training algorithm is aggregated using the averaging method to predict the outcome. Experiments are conducted on two benchmark datasets: CCLE and GDSC. The proposed model is compared with state-of-the-art models using multiple evaluation metrics including Root Means Square Error (RMSE) and Pearson Correlation Coefficient (PCC). The proposed model is quite promising and achieves better performance on CCLE dataset when compared with the existing approaches.
Collapse
Affiliation(s)
- Muhammad Shahzad
- School of Computing and Communications, Lancaster University, Lancaster, United Kingdom
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi Campus, Pakistan
| | - M. Atif Tahir
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi Campus, Pakistan
| | - M. Atta Khan
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi Campus, Pakistan
| | - Richard Jiang
- School of Computing and Communications, Lancaster University, Lancaster, United Kingdom
| | - Rauf Ahmed Shams Malick
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi Campus, Pakistan
| |
Collapse
|
21
|
Jiang L, Jiang C, Yu X, Fu R, Jin S, Liu X. DeepTTA: a transformer-based model for predicting cancer drug response. Brief Bioinform 2022; 23:6554594. [PMID: 35348595 DOI: 10.1093/bib/bbac100] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 02/08/2022] [Accepted: 02/27/2022] [Indexed: 12/27/2022] Open
Abstract
Identifying new lead molecules to treat cancer requires more than a decade of dedicated effort. Before selected drug candidates are used in the clinic, their anti-cancer activity is generally validated by in vitro cellular experiments. Therefore, accurate prediction of cancer drug response is a critical and challenging task for anti-cancer drugs design and precision medicine. With the development of pharmacogenomics, the combination of efficient drug feature extraction methods and omics data has made it possible to use computational models to assist in drug response prediction. In this study, we propose DeepTTA, a novel end-to-end deep learning model that utilizes transformer for drug representation learning and a multilayer neural network for transcriptomic data prediction of the anti-cancer drug responses. Specifically, DeepTTA uses transcriptomic gene expression data and chemical substructures of drugs for drug response prediction. Compared to existing methods, DeepTTA achieved higher performance in terms of root mean square error, Pearson correlation coefficient and Spearman's rank correlation coefficient on multiple test sets. Moreover, we discovered that anti-cancer drugs bortezomib and dactinomycin provide a potential therapeutic option with multiple clinical indications. With its excellent performance, DeepTTA is expected to be an effective method in cancer drug design.
Collapse
Affiliation(s)
- Likun Jiang
- Department of Computer Science, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Changzhi Jiang
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Xinyu Yu
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Rao Fu
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Shuting Jin
- Department of Computer Science, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Xiangrong Liu
- Department of Computer Science, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| |
Collapse
|
22
|
Wang Z, Wang Z, Huang Y, Lu L, Fu Y. A multi-view multi-omics model for cancer drug response prediction. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03294-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
23
|
Zhang L, Yuan Y, Yu J, Liu H. SEMCM: A Self-Expressive Matrix Completion Model for Anti-cancer Drug Sensitivity Prediction. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220302123118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Genomic data sets generated by several recent large scale high-throughput screening efforts pose a thorny computational challenge for anticancer drug sensitivity prediction.
Objective:
We aimed to design an algorithm model that would predict missing elements in incomplete matrices and could be applicable to drug response prediction programs.
Method:
We developed a novel self-expressive matrix completion model to improve the predictive performance of drug response prediction problems. The model is based on the idea of subspace clustering and as a convex problem, it can be solved by alternating direction method of
multipliers. The original incomplete matrix can be filled through model training and parameters updated iteratively.
Results:
We applied SEMCM to Genomics of Drug Sensitivity in Cancer
(GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets to predict
unknown response values. A large number of experiments have proved that the algorithm has good prediction results and stability, which are better than several existing advanced drug sensitivity prediction and matrix
completion algorithms. Without modeling mutation information, SEMCM
could correctly predict cell line-drug associations for mutated cell lines and
wild cell lines. SEMCM can also be used for drug repositioning. The newly
predicted drug responses of GDSC dataset suggest that BL-41 was highly
sensitive to Bortezomib. Moreover, the sensitivity of A172 and NCI-H1437
to Paclitaxel was roughly the same.
Conclusion:
We report an efficient anticancer drug sensitivity prediction algorithm which is open-source and can predict the unknown responses of
cancer cell lines to drugs. Experimental results prove that our method can
not only improve the prediction accuracy but also can be applied to drug
repositioning.
Collapse
Affiliation(s)
- Lin Zhang
- Engineering Research Center of Intelligent Control for Underground
Space, Ministry of Education,
- China University of Mining and Technology, Xuzhou 221116, China
| | - Yuwei Yuan
- Engineering Research Center of Intelligent Control for Underground
Space, Ministry of Education,
- China University of Mining and Technology, Xuzhou 221116, China
| | - Jian Yu
- Engineering Research Center of Intelligent Control for Underground
Space, Ministry of Education,
- China University of Mining and Technology, Xuzhou 221116, China
| | - Hui Liu
- Engineering Research Center of Intelligent Control for Underground
Space, Ministry of Education,
- China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
24
|
Prasse P, Iversen P, Lienhard M, Thedinga K, Bauer C, Herwig R, Scheffer T. Matching anticancer compounds and tumor cell lines by neural networks with ranking loss. NAR Genom Bioinform 2022; 4:lqab128. [PMID: 35047818 PMCID: PMC8759564 DOI: 10.1093/nargab/lqab128] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 12/03/2021] [Accepted: 12/29/2021] [Indexed: 12/24/2022] Open
Abstract
Computational drug sensitivity models have the potential to improve therapeutic outcomes by identifying targeted drug components that are likely to achieve the highest efficacy for a cancer cell line at hand at a therapeutic dose. State of the art drug sensitivity models use regression techniques to predict the inhibitory concentration of a drug for a tumor cell line. This regression objective is not directly aligned with either of these principal goals of drug sensitivity models: We argue that drug sensitivity modeling should be seen as a ranking problem with an optimization criterion that quantifies a drug's inhibitory capacity for the cancer cell line at hand relative to its toxicity for healthy cells. We derive an extension to the well-established drug sensitivity regression model PaccMann that employs a ranking loss and focuses on the ratio of inhibitory concentration and therapeutic dosage range. We find that the ranking extension significantly enhances the model's capability to identify the most effective anticancer drugs for unseen tumor cell profiles based in on in-vitro data.
Collapse
Affiliation(s)
- Paul Prasse
- To whom correspondence should be addressed. Tel: +49 331 977 3829;
| | | | - Matthias Lienhard
- Dep. Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Kristina Thedinga
- Dep. Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | | | - Ralf Herwig
- Dep. Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Tobias Scheffer
- University of Potsdam, Department of Computer Science, Potsdam, Germany
| |
Collapse
|
25
|
Firoozbakht F, Yousefi B, Schwikowski B. An overview of machine learning methods for monotherapy drug response prediction. Brief Bioinform 2022; 23:bbab408. [PMID: 34619752 PMCID: PMC8769705 DOI: 10.1093/bib/bbab408] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/25/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
For an increasing number of preclinical samples, both detailed molecular profiles and their responses to various drugs are becoming available. Efforts to understand, and predict, drug responses in a data-driven manner have led to a proliferation of machine learning (ML) methods, with the longer term ambition of predicting clinical drug responses. Here, we provide a uniquely wide and deep systematic review of the rapidly evolving literature on monotherapy drug response prediction, with a systematic characterization and classification that comprises more than 70 ML methods in 13 subclasses, their input and output data types, modes of evaluation, and code and software availability. ML experts are provided with a fundamental understanding of the biological problem, and how ML methods are configured for it. Biologists and biomedical researchers are introduced to the basic principles of applicable ML methods, and their application to the problem of drug response prediction. We also provide systematic overviews of commonly used data sources used for training and evaluation methods.
Collapse
Affiliation(s)
- Farzaneh Firoozbakht
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Behnam Yousefi
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
- Sorbonne Université, École Doctorale Complexite du Vivant, Paris, France
| | - Benno Schwikowski
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
26
|
Hao Y, Moore JH. TargetTox: A Feature Selection Pipeline for Identifying Predictive Targets Associated with Drug Toxicity. J Chem Inf Model 2021; 61:5386-5394. [PMID: 34757743 DOI: 10.1021/acs.jcim.1c00733] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In silico assessment of drug toxicity is becoming a critical step in drug development. Conventional ligand-based models are limited by low accuracy and lack of interpretability. Further, they often fail to explain cellular mechanisms underlying structure-toxicity associations. We addressed these limitations by incorporating target profile as an intermediate connecting structure to toxicity. To accommodate for high-dimensional feature space, we developed a pipeline named TargetTox that can identity a subset of predictive features. We implemented TargetTox to study 569 targets and 815 adverse events. The features identified by TargetTox comprise less than 10% of the original feature space; nevertheless, they accurately predicted binding outcomes for 377 targets and toxicity outcomes for 36 adverse events. We demonstrated that predictive targets tend to be differentially expressed in the tissue of toxicity. We also rediscovered key cellular functions associated with cardiotoxicity from the predictive targets, as well as markers of skin and liver diseases. Furthermore, we found evidence supporting diagnostic and therapeutic applications of some predictive targets in hepatotoxicity and nephrotoxicity. Our findings highlighted the critical role of predictive targets in cellular mechanisms leading to toxicity. In general, our study improved the interpretability of toxicity prediction without sacrificing accuracy. Our novel pipeline may benefit future studies of high-dimensional data sets.
Collapse
Affiliation(s)
- Yun Hao
- Genomics and Computational Biology (GCB) Graduate Program, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| | - Jason H Moore
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| |
Collapse
|
27
|
Wang S, Li J, Wang Y. WMMDCA: Prediction of Drug Responses by Weight-Based Modular Mapping in Cancer Cell Lines. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2733-2740. [PMID: 32142453 DOI: 10.1109/tcbb.2020.2976997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Due to the high consumption of cost and time for experimental verification in clinical trials, drug response prediction by computational models have become important challenges. The existing drug response data in diverse cell lines enable prediction of potential sensitive associations. Here, we propose a weight-based modular mapping method, named as WMMDCA, to predict drug-cell line associations. The method fully considers the effects of drugs' chemical structural feature, and adds modular information into the network projection. Leave-one-out cross-validation was used to evaluate the predictive ability of WMMDCA, which showed the best performance among several state-of-the-art methods in not only the whole dataset but also the major tissue types of cell lines. Literature support of highly ranked potential associations was found manually, demonstrating the effectiveness of WMMDCA on drug response prediction.
Collapse
|
28
|
Zuo Z, Wang P, Chen X, Tian L, Ge H, Qian D. SWnet: a deep learning model for drug response prediction from cancer genomic signatures and compound chemical structures. BMC Bioinformatics 2021; 22:434. [PMID: 34507532 PMCID: PMC8434731 DOI: 10.1186/s12859-021-04352-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 08/31/2021] [Indexed: 12/13/2022] Open
Abstract
Background One of the major challenges in precision medicine is accurate prediction of individual patient’s response to drugs. A great number of computational methods have been developed to predict compounds activity using genomic profiles or chemical structures, but more exploration is yet to be done to combine genetic mutation, gene expression, and cheminformatics in one machine learning model. Results We presented here a novel deep-learning model that integrates gene expression, genetic mutation, and chemical structure of compounds in a multi-task convolutional architecture. We applied our model to the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets. We selected relevant cancer-related genes based on oncology genetics database and L1000 landmark genes, and used their expression and mutations as genomic features in model training. We obtain the cheminformatics features for compounds from PubChem or ChEMBL. Our finding is that combining gene expression, genetic mutation, and cheminformatics features greatly enhances the predictive performance. Conclusion We implemented an extended Graph Neural Network for molecular graphs and Convolutional Neural Network for gene features. With the employment of multi-tasking and self-attention functions to monitor the similarity between compounds, our model outperforms recently published methods using the same training and testing datasets. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04352-9.
Collapse
Affiliation(s)
- Zhaorui Zuo
- Institute of Medical Robotics, Shanghai Jiao Tong University, 2F of the Translational Medicine Building, No. 800 Dongchuan Road, Shanghai, 200000, China
| | - Penglei Wang
- Institute of Medical Robotics, Shanghai Jiao Tong University, 2F of the Translational Medicine Building, No. 800 Dongchuan Road, Shanghai, 200000, China
| | - Xiaowei Chen
- Novartis Institutes for Biomedical Research, 4218 Jinke Road, Pudong, Shanghai, 201203, China
| | - Li Tian
- Novartis Institutes for Biomedical Research, 4218 Jinke Road, Pudong, Shanghai, 201203, China
| | - Hui Ge
- Novartis Institutes for Biomedical Research, 4218 Jinke Road, Pudong, Shanghai, 201203, China.
| | - Dahong Qian
- Institute of Medical Robotics, Shanghai Jiao Tong University, 2F of the Translational Medicine Building, No. 800 Dongchuan Road, Shanghai, 200000, China.
| |
Collapse
|
29
|
Piyawajanusorn C, Nguyen LC, Ghislat G, Ballester PJ. A gentle introduction to understanding preclinical data for cancer pharmaco-omic modeling. Brief Bioinform 2021; 22:6343527. [PMID: 34368843 DOI: 10.1093/bib/bbab312] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 06/25/2021] [Accepted: 07/20/2021] [Indexed: 12/16/2022] Open
Abstract
A central goal of precision oncology is to administer an optimal drug treatment to each cancer patient. A common preclinical approach to tackle this problem has been to characterize the tumors of patients at the molecular and drug response levels, and employ the resulting datasets for predictive in silico modeling (mostly using machine learning). Understanding how and why the different variants of these datasets are generated is an important component of this process. This review focuses on providing such introduction aimed at scientists with little previous exposure to this research area.
Collapse
Affiliation(s)
- Chayanit Piyawajanusorn
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France.,Institut Paoli-Calmettes, F-13009 Marseille, France.,Aix-Marseille Université, F-13284 Marseille, France.,CNRS UMR7258, F-13009 Marseille, France.,Faculty of Medicine and Public Health, HRH Princess Chulabhorn College of Medical Science, Chulabhorn Royal Academy, Bangkok, Thailand
| | - Linh C Nguyen
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France.,Institut Paoli-Calmettes, F-13009 Marseille, France.,Aix-Marseille Université, F-13284 Marseille, France.,CNRS UMR7258, F-13009 Marseille, France.,Department of Life Sciences, University of Science and Technology of Hanoi, Vietnam Academy of Science and Technology, Hanoi, Vietnam
| | - Ghita Ghislat
- U1104, CNRS UMR7280, Centre d'Immunologie de Marseille-Luminy, Inserm, Marseille, France
| | - Pedro J Ballester
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France.,Institut Paoli-Calmettes, F-13009 Marseille, France.,Aix-Marseille Université, F-13284 Marseille, France.,CNRS UMR7258, F-13009 Marseille, France
| |
Collapse
|
30
|
Rafique R, Islam SR, Kazi JU. Machine learning in the prediction of cancer therapy. Comput Struct Biotechnol J 2021; 19:4003-4017. [PMID: 34377366 PMCID: PMC8321893 DOI: 10.1016/j.csbj.2021.07.003] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 07/06/2021] [Accepted: 07/07/2021] [Indexed: 12/15/2022] Open
Abstract
Resistance to therapy remains a major cause of cancer treatment failures, resulting in many cancer-related deaths. Resistance can occur at any time during the treatment, even at the beginning. The current treatment plan is dependent mainly on cancer subtypes and the presence of genetic mutations. Evidently, the presence of a genetic mutation does not always predict the therapeutic response and can vary for different cancer subtypes. Therefore, there is an unmet need for predictive models to match a cancer patient with a specific drug or drug combination. Recent advancements in predictive models using artificial intelligence have shown great promise in preclinical settings. However, despite massive improvements in computational power, building clinically useable models remains challenging due to a lack of clinically meaningful pharmacogenomic data. In this review, we provide an overview of recent advancements in therapeutic response prediction using machine learning, which is the most widely used branch of artificial intelligence. We describe the basics of machine learning algorithms, illustrate their use, and highlight the current challenges in therapy response prediction for clinical practice.
Collapse
Affiliation(s)
| | - S.M. Riazul Islam
- Department of Computer Science and Engineering, Sejong University, Seoul, South Korea
| | - Julhash U. Kazi
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Corresponding author at: Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Medicon village Building 404:C3, Scheelevägen 8, 22363 Lund, Sweden.
| |
Collapse
|
31
|
Li M, Wang Y, Zheng R, Shi X, Li Y, Wu FX, Wang J. DeepDSC: A Deep Learning Method to Predict Drug Sensitivity of Cancer Cell Lines. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:575-582. [PMID: 31150344 DOI: 10.1109/tcbb.2019.2919581] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
High-throughput screening technologies have provided a large amount of drug sensitivity data for a panel of cancer cell lines and hundreds of compounds. Computational approaches to analyzing these data can benefit anticancer therapeutics by identifying molecular genomic determinants of drug sensitivity and developing new anticancer drugs. In this study, we have developed a deep learning architecture to improve the performance of drug sensitivity prediction based on these data. We integrated both genomic features of cell lines and chemical information of compounds to predict the half maximal inhibitory concentrations [Formula: see text] on the Cancer Cell Line Encyclopedia (CCLE) and the Genomics of Drug Sensitivity in Cancer (GDSC) datasets using a deep neural network, which we called DeepDSC. Specifically, we first applied a stacked deep autoencoder to extract genomic features of cell lines from gene expression data, and then combined the compounds' chemical features to these genomic features to produce final response data. We conducted 10-fold cross-validation to demonstrate the performance of our deep model in terms of root-mean-square error (RMSE) and coefficient of determination [Formula: see text]. We show that our model outperforms the previous approaches with RMSE of 0.23 and [Formula: see text] of 0.78 on CCLE dataset, and RMSE of 0.52 and [Formula: see text] of 0.78 on GDSC dataset, respectively. Moreover, to demonstrate the prediction ability of our models on novel cell lines or novel compounds, we left cell lines originating from the same tissue and each compound out as the test sets, respectively, and the rest as training sets. The performance was comparable to other methods.
Collapse
|
32
|
Lloyd JP, Soellner MB, Merajver SD, Li JZ. Impact of between-tissue differences on pan-cancer predictions of drug sensitivity. PLoS Comput Biol 2021; 17:e1008720. [PMID: 33630864 PMCID: PMC7906305 DOI: 10.1371/journal.pcbi.1008720] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 01/18/2021] [Indexed: 11/24/2022] Open
Abstract
Increased availability of drug response and genomics data for many tumor cell lines has accelerated the development of pan-cancer prediction models of drug response. However, it is unclear how much between-tissue differences in drug response and molecular characteristics may contribute to pan-cancer predictions. Also unknown is whether the performance of pan-cancer models could vary by cancer type. Here, we built a series of pan-cancer models using two datasets containing 346 and 504 cell lines, each with MEK inhibitor (MEKi) response and mRNA expression, point mutation, and copy number variation data, and found that, while the tissue-level drug responses are accurately predicted (between-tissue ρ = 0.88–0.98), only 5 of 10 cancer types showed successful within-tissue prediction performance (within-tissue ρ = 0.11–0.64). Between-tissue differences make substantial contributions to the performance of pan-cancer MEKi response predictions, as exclusion of between-tissue signals leads to a decrease in Spearman’s ρ from a range of 0.43–0.62 to 0.30–0.51. In practice, joint analysis of multiple cancer types usually has a larger sample size, hence greater power, than for one cancer type; and we observe that higher accuracy of pan-cancer prediction of MEKi response is almost entirely due to the sample size advantage. Success of pan-cancer prediction reveals how drug response in different cancers may invoke shared regulatory mechanisms despite tissue-specific routes of oncogenesis, yet predictions in different cancer types require flexible incorporation of between-cancer and within-cancer signals. As most datasets in genome sciences contain multiple levels of heterogeneity, careful parsing of group characteristics and within-group, individual variation is essential when making robust inference. One of the central goals for precision oncology is to tailor treatment of individual tumors by their molecular characteristics. While drug response predictions have traditionally been sought within each cancer type, it has long been hoped to develop more robust predictions by jointly considering diverse cancer types. While such pan-cancer approaches have improved in recent years, it remains unclear whether between-tissue differences are contributing to the reported pan-cancer prediction performance. This concern stems from the observation that, when cancer types differ in both molecular features and drug response, strong predictive information can come mainly from differences among tissue types. Our study finds that both between- and within-cancer type signals provide substantial contributions to pan-cancer drug response prediction models, and about half of the cancer types examined are poorly predicted despite strong overall performance across all cancer types. We also find that pan-cancer prediction models perform similarly or better than cancer type-specific models, and in many cases the advantage of pan-cancer models is due to the larger number of samples available for pan-cancer analysis. Our results highlight tissue-of-origin as a key consideration for pan-cancer drug response prediction models, and recommend cancer type-specific considerations when translating pan-cancer prediction models for clinical use.
Collapse
Affiliation(s)
- John P Lloyd
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan, United States of America.,Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America.,Rogel Cancer Center, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Matthew B Soellner
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America.,Rogel Cancer Center, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sofia D Merajver
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America.,Rogel Cancer Center, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jun Z Li
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan, United States of America.,Rogel Cancer Center, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
33
|
Sharma A, Rani R. Ensembled machine learning framework for drug sensitivity prediction. IET Syst Biol 2020; 14:39-46. [PMID: 31931480 DOI: 10.1049/iet-syb.2018.5094] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Drug sensitivity prediction is one of the critical tasks involved in drug designing and discovery. Recently several online databases and consortiums have contributed to providing open access to pharmacogenomic data. These databases have helped in developing computational approaches for drug sensitivity prediction. Cancer is a complex disease involving the heterogeneous behaviour of same tumour-type patients towards the same kind of drug therapy. Several methods have been proposed in the literature to predict drug sensitivity. However, these methods are not efficient enough to predict drug sensitivity. The present study has proposed an ensemble learning framework for drug-response prediction using a modified rotation forest. The proposed framework is further compared with three state-of-the-art algorithms and two baseline methods using Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) drug screens. The authors have also predicted missing drug response values in the data set using the proposed approach. The proposed approach outperforms other counterparts even though gene mutation data is not incorporated while designing the approach. An average mean square error of 3.14 and 0.404 is achieved using GDSC and CCLE drug screens, respectively. The obtained results show that the proposed framework has considerable potential to improve anti-cancer drug response prediction.
Collapse
|
34
|
Kleandrova VV, Scotti MT, Scotti L, Nayarisseri A, Speck-Planche A. Cell-based multi-target QSAR model for design of virtual versatile inhibitors of liver cancer cell lines. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2020; 31:815-836. [PMID: 32967475 DOI: 10.1080/1062936x.2020.1818617] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 08/31/2020] [Indexed: 06/11/2023]
Abstract
Liver cancers are one of the leading fatal diseases among malignant neoplasms. Current chemotherapeutic treatments used to fight these illnesses have become less efficient in terms of both efficacy and safety. Therefore, there is a great need of search for new anti-liver cancer agents and this can be accelerated by using computer-aided drug discovery approaches. In this work, we report the development of the first cell-based multi-target model based on quantitative structure-activity relationships (CBMT-QSAR) for the design and prediction of chemicals as anticancer agents against 17 liver cancer cell lines. While having a good quality and predictive power (accuracy higher than 80%) in the training and test sets, respectively, the CBMT-QSAR model was employed as a tool to directly extract suitable fragments from the physicochemical and structural interpretations of the molecular descriptors. Some of these desirable fragments were assembled, leading to the virtual design of eight molecules with drug-like properties, with six of them being predicted as versatile anticancer agents against the 17 liver cancer cell lines reported here.
Collapse
Affiliation(s)
- V V Kleandrova
- Laboratory of Fundamental and Applied Research of Quality and Technology of Food Production, Moscow State University of Food Production , Moscow, Russian Federation
| | - M T Scotti
- Postgraduate Program in Natural and Synthetic Bioactive Products, Federal University of Paraíba , João Pessoa, Brazil
| | - L Scotti
- Postgraduate Program in Natural and Synthetic Bioactive Products, Federal University of Paraíba , João Pessoa, Brazil
| | - A Nayarisseri
- In Silico Research Laboratory, Eminent Biosciences , Indore, Madhya Pradesh, India
| | - A Speck-Planche
- Postgraduate Program in Natural and Synthetic Bioactive Products, Federal University of Paraíba , João Pessoa, Brazil
| |
Collapse
|
35
|
Naulaerts S, Menden MP, Ballester PJ. Concise Polygenic Models for Cancer-Specific Identification of Drug-Sensitive Tumors from Their Multi-Omics Profiles. Biomolecules 2020; 10:E963. [PMID: 32604779 PMCID: PMC7356608 DOI: 10.3390/biom10060963] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 06/20/2020] [Accepted: 06/22/2020] [Indexed: 12/15/2022] Open
Abstract
In silico models to predict which tumors will respond to a given drug are necessary for Precision Oncology. However, predictive models are only available for a handful of cases (each case being a given drug acting on tumors of a specific cancer type). A way to generate predictive models for the remaining cases is with suitable machine learning algorithms that are yet to be applied to existing in vitro pharmacogenomics datasets. Here, we apply XGBoost integrated with a stringent feature selection approach, which is an algorithm that is advantageous for these high-dimensional problems. Thus, we identified and validated 118 predictive models for 62 drugs across five cancer types by exploiting four molecular profiles (sequence mutations, copy-number alterations, gene expression, and DNA methylation). Predictive models were found in each cancer type and with every molecular profile. On average, no omics profile or cancer type obtained models with higher predictive accuracy than the rest. However, within a given cancer type, some molecular profiles were overrepresented among predictive models. For instance, CNA profiles were predictive in breast invasive carcinoma (BRCA) cell lines, but not in small cell lung cancer (SCLC) cell lines where gene expression (GEX) and DNA methylation profiles were the most predictive. Lastly, we identified the best XGBoost model per cancer type and analyzed their selected features. For each model, some of the genes in the selected list had already been found to be individually linked to the response to that drug, providing additional evidence of the usefulness of these models and the merits of the feature selection scheme.
Collapse
Affiliation(s)
- Stefan Naulaerts
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France;
- Institut Paoli-Calmettes, F-13009 Marseille, France
- Aix-Marseille Université, F-13284 Marseille, France
- CNRS UMR7258, F-13009 Marseille, France
- Ludwig Institute for Cancer Research, de Duve Institute, Université catholique de Louvain, 1200 Brussels, Belgium
| | - Michael P. Menden
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, 85764 Neuherberg, Germany;
- Department of Biology, Ludwig-Maximilians University Munich, 82152 Planegg-Martinsried, Germany
- German Centre for Diabetes Research (DZD e.V.), 85764 Neuherberg, Germany
| | - Pedro J. Ballester
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France;
- Institut Paoli-Calmettes, F-13009 Marseille, France
- Aix-Marseille Université, F-13284 Marseille, France
- CNRS UMR7258, F-13009 Marseille, France
| |
Collapse
|
36
|
Wang H, Xi J, Wang M, Li A. Dual-Layer Strengthened Collaborative Topic Regression Modeling for Predicting Drug Sensitivity. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:587-598. [PMID: 30106738 DOI: 10.1109/tcbb.2018.2864739] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
An effective way to facilitate the development of modern oncology precision medicine is the systematical analysis of the known drug sensitivities that have emerged in recent years. Meanwhile, the screening of drug response in cancer cell lines provides an estimable genomic and pharmacological data towards high accuracy prediction. Existing works primarily utilize genomic or functional genomic features to classify or regress the drug response. Here in this work, by the migration and extension of the conventional merchandise recommendation methods, we introduce an innovation model on accurate drug sensitivity prediction by using dual-layer strengthened collaborative topic regression (DS-CTR), which incorporates not only the graphic model to jointly learn drugs and cell lines feature from pharmacogenomics data but also drug and cell line similarity network model to strengthen the correlation of the prediction results. Using Genomics of Drug Sensitivity in Cancer project (GDSC) as benchmark datasets, the 5-fold cross-validation experiment demonstrates that DS-CTR model significantly improves drug response prediction performance compared with four categories of state-of-the-art algorithms as for both Receiver Operator Curve (ROC) and the Area Under Receiver Operator Curve (AUC). By uncovering the unknown cell-drug associations with advanced literature evidences, our novel model DS-CTR is validated and supported. The model also provides the possibility to make the discovery of new anti-cancer therapeutics in the preclinical trials cheaper and faster.
Collapse
|
37
|
Kurilov R, Haibe-Kains B, Brors B. Assessment of modelling strategies for drug response prediction in cell lines and xenografts. Sci Rep 2020; 10:2849. [PMID: 32071383 PMCID: PMC7028927 DOI: 10.1038/s41598-020-59656-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Accepted: 01/23/2020] [Indexed: 12/20/2022] Open
Abstract
Data from several large high-throughput drug response screens have become available to the scientific community recently. Although many efforts have been made to use this information to predict drug sensitivity, our ability to accurately predict drug response based on genetic data remains limited. In order to systematically examine how different aspects of modelling affect the resulting prediction accuracy, we built a range of models for seven drugs (erlotinib, pacliatxel, lapatinib, PLX4720, sorafenib, nutlin-3 and nilotinib) using data from the largest available cell line and xenograft drug sensitivity screens. We found that the drug response metric, the choice of the molecular data type and the number of training samples have a substantial impact on prediction accuracy. We also compared the tasks of drug response prediction with tissue type prediction and found that, unlike for drug response, tissue type can be predicted with high accuracy. Furthermore, we assessed our ability to predict drug response in four xenograft cohorts (treated either with erlotinib, gemcitabine or paclitaxel) using models trained on cell line data. We could predict response in an erlotinib-treated cohort with a moderate accuracy (correlation ≈ 0.5), but were unable to correctly predict responses in cohorts treated with gemcitabine or paclitaxel.
Collapse
Affiliation(s)
- Roman Kurilov
- Division of Applied Bioinformatics, German Cancer Research Center, Heidelberg, Germany. .,Faculty of Biosciences, Heidelberg University, Heidelberg, Germany.
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Centre, Toronto, Ontario, M5G 1L7, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, M5G 1L7, Canada.,Department of Computer Science, University of Toronto, Toronto, Ontario, M5T 3A1, Canada.,Ontario Institute for Cancer Research, Toronto, Ontario, M5G 1L7, Canada
| | - Benedikt Brors
- Division of Applied Bioinformatics, German Cancer Research Center, Heidelberg, Germany.,National Center for Tumor Diseases (NCT), Heidelberg, Germany.,German Cancer Consortium (DKTK), Core Center Heidelberg, Heidelberg, Germany
| |
Collapse
|
38
|
Wang S, Li J. Modular within and between score for drug response prediction in cancer cell lines. Mol Omics 2020; 16:31-38. [PMID: 31802092 DOI: 10.1039/c9mo00162j] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Drug response prediction in cancer cell lines is vital to discover new anticancer drugs. However, it's still a challenging task to accurately predict drug responses in cancer cell lines. In this study, we presented a novel computational approach, named as MSDRP (modular within and between score for drug response prediction), to predict drug responses in cell lines. The method is based on a constructed heterogeneous drug-cell line network with multiple information. Compared with other state-of-the-art methods, MSDRP acquired better predictive performance, and identified potential associations between drugs and cell lines, which have been confirmed by the published literature. The source code of MSDRP is freely available at https://github.com/shimingwang1994/MSDRP.git.
Collapse
Affiliation(s)
- Shiming Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
| | | |
Collapse
|
39
|
He Y, Liu J, Ning X. Drug Selection via Joint Push and Learning to Rank. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:110-123. [PMID: 29994481 DOI: 10.1109/tcbb.2018.2848908] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Selecting the right drugs for the right patients is a primary goal of precision medicine. In this article, we consider the problem of cancer drug selection in a learning-to-rank framework. We have formulated the cancer drug selection problem as to accurately predicting 1) the ranking positions of sensitive drugs and 2) the ranking orders among sensitive drugs in cancer cell lines based on their responses to cancer drugs. We have developed a new learning-to-rank method, denoted as pLETORg, that predicts drug ranking structures in each cell line via using drug latent vectors and cell line latent vectors. The pLETORg method learns such latent vectors through explicitly enforcing that, in the drug ranking list of each cell line, the sensitive drugs are pushed above insensitive drugs, and meanwhile the ranking orders among sensitive drugs are correct. Genomics information on cell lines is leveraged in learning the latent vectors. Our experimental results on a benchmark cell line-drug response dataset demonstrate that the new pLETORg significantly outperforms the state-of-the-art method in prioritizing new sensitive drugs.
Collapse
|
40
|
Güvenç Paltun B, Mamitsuka H, Kaski S. Improving drug response prediction by integrating multiple data sources: matrix factorization, kernel and network-based approaches. Brief Bioinform 2019; 22:346-359. [PMID: 31838491 PMCID: PMC7820853 DOI: 10.1093/bib/bbz153] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 11/01/2019] [Accepted: 11/04/2019] [Indexed: 12/17/2022] Open
Abstract
Predicting the response of cancer cell lines to specific drugs is one of the central problems in personalized medicine, where the cell lines show diverse characteristics. Researchers have developed a variety of computational methods to discover associations between drugs and cell lines, and improved drug sensitivity analyses by integrating heterogeneous biological data. However, choosing informative data sources and methods that can incorporate multiple sources efficiently is the challenging part of successful analysis in personalized medicine. The reason is that finding decisive factors of cancer and developing methods that can overcome the problems of integrating data, such as differences in data structures and data complexities, are difficult. In this review, we summarize recent advances in data integration-based machine learning for drug response prediction, by categorizing methods as matrix factorization-based, kernel-based and network-based methods. We also present a short description of relevant databases used as a benchmark in drug response prediction analyses, followed by providing a brief discussion of challenges faced in integrating and interpreting data from multiple sources. Finally, we address the advantages of combining multiple heterogeneous data sources on drug sensitivity analysis by showing an experimental comparison. Contact: betul.guvenc@aalto.fi
Collapse
Affiliation(s)
- Betül Güvenç Paltun
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Samuel Kaski
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| |
Collapse
|
41
|
Network-based Biased Tree Ensembles (NetBiTE) for Drug Sensitivity Prediction and Drug Sensitivity Biomarker Identification in Cancer. Sci Rep 2019; 9:15918. [PMID: 31685861 PMCID: PMC6828742 DOI: 10.1038/s41598-019-52093-w] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 10/07/2019] [Indexed: 12/15/2022] Open
Abstract
We present the Network-based Biased Tree Ensembles (NetBiTE) method for drug sensitivity prediction and drug sensitivity biomarker identification in cancer using a combination of prior knowledge and gene expression data. Our devised method consists of a biased tree ensemble that is built according to a probabilistic bias weight distribution. The bias weight distribution is obtained from the assignment of high weights to the drug targets and propagating the assigned weights over a protein-protein interaction network such as STRING. The propagation of weights, defines neighborhoods of influence around the drug targets and as such simulates the spread of perturbations within the cell, following drug administration. Using a synthetic dataset, we showcase how application of biased tree ensembles (BiTE) results in significant accuracy gains at a much lower computational cost compared to the unbiased random forests (RF) algorithm. We then apply NetBiTE to the Genomics of Drug Sensitivity in Cancer (GDSC) dataset and demonstrate that NetBiTE outperforms RF in predicting IC50 drug sensitivity, only for drugs that target membrane receptor pathways (MRPs): RTK, EGFR and IGFR signaling pathways. We propose based on the NetBiTE results, that for drugs that inhibit MRPs, the expression of target genes prior to drug administration is a biomarker for IC50 drug sensitivity following drug administration. We further verify and reinforce this proposition through control studies on, PI3K/MTOR signaling pathway inhibitors, a drug category that does not target MRPs, and through assignment of dummy targets to MRP inhibiting drugs and investigating the variation in NetBiTE accuracy.
Collapse
|
42
|
Parca L, Pepe G, Pietrosanto M, Galvan G, Galli L, Palmeri A, Sciandrone M, Ferrè F, Ausiello G, Helmer-Citterich M. Modeling cancer drug response through drug-specific informative genes. Sci Rep 2019; 9:15222. [PMID: 31645597 PMCID: PMC6811538 DOI: 10.1038/s41598-019-50720-0] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 09/06/2019] [Indexed: 12/18/2022] Open
Abstract
Recent advances in pharmacogenomics have generated a wealth of data of different types whose analysis have helped in the identification of signatures of different cellular sensitivity/resistance responses to hundreds of chemical compounds. Among the different data types, gene expression has proven to be the more successful for the inference of drug response in cancer cell lines. Although effective, the whole transcriptome can introduce noise in the predictive models, since specific mechanisms are required for different drugs and these realistically involve only part of the proteins encoded in the genome. We analyzed the pharmacogenomics data of 961 cell lines tested with 265 anti-cancer drugs and developed different machine learning approaches for dissecting the genome systematically and predict drug responses using both drug-unspecific and drug-specific genes. These methodologies reach better response predictions for the vast majority of the screened drugs using tens to few hundreds genes specific to each drug instead of the whole genome, thus allowing a better understanding and interpretation of drug-specific response mechanisms which are not necessarily restricted to the drug known targets.
Collapse
Affiliation(s)
- Luca Parca
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Gerardo Pepe
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Marco Pietrosanto
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Giulio Galvan
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Leonardo Galli
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Antonio Palmeri
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
- Celgene Institute for Translational Research Europe, Sevilla, Spain
| | - Marco Sciandrone
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Fabrizio Ferrè
- Department of Pharmacy and Biotechnology, University of Bologna Alma Mater, Bologna, Italy
| | - Gabriele Ausiello
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | | |
Collapse
|
43
|
Manica M, Oskooei A, Born J, Subramanian V, Sáez-Rodríguez J, Rodríguez Martínez M. Toward Explainable Anticancer Compound Sensitivity Prediction via Multimodal Attention-Based Convolutional Encoders. Mol Pharm 2019; 16:4797-4806. [DOI: 10.1021/acs.molpharmaceut.9b00520] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
| | | | - Jannis Born
- IBM Research, 8803 Zürich, Switzerland
- ETH Zürich, 8092 Zürich, Switzerland
- University of Zürich, 8006 Zürich, Switzerland
| | | | | | | |
Collapse
|
44
|
Tejera E, Carrera I, Jimenes-Vargas K, Armijos-Jaramillo V, Sánchez-Rodríguez A, Cruz-Monteagudo M, Perez-Castillo Y. Cell fishing: A similarity based approach and machine learning strategy for multiple cell lines-compound sensitivity prediction. PLoS One 2019; 14:e0223276. [PMID: 31589649 PMCID: PMC6779297 DOI: 10.1371/journal.pone.0223276] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 09/17/2019] [Indexed: 12/21/2022] Open
Abstract
The prediction of cell-lines sensitivity to a given set of compounds is a very important factor in the optimization of in-vitro assays. To date, the most common prediction strategies are based upon machine learning or other quantitative structure-activity relationships (QSAR) based approaches. In the present research, we propose and discuss a straightforward strategy not based on any learning modelling but exclusively relying upon the chemical similarity of a query compound to reference compounds with annotated activity against cell lines. We also compare the performance of the proposed method to machine learning predictions on the same problem. A curated database of compounds-cell lines associations derived from ChemBL version 22 was created for algorithm construction and cross-validation. Validation was done using 10-fold cross-validation and testing the models on new data obtained from ChemBL version 25. In terms of accuracy, both methods perform similarly with values around 0.65 across 750 cell lines in 10-fold cross-validation experiments. By combining both methods it is possible to achieve 66% of correct classification rate in more than 26000 newly reported interactions comprising 11000 new compounds. A Web Service implementing the described approaches (both similarity and machine learning based models) is freely available at: http://bioquimio.udla.edu.ec/cellfishing.
Collapse
Affiliation(s)
- E. Tejera
- Ingeniería en Biotecnología, Facultad de Ingeniería y Ciencias Aplicadas, Universidad de Las Américas, Quito, Ecuador
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador
| | - I. Carrera
- Departamento de Informática y Ciencias de la Computación, Escuela Politécnica Nacional, Quito, Ecuador
- Departamento de Ciências de Computadores, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - Karina Jimenes-Vargas
- Ingeniería en Biotecnología, Facultad de Ingeniería y Ciencias Aplicadas, Universidad de Las Américas, Quito, Ecuador
| | - V. Armijos-Jaramillo
- Ingeniería en Biotecnología, Facultad de Ingeniería y Ciencias Aplicadas, Universidad de Las Américas, Quito, Ecuador
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador
| | - A. Sánchez-Rodríguez
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador
- Universidad Técnica Particular de Loja, Loja, Ecuador
| | - M. Cruz-Monteagudo
- Center for Computational Science (CCS), University of Miami (UM), Miami, FL, United States of America
- West Coast University, Miami, Florida, United States of America
| | - Y. Perez-Castillo
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador
- Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito, Ecuador
| |
Collapse
|
45
|
Abstract
Motivation Large-scale screenings of cancer cell lines with detailed molecular profiles against libraries of pharmacological compounds are currently being performed in order to gain a better understanding of the genetic component of drug response and to enhance our ability to recommend therapies given a patient's molecular profile. These comprehensive screens differ from the clinical setting in which (i) medical records only contain the response of a patient to very few drugs, (ii) drugs are recommended by doctors based on their expert judgment and (iii) selecting the most promising therapy is often more important than accurately predicting the sensitivity to all potential drugs. Current regression models for drug sensitivity prediction fail to account for these three properties. Results We present a machine learning approach, named Kernelized Rank Learning (KRL), that ranks drugs based on their predicted effect per cell line (patient), circumventing the difficult problem of precisely predicting the sensitivity to the given drug. Our approach outperforms several state-of-the-art predictors in drug recommendation, particularly if the training dataset is sparse, and generalizes to patient data. Our work phrases personalized drug recommendation as a new type of machine learning problem with translational potential to the clinic. Availability and implementation The Python implementation of KRL and scripts for running our experiments are available at https://github.com/BorgwardtLab/Kernelized-Rank-Learning. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiao He
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Lukas Folkman
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Karsten Borgwardt
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
46
|
Liu P, Li H, Li S, Leung KS. Improving prediction of phenotypic drug response on cancer cell lines using deep convolutional network. BMC Bioinformatics 2019; 20:408. [PMID: 31357929 PMCID: PMC6664725 DOI: 10.1186/s12859-019-2910-6] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2018] [Accepted: 05/21/2019] [Indexed: 12/11/2022] Open
Abstract
Background Understanding the phenotypic drug response on cancer cell lines plays a vital role in anti-cancer drug discovery and re-purposing. The Genomics of Drug Sensitivity in Cancer (GDSC) database provides open data for researchers in phenotypic screening to build and test their models. Previously, most research in these areas starts from the molecular fingerprints or physiochemical features of drugs, instead of their structures. Results In this paper, a model called twin Convolutional Neural Network for drugs in SMILES format (tCNNS) is introduced for phenotypic screening. tCNNS uses a convolutional network to extract features for drugs from their simplified molecular input line entry specification (SMILES) format and uses another convolutional network to extract features for cancer cell lines from the genetic feature vectors respectively. After that, a fully connected network is used to predict the interaction between the drugs and the cancer cell lines. When the training set and the testing set are divided based on the interaction pairs between drugs and cell lines, tCNNS achieves 0.826, 0.831 for the mean and top quartile of the coefficient of determinant (R2) respectively and 0.909, 0.912 for the mean and top quartile of the Pearson correlation (Rp) respectively, which are significantly better than those of the previous works (Ammad-Ud-Din et al., J Chem Inf Model 54:2347–9, 2014), (Haider et al., PLoS ONE 10:0144490, 2015), (Menden et al., PLoS ONE 8:61318, 2013). However, when the training set and the testing set are divided exclusively based on drugs or cell lines, the performance of tCNNS decreases significantly and Rp and R2 drop to barely above 0. Conclusions Our approach is able to predict the drug effects on cancer cell lines with high accuracy, and its performance remains stable with less but high-quality data, and with fewer features for the cancer cell lines. tCNNS can also solve the problem of outliers in other feature space. Besides achieving high scores in these statistical metrics, tCNNS also provides some insights into the phenotypic screening. However, the performance of tCNNS drops in the blind test. Electronic supplementary material The online version of this article (10.1186/s12859-019-2910-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pengfei Liu
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China.
| | - Hongjian Li
- SDIVF R&D Centre, Hong Kong Science Park, Sha Tin, N.T., Hong Kong, China.,CUHK-SDU Reproductive Genetics Joint Laboratory, School of Biomedical Sciences, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China
| | - Shuai Li
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China
| |
Collapse
|
47
|
Sidorov P, Naulaerts S, Ariey-Bonnet J, Pasquier E, Ballester PJ. Predicting Synergism of Cancer Drug Combinations Using NCI-ALMANAC Data. Front Chem 2019; 7:509. [PMID: 31380352 PMCID: PMC6646421 DOI: 10.3389/fchem.2019.00509] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 07/02/2019] [Indexed: 12/15/2022] Open
Abstract
Drug combinations are of great interest for cancer treatment. Unfortunately, the discovery of synergistic combinations by purely experimental means is only feasible on small sets of drugs. In silico modeling methods can substantially widen this search by providing tools able to predict which of all possible combinations in a large compound library are synergistic. Here we investigate to which extent drug combination synergy can be predicted by exploiting the largest available dataset to date (NCI-ALMANAC, with over 290,000 synergy determinations). Each cell line is modeled using primarily two machine learning techniques, Random Forest (RF) and Extreme Gradient Boosting (XGBoost), on the datasets provided by NCI-ALMANAC. This large-scale predictive modeling study comprises more than 5,000 pair-wise drug combinations, 60 cell lines, 4 types of models, and 5 types of chemical features. The application of a powerful, yet uncommonly used, RF-specific technique for reliability prediction is also investigated. The evaluation of these models shows that it is possible to predict the synergy of unseen drug combinations with high accuracy (Pearson correlations between 0.43 and 0.86 depending on the considered cell line, with XGBoost providing slightly better predictions than RF). We have also found that restricting to the most reliable synergy predictions results in at least 2-fold error decrease with respect to employing the best learning algorithm without any reliability estimation. Alkylating agents, tyrosine kinase inhibitors and topoisomerase inhibitors are the drugs whose synergy with other partner drugs are better predicted by the models. Despite its leading size, NCI-ALMANAC comprises an extremely small part of all conceivable combinations. Given their accuracy and reliability estimation, the developed models should drastically reduce the number of required in vitro tests by predicting in silico which of the considered combinations are likely to be synergistic.
Collapse
Affiliation(s)
- Pavel Sidorov
- CRCM, INSERM, Cancer Research Center of Marseille, Institut Paoli-Calmettes, Aix-Marseille Univ, CNRS, Marseille, France
| | - Stefan Naulaerts
- CRCM, INSERM, Cancer Research Center of Marseille, Institut Paoli-Calmettes, Aix-Marseille Univ, CNRS, Marseille, France
- Department of Tumor Immunology, Institut de Duve, Bruxelles, Belgium
| | - Jérémy Ariey-Bonnet
- CRCM, INSERM, Cancer Research Center of Marseille, Institut Paoli-Calmettes, Aix-Marseille Univ, CNRS, Marseille, France
| | - Eddy Pasquier
- CRCM, INSERM, Cancer Research Center of Marseille, Institut Paoli-Calmettes, Aix-Marseille Univ, CNRS, Marseille, France
| | - Pedro J. Ballester
- CRCM, INSERM, Cancer Research Center of Marseille, Institut Paoli-Calmettes, Aix-Marseille Univ, CNRS, Marseille, France
| |
Collapse
|
48
|
Lind AP, Anderson PC. Predicting drug activity against cancer cells by random forest models based on minimal genomic information and chemical properties. PLoS One 2019; 14:e0219774. [PMID: 31295321 PMCID: PMC6622537 DOI: 10.1371/journal.pone.0219774] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 07/01/2019] [Indexed: 12/27/2022] Open
Abstract
A key goal of precision medicine is predicting the best drug therapy for a specific patient from genomic information. In oncology, cancers that appear similar pathologically can vary greatly in how they respond to the same drug. Fortunately, data from high-throughput screening programs often reveal important relationships between genomic variability of cancer cells and their response to drugs. Nevertheless, many current computational methods to predict compound activity against cancer cells require large quantities of genomic, epigenomic, and additional cellular data to develop and to apply. Here we integrate recent screening data and machine learning to train classification models that predict the activity/inactivity of compounds against cancer cells based on the mutational status of only 145 oncogenes and a set of compound structural descriptors. Using IC50 values of 1 μM as activity cutoffs, our predictive models have sensitivities of 87%, specificities of 87%, and yield an area under the receiver operating characteristic curve equal to 0.94. We also develop regression models to predict log(IC50) values of compounds for cancer cells; the models achieve a Pearson correlation coefficient of 0.86 for cross-validation and up to 0.65-0.73 against blind test sets. Predictive performance remains strong when as few as 50 oncogenes are included. Finally, even when 40% of experimental IC50 values are missing from screening data, they can be imputed with sufficient reliability that classification accuracy is not diminished. The presented models are fast to generate and may serve as easily implemented screening tools for personalized oncology medicine, drug repurposing, and drug discovery.
Collapse
Affiliation(s)
- Alex P. Lind
- Physical Sciences Division, University of Washington Bothell, Bothell, Washington, United States of America
| | - Peter C. Anderson
- Physical Sciences Division, University of Washington Bothell, Bothell, Washington, United States of America
| |
Collapse
|
49
|
Cortés-Ciriano I, Bender A. KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images. J Cheminform 2019; 11:41. [PMID: 31218493 PMCID: PMC6582521 DOI: 10.1186/s13321-019-0364-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 06/09/2019] [Indexed: 02/08/2023] Open
Abstract
The application of convolutional neural networks (ConvNets) to harness high-content screening images or 2D compound representations is gaining increasing attention in drug discovery. However, existing applications often require large data sets for training, or sophisticated pretraining schemes. Here, we show using 33 IC50 data sets from ChEMBL 23 that the in vitro activity of compounds on cancer cell lines and protein targets can be accurately predicted on a continuous scale from their Kekulé structure representations alone by extending existing architectures (AlexNet, DenseNet-201, ResNet152 and VGG-19), which were pretrained on unrelated image data sets. We show that the predictive power of the generated models, which just require standard 2D compound representations as input, is comparable to that of Random Forest (RF) models and fully-connected Deep Neural Networks trained on circular (Morgan) fingerprints. Notably, including additional fully-connected layers further increases the predictive power of the ConvNets by up to 10%. Analysis of the predictions generated by RF models and ConvNets shows that by simply averaging the output of the RF models and ConvNets we obtain significantly lower errors in prediction for multiple data sets, although the effect size is small, than those obtained with either model alone, indicating that the features extracted by the convolutional layers of the ConvNets provide complementary predictive signal to Morgan fingerprints. Lastly, we show that multi-task ConvNets trained on compound images permit to model COX isoform selectivity on a continuous scale with errors in prediction comparable to the uncertainty of the data. Overall, in this work we present a set of ConvNet architectures for the prediction of compound activity from their Kekulé structure representations with state-of-the-art performance, that require no generation of compound descriptors or use of sophisticated image processing techniques. The code needed to reproduce the results presented in this study and all the data sets are provided at https://github.com/isidroc/kekulescope .
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| |
Collapse
|
50
|
Guan NN, Zhao Y, Wang CC, Li JQ, Chen X, Piao X. Anticancer Drug Response Prediction in Cell Lines Using Weighted Graph Regularized Matrix Factorization. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 17:164-174. [PMID: 31265947 PMCID: PMC6610642 DOI: 10.1016/j.omtn.2019.05.017] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 05/17/2019] [Accepted: 05/20/2019] [Indexed: 12/14/2022]
Abstract
Precision medicine has become a novel and rising concept, which depends much on the identification of individual genomic signatures for different patients. The cancer cell lines could reflect the “omic” diversity of primary tumors, based on which many works have been carried out to study the cancer biology and drug discovery both in experimental and computational aspects. In this work, we presented a novel method to utilize weighted graph regularized matrix factorization (WGRMF) for inferring anticancer drug response in cell lines. We constructed a p-nearest neighbor graph to sparsify drug similarity matrix and cell line similarity matrix, respectively. Using the sparsified matrices in the graph regularization terms, we performed matrix factorization to generate the latent matrices for drug and cell line. The graph regularization terms including neighbor information could help to exclude the noisy ingredient and improve the prediction accuracy. The 10-fold cross-validation was implemented, and the Pearson correlation coefficient (PCC), root-mean-square error (RMSE), PCCsr, and RMSEsr averaged over all drugs were calculated to evaluate the performance of WGRMF. The results on the Genomics of Drug Sensitivity in Cancer (GDSC) dataset are 0.64 ± 0.16, 1.37 ± 0.35, 0.73 ± 0.14, and 1.71 ± 0.44 for PCC, RMSE, PCCsr, and RMSEsr in turn. And for the Cancer Cell Line Encyclopedia (CCLE) dataset, WGRMF got results of 0.72 ± 0.09, 0.56 ± 0.19, 0.79 ± 0.07, and 0.69 ± 0.19, respectively. The results showed the superiority of WGRMF compared with previous methods. Besides, based on the prediction results using the GDSC dataset, three types of case studies were carried out. The results from both cross-validation and case studies have shown the effectiveness of WGRMF on the prediction of drug response in cell lines.
Collapse
Affiliation(s)
- Na-Na Guan
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Chun-Chun Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China.
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China.
| | - Xue Piao
- School of Medical Informatics, Xuzhou Medical University, Xuzhou 221004, China.
| |
Collapse
|