1
|
Eckhart L, Lenhof K, Rolli LM, Lenhof HP. A comprehensive benchmarking of machine learning algorithms and dimensionality reduction methods for drug sensitivity prediction. Brief Bioinform 2024; 25:bbae242. [PMID: 38797968 PMCID: PMC11128483 DOI: 10.1093/bib/bbae242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 04/05/2024] [Accepted: 05/06/2024] [Indexed: 05/29/2024] Open
Abstract
A major challenge of precision oncology is the identification and prioritization of suitable treatment options based on molecular biomarkers of the considered tumor. In pursuit of this goal, large cancer cell line panels have successfully been studied to elucidate the relationship between cellular features and treatment response. Due to the high dimensionality of these datasets, machine learning (ML) is commonly used for their analysis. However, choosing a suitable algorithm and set of input features can be challenging. We performed a comprehensive benchmarking of ML methods and dimension reduction (DR) techniques for predicting drug response metrics. Using the Genomics of Drug Sensitivity in Cancer cell line panel, we trained random forests, neural networks, boosting trees and elastic nets for 179 anti-cancer compounds with feature sets derived from nine DR approaches. We compare the results regarding statistical performance, runtime and interpretability. Additionally, we provide strategies for assessing model performance compared with a simple baseline model and measuring the trade-off between models of different complexity. Lastly, we show that complex ML models benefit from using an optimized DR strategy, and that standard models-even when using considerably fewer features-can still be superior in performance.
Collapse
Affiliation(s)
- Lea Eckhart
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123, Saarland, Germany
| | - Kerstin Lenhof
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123, Saarland, Germany
| | - Lisa-Marie Rolli
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123, Saarland, Germany
| | - Hans-Peter Lenhof
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123, Saarland, Germany
| |
Collapse
|
2
|
Sotudian S, Paschalidis IC. ITNR: Inversion Transformer-based Neural Ranking for cancer drug recommendations. Comput Biol Med 2024; 172:108312. [PMID: 38503090 PMCID: PMC10990436 DOI: 10.1016/j.compbiomed.2024.108312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Revised: 03/09/2024] [Accepted: 03/12/2024] [Indexed: 03/21/2024]
Abstract
Personalized drug response prediction is an approach for tailoring effective therapeutic strategies for patients based on their tumors' genomic characterization. While machine learning methods are widely employed in the literature, they often struggle to capture drug-cell line relations across various cell lines. In addressing this challenge, our study introduces a novel listwise Learning-to-Rank (LTR) model named Inversion Transformer-based Neural Ranking (ITNR). ITNR utilizes genomic features and a transformer architecture to decipher functional relationships and construct models that can predict patient-specific drug responses. Our experiments were conducted on three major drug response data sets, showing that ITNR reliably and consistently outperforms state-of-the-art LTR models.
Collapse
Affiliation(s)
- Shahabeddin Sotudian
- Department of Electrical and Computer Engineering, Division of Systems Engineering, Boston University, Boston, MA, USA.
| | - Ioannis Ch Paschalidis
- Department of Electrical and Computer Engineering, Division of Systems Engineering, Boston University, Boston, MA, USA; Department of Biomedical Engineering, and Faculty of Computing and Data Sciences, Boston University, Boston, MA, USA.
| |
Collapse
|
3
|
Partin A, Brettin TS, Zhu Y, Narykov O, Clyde A, Overbeek J, Stevens RL. Deep learning methods for drug response prediction in cancer: Predominant and emerging trends. Front Med (Lausanne) 2023; 10:1086097. [PMID: 36873878 PMCID: PMC9975164 DOI: 10.3389/fmed.2023.1086097] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/23/2023] [Indexed: 02/17/2023] Open
Abstract
Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Thomas S. Brettin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Austin Clyde
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Jamie Overbeek
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Rick L. Stevens
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, The University of Chicago, Chicago, IL, United States
| |
Collapse
|
4
|
Singh DP, Kaushik B. A systematic literature review for the prediction of anticancer drug response using various machine-learning and deep-learning techniques. Chem Biol Drug Des 2023; 101:175-194. [PMID: 36303299 DOI: 10.1111/cbdd.14164] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/13/2022] [Accepted: 10/24/2022] [Indexed: 12/24/2022]
Abstract
Computational methods have gained prominence in healthcare research. The accessibility of healthcare data has greatly incited academicians and researchers to develop executions that help in prognosis of cancer drug response. Among various computational methods, machine-learning (ML) and deep-learning (DL) methods provide the most consistent and effectual approaches to handle the serious aftermaths of the deadly disease and drug administered to the patients. Hence, this systematic literature review has reviewed researches that have investigated drug discovery and prognosis of anticancer drug response using ML and DL algorithms. Fot this purpose, PRISMA guidelines have been followed to choose research papers from Google Scholar, PubMed, and Sciencedirect websites. A total count of 105 papers that align with the context of this review were chosen. Further, the review also presents accuracy of the existing ML and DL methods in the prediction of anticancer drug response. It has been found from the review that, amidst the availability of various studies, there are certain challenges associated with each method. Thus, future researchers can consider these limitations and challenges to develop a prominent anticancer drug response prediction method, and it would be greatly beneficial to the medical professionals in administering non-invasive treatment to the patients.
Collapse
Affiliation(s)
- Davinder Paul Singh
- School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India
| | - Baijnath Kaushik
- School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India
| |
Collapse
|
5
|
Sotudian S, Paschalidis IC. Machine Learning for Pharmacogenomics and Personalized Medicine: A Ranking Model for Drug Sensitivity Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2324-2333. [PMID: 34043512 PMCID: PMC9642333 DOI: 10.1109/tcbb.2021.3084562] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
It is infeasible to test many different chemotherapy drugs on actual patients in large clinical trials, which motivates computational methods with the ability to learn and exploit associations between drug effectiveness and patient characteristics. This work proposes a machine learning approach to infer robust predictors of drug responses from patient genomic information. Rather than predicting the exact drug response on a given cell line, we introduce an elastic-net regression methodology to compare a drug-cell line pair against an alternative pair. Using predicted pairwise comparisons we rank the effectiveness of different drugs on the same cell line. A total of 173 cell lines and 100 drug responses were used in various settings for training and testing the proposed models. By comparing our approach against twelve baseline methods, we demonstrate that it outperforms the state-of-the-art methods in the literature. In contrast to most other methods, the algorithm is able to maintain its high performance even when we use a large number of drugs and few cell lines.
Collapse
|
6
|
Automatic identification of drug sensitivity of cancer cell with novel regression-based ensemble convolution neural network model. Soft comput 2022. [DOI: 10.1007/s00500-022-07098-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
7
|
Nguyen GTT, Vu HD, Le DH. Integrating Molecular Graph Data of Drugs and Multiple -Omic Data of Cell Lines for Drug Response Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:710-717. [PMID: 34260355 DOI: 10.1109/tcbb.2021.3096960] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Previous studies have either learned drug's features from their string or numeric representations, which are not natural forms of drugs, or only used genomic data of cell lines for the drug response prediction problem. Here, we proposed a deep learning model, GraOmicDRP, to learn drug's features from their graph representation and integrate multiple -omic data of cell lines. In GraOmicDRP, drugs are represented as graphs of bindings among atoms; meanwhile, cell lines are depicted by not only genomic but also transcriptomic and epigenomic data. Graph convolutional and convolutional neural networks were used to learn the representation of drugs and cell lines, respectively. A combination of the two representations was then used to be representative of each pair of drug-cell line. Finally, the response value of each pair was predicted by a fully connected network. Experimental results indicate that transcriptomic data shows the best among single -omic data; meanwhile, the combinations of transcriptomic and other -omic data achieved the best performance overall in terms of both Root Mean Square Error and Pearson correlation coefficient. In addition, we also show that GraOmicDRP outperforms some state-of-the-art methods, including ones integrating -omic data with drug information such as GraphDRP, and ones using -omic data without drug information such as DeepDR and MOLI.
Collapse
|
8
|
Firoozbakht F, Yousefi B, Schwikowski B. An overview of machine learning methods for monotherapy drug response prediction. Brief Bioinform 2022; 23:bbab408. [PMID: 34619752 PMCID: PMC8769705 DOI: 10.1093/bib/bbab408] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/25/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
For an increasing number of preclinical samples, both detailed molecular profiles and their responses to various drugs are becoming available. Efforts to understand, and predict, drug responses in a data-driven manner have led to a proliferation of machine learning (ML) methods, with the longer term ambition of predicting clinical drug responses. Here, we provide a uniquely wide and deep systematic review of the rapidly evolving literature on monotherapy drug response prediction, with a systematic characterization and classification that comprises more than 70 ML methods in 13 subclasses, their input and output data types, modes of evaluation, and code and software availability. ML experts are provided with a fundamental understanding of the biological problem, and how ML methods are configured for it. Biologists and biomedical researchers are introduced to the basic principles of applicable ML methods, and their application to the problem of drug response prediction. We also provide systematic overviews of commonly used data sources used for training and evaluation methods.
Collapse
Affiliation(s)
- Farzaneh Firoozbakht
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Behnam Yousefi
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
- Sorbonne Université, École Doctorale Complexite du Vivant, Paris, France
| | - Benno Schwikowski
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
9
|
Mahajan RA, Shaikh NK, Tikhe TB, Vyas R, Chavan SM. Hybrid Sea Lion Crow Search Algorithm-Based Stacked Autoencoder for Drug Sensitivity Prediction From Cancer Cell Lines. INTERNATIONAL JOURNAL OF SWARM INTELLIGENCE RESEARCH 2022. [DOI: 10.4018/ijsir.304723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Cancer is the most dreadful diseases across world and providing better therapy to cancer patients is still remains as a major challenging task due to drug resistance of tumor cells. This paper proposes a Sea Lion Crow Search Algorithm (SLCSA) for drug sensitivity prediction. The drug sensitivity from cultured cell lines is predicted using stacked autoencoder and proposed SLCSA is derived by combination of Sea Lion Optimization (SLnO) and Crow Search Algorithm (CSA).The implemented approach has offered superior results with maximum value of testing accuracy for normal are 0.920, leukemia is 0.920, NSCLC is 0.912, and urogenital is 0.914.
Collapse
|
10
|
Piyawajanusorn C, Nguyen LC, Ghislat G, Ballester PJ. A gentle introduction to understanding preclinical data for cancer pharmaco-omic modeling. Brief Bioinform 2021; 22:6343527. [PMID: 34368843 DOI: 10.1093/bib/bbab312] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 06/25/2021] [Accepted: 07/20/2021] [Indexed: 12/16/2022] Open
Abstract
A central goal of precision oncology is to administer an optimal drug treatment to each cancer patient. A common preclinical approach to tackle this problem has been to characterize the tumors of patients at the molecular and drug response levels, and employ the resulting datasets for predictive in silico modeling (mostly using machine learning). Understanding how and why the different variants of these datasets are generated is an important component of this process. This review focuses on providing such introduction aimed at scientists with little previous exposure to this research area.
Collapse
Affiliation(s)
- Chayanit Piyawajanusorn
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France.,Institut Paoli-Calmettes, F-13009 Marseille, France.,Aix-Marseille Université, F-13284 Marseille, France.,CNRS UMR7258, F-13009 Marseille, France.,Faculty of Medicine and Public Health, HRH Princess Chulabhorn College of Medical Science, Chulabhorn Royal Academy, Bangkok, Thailand
| | - Linh C Nguyen
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France.,Institut Paoli-Calmettes, F-13009 Marseille, France.,Aix-Marseille Université, F-13284 Marseille, France.,CNRS UMR7258, F-13009 Marseille, France.,Department of Life Sciences, University of Science and Technology of Hanoi, Vietnam Academy of Science and Technology, Hanoi, Vietnam
| | - Ghita Ghislat
- U1104, CNRS UMR7280, Centre d'Immunologie de Marseille-Luminy, Inserm, Marseille, France
| | - Pedro J Ballester
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France.,Institut Paoli-Calmettes, F-13009 Marseille, France.,Aix-Marseille Université, F-13284 Marseille, France.,CNRS UMR7258, F-13009 Marseille, France
| |
Collapse
|
11
|
Partin A, Brettin T, Evrard YA, Zhu Y, Yoo H, Xia F, Jiang S, Clyde A, Shukla M, Fonstein M, Doroshow JH, Stevens RL. Learning curves for drug response prediction in cancer cell lines. BMC Bioinformatics 2021; 22:252. [PMID: 34001007 PMCID: PMC8130157 DOI: 10.1186/s12859-021-04163-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 05/04/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. METHODS We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. RESULTS The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. CONCLUSIONS A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, USA. .,University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA.
| | - Thomas Brettin
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA.,Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, USA
| | - Yvonne A Evrard
- Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD, USA
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, USA.,University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
| | - Hyunseung Yoo
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, USA.,University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
| | - Fangfang Xia
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, USA.,University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
| | - Songhao Jiang
- Department of Computer Science, University of Chicago, Chicago, IL, USA
| | - Austin Clyde
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, USA.,Department of Computer Science, University of Chicago, Chicago, IL, USA
| | - Maulik Shukla
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, USA.,University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
| | - Michael Fonstein
- Biosciences Division, Argonne National Laboratory, Lemont, IL, USA
| | - James H Doroshow
- Division of Cancer Therapeutics and Diagnosis, National Cancer Institute, Bethesda, MD, USA
| | - Rick L Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, USA.,Department of Computer Science, University of Chicago, Chicago, IL, USA
| |
Collapse
|
12
|
Qiu K, Lee J, Kim H, Yoon S, Kang K. Machine learning based anti-cancer drug response prediction and search for predictor genes using cancer cell line gene expression. Genomics Inform 2021; 19:e10. [PMID: 33840174 PMCID: PMC8042299 DOI: 10.5808/gi.20076] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 02/11/2021] [Indexed: 01/06/2023] Open
Abstract
Although many models have been proposed to accurately predict the response of drugs in cell lines recent years, understanding the genome related to drug response is also the key for completing oncology precision medicine. In this paper, based on the cancer cell line gene expression and the drug response data, we established a reliable and accurate drug response prediction model and found predictor genes for some drugs of interest. To this end, we first performed pre-selection of genes based on the Pearson correlation coefficient and then used ElasticNet regression model for drug response prediction and fine gene selection. To find more reliable set of predictor genes, we performed regression twice for each drug, one with IC50 and the other with area under the curve (AUC) (or activity area). For the 12 drugs we tested, the predictive performance in terms of Pearson correlation coefficient exceeded 0.6 and the highest one was 17-AAG for which Pearson correlation coefficient was 0.811 for IC50 and 0.81 for AUC. We identify common predictor genes for IC50 and AUC, with which the performance was similar to those with genes separately found for IC50 and AUC, but with much smaller number of predictor genes. By using only common predictor genes, the highest performance was AZD6244 (0.8016 for IC50, 0.7945 for AUC) with 321 predictor genes.
Collapse
Affiliation(s)
- Kexin Qiu
- Department of Computer Science, Dankook University, Yongin 16890, Korea
| | - JoongHo Lee
- Department of Computer Science, Dankook University, Yongin 16890, Korea
| | - HanByeol Kim
- Department of Computer Science, Dankook University, Yongin 16890, Korea
| | - Seokhyun Yoon
- Department of Computer Science, Dankook University, Yongin 16890, Korea.,Department of Electronics and Electrical Engineering, Dankook University, Yongin 16890, Korea
| | - Keunsoo Kang
- Department of Microbiology, Dankook University, Cheonan 31116, Korea
| |
Collapse
|
13
|
Shah K, Ahmed M, Kazi JU. The Aurora kinase/β-catenin axis contributes to dexamethasone resistance in leukemia. NPJ Precis Oncol 2021; 5:13. [PMID: 33597638 PMCID: PMC7889633 DOI: 10.1038/s41698-021-00148-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 01/12/2021] [Indexed: 02/07/2023] Open
Abstract
Glucocorticoids, such as dexamethasone and prednisolone, are widely used in cancer treatment. Different hematological malignancies respond differently to this treatment which, as could be expected, correlates with treatment outcome. In this study, we have used a glucocorticoid-induced gene signature to develop a deep learning model that can predict dexamethasone sensitivity. By combining gene expression data from cell lines and patients with acute lymphoblastic leukemia, we observed that the model is useful for the classification of patients. Predicted samples have been used to detect deregulated pathways that lead to dexamethasone resistance. Gene set enrichment analysis, peptide substrate-based kinase profiling assay, and western blot analysis identified Aurora kinase, S6K, p38, and β-catenin as key signaling proteins involved in dexamethasone resistance. Deep learning-enabled drug synergy prediction followed by in vitro drug synergy analysis identified kinase inhibitors against Aurora kinase, JAK, S6K, and mTOR that displayed synergy with dexamethasone. Combining pathway enrichment, kinase regulation, and kinase inhibition data, we propose that Aurora kinase or its several direct or indirect downstream kinase effectors such as mTOR, S6K, p38, and JAK may be involved in β-catenin stabilization through phosphorylation-dependent inactivation of GSK-3β. Collectively, our data suggest that activation of the Aurora kinase/β-catenin axis during dexamethasone treatment may contribute to cell survival signaling which is possibly maintained in patients who are resistant to dexamethasone.
Collapse
Affiliation(s)
- Kinjal Shah
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - Mehreen Ahmed
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - Julhash U Kazi
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden.
- Lund Stem Cell Center, Department of Laboratory Medicine, Lund University, Lund, Sweden.
| |
Collapse
|
14
|
Daoud S, Mdhaffar A, Jmaiel M, Freisleben B. Q-Rank: Reinforcement Learning for Recommending Algorithms to Predict Drug Sensitivity to Cancer Therapy. IEEE J Biomed Health Inform 2020; 24:3154-3161. [PMID: 32750950 DOI: 10.1109/jbhi.2020.3004663] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
In personalized medicine, a challenging task is to identify the most effective treatment for a patient. In oncology, several computational models have been developed to predict the response of drugs to therapy. However, the performance of these models depends on multiple factors. This paper presents a new approach, called Q-Rank, to predict the sensitivity of cell lines to anti-cancer drugs. Q-Rank integrates different prediction algorithms and identifies a suitable algorithm for a given application. Q-Rank is based on reinforcement learning methods to rank prediction algorithms on the basis of relevant features (e.g., omics characterization). The best-ranked algorithm is recommended and used to predict the response of drugs to therapy. Our experimental results indicate that Q-Rank outperforms the integrated models in predicting the sensitivity of cell lines to different drugs.
Collapse
|
15
|
Rahman R, Dhruba SR, Matlock K, De-Niz C, Ghosh S, Pal R. Evaluating the consistency of large-scale pharmacogenomic studies. Brief Bioinform 2020; 20:1734-1753. [PMID: 31846027 DOI: 10.1093/bib/bby046] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Revised: 05/04/2018] [Indexed: 12/21/2022] Open
Abstract
Recent years have seen an increase in the availability of pharmacogenomic databases such as Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) that provide genomic and functional characterization information for multiple cell lines. Studies have alluded to the fact that specific characterizations may be inconsistent between different databases. Analysis of the potential discrepancies in the different databases is highly significant, as these sources are frequently used to analyze and validate methodologies for personalized cancer therapies. In this article, we review the recent developments in investigating the correspondence between different pharmacogenomics databases and discuss the potential factors that require attention when incorporating these sources in any modeling analysis. Furthermore, we explored the consistency among these databases using copulas that can capture nonlinear dependencies between two sets of data.
Collapse
Affiliation(s)
- Raziur Rahman
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX 79409, USA
| | - Saugato Rahman Dhruba
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX 79409, USA
| | - Kevin Matlock
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX 79409, USA
| | - Carlos De-Niz
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX 79409, USA
| | - Souparno Ghosh
- Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX 79409, USA
| | - Ranadip Pal
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX 79409, USA.,Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX 79409, USA
| |
Collapse
|
16
|
Naulaerts S, Menden MP, Ballester PJ. Concise Polygenic Models for Cancer-Specific Identification of Drug-Sensitive Tumors from Their Multi-Omics Profiles. Biomolecules 2020; 10:E963. [PMID: 32604779 PMCID: PMC7356608 DOI: 10.3390/biom10060963] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 06/20/2020] [Accepted: 06/22/2020] [Indexed: 12/15/2022] Open
Abstract
In silico models to predict which tumors will respond to a given drug are necessary for Precision Oncology. However, predictive models are only available for a handful of cases (each case being a given drug acting on tumors of a specific cancer type). A way to generate predictive models for the remaining cases is with suitable machine learning algorithms that are yet to be applied to existing in vitro pharmacogenomics datasets. Here, we apply XGBoost integrated with a stringent feature selection approach, which is an algorithm that is advantageous for these high-dimensional problems. Thus, we identified and validated 118 predictive models for 62 drugs across five cancer types by exploiting four molecular profiles (sequence mutations, copy-number alterations, gene expression, and DNA methylation). Predictive models were found in each cancer type and with every molecular profile. On average, no omics profile or cancer type obtained models with higher predictive accuracy than the rest. However, within a given cancer type, some molecular profiles were overrepresented among predictive models. For instance, CNA profiles were predictive in breast invasive carcinoma (BRCA) cell lines, but not in small cell lung cancer (SCLC) cell lines where gene expression (GEX) and DNA methylation profiles were the most predictive. Lastly, we identified the best XGBoost model per cancer type and analyzed their selected features. For each model, some of the genes in the selected list had already been found to be individually linked to the response to that drug, providing additional evidence of the usefulness of these models and the merits of the feature selection scheme.
Collapse
Affiliation(s)
- Stefan Naulaerts
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France;
- Institut Paoli-Calmettes, F-13009 Marseille, France
- Aix-Marseille Université, F-13284 Marseille, France
- CNRS UMR7258, F-13009 Marseille, France
- Ludwig Institute for Cancer Research, de Duve Institute, Université catholique de Louvain, 1200 Brussels, Belgium
| | - Michael P. Menden
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, 85764 Neuherberg, Germany;
- Department of Biology, Ludwig-Maximilians University Munich, 82152 Planegg-Martinsried, Germany
- German Centre for Diabetes Research (DZD e.V.), 85764 Neuherberg, Germany
| | - Pedro J. Ballester
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France;
- Institut Paoli-Calmettes, F-13009 Marseille, France
- Aix-Marseille Université, F-13284 Marseille, France
- CNRS UMR7258, F-13009 Marseille, France
| |
Collapse
|
17
|
Adam G, Rampášek L, Safikhani Z, Smirnov P, Haibe-Kains B, Goldenberg A. Machine learning approaches to drug response prediction: challenges and recent progress. NPJ Precis Oncol 2020; 4:19. [PMID: 32566759 PMCID: PMC7296033 DOI: 10.1038/s41698-020-0122-1] [Citation(s) in RCA: 119] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 04/17/2020] [Indexed: 12/24/2022] Open
Abstract
Cancer is a leading cause of death worldwide. Identifying the best treatment using computational models to personalize drug response prediction holds great promise to improve patient's chances of successful recovery. Unfortunately, the computational task of predicting drug response is very challenging, partially due to the limitations of the available data and partially due to algorithmic shortcomings. The recent advances in deep learning may open a new chapter in the search for computational drug response prediction models and ultimately result in more accurate tools for therapy response. This review provides an overview of the computational challenges and advances in drug response prediction, and focuses on comparing the machine learning techniques to be of utmost practical use for clinicians and machine learning non-experts. The incorporation of new data modalities such as single-cell profiling, along with techniques that rapidly find effective drug combinations will likely be instrumental in improving cancer care.
Collapse
Affiliation(s)
- George Adam
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON Canada
- Department of Computer Science, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
| | - Ladislav Rampášek
- Department of Computer Science, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, ON Canada
| | - Zhaleh Safikhani
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON Canada
| | - Petr Smirnov
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Ontario Institute for Cancer Research, Toronto, ON Canada
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON Canada
- Department of Computer Science, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON Canada
- Ontario Institute for Cancer Research, Toronto, ON Canada
| | - Anna Goldenberg
- Department of Computer Science, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, ON Canada
| |
Collapse
|
18
|
Schätzle LK, Hadizadeh Esfahani A, Schuppert A. Methodological challenges in translational drug response modeling in cancer: A systematic analysis with FORESEE. PLoS Comput Biol 2020; 16:e1007803. [PMID: 32310964 PMCID: PMC7192505 DOI: 10.1371/journal.pcbi.1007803] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Revised: 04/30/2020] [Accepted: 03/19/2020] [Indexed: 11/23/2022] Open
Abstract
Translational models directly relating drug response specific processes that can be observed in vitro to their in vivo role in cancer patients constitute a crucial part of the development of personalized medication. Unfortunately, current studies often focus on the optimization of isolated model characteristics instead of examining the overall modeling workflow and the interplay of the individual model components. Moreover, they are often limited to specific data sets only. Therefore, they are often confined by the irreproducibility of the results and the non-transferability of the approaches into other contexts. In this study, we present a thorough investigation of translational models and their ability to predict the drug responses of cancer patients originating from diverse data sets using the R-package FORESEE. By systematically scanning the modeling space for optimal combinations of different model settings, we can determine models of extremely high predictivity and work out a few modeling guidelines that promote simplicity. Yet, we identify noise within the data, sample size effects, and drug unspecificity as factors that deteriorate the models’ robustness. Moreover, we show that cell line models of high accuracy do not necessarily excel in predicting drug response processes in patients. We therefore hope to motivate future research to consider in vivo aspects more carefully to ultimately generate deeper insights into applicable precision medicine. In the context of personalized medicine, finding genomic patterns in a cancer patient that can predict how a specific drug will affect the patient’s survival is of great interest. Translational approaches that directly relate drug response specific processes observed in cell line experiments to their role in cancer patients have the potential to increase the clinical relevance of models. Unfortunately, existing approaches are often irreproducible in other applications. In order to address this irreproducibility aspect, our work comprises a thorough investigation of a diverse set of translational models. In contrast to other approaches that focus on one isolated model characteristic at a time, we examine the overall workflow and the interplay of all model components. Additionally, we validate our models in multiple patient data sets and identify differences between cell line and patient models. While we can establish models of high predictive performance, we also expose the deceptive potential of optimizing methods to a specific use case only by showing that those models do not necessarily depict biological processes. Thus, this study serves as a guide to interpret new approaches in a broader context to avoid the dissemination of noise-driven models that fail to serve in everyday applications.
Collapse
Affiliation(s)
- Lisa-Katrin Schätzle
- Joint Research Center for Computational Biomedicine, RWTH Aachen University, Aachen, Germany
- Aachen Institute for Advanced Study in Computational Engineering Science, RWTH Aachen University, Aachen, Germany
| | - Ali Hadizadeh Esfahani
- Joint Research Center for Computational Biomedicine, RWTH Aachen University, Aachen, Germany
- Aachen Institute for Advanced Study in Computational Engineering Science, RWTH Aachen University, Aachen, Germany
| | - Andreas Schuppert
- Joint Research Center for Computational Biomedicine, RWTH Aachen University, Aachen, Germany
- Aachen Institute for Advanced Study in Computational Engineering Science, RWTH Aachen University, Aachen, Germany
- * E-mail:
| |
Collapse
|
19
|
He Y, Liu J, Ning X. Drug Selection via Joint Push and Learning to Rank. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:110-123. [PMID: 29994481 DOI: 10.1109/tcbb.2018.2848908] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Selecting the right drugs for the right patients is a primary goal of precision medicine. In this article, we consider the problem of cancer drug selection in a learning-to-rank framework. We have formulated the cancer drug selection problem as to accurately predicting 1) the ranking positions of sensitive drugs and 2) the ranking orders among sensitive drugs in cancer cell lines based on their responses to cancer drugs. We have developed a new learning-to-rank method, denoted as pLETORg, that predicts drug ranking structures in each cell line via using drug latent vectors and cell line latent vectors. The pLETORg method learns such latent vectors through explicitly enforcing that, in the drug ranking list of each cell line, the sensitive drugs are pushed above insensitive drugs, and meanwhile the ranking orders among sensitive drugs are correct. Genomics information on cell lines is leveraged in learning the latent vectors. Our experimental results on a benchmark cell line-drug response dataset demonstrate that the new pLETORg significantly outperforms the state-of-the-art method in prioritizing new sensitive drugs.
Collapse
|
20
|
Güvenç Paltun B, Mamitsuka H, Kaski S. Improving drug response prediction by integrating multiple data sources: matrix factorization, kernel and network-based approaches. Brief Bioinform 2019; 22:346-359. [PMID: 31838491 PMCID: PMC7820853 DOI: 10.1093/bib/bbz153] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 11/01/2019] [Accepted: 11/04/2019] [Indexed: 12/17/2022] Open
Abstract
Predicting the response of cancer cell lines to specific drugs is one of the central problems in personalized medicine, where the cell lines show diverse characteristics. Researchers have developed a variety of computational methods to discover associations between drugs and cell lines, and improved drug sensitivity analyses by integrating heterogeneous biological data. However, choosing informative data sources and methods that can incorporate multiple sources efficiently is the challenging part of successful analysis in personalized medicine. The reason is that finding decisive factors of cancer and developing methods that can overcome the problems of integrating data, such as differences in data structures and data complexities, are difficult. In this review, we summarize recent advances in data integration-based machine learning for drug response prediction, by categorizing methods as matrix factorization-based, kernel-based and network-based methods. We also present a short description of relevant databases used as a benchmark in drug response prediction analyses, followed by providing a brief discussion of challenges faced in integrating and interpreting data from multiple sources. Finally, we address the advantages of combining multiple heterogeneous data sources on drug sensitivity analysis by showing an experimental comparison. Contact: betul.guvenc@aalto.fi
Collapse
Affiliation(s)
- Betül Güvenç Paltun
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Samuel Kaski
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| |
Collapse
|
21
|
Parca L, Pepe G, Pietrosanto M, Galvan G, Galli L, Palmeri A, Sciandrone M, Ferrè F, Ausiello G, Helmer-Citterich M. Modeling cancer drug response through drug-specific informative genes. Sci Rep 2019; 9:15222. [PMID: 31645597 PMCID: PMC6811538 DOI: 10.1038/s41598-019-50720-0] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 09/06/2019] [Indexed: 12/18/2022] Open
Abstract
Recent advances in pharmacogenomics have generated a wealth of data of different types whose analysis have helped in the identification of signatures of different cellular sensitivity/resistance responses to hundreds of chemical compounds. Among the different data types, gene expression has proven to be the more successful for the inference of drug response in cancer cell lines. Although effective, the whole transcriptome can introduce noise in the predictive models, since specific mechanisms are required for different drugs and these realistically involve only part of the proteins encoded in the genome. We analyzed the pharmacogenomics data of 961 cell lines tested with 265 anti-cancer drugs and developed different machine learning approaches for dissecting the genome systematically and predict drug responses using both drug-unspecific and drug-specific genes. These methodologies reach better response predictions for the vast majority of the screened drugs using tens to few hundreds genes specific to each drug instead of the whole genome, thus allowing a better understanding and interpretation of drug-specific response mechanisms which are not necessarily restricted to the drug known targets.
Collapse
Affiliation(s)
- Luca Parca
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Gerardo Pepe
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Marco Pietrosanto
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Giulio Galvan
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Leonardo Galli
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Antonio Palmeri
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
- Celgene Institute for Translational Research Europe, Sevilla, Spain
| | - Marco Sciandrone
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Fabrizio Ferrè
- Department of Pharmacy and Biotechnology, University of Bologna Alma Mater, Bologna, Italy
| | - Gabriele Ausiello
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | | |
Collapse
|
22
|
Manica M, Oskooei A, Born J, Subramanian V, Sáez-Rodríguez J, Rodríguez Martínez M. Toward Explainable Anticancer Compound Sensitivity Prediction via Multimodal Attention-Based Convolutional Encoders. Mol Pharm 2019; 16:4797-4806. [DOI: 10.1021/acs.molpharmaceut.9b00520] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
| | | | - Jannis Born
- IBM Research, 8803 Zürich, Switzerland
- ETH Zürich, 8092 Zürich, Switzerland
- University of Zürich, 8006 Zürich, Switzerland
| | | | | | | |
Collapse
|
23
|
Dhruba SR, Rahman A, Rahman R, Ghosh S, Pal R. Recursive model for dose-time responses in pharmacological studies. BMC Bioinformatics 2019; 20:317. [PMID: 31216980 PMCID: PMC6584530 DOI: 10.1186/s12859-019-2831-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Clinical studies often track dose-response curves of subjects over time. One can easily model the dose-response curve at each time point with Hill equation, but such a model fails to capture the temporal evolution of the curves. On the other hand, one can use Gompertz equation to model the temporal behaviors at each dose without capturing the evolution of time curves across dosage. Results In this article, we propose a parametric model for dose-time responses that follows Gompertz law in time and Hill equation across dose approximately. We derive a recursion relation for dose-response curves over time capturing the temporal evolution and then specify a regression model connecting the parameters controlling the dose-time responses with individual level proteomic data. The resultant joint model allows us to predict the dose-response curves over time for new individuals. Conclusion We have compared the efficacy of our proposed Recursive Hybrid model with individual dose-response predictive models at desired time points. We note that our proposed model exhibits a superior performance compared to the individual ones for both synthetic data and actual pharmacological data. For the desired dose-time varying genetic characterization and drug response values, we have used the HMS-LINCS database and demonstrated the effectiveness of our model for all available anticancer compounds. Electronic supplementary material The online version of this article (10.1186/s12859-019-2831-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Saugato Rahman Dhruba
- Department of Electrical and Computer Engineering, Texas Tech University, 1012 Boston Ave, Lubbock, 79409, TX, USA
| | - Aminur Rahman
- Department of Mathematics and Statistics, Texas Tech University, 1108 Memorial Circle, Lubbock, 79409, TX, USA
| | - Raziur Rahman
- Department of Electrical and Computer Engineering, Texas Tech University, 1012 Boston Ave, Lubbock, 79409, TX, USA
| | - Souparno Ghosh
- Department of Mathematics and Statistics, Texas Tech University, 1108 Memorial Circle, Lubbock, 79409, TX, USA.
| | - Ranadip Pal
- Department of Electrical and Computer Engineering, Texas Tech University, 1012 Boston Ave, Lubbock, 79409, TX, USA
| |
Collapse
|
24
|
Rahmanl R, Perera C, Ghosh S, Pall R. Adaptive Multi-task Elastic Net based feature selection from Pharmacogenomics Databases. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2018:279-282. [PMID: 30440392 DOI: 10.1109/embc.2018.8512229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Integrating multiple databases of similar tasks is a significant problem in biological data analysis. In this paper, we consider whether feature selection in a single database can benefit from incorporating similar databases. We report that by using adaptive multi-task elastic net for feature selection and Random Forest for prediction, the prediction performance can be improved for pharmacogenomics databases. We also present a simulation study to explain the robust feature selection benefit of adaptive multi task elastic net while dealing with noisy features.
Collapse
|
25
|
Sundin I, Peltola T, Micallef L, Afrabandpey H, Soare M, Mamun Majumder M, Daee P, He C, Serim B, Havulinna A, Heckman C, Jacucci G, Marttinen P, Kaski S. Improving genomics-based predictions for precision medicine through active elicitation of expert knowledge. Bioinformatics 2018; 34:i395-i403. [PMID: 29949984 PMCID: PMC6022689 DOI: 10.1093/bioinformatics/bty257] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Motivation Precision medicine requires the ability to predict the efficacies of different treatments for a given individual using high-dimensional genomic measurements. However, identifying predictive features remains a challenge when the sample size is small. Incorporating expert knowledge offers a promising approach to improve predictions, but collecting such knowledge is laborious if the number of candidate features is very large. Results We introduce a probabilistic framework to incorporate expert feedback about the impact of genomic measurements on the outcome of interest and present a novel approach to collect the feedback efficiently, based on Bayesian experimental design. The new approach outperformed other recent alternatives in two medical applications: prediction of metabolic traits and prediction of sensitivity of cancer cells to different drugs, both using genomic features as predictors. Furthermore, the intelligent approach to collect feedback reduced the workload of the expert to approximately 11%, compared to a baseline approach. Availability and implementation Source code implementing the introduced computational methods is freely available at https://github.com/AaltoPML/knowledge-elicitation-for-precision-medicine. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Iiris Sundin
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Tomi Peltola
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Luana Micallef
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Homayun Afrabandpey
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Marta Soare
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Muntasir Mamun Majumder
- Institute for Molecular Medicine Finland FIMM, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
| | - Pedram Daee
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Chen He
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| | - Baris Serim
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| | - Aki Havulinna
- Institute for Molecular Medicine Finland FIMM, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland.,National Institute for Health and Welfare THL, Helsinki, Finland
| | - Caroline Heckman
- Institute for Molecular Medicine Finland FIMM, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
| | - Giulio Jacucci
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| | - Pekka Marttinen
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Samuel Kaski
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| |
Collapse
|
26
|
PATRI, a Genomics Data Integration Tool for Biomarker Discovery. BIOMED RESEARCH INTERNATIONAL 2018; 2018:2012078. [PMID: 30065933 PMCID: PMC6051285 DOI: 10.1155/2018/2012078] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 05/29/2018] [Indexed: 12/31/2022]
Abstract
The availability of genomic datasets in association with clinical, phenotypic, and drug sensitivity information represents an invaluable source for potential therapeutic applications, supporting the identification of new drug sensitivity biomarkers and pharmacological targets. Drug discovery and precision oncology can largely benefit from the integration of treatment molecular discriminants obtained from cell line models and clinical tumor samples; however this task demands comprehensive analysis approaches for the discovery of underlying data connections. Here we introduce PATRI (Platform for the Analysis of TRanslational Integrated data), a standalone tool accessible through a user-friendly graphical interface, conceived for the identification of treatment sensitivity biomarkers from user-provided genomics data, associated with information on sample characteristics. PATRI streamlines a translational analysis workflow: first, baseline genomics signatures are statistically identified, differentiating treatment sensitive from resistant preclinical models; then, these signatures are used for the prediction of treatment sensitivity in clinical samples, via random forest categorization of clinical genomics datasets and statistical evaluation of the relative phenotypic features. The same workflow can also be applied across distinct clinical datasets. The ease of use of the PATRI tool is illustrated with validation analysis examples, performed with sensitivity data for drug treatments with known molecular discriminants.
Collapse
|
27
|
Kutmon M, Ehrhart F, Willighagen EL, Evelo CT, Coort SL. CyTargetLinker app update: A flexible solution for network extension in Cytoscape. F1000Res 2018; 7:ELIXIR-743. [PMID: 31489175 PMCID: PMC6707396 DOI: 10.12688/f1000research.14613.1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/08/2019] [Indexed: 11/15/2023] Open
Abstract
Here, we present an update of the open-source CyTargetLinker app for Cytoscape ( http://apps.cytoscape.org/apps/cytargetlinker) that introduces new automation features. CyTargetLinker provides a simple interface to extend networks with links to relevant data and/or knowledge extracted from so-called linksets. The linksets are provided on the CyTargetLinker website ( https://cytargetlinker.github.io/) or can be custom-made for specific use cases. The new automation feature enables users to programmatically execute the app's functionality in Cytoscape (command line tool) and with external tools (e.g. R, Jupyter, Python, etc). This allows users to share their analysis workflows and therefore increase repeatability and reproducibility. Three use cases demonstrate automated workflows, combinations with other Cytoscape apps and core Cytoscape functionality. We first extend a protein-protein interaction network created with the stringApp, with compound-target interactions and disease-gene annotations. In the second use case, we created a workflow to load differentially expressed genes from an experimental dataset and extend it with gene-pathway associations. Lastly, we chose an example outside the biological domain and used CyTargetLinker to create an author-article-journal network for the five authors of this manuscript using a two-step extension mechanism. With 400 downloads per month in the last year and nearly 20,000 downloads in total, CyTargetLinker shows the adoption and relevance of the app in the field of network biology. In August 2019, the original publication was cited in 83 articles demonstrating the applicability in biomedical research.
Collapse
Affiliation(s)
- Martina Kutmon
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, 6229 ER, The Netherlands
| | - Friederike Ehrhart
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER, The Netherlands
- GKC-Rett Expertise Centre, Maastricht University Medical Center, Maastricht, 6200 MD, The Netherlands
| | - Egon L. Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER, The Netherlands
| | - Chris T. Evelo
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, 6229 ER, The Netherlands
| | - Susan L. Coort
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER, The Netherlands
| |
Collapse
|
28
|
Kutmon M, Ehrhart F, Willighagen EL, Evelo CT, Coort SL. CyTargetLinker app update: A flexible solution for network extension in Cytoscape. F1000Res 2018; 7. [PMID: 31489175 PMCID: PMC6707396 DOI: 10.12688/f1000research.14613.2] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/08/2019] [Indexed: 12/20/2022] Open
Abstract
Here, we present an update of the open-source CyTargetLinker app for Cytoscape (
http://apps.cytoscape.org/apps/cytargetlinker) that introduces new automation features. CyTargetLinker provides a simple interface to extend networks with links to relevant data and/or knowledge extracted from so-called linksets. The linksets are provided on the CyTargetLinker website (
https://cytargetlinker.github.io/) or can be custom-made for specific use cases. The new automation feature enables users to programmatically execute the app’s functionality in Cytoscape (command line tool) and with external tools (e.g. R, Jupyter, Python, etc). This allows users to share their analysis workflows and therefore increase repeatability and reproducibility. Three use cases demonstrate automated workflows, combinations with other Cytoscape apps and core Cytoscape functionality. We first extend a protein-protein interaction network created with the stringApp, with compound-target interactions and disease-gene annotations. In the second use case, we created a workflow to load differentially expressed genes from an experimental dataset and extend it with gene-pathway associations. Lastly, we chose an example outside the biological domain and used CyTargetLinker to create an author-article-journal network for the five authors of this manuscript using a two-step extension mechanism. With 400 downloads per month in the last year and nearly 20,000 downloads in total, CyTargetLinker shows the adoption and relevance of the app in the field of network biology. In August 2019, the original publication was cited in 83 articles demonstrating the applicability in biomedical research.
Collapse
Affiliation(s)
- Martina Kutmon
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER, The Netherlands.,Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, 6229 ER, The Netherlands
| | - Friederike Ehrhart
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER, The Netherlands.,GKC-Rett Expertise Centre, Maastricht University Medical Center, Maastricht, 6200 MD, The Netherlands
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER, The Netherlands
| | - Chris T Evelo
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER, The Netherlands.,Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, 6229 ER, The Netherlands
| | - Susan L Coort
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER, The Netherlands
| |
Collapse
|
29
|
Ammad-ud-din M, Khan SA, Wennerberg K, Aittokallio T. Systematic identification of feature combinations for predicting drug response with Bayesian multi-view multi-task linear regression. Bioinformatics 2017; 33:i359-i368. [PMID: 28881998 PMCID: PMC5870540 DOI: 10.1093/bioinformatics/btx266] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION A prime challenge in precision cancer medicine is to identify genomic and molecular features that are predictive of drug treatment responses in cancer cells. Although there are several computational models for accurate drug response prediction, these often lack the ability to infer which feature combinations are the most predictive, particularly for high-dimensional molecular datasets. As increasing amounts of diverse genome-wide data sources are becoming available, there is a need to build new computational models that can effectively combine these data sources and identify maximally predictive feature combinations. RESULTS We present a novel approach that leverages on systematic integration of data sources to identify response predictive features of multiple drugs. To solve the modeling task we implement a Bayesian linear regression method. To further improve the usefulness of the proposed model, we exploit the known human cancer kinome for identifying biologically relevant feature combinations. In case studies with a synthetic dataset and two publicly available cancer cell line datasets, we demonstrate the improved accuracy of our method compared to the widely used approaches in drug response analysis. As key examples, our model identifies meaningful combinations of features for the well known EGFR, ALK, PLK and PDGFR inhibitors. AVAILABILITY AND IMPLEMENTATION The source code of the method is available at https://github.com/suleimank/mvlr . CONTACT muhammad.ammad-ud-din@helsinki.fi or suleiman.khan@helsinki.fi. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Muhammad Ammad-ud-din
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Suleiman A Khan
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Krister Wennerberg
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| |
Collapse
|