1
|
Mohammadzadeh-Vardin T, Ghareyazi A, Gharizadeh A, Abbasi K, Rabiee HR. DeepDRA: Drug repurposing using multi-omics data integration with autoencoders. PLoS One 2024; 19:e0307649. [PMID: 39058696 PMCID: PMC11280260 DOI: 10.1371/journal.pone.0307649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 07/09/2024] [Indexed: 07/28/2024] Open
Abstract
Cancer treatment has become one of the biggest challenges in the world today. Different treatments are used against cancer; drug-based treatments have shown better results. On the other hand, designing new drugs for cancer is costly and time-consuming. Some computational methods, such as machine learning and deep learning, have been suggested to solve these challenges using drug repurposing. Despite the promise of classical machine-learning methods in repurposing cancer drugs and predicting responses, deep-learning methods performed better. This study aims to develop a deep-learning model that predicts cancer drug response based on multi-omics data, drug descriptors, and drug fingerprints and facilitates the repurposing of drugs based on those responses. To reduce multi-omics data's dimensionality, we use autoencoders. As a multi-task learning model, autoencoders are connected to MLPs. We extensively tested our model using three primary datasets: GDSC, CTRP, and CCLE to determine its efficacy. In multiple experiments, our model consistently outperforms existing state-of-the-art methods. Compared to state-of-the-art models, our model achieves an impressive AUPRC of 0.99. Furthermore, in a cross-dataset evaluation, where the model is trained on GDSC and tested on CCLE, it surpasses the performance of three previous works, achieving an AUPRC of 0.72. In conclusion, we presented a deep learning model that outperforms the current state-of-the-art regarding generalization. Using this model, we could assess drug responses and explore drug repurposing, leading to the discovery of novel cancer drugs. Our study highlights the potential for advanced deep learning to advance cancer therapeutic precision.
Collapse
Affiliation(s)
- Taha Mohammadzadeh-Vardin
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| | - Amin Ghareyazi
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| | - Ali Gharizadeh
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| | - Karim Abbasi
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
- Faculty of Mathematics and Computer Science, Kharazmi University, Tehran, Iran
| | - Hamid R. Rabiee
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| |
Collapse
|
2
|
Lenhof K, Eckhart L, Rolli LM, Lenhof HP. Trust me if you can: a survey on reliability and interpretability of machine learning approaches for drug sensitivity prediction in cancer. Brief Bioinform 2024; 25:bbae379. [PMID: 39101498 PMCID: PMC11299037 DOI: 10.1093/bib/bbae379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 07/08/2024] [Accepted: 07/19/2024] [Indexed: 08/06/2024] Open
Abstract
With the ever-increasing number of artificial intelligence (AI) systems, mitigating risks associated with their use has become one of the most urgent scientific and societal issues. To this end, the European Union passed the EU AI Act, proposing solution strategies that can be summarized under the umbrella term trustworthiness. In anti-cancer drug sensitivity prediction, machine learning (ML) methods are developed for application in medical decision support systems, which require an extraordinary level of trustworthiness. This review offers an overview of the ML landscape of methods for anti-cancer drug sensitivity prediction, including a brief introduction to the four major ML realms (supervised, unsupervised, semi-supervised, and reinforcement learning). In particular, we address the question to what extent trustworthiness-related properties, more specifically, interpretability and reliability, have been incorporated into anti-cancer drug sensitivity prediction methods over the previous decade. In total, we analyzed 36 papers with approaches for anti-cancer drug sensitivity prediction. Our results indicate that the need for reliability has hardly been addressed so far. Interpretability, on the other hand, has often been considered for model development. However, the concept is rather used intuitively, lacking clear definitions. Thus, we propose an easily extensible taxonomy for interpretability, unifying all prevalent connotations explicitly or implicitly used within the field.
Collapse
Affiliation(s)
- Kerstin Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| | - Lea Eckhart
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| | - Lisa-Marie Rolli
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| | - Hans-Peter Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| |
Collapse
|
3
|
Lenhof K, Eckhart L, Rolli LM, Volkamer A, Lenhof HP. Reliable anti-cancer drug sensitivity prediction and prioritization. Sci Rep 2024; 14:12303. [PMID: 38811639 PMCID: PMC11137046 DOI: 10.1038/s41598-024-62956-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 05/23/2024] [Indexed: 05/31/2024] Open
Abstract
The application of machine learning (ML) to solve real-world problems does not only bear great potential but also high risk. One fundamental challenge in risk mitigation is to ensure the reliability of the ML predictions, i.e., the model error should be minimized, and the prediction uncertainty should be estimated. Especially for medical applications, the importance of reliable predictions can not be understated. Here, we address this challenge for anti-cancer drug sensitivity prediction and prioritization. To this end, we present a novel drug sensitivity prediction and prioritization approach guaranteeing user-specified certainty levels. The developed conformal prediction approach is applicable to classification, regression, and simultaneous regression and classification. Additionally, we propose a novel drug sensitivity measure that is based on clinically relevant drug concentrations and enables a straightforward prioritization of drugs for a given cancer sample.
Collapse
Affiliation(s)
- Kerstin Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, 66123, Saarbrücken, Saarland, Germany.
| | - Lea Eckhart
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, 66123, Saarbrücken, Saarland, Germany
| | - Lisa-Marie Rolli
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, 66123, Saarbrücken, Saarland, Germany
| | - Andrea Volkamer
- Center for Bioinformatics, Chair for Data Driven Drug Design, Saarland Informatics Campus (E2.1) Saarland University, Campus, 66123, Saarbrücken, Saarland, Germany
| | - Hans-Peter Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, 66123, Saarbrücken, Saarland, Germany
| |
Collapse
|
4
|
Ovchinnikova K, Born J, Chouvardas P, Rapsomaniki M, Kruithof-de Julio M. Overcoming limitations in current measures of drug response may enable AI-driven precision oncology. NPJ Precis Oncol 2024; 8:95. [PMID: 38658785 PMCID: PMC11043358 DOI: 10.1038/s41698-024-00583-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 03/22/2024] [Indexed: 04/26/2024] Open
Abstract
Machine learning (ML) models of drug sensitivity prediction are becoming increasingly popular in precision oncology. Here, we identify a fundamental limitation in standard measures of drug sensitivity that hinders the development of personalized prediction models - they focus on absolute effects but do not capture relative differences between cancer subtypes. Our work suggests that using z-scored drug response measures mitigates these limitations and leads to meaningful predictions, opening the door for sophisticated ML precision oncology models.
Collapse
Affiliation(s)
- Katja Ovchinnikova
- Urology Research Laboratory, Department for BioMedical Research, University of Bern, Bern, Switzerland
| | | | - Panagiotis Chouvardas
- Urology Research Laboratory, Department for BioMedical Research, University of Bern, Bern, Switzerland
- Department of Urology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | | | - Marianna Kruithof-de Julio
- Urology Research Laboratory, Department for BioMedical Research, University of Bern, Bern, Switzerland.
- Department of Urology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
| |
Collapse
|
5
|
Wu G, Zaker A, Ebrahimi A, Tripathi S, Mer AS. Text-mining-based feature selection for anticancer drug response prediction. BIOINFORMATICS ADVANCES 2024; 4:vbae047. [PMID: 38606185 PMCID: PMC11009020 DOI: 10.1093/bioadv/vbae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 03/09/2024] [Accepted: 03/22/2024] [Indexed: 04/13/2024]
Abstract
Motivation Predicting anticancer treatment response from baseline genomic data is a critical obstacle in personalized medicine. Machine learning methods are commonly used for predicting drug response from gene expression data. In the process of constructing these machine learning models, one of the most significant challenges is identifying appropriate features among a massive number of genes. Results In this study, we utilize features (genes) extracted using the text-mining of scientific literatures. Using two independent cancer pharmacogenomic datasets, we demonstrate that text-mining-based features outperform traditional feature selection techniques in machine learning tasks. In addition, our analysis reveals that text-mining feature-based machine learning models trained on in vitro data also perform well when predicting the response of in vivo cancer models. Our results demonstrate that text-mining-based feature selection is an easy to implement approach that is suitable for building machine learning models for anticancer drug response prediction. Availability and implementation https://github.com/merlab/text_features.
Collapse
Affiliation(s)
- Grace Wu
- Division of Engineering Science, University of Toronto, Toronto, M5S2E4, Canada
| | - Arvin Zaker
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
- Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, K1H8M5, Canada
| | - Amirhosein Ebrahimi
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
| | - Shivanshi Tripathi
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
- Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, K1H8M5, Canada
| | - Arvind Singh Mer
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
- Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, K1H8M5, Canada
- School of Electrical Engineering & Computer Science, University of Ottawa, Ottawa, K1N6N5, Canada
| |
Collapse
|
6
|
Taj F, Stein LD. MMDRP: drug response prediction and biomarker discovery using multi-modal deep learning. BIOINFORMATICS ADVANCES 2024; 4:vbae010. [PMID: 38371918 PMCID: PMC10872075 DOI: 10.1093/bioadv/vbae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 12/01/2023] [Accepted: 01/16/2024] [Indexed: 02/20/2024]
Abstract
Motivation A major challenge in cancer care is that patients with similar demographics, tumor types, and medical histories can respond quite differently to the same drug regimens. This difference is largely explained by genetic and other molecular variabilities among the patients and their cancers. Efforts in the pharmacogenomics field are underway to understand better the relationship between the genome of the patient's healthy and tumor cells and their response to therapy. To advance this goal, research groups and consortia have undertaken large-scale systematic screening of panels of drugs across multiple cancer cell lines that have been molecularly profiled by genomics, proteomics, and similar techniques. These large data drug screening sets have been applied to the problem of drug response prediction (DRP), the challenge of predicting the response of a previously untested drug/cell-line combination. Although deep learning algorithms outperform traditional methods, there are still many challenges in DRP that ultimately result in these models' low generalizability and hampers their clinical application. Results In this article, we describe a novel algorithm that addresses the major shortcomings of current DRP methods by combining multiple cell line characterization data, addressing drug response data skewness, and improving chemical compound representation. Availability and implementation MMDRP is implemented as an open-source, Python-based, command-line program and is available at https://github.com/LincolnSteinLab/MMDRP.
Collapse
Affiliation(s)
- Farzan Taj
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A1, Canada
- Adaptive Oncology, Ontario Institute for Cancer Research, Toronto, ON M5G 0A3, Canada
| | - Lincoln D Stein
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A1, Canada
- Adaptive Oncology, Ontario Institute for Cancer Research, Toronto, ON M5G 0A3, Canada
| |
Collapse
|
7
|
Branson N, Cutillas PR, Bessant C. Comparison of multiple modalities for drug response prediction with learning curves using neural networks and XGBoost. BIOINFORMATICS ADVANCES 2023; 4:vbad190. [PMID: 38282976 PMCID: PMC10812874 DOI: 10.1093/bioadv/vbad190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 12/19/2023] [Accepted: 12/22/2023] [Indexed: 01/30/2024]
Abstract
Motivation Anti-cancer drug response prediction is a central problem within stratified medicine. Transcriptomic profiles of cancer cell lines are typically used for drug response prediction, but we hypothesize that proteomics or phosphoproteomics might be more suitable as they give a more direct insight into cellular processes. However, there has not yet been a systematic comparison between all three of these datatypes using consistent evaluation criteria. Results Due to the limited number of cell lines with phosphoproteomics profiles we use learning curves, a plot of predictive performance as a function of dataset size, to compare the current performance and predict the future performance of the three omics datasets with more data. We use neural networks and XGBoost and compare them against a simple rule-based benchmark. We show that phosphoproteomics slightly outperforms RNA-seq and proteomics using the 38 cell lines with profiles of all three omics data types. Furthermore, using the 877 cell lines with proteomics and RNA-seq profiles, we show that RNA-seq slightly outperforms proteomics. With the learning curves we predict that the mean squared error using the phosphoproteomics dataset would decrease by ∼ 15 % if a dataset of the same size as the proteomics/transcriptomics was collected. For the cell lines with proteomics and RNA-seq profiles the learning curves reveal that for smaller dataset sizes neural networks outperform XGBoost and vice versa for larger datasets. Furthermore, the trajectory of the XGBoost curve suggests that it will improve faster than the neural networks as more data are collected. Availability and implementation See https://github.com/Nik-BB/Learning-curves-for-DRP for the code used.
Collapse
Affiliation(s)
- Nikhil Branson
- School of Biological and Behavioural Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
- Digital Environment Research Institute, Queen Mary University of London, London E1 1HH, United Kingdom
| | - Pedro R Cutillas
- Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, United Kingdom
| | - Conrad Bessant
- School of Biological and Behavioural Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
- Digital Environment Research Institute, Queen Mary University of London, London E1 1HH, United Kingdom
| |
Collapse
|
8
|
Greenberg ZF, Graim KS, He M. Towards artificial intelligence-enabled extracellular vesicle precision drug delivery. Adv Drug Deliv Rev 2023:114974. [PMID: 37356623 DOI: 10.1016/j.addr.2023.114974] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 06/27/2023]
Abstract
Extracellular Vesicles (EVs), particularly exosomes, recently exploded into nanomedicine as an emerging drug delivery approach due to their superior biocompatibility, circulating stability, and bioavailability in vivo. However, EV heterogeneity makes molecular targeting precision a critical challenge. Deciphering key molecular drivers for controlling EV tissue targeting specificity is in great need. Artificial intelligence (AI) brings powerful prediction ability for guiding the rational design of engineered EVs in precision control for drug delivery. This review focuses on cutting-edge nano-delivery via integrating large-scale EV data with AI to develop AI-directed EV therapies and illuminate the clinical translation potential. We briefly review the current status of EVs in drug delivery, including the current frontier, limitations, and considerations to advance the field. Subsequently, we detail the future of AI in drug delivery and its impact on precision EV delivery. Our review discusses the current universal challenge of standardization and critical considerations when using AI combined with EVs for precision drug delivery. Finally, we will conclude this review with a perspective on future clinical translation led by a combined effort of AI and EV research.
Collapse
Affiliation(s)
- Zachary F Greenberg
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, Florida, 32610, USA
| | - Kiley S Graim
- Department of Computer & Information Science & Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, 32610, USA
| | - Mei He
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, Florida, 32610, USA.
| |
Collapse
|
9
|
Shahzad M, Tahir MA, Alhussein M, Mobin A, Shams Malick RA, Anwar MS. NeuPD-A Neural Network-Based Approach to Predict Antineoplastic Drug Response. Diagnostics (Basel) 2023; 13:2043. [PMID: 37370938 DOI: 10.3390/diagnostics13122043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 06/01/2023] [Accepted: 06/05/2023] [Indexed: 06/29/2023] Open
Abstract
With the beginning of the high-throughput screening, in silico-based drug response analysis has opened lots of research avenues in the field of personalized medicine. For a decade, many different predicting techniques have been recommended for the antineoplastic (anti-cancer) drug response, but still, there is a need for improvements in drug sensitivity prediction. The intent of this research study is to propose a framework, namely NeuPD, to validate the potential anti-cancer drugs against a panel of cancer cell lines in publicly available datasets. The datasets used in this work are Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE). As not all drugs are effective on cancer cell lines, we have worked on 10 essential drugs from the GDSC dataset that have achieved the best modeling results in previous studies. We also extracted 1610 essential oncogene expressions from 983 cell lines from the same dataset. Whereas, from the CCLE dataset, 16,383 gene expressions from 1037 cell lines and 24 drugs have been used in our experiments. For dimensionality reduction, Pearson correlation is applied to best fit the model. We integrate the genomic features of cell lines and drugs' fingerprints to fit the neural network model. For evaluation of the proposed NeuPD framework, we have used repeated K-fold cross-validation with 5 times repeats where K = 10 to demonstrate the performance in terms of root mean square error (RMSE) and coefficient determination (R2). The results obtained on the GDSC dataset that were measured using these cost functions show that our proposed NeuPD framework has outperformed existing approaches with an RMSE of 0.490 and R2 of 0.929.
Collapse
Affiliation(s)
- Muhammad Shahzad
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Muhammad Atif Tahir
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Musaed Alhussein
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 51178, Riyadh 11543, Saudi Arabia
| | - Ansharah Mobin
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Rauf Ahmed Shams Malick
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Muhammad Shahid Anwar
- Department of AI and Software, Gachon University, Seongnam-si 13120, Republic of Korea
| |
Collapse
|
10
|
Das T, Bhattarai K, Rajaganapathy S, Wang L, Cerhan JR, Zong N. Leveraging multi-source to resolve inconsistency across pharmacogenomic datasets in drug sensitivity prediction. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.25.23290546. [PMID: 37333219 PMCID: PMC10274988 DOI: 10.1101/2023.05.25.23290546] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Pharmacogenomics datasets have been generated for various purposes, such as investigating different biomarkers. However, when studying the same cell line with the same drugs, differences in drug responses exist between studies. These variations arise from factors such as inter-tumoral heterogeneity, experimental standardization, and the complexity of cell subtypes. Consequently, drug response prediction suffers from limited generalizability. To address these challenges, we propose a computational model based on Federated Learning (FL) for drug response prediction. By leveraging three pharmacogenomics datasets (CCLE, GDSC2, and gCSI), we evaluate the performance of our model across diverse cell line-based databases. Our results demonstrate superior predictive performance compared to baseline methods and traditional FL approaches through various experimental tests. This study underscores the potential of employing FL to leverage multiple data sources, enabling the development of generalized models that account for inconsistencies among pharmacogenomics datasets. By addressing the limitations of low generalizability, our approach contributes to advancing drug response prediction in precision oncology.
Collapse
Affiliation(s)
- Trisha Das
- University of Illinois Urbana-Champaign, Champaign, Illinois, United States
| | | | - Sivaraman Rajaganapathy
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN, USA
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN
| | - James R. Cerhan
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Nansu Zong
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
11
|
Sun J, Xu M, Ru J, James-Bott A, Xiong D, Wang X, Cribbs AP. Small molecule-mediated targeting of microRNAs for drug discovery: Experiments, computational techniques, and disease implications. Eur J Med Chem 2023; 257:115500. [PMID: 37262996 DOI: 10.1016/j.ejmech.2023.115500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/05/2023] [Accepted: 05/15/2023] [Indexed: 06/03/2023]
Abstract
Small molecules have been providing medical breakthroughs for human diseases for more than a century. Recently, identifying small molecule inhibitors that target microRNAs (miRNAs) has gained importance, despite the challenges posed by labour-intensive screening experiments and the significant efforts required for medicinal chemistry optimization. Numerous experimentally-verified cases have demonstrated the potential of miRNA-targeted small molecule inhibitors for disease treatment. This new approach is grounded in their posttranscriptional regulation of the expression of disease-associated genes. Reversing dysregulated gene expression using this mechanism may help control dysfunctional pathways. Furthermore, the ongoing improvement of algorithms has allowed for the integration of computational strategies built on top of laboratory-based data, facilitating a more precise and rational design and discovery of lead compounds. To complement the use of extensive pharmacogenomics data in prioritising potential drugs, our previous work introduced a computational approach based on only molecular sequences. Moreover, various computational tools for predicting molecular interactions in biological networks using similarity-based inference techniques have been accumulated in established studies. However, there are a limited number of comprehensive reviews covering both computational and experimental drug discovery processes. In this review, we outline a cohesive overview of both biological and computational applications in miRNA-targeted drug discovery, along with their disease implications and clinical significance. Finally, utilizing drug-target interaction (DTIs) data from DrugBank, we showcase the effectiveness of deep learning for obtaining the physicochemical characterization of DTIs.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| | - Miaoer Xu
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | - Jinlong Ru
- Chair of Prevention of Microbial Diseases, School of Life Sciences Weihenstephan, Technical University of Munich, Freising, 85354, Germany
| | - Anna James-Bott
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Xia Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China.
| | - Adam P Cribbs
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| |
Collapse
|
12
|
Partin A, Brettin T, Zhu Y, Dolezal JM, Kochanny S, Pearson AT, Shukla M, Evrard YA, Doroshow JH, Stevens RL. Data augmentation and multimodal learning for predicting drug response in patient-derived xenografts from gene expressions and histology images. Front Med (Lausanne) 2023; 10:1058919. [PMID: 36960342 PMCID: PMC10027779 DOI: 10.3389/fmed.2023.1058919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 02/10/2023] [Indexed: 03/09/2023] Open
Abstract
Patient-derived xenografts (PDXs) are an appealing platform for preclinical drug studies. A primary challenge in modeling drug response prediction (DRP) with PDXs and neural networks (NNs) is the limited number of drug response samples. We investigate multimodal neural network (MM-Net) and data augmentation for DRP in PDXs. The MM-Net learns to predict response using drug descriptors, gene expressions (GE), and histology whole-slide images (WSIs). We explore whether combining WSIs with GE improves predictions as compared with models that use GE alone. We propose two data augmentation methods which allow us training multimodal and unimodal NNs without changing architectures with a single larger dataset: 1) combine single-drug and drug-pair treatments by homogenizing drug representations, and 2) augment drug-pairs which doubles the sample size of all drug-pair samples. Unimodal NNs which use GE are compared to assess the contribution of data augmentation. The NN that uses the original and the augmented drug-pair treatments as well as single-drug treatments outperforms NNs that ignore either the augmented drug-pairs or the single-drug treatments. In assessing the multimodal learning based on the MCC metric, MM-Net outperforms all the baselines. Our results show that data augmentation and integration of histology images with GE can improve prediction performance of drug response in PDXs.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Thomas Brettin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - James M. Dolezal
- Section of Hematology/Oncology, Department of Medicine, University of Chicago Medical Center, Chicago, IL, United States
| | - Sara Kochanny
- Section of Hematology/Oncology, Department of Medicine, University of Chicago Medical Center, Chicago, IL, United States
| | - Alexander T. Pearson
- Section of Hematology/Oncology, Department of Medicine, University of Chicago Medical Center, Chicago, IL, United States
| | - Maulik Shukla
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yvonne A. Evrard
- Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD, United States
| | - James H. Doroshow
- Division of Cancer Therapeutics and Diagnosis, National Cancer Institute, Bethesda, MD, United States
| | - Rick L. Stevens
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, The University of Chicago, Chicago, IL, United States
| |
Collapse
|
13
|
Partin A, Brettin TS, Zhu Y, Narykov O, Clyde A, Overbeek J, Stevens RL. Deep learning methods for drug response prediction in cancer: Predominant and emerging trends. Front Med (Lausanne) 2023; 10:1086097. [PMID: 36873878 PMCID: PMC9975164 DOI: 10.3389/fmed.2023.1086097] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/23/2023] [Indexed: 02/17/2023] Open
Abstract
Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Thomas S. Brettin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Austin Clyde
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Jamie Overbeek
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Rick L. Stevens
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, The University of Chicago, Chicago, IL, United States
| |
Collapse
|
14
|
Shen B, Feng F, Li K, Lin P, Ma L, Li H. A systematic assessment of deep learning methods for drug response prediction: from in vitro to clinical applications. Brief Bioinform 2023; 24:6961794. [PMID: 36575826 DOI: 10.1093/bib/bbac605] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/30/2022] [Accepted: 12/09/2022] [Indexed: 12/29/2022] Open
Abstract
Drug response prediction is an important problem in personalized cancer therapy. Among various newly developed models, significant improvement in prediction performance has been reported using deep learning methods. However, systematic comparisons of deep learning methods, especially of the transferability from preclinical models to clinical cohorts, are currently lacking. To provide a more rigorous assessment, the performance of six representative deep learning methods for drug response prediction using nine evaluation metrics, including the overall prediction accuracy, predictability of each drug, potential associated factors and transferability to clinical cohorts, in multiple application scenarios was benchmarked. Most methods show promising prediction within cell line datasets, and TGSA, with its lower time cost and better performance, is recommended. Although the performance metrics decrease when applying models trained on cell lines to patients, a certain amount of power to distinguish clinical response on some drugs can be maintained using CRDNN and TGSA. With these assessments, we provide a guidance for researchers to choose appropriate methods, as well as insights into future directions for the development of more effective methods in clinical scenarios.
Collapse
Affiliation(s)
- Bihan Shen
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Fangyoumin Feng
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Kunshi Li
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Ping Lin
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Liangxiao Ma
- Bio-Med Big Data Center at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Hong Li
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
15
|
Utilization of Cancer Cell Line Screening to Elucidate the Anticancer Activity and Biological Pathways Related to the Ruthenium-Based Therapeutic BOLD-100. Cancers (Basel) 2022; 15:cancers15010028. [PMID: 36612025 PMCID: PMC9817855 DOI: 10.3390/cancers15010028] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 11/30/2022] [Accepted: 12/16/2022] [Indexed: 12/24/2022] Open
Abstract
BOLD-100 (sodium trans-[tetrachlorobis(1H indazole)ruthenate(III)]) is a ruthenium-based anticancer compound currently in clinical development. The identification of cancer types that show increased sensitivity towards BOLD-100 can lead to improved developmental strategies. Sensitivity profiling can also identify mechanisms of action that are pertinent for the bioactivity of complex therapeutics. Sensitivity to BOLD-100 was measured in a 319-cancer-cell line panel spanning 24 tissues. BOLD-100's sensitivity profile showed variation across the tissue lineages, including increased response in esophageal, bladder, and hematologic cancers. Multiple cancers, including esophageal, bile duct and colon cancer, had higher relative response to BOLD-100 than to cisplatin. Response to BOLD-100 showed only moderate correlation to anticancer compounds in the Genomics of Drug Sensitivity in Cancer (GDSC) database, as well as no clear theme in bioactivity of correlated hits, suggesting that BOLD-100 may have a differentiated therapeutic profile. The genomic modalities of cancer cell lines were modeled against the BOLD-100 sensitivity profile, which revealed that genes related to ribosomal processes were associated with sensitivity to BOLD-100. Machine learning modeling of the sensitivity profile to BOLD-100 and gene expression data provided moderative predictive value. These findings provide further mechanistic understanding around BOLD-100 and support its development for additional cancer types.
Collapse
|
16
|
Multi-Omics Alleviates the Limitations of Panel Sequencing for Cancer Drug Response Prediction. Cancers (Basel) 2022; 14:cancers14225604. [PMID: 36428696 PMCID: PMC9688044 DOI: 10.3390/cancers14225604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 11/10/2022] [Accepted: 11/12/2022] [Indexed: 11/17/2022] Open
Abstract
Comprehensive genomic profiling using cancer gene panels has been shown to improve treatment options for a variety of cancer types. However, genomic aberrations detected via such gene panels do not necessarily serve as strong predictors of drug sensitivity. In this study, using pharmacogenomics datasets of cell lines, patient-derived xenografts, and ex vivo treated fresh tumor specimens, we demonstrate that utilizing the transcriptome on top of gene panel features substantially improves drug response prediction performance in cancer.
Collapse
|
17
|
Yingtaweesittikul H, Wu J, Mongia A, Peres R, Ko K, Nagarajan N, Suphavilai C. CREAMMIST: an integrative probabilistic database for cancer drug response prediction. Nucleic Acids Res 2022; 51:D1242-D1248. [PMID: 36259664 PMCID: PMC9825458 DOI: 10.1093/nar/gkac911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/18/2022] [Accepted: 10/11/2022] [Indexed: 01/30/2023] Open
Abstract
Extensive in vitro cancer drug screening datasets have enabled scientists to identify biomarkers and develop machine learning models for predicting drug sensitivity. While most advancements have focused on omics profiles, cancer drug sensitivity scores precalculated by the original sources are often used as-is, without consideration for variabilities between studies. It is well-known that significant inconsistencies exist between the drug sensitivity scores across datasets due to differences in experimental setups and preprocessing methods used to obtain the sensitivity scores. As a result, many studies opt to focus only on a single dataset, leading to underutilization of available data and a limited interpretation of cancer pharmacogenomics analysis. To overcome these caveats, we have developed CREAMMIST (https://creammist.mtms.dev), an integrative database that enables users to obtain an integrative dose-response curve, to capture uncertainty (or high certainty when multiple datasets well align) across five widely used cancer cell-line drug-response datasets. We utilized the Bayesian framework to systematically integrate all available dose-response values across datasets (>14 millions dose-response data points). CREAMMIST provides easy-to-use statistics derived from the integrative dose-response curves for various downstream analyses such as identifying biomarkers, selecting drug concentrations for experiments, and training robust machine learning models.
Collapse
Affiliation(s)
| | - Jiaxi Wu
- Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Aanchal Mongia
- Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Rafael Peres
- Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Karrie Ko
- Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | | | - Chayaporn Suphavilai
- To whom correspondence should be addressed. Tel: +65 86213683; Fax: +65 68088292;
| |
Collapse
|
18
|
Zhao Z, Wang S, Zucknick M, Aittokallio T. Tissue-specific identification of multi-omics features for pan-cancer drug response prediction. iScience 2022; 25:104767. [PMID: 35992090 PMCID: PMC9385562 DOI: 10.1016/j.isci.2022.104767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 06/28/2022] [Accepted: 07/11/2022] [Indexed: 11/29/2022] Open
Abstract
Current statistical models for drug response prediction and biomarker identification fall short in leveraging the shared and unique information from various cancer tissues and multi-omics profiles. We developed mix-lasso model that introduces an additional sample group penalty term to capture tissue-specific effects of features on pan-cancer response prediction. The mix-lasso model takes into account both the similarity between drug responses (i.e., multi-task learning), and the heterogeneity between multi-omics data (multi-modal learning). When applied to large-scale pharmacogenomics dataset from Cancer Therapeutics Response Portal, mix-lasso enabled accurate drug response predictions and identification of tissue-specific predictive features in the presence of various degrees of missing data, drug-drug correlations, and high-dimensional and correlated genomic and molecular features that often hinder the use of statistical approaches in drug response modeling. Compared to tree lasso model, mix-lasso identified a smaller number of tissue-specific features, hence making the model more interpretable and stable for drug discovery applications. Pan-cancer cell lines provide a test bench for exploring gene-drug relationships Multi-omics data were integrated with pharmacological profiles for joint modeling Mix-lasso identifies tissue-specific biomarkers predictive of multi-drug responses Mix-lasso provides small number of stable features for drug discovery applications
Collapse
Affiliation(s)
- Zhi Zhao
- Institute for Cancer Research, Department of Cancer Genetics, Oslo University Hospital, Norway
- Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Norway
| | - Shixiong Wang
- Institute for Cancer Research, Department of Cancer Genetics, Oslo University Hospital, Norway
| | - Manuela Zucknick
- Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Norway
- Corresponding author
| | - Tero Aittokallio
- Institute for Cancer Research, Department of Cancer Genetics, Oslo University Hospital, Norway
- Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Norway
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Finland
- Corresponding author
| |
Collapse
|
19
|
Construction and Validation of a UPR-Associated Gene Prognostic Model for Head and Neck Squamous Cell Carcinoma. BIOMED RESEARCH INTERNATIONAL 2022; 2022:8677309. [PMID: 35707371 PMCID: PMC9192238 DOI: 10.1155/2022/8677309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 05/12/2022] [Indexed: 11/27/2022]
Abstract
Our study is aimed at constructing and validating a UPR-associated gene signature to predict HNSCC prognosis. We obtained 544 samples of RNA sequencing data and clinical characteristics from TCGA database and randomly grouped the samples into training and testing cohorts (1 : 1 ratio). After identifying 14 UPR-associated genes with LASSO and univariate Cox regression analysis, HNSCC samples were categorized into low-risk (LR) and high-risk (HR) subgroups depending on the risk score. Our analyses indicated that low-risk patients had a much better prognosis in the training and testing cohorts. To predict the HNSCC prognosis with the 14 UPR-associated gene signatures, we incorporated the UPR gene risk score, N stage, M stage, and age into a nomogram model. We further explored the sensitivity to anticancer drugs by using the IC50 analysis in two subgroups from the Cancer Genome Project database. The outcomes showed that the AKT inhibitor III and sorafenib were sensitive anticancer drugs in HR and LR patients, respectively. The immune cell infiltration analysis and GSEA provided strong evidence for elucidating the molecular mechanisms of UPR-associated genes affecting HNSCC. In conclusion, the UPR-associated gene risk score, N stage, M stage, and age can serve as a robust model for predicting prognosis and can improve decision-making at the individual patient level.
Collapse
|
20
|
Ba-Alawi W, Kadambat Nair S, Li B, Mammoliti A, Smirnov P, Mer AS, Penn LZ, Haibe-Kains B. Bimodal gene expression in cancer patients provides interpretable biomarkers for drug sensitivity. Cancer Res 2022; 82:2378-2387. [PMID: 35536872 DOI: 10.1158/0008-5472.can-21-2395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 02/24/2022] [Accepted: 05/06/2022] [Indexed: 11/16/2022]
Abstract
Identifying biomarkers predictive of cancer cell response to drug treatment constitutes one of the main challenges in precision oncology. Recent large-scale cancer pharmacogenomic studies have opened new avenues of research to develop predictive biomarkers by profiling thousands of human cancer cell lines at the molecular level and screening them with hundreds of approved drugs and experimental chemical compounds. Many studies have leveraged these data to build predictive models of response using various statistical and machine learning methods. However, a common pitfall to these methods is the lack of interpretability as to how they make predictions, hindering the clinical translation of these models. To alleviate this issue, we used the recent logic modeling approach to develop a new machine learning pipeline that explores the space of bimodally expressed genes in multiple large in vitro pharmacogenomic studies and builds multivariate, nonlinear, yet interpretable logic-based models predictive of drug response. The performance of this approach was showcased in a compendium of the three largest in vitro pharmacogenomic data sets to build robust and interpretable models for 101 drugs that span 17 drug classes with high validation rates in independent datasets. These results along with in vivo and clinical validation, support a better translation of gene expression biomarkers between model systems using bimodal gene expression.
Collapse
Affiliation(s)
| | | | - Bo Li
- University of Toronto, Toronto, Canada
| | | | | | | | - Linda Z Penn
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada
| | | |
Collapse
|
21
|
Out-of-distribution generalization from labelled and unlabelled gene expression data for drug response prediction. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00408-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|