1
|
Carli F, Di Chiaro P, Morelli M, Arora C, Bisceglia L, De Oliveira Rosa N, Cortesi A, Franceschi S, Lessi F, Di Stefano AL, Santonocito OS, Pasqualetti F, Aretini P, Miglionico P, Diaferia GR, Giannotti F, Liò P, Duran-Frigola M, Mazzanti CM, Natoli G, Raimondi F. Learning and actioning general principles of cancer cell drug sensitivity. Nat Commun 2025; 16:1654. [PMID: 39952993 PMCID: PMC11828915 DOI: 10.1038/s41467-025-56827-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 02/03/2025] [Indexed: 02/17/2025] Open
Abstract
High-throughput screening of drug sensitivity of cancer cell lines (CCLs) holds the potential to unlock anti-tumor therapies. In this study, we leverage such datasets to predict drug response using cell line transcriptomics, focusing on models' interpretability and deployment on patients' data. We use large language models (LLMs) to match drug to mechanisms of action (MOA)-related pathways. Genes crucial for prediction are enriched in drug-MOAs, suggesting that our models learn the molecular determinants of response. Furthermore, by using only LLM-curated, MOA-genes, we enhance the predictive accuracy of our models. To enhance translatability, we align RNAseq data from CCLs, used for training, to those from patient samples, used for inference. We validated our approach on TCGA samples, where patients' best scoring drugs match those prescribed for their cancer type. We further predict and experimentally validate effective drugs for the patients of two highly lethal solid tumors, i.e., pancreatic cancer and glioblastoma.
Collapse
Affiliation(s)
- Francesco Carli
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Pisa, Italy.
- Department of Computer Science, Univerisity of Pisa, Pisa, Italy.
| | - Pierluigi Di Chiaro
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
| | | | - Chakit Arora
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Pisa, Italy
| | - Luisa Bisceglia
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Pisa, Italy
| | | | - Alice Cortesi
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
| | | | | | | | | | | | | | | | - Giuseppe R Diaferia
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
- Botton-Champalimaud Pancreatic Cancer Center, Champalimaud Foundation, Lisbon, Portugal
| | | | - Pietro Liò
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
| | | | | | - Gioacchino Natoli
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
| | | |
Collapse
|
2
|
Wang C, Kumar GA, Rajapakse JC. Drug discovery and mechanism prediction with explainable graph neural networks. Sci Rep 2025; 15:179. [PMID: 39747341 PMCID: PMC11696803 DOI: 10.1038/s41598-024-83090-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Accepted: 12/11/2024] [Indexed: 01/04/2025] Open
Abstract
Apprehension of drug action mechanism is paramount for drug response prediction and precision medicine. The unprecedented development of machine learning and deep learning algorithms has expedited the drug response prediction research. However, existing methods mainly focus on forward encoding of drugs, which is to obtain an accurate prediction of the response levels, but omitted to decipher the reaction mechanism between drug molecules and genes. We propose the eXplainable Graph-based Drug response Prediction (XGDP) approach that achieves a precise drug response prediction and reveals the comprehensive mechanism of action between drugs and their targets. XGDP represents drugs with molecular graphs, which naturally preserve the structural information of molecules and a Graph Neural Network module is applied to learn the latent features of molecules. Gene expression data from cancer cell lines are incorporated and processed by a Convolutional Neural Network module. A couple of deep learning attribution algorithms are leveraged to interpret interactions between drug molecular features and genes. We demonstrate that XGDP not only enhances the prediction accuracy compared to pioneering works but is also capable of capturing the salient functional groups of drugs and interactions with significant genes of cancer cells.
Collapse
Affiliation(s)
- Conghao Wang
- College of Computing and Data Science, Nanyang Technological University, Singapore, 639798, Singapore
| | - Gaurav Asok Kumar
- College of Computing and Data Science, Nanyang Technological University, Singapore, 639798, Singapore
| | - Jagath C Rajapakse
- College of Computing and Data Science, Nanyang Technological University, Singapore, 639798, Singapore.
| |
Collapse
|
3
|
Xiao M, Zheng Q, Popa P, Mi X, Hu J, Zou F, Zou B. Drug molecular representations for drug response predictions: a comprehensive investigation via machine learning methods. Sci Rep 2025; 15:20. [PMID: 39748003 PMCID: PMC11696021 DOI: 10.1038/s41598-024-84711-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Accepted: 12/26/2024] [Indexed: 01/04/2025] Open
Abstract
The integration of drug molecular representations into predictive models for Drug Response Prediction (DRP) is a standard procedure in pharmaceutical research and development. However, the comparative effectiveness of combining these representations with genetic profiles for DRP remains unclear. This study conducts a comprehensive evaluation of the efficacy of various drug molecular representations employing cutting-edge machine learning models under various experimental settings. Our findings reveal that the inclusion of molecular representations from either PubChem fingerprints or SMILES can significantly enhance the performance of DRPs when used in conjunction with deep learning models. However, the optimal choice of drug molecular representation can vary depending on the predictive model and the specific DRP task. The insights derived from our study offer useful guidance on selecting the most suitable drug molecular representations for constructing efficient predictive models for DRPs, aiding for drug repurposing, personalized medicine, and new drug discovery.
Collapse
Affiliation(s)
- Meisheng Xiao
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Qianhui Zheng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | | | - Xinlei Mi
- Gilead Science, Inc, Foster City, USA
| | - Jianhua Hu
- Department of Biostatistics, Columbia University, New York, USA
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Baiming Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA.
- School of Nursing, University of North Carolina at Chapel Hill, Chapel Hill, USA.
| |
Collapse
|
4
|
Firoozbakht F, Yousefi B, Tsoy O, Baumbach J, Schwikowski B. Comparative evaluation of feature reduction methods for drug response prediction. Sci Rep 2024; 14:30885. [PMID: 39730699 DOI: 10.1038/s41598-024-81866-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 11/29/2024] [Indexed: 12/29/2024] Open
Abstract
Personalized medicine aims to tailor medical treatments to individual patients, and predicting drug responses from molecular profiles using machine learning is crucial for this goal. However, the high dimensionality of the molecular profiles compared to the limited number of samples presents significant challenges. Knowledge-based feature selection methods are particularly suitable for drug response prediction, as they leverage biological insights to reduce dimensionality and improve model interpretability. This study presents the first comparative evaluation of nine different knowledge-based and data-driven feature reduction methods on cell line and tumor data. Our analysis employs six distinct machine learning models, with a total of more than 6,000 runs to ensure a robust evaluation. Our findings indicate that transcription factor activities outperform other methods in predicting drug responses, effectively distinguishing between sensitive and resistant tumors for seven of the 20 drugs evaluated.
Collapse
Affiliation(s)
- Farzaneh Firoozbakht
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Behnam Yousefi
- Computational Systems Biomedicine Lab, Institut Pasteur, Université Paris Cité, Paris, France
- École Doctorale Complexite du vivant, Sorbonne Université, Paris, France
- Institute of Medical Systems Biology, Center for Biomedical AI (bAIome), Center for Molecular Neurobiology (ZMNH), University Medical Center Hamburg-Eppendorf, 20251, Hamburg, Germany
| | - Olga Tsoy
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Computational BioMedicine Lab, University of Southern Denmark, Odense, Denmark
| | - Benno Schwikowski
- Computational Systems Biomedicine Lab, Institut Pasteur, Université Paris Cité, Paris, France.
| |
Collapse
|
5
|
De Landtsheer S, Badkas A, Kulms D, Sauter T. Model ensembling as a tool to form interpretable multi-omic predictors of cancer pharmacosensitivity. Brief Bioinform 2024; 25:bbae567. [PMID: 39494610 PMCID: PMC11532660 DOI: 10.1093/bib/bbae567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 09/23/2024] [Accepted: 10/22/2024] [Indexed: 11/05/2024] Open
Abstract
Stratification of patients diagnosed with cancer has become a major goal in personalized oncology. One important aspect is the accurate prediction of the response to various drugs. It is expected that the molecular characteristics of the cancer cells contain enough information to retrieve specific signatures, allowing for accurate predictions based solely on these multi-omic data. Ideally, these predictions should be explainable to clinicians, in order to be integrated in the patients care. We propose a machine-learning framework based on ensemble learning to integrate multi-omic data and predict sensitivity to an array of commonly used and experimental compounds, including chemotoxic compounds and targeted kinase inhibitors. We trained a set of classifiers on the different parts of our dataset to produce omic-specific signatures, then trained a random forest classifier on these signatures to predict drug responsiveness. We used the Cancer Cell Line Encyclopedia dataset, comprising multi-omic and drug sensitivity measurements for hundreds of cell lines, to build the predictive models, and validated the results using nested cross-validation. Our results show good performance for several compounds (Area under the Receiver-Operating Curve >79%) across the most frequent cancer types. Furthermore, the simplicity of our approach allows to examine which omic layers have a greater importance in the models and identify new putative markers of drug responsiveness. We propose several models based on small subsets of transcriptional markers with the potential to become useful tools in personalized oncology, paving the way for clinicians to use the molecular characteristics of the tumors to predict sensitivity to therapeutic compounds.
Collapse
Affiliation(s)
- Sébastien De Landtsheer
- Department of Life Sciences and Medicine, University of Luxembourg, 2, place de l’Université, L4365 Esch-sur-Alzette, Luxembourg
| | - Apurva Badkas
- Department of Life Sciences and Medicine, University of Luxembourg, 2, place de l’Université, L4365 Esch-sur-Alzette, Luxembourg
| | - Dagmar Kulms
- Experimental Dermatology, Department of Dermatology, Technische Universität-Dresden, 01307 Dresden, Germany
- National Center for Tumor Diseases, Technische Universität-Dresden, 01307 Dresden, Germany
| | - Thomas Sauter
- Department of Life Sciences and Medicine, University of Luxembourg, 2, place de l’Université, L4365 Esch-sur-Alzette, Luxembourg
| |
Collapse
|
6
|
Hakami MA. Harnessing machine learning potential for personalised drug design and overcoming drug resistance. J Drug Target 2024; 32:918-930. [PMID: 38842417 DOI: 10.1080/1061186x.2024.2365934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 06/01/2024] [Accepted: 06/04/2024] [Indexed: 06/07/2024]
Abstract
Drug resistance in cancer treatment presents a significant challenge, necessitating innovative approaches to improve therapeutic efficacy. Integrating machine learning (ML) in cancer research is promising as ML algorithms outrival in analysing complex datasets, identifying patterns, and predicting treatment outcomes. Leveraging diverse data sources such as genomic profiles, clinical records, and drug response assays, ML uncovers molecular mechanisms of drug resistance, enabling personalised treatment, maximising efficacy and minimising adverse effects. Various ML algorithms contribute to the drug discovery process - Random Forest and Decision Trees predict drug-target interactions and aid in virtual screening, and SVM classify leads on bioactivity data. Neural Networks model QSAR to optimise lead compounds and K-means clustering group compounds with similar chemical properties aiding compound selection. Gaussian Processes predict drug responses, Bayesian Networks infer causal relationships, Autoencoders generate novel compounds, and Genetic Algorithms optimise molecular structures. These algorithms collectively enhance efficiency and success rates in drug design endeavours, from lead identification to optimisation and are cost-effective, empowering clinicians with real-time treatment monitoring and improving patient outcomes. This review highlights the immense potential of ML in revolutionising cancer care through effective drug design to reduce drug resistance, and we have also discussed various limitations and research gaps to understand better.
Collapse
Affiliation(s)
- Mohammed Ageeli Hakami
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Shaqra University, Al-Quwayiyah, Riyadh, Saudi Arabia
| |
Collapse
|
7
|
Matboli M, Abdelbaky I, Khaled A, Khaled R, Hamady S, Farid LM, Abouelkhair MB, El-Attar NE, Farag Fathallah M, Abd El Hamid MS, Elmakromy GM, Ali M. Machine learning based identification potential feature genes for prediction of drug efficacy in nonalcoholic steatohepatitis animal model. Lipids Health Dis 2024; 23:266. [PMID: 39182075 PMCID: PMC11344433 DOI: 10.1186/s12944-024-02231-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Accepted: 07/30/2024] [Indexed: 08/27/2024] Open
Abstract
BACKGROUND Nonalcoholic Steatohepatitis (NASH) results from complex liver conditions involving metabolic, inflammatory, and fibrogenic processes. Despite its burden, there has been a lack of any approved food-and-drug administration therapy up till now. PURPOSE Utilizing machine learning (ML) algorithms, the study aims to identify reliable potential genes to accurately predict the treatment response in the NASH animal model using biochemical and molecular markers retrieved using bioinformatics techniques. METHODS The NASH-induced rat models were administered various microbiome-targeted therapies and herbal drugs for 12 weeks, these drugs resulted in reducing hepatic lipid accumulation, liver inflammation, and histopathological changes. The ML model was trained and tested based on the Histopathological NASH score (HPS); while (0-4) HPS considered Improved NASH and (5-8) considered non-improved, confirmed through rats' liver histopathological examination, incorporates 34 features comprising 20 molecular markers (mRNAs-microRNAs-Long non-coding-RNAs) and 14 biochemical markers that are highly enriched in NASH pathogenesis. Six different ML models were used in the proposed model for the prediction of NASH improvement, with Gradient Boosting demonstrating the highest accuracy of 98% in predicting NASH drug response. FINDINGS Following a gradual reduction in features, the outcomes demonstrated superior performance when employing the Random Forest classifier, yielding an accuracy of 98.4%. The principal selected molecular features included YAP1, LATS1, NF2, SRD5A3-AS1, FOXA2, TEAD2, miR-650, MMP14, ITGB1, and miR-6881-5P, while the biochemical markers comprised triglycerides (TG), ALT, ALP, total bilirubin (T. Bilirubin), alpha-fetoprotein (AFP), and low-density lipoprotein cholesterol (LDL-C). CONCLUSION This study introduced an ML model incorporating 16 noninvasive features, including molecular and biochemical signatures, which achieved high performance and accuracy in detecting NASH improvement. This model could potentially be used as diagnostic tools and to identify target therapies.
Collapse
Affiliation(s)
- Marwa Matboli
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt.
| | - Ibrahim Abdelbaky
- Artificial Intelligence Department, Faculty of Computers and Artificial Intelligence, Benha University, Benha City, Egypt
| | - Abdelrahman Khaled
- Bioinformatics Group, Center of Informatics Sciences (CIS), School of Information Technology and Computer Sciences, Nile University, Giza, Egypt
| | - Radwa Khaled
- Biotechnology/Biomolecular Chemistry Department, Faculty of Science, Cairo University, Cairo, Egypt
- Basic Sciences Department, Modern University for Technology and Information, Cairo, Egypt
| | | | - Laila M Farid
- Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | | | - Noha E El-Attar
- Information System Department, Faculty of Computers and Artificial Intelligence, Benha University, Benha City, Egypt
- Faculty of Artificial Intelligence, Delta University for Science and Technology, Gamasa, 35712, Egypt
| | - Mohamed Farag Fathallah
- Medical Pathology Department, Faculty of Medicine, Cairo University, Cairo, Egypt
- Medical Physiology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Manal S Abd El Hamid
- Medical Physiology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Gena M Elmakromy
- Endocrinology & Diabetes Mellitus Unit, Department of Internal Medicine, Badr University in Cairo, Badr City, Egypt
| | - Marwa Ali
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| |
Collapse
|
8
|
Mohammadzadeh-Vardin T, Ghareyazi A, Gharizadeh A, Abbasi K, Rabiee HR. DeepDRA: Drug repurposing using multi-omics data integration with autoencoders. PLoS One 2024; 19:e0307649. [PMID: 39058696 PMCID: PMC11280260 DOI: 10.1371/journal.pone.0307649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 07/09/2024] [Indexed: 07/28/2024] Open
Abstract
Cancer treatment has become one of the biggest challenges in the world today. Different treatments are used against cancer; drug-based treatments have shown better results. On the other hand, designing new drugs for cancer is costly and time-consuming. Some computational methods, such as machine learning and deep learning, have been suggested to solve these challenges using drug repurposing. Despite the promise of classical machine-learning methods in repurposing cancer drugs and predicting responses, deep-learning methods performed better. This study aims to develop a deep-learning model that predicts cancer drug response based on multi-omics data, drug descriptors, and drug fingerprints and facilitates the repurposing of drugs based on those responses. To reduce multi-omics data's dimensionality, we use autoencoders. As a multi-task learning model, autoencoders are connected to MLPs. We extensively tested our model using three primary datasets: GDSC, CTRP, and CCLE to determine its efficacy. In multiple experiments, our model consistently outperforms existing state-of-the-art methods. Compared to state-of-the-art models, our model achieves an impressive AUPRC of 0.99. Furthermore, in a cross-dataset evaluation, where the model is trained on GDSC and tested on CCLE, it surpasses the performance of three previous works, achieving an AUPRC of 0.72. In conclusion, we presented a deep learning model that outperforms the current state-of-the-art regarding generalization. Using this model, we could assess drug responses and explore drug repurposing, leading to the discovery of novel cancer drugs. Our study highlights the potential for advanced deep learning to advance cancer therapeutic precision.
Collapse
Affiliation(s)
- Taha Mohammadzadeh-Vardin
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| | - Amin Ghareyazi
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| | - Ali Gharizadeh
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| | - Karim Abbasi
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
- Faculty of Mathematics and Computer Science, Kharazmi University, Tehran, Iran
| | - Hamid R. Rabiee
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| |
Collapse
|
9
|
Dey V, Ning X. Improving Anticancer Drug Selection and Prioritization via Neural Learning to Rank. J Chem Inf Model 2024; 64:4071-4088. [PMID: 38740382 PMCID: PMC11134508 DOI: 10.1021/acs.jcim.3c01060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 03/27/2024] [Accepted: 04/16/2024] [Indexed: 05/16/2024]
Abstract
Personalized cancer treatment requires a thorough understanding of complex interactions between drugs and cancer cell lines in varying genetic and molecular contexts. To address this, high-throughput screening has been used to generate large-scale drug response data, facilitating data-driven computational models. Such models can capture complex drug-cell line interactions across various contexts in a fully data-driven manner. However, accurately prioritizing the most effective drugs for each cell line still remains a significant challenge. To address this, we developed multiple neural ranking approaches that leverage large-scale drug response data across multiple cell lines from diverse cancer types. Unlike existing approaches that primarily utilize regression and classification techniques for drug response prediction, we formulated the objective of drug selection and prioritization as a drug ranking problem. In this work, we proposed multiple pairwise and listwise neural ranking methods that learn latent representations of drugs and cell lines and then use those representations to score drugs in each cell line via a learnable scoring function. Specifically, we developed neural pairwise and listwise ranking methods, Pair-PushC and List-One on top of the existing methods, pLETORg and ListNet, respectively. Additionally, we proposed a novel listwise ranking method, List-All, that focuses on all the effective drugs instead of the top effective drug, unlike List-One. We also provide an exhaustive empirical evaluation with state-of-the-art regression and ranking baselines on large-scale data sets across multiple experimental settings. Our results demonstrate that our proposed ranking methods mostly outperform the best baselines with significant improvements of as much as 25.6% in terms of selecting truly effective drugs within the top 20 predicted drugs (i.e., hit@20) across 50% test cell lines. Furthermore, our analyses suggest that the learned latent spaces from our proposed methods demonstrate informative clustering structures and capture relevant underlying biological features. Moreover, our comprehensive evaluation provides a thorough and objective comparison of the performance of different methods (including our proposed ones).
Collapse
Affiliation(s)
- Vishal Dey
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
| | - Xia Ning
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
- Biomedical
Informatics, The Ohio State University, Columbus, Ohio 43210, United States
- Translational
Data Analytics Institute, The Ohio State
University, Columbus, Ohio 43210, United States
| |
Collapse
|
10
|
Wu G, Zaker A, Ebrahimi A, Tripathi S, Mer AS. Text-mining-based feature selection for anticancer drug response prediction. BIOINFORMATICS ADVANCES 2024; 4:vbae047. [PMID: 38606185 PMCID: PMC11009020 DOI: 10.1093/bioadv/vbae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 03/09/2024] [Accepted: 03/22/2024] [Indexed: 04/13/2024]
Abstract
Motivation Predicting anticancer treatment response from baseline genomic data is a critical obstacle in personalized medicine. Machine learning methods are commonly used for predicting drug response from gene expression data. In the process of constructing these machine learning models, one of the most significant challenges is identifying appropriate features among a massive number of genes. Results In this study, we utilize features (genes) extracted using the text-mining of scientific literatures. Using two independent cancer pharmacogenomic datasets, we demonstrate that text-mining-based features outperform traditional feature selection techniques in machine learning tasks. In addition, our analysis reveals that text-mining feature-based machine learning models trained on in vitro data also perform well when predicting the response of in vivo cancer models. Our results demonstrate that text-mining-based feature selection is an easy to implement approach that is suitable for building machine learning models for anticancer drug response prediction. Availability and implementation https://github.com/merlab/text_features.
Collapse
Affiliation(s)
- Grace Wu
- Division of Engineering Science, University of Toronto, Toronto, M5S2E4, Canada
| | - Arvin Zaker
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
- Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, K1H8M5, Canada
| | - Amirhosein Ebrahimi
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
| | - Shivanshi Tripathi
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
- Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, K1H8M5, Canada
| | - Arvind Singh Mer
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
- Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, K1H8M5, Canada
- School of Electrical Engineering & Computer Science, University of Ottawa, Ottawa, K1N6N5, Canada
| |
Collapse
|
11
|
Qin Y, Huo M, Liu X, Li SC. Biomarkers and computational models for predicting efficacy to tumor ICI immunotherapy. Front Immunol 2024; 15:1368749. [PMID: 38524135 PMCID: PMC10957591 DOI: 10.3389/fimmu.2024.1368749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 02/27/2024] [Indexed: 03/26/2024] Open
Abstract
Numerous studies have shown that immune checkpoint inhibitor (ICI) immunotherapy has great potential as a cancer treatment, leading to significant clinical improvements in numerous cases. However, it benefits a minority of patients, underscoring the importance of discovering reliable biomarkers that can be used to screen for potential beneficiaries and ultimately reduce the risk of overtreatment. Our comprehensive review focuses on the latest advancements in predictive biomarkers for ICI therapy, particularly emphasizing those that enhance the efficacy of programmed cell death protein 1 (PD-1)/programmed cell death-ligand 1 (PD-L1) inhibitors and cytotoxic T-lymphocyte antigen-4 (CTLA-4) inhibitors immunotherapies. We explore biomarkers derived from various sources, including tumor cells, the tumor immune microenvironment (TIME), body fluids, gut microbes, and metabolites. Among them, tumor cells-derived biomarkers include tumor mutational burden (TMB) biomarker, tumor neoantigen burden (TNB) biomarker, microsatellite instability (MSI) biomarker, PD-L1 expression biomarker, mutated gene biomarkers in pathways, and epigenetic biomarkers. TIME-derived biomarkers include immune landscape of TIME biomarkers, inhibitory checkpoints biomarkers, and immune repertoire biomarkers. We also discuss various techniques used to detect and assess these biomarkers, detailing their respective datasets, strengths, weaknesses, and evaluative metrics. Furthermore, we present a comprehensive review of computer models for predicting the response to ICI therapy. The computer models include knowledge-based mechanistic models and data-based machine learning (ML) models. Among the knowledge-based mechanistic models are pharmacokinetic/pharmacodynamic (PK/PD) models, partial differential equation (PDE) models, signal networks-based models, quantitative systems pharmacology (QSP) models, and agent-based models (ABMs). ML models include linear regression models, logistic regression models, support vector machine (SVM)/random forest/extra trees/k-nearest neighbors (KNN) models, artificial neural network (ANN) and deep learning models. Additionally, there are hybrid models of systems biology and ML. We summarized the details of these models, outlining the datasets they utilize, their evaluation methods/metrics, and their respective strengths and limitations. By summarizing the major advances in the research on predictive biomarkers and computer models for the therapeutic effect and clinical utility of tumor ICI, we aim to assist researchers in choosing appropriate biomarkers or computer models for research exploration and help clinicians conduct precision medicine by selecting the best biomarkers.
Collapse
Affiliation(s)
- Yurong Qin
- Department of Computer Science, City University of Hong Kong, Kowloon, China
- City University of Hong Kong Shenzhen Research Institute, Shenzhen, Guangdong, China
| | - Miaozhe Huo
- Department of Computer Science, City University of Hong Kong, Kowloon, China
- City University of Hong Kong Shenzhen Research Institute, Shenzhen, Guangdong, China
| | - Xingwu Liu
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Kowloon, China
- City University of Hong Kong Shenzhen Research Institute, Shenzhen, Guangdong, China
| |
Collapse
|
12
|
Vasanthakumari P, Zhu Y, Brettin T, Partin A, Shukla M, Xia F, Narykov O, Weil MR, Stevens RL. A Comprehensive Investigation of Active Learning Strategies for Conducting Anti-Cancer Drug Screening. Cancers (Basel) 2024; 16:530. [PMID: 38339281 PMCID: PMC10854925 DOI: 10.3390/cancers16030530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/12/2024] [Accepted: 01/22/2024] [Indexed: 02/12/2024] Open
Abstract
It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.
Collapse
Affiliation(s)
- Priyanka Vasanthakumari
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
| | - Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Maulik Shukla
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Fangfang Xia
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Michael Ryan Weil
- Cancer Research Technology Program, Cancer Data Science Initiatives, Frederick National Laboratory for Cancer Research, Frederick, MD 21701, USA;
| | - Rick L. Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
13
|
Bayer FP, Gander M, Kuster B, The M. CurveCurator: a recalibrated F-statistic to assess, classify, and explore significance of dose-response curves. Nat Commun 2023; 14:7902. [PMID: 38036588 PMCID: PMC10689459 DOI: 10.1038/s41467-023-43696-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/16/2023] [Indexed: 12/02/2023] Open
Abstract
Dose-response curves are key metrics in pharmacology and biology to assess phenotypic or molecular actions of bioactive compounds in a quantitative fashion. Yet, it is often unclear whether or not a measured response significantly differs from a curve without regulation, particularly in high-throughput applications or unstable assays. Treating potency and effect size estimates from random and true curves with the same level of confidence can lead to incorrect hypotheses and issues in training machine learning models. Here, we present CurveCurator, an open-source software that provides reliable dose-response characteristics by computing p-values and false discovery rates based on a recalibrated F-statistic and a target-decoy procedure that considers dataset-specific effect size distributions. The application of CurveCurator to three large-scale datasets enables a systematic drug mode of action analysis and demonstrates its scalable utility across several application areas, facilitated by a performant, interactive dashboard for fast data exploration.
Collapse
Affiliation(s)
- Florian P Bayer
- Proteomics and Bioanalytics, School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Manuel Gander
- Proteomics and Bioanalytics, School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Bernhard Kuster
- Proteomics and Bioanalytics, School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
- German Cancer Consortium (DKTK), Partner Site Munich, 80336, Munich, Germany
| | - Matthew The
- Proteomics and Bioanalytics, School of Life Sciences, Technical University of Munich, 85354, Freising, Germany.
| |
Collapse
|
14
|
Piochi LF, Preto AJ, Moreira IS. DELFOS-drug efficacy leveraging forked and specialized networks-benchmarking scRNA-seq data in multi-omics-based prediction of cancer sensitivity. Bioinformatics 2023; 39:btad645. [PMID: 37862234 PMCID: PMC10627353 DOI: 10.1093/bioinformatics/btad645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 09/28/2023] [Accepted: 10/19/2023] [Indexed: 10/22/2023] Open
Abstract
MOTIVATION Cancer is currently one of the most notorious diseases, with over 1 million deaths in the European Union alone in 2022. As each tumor can be composed of diverse cell types with distinct genotypes, cancer cells can acquire resistance to different compounds. Moreover, anticancer drugs can display severe side effects, compromising patient well-being. Therefore, novel strategies for identifying the optimal set of compounds to treat each tumor have become an important research topic in recent decades. RESULTS To address this challenge, we developed a novel drug response prediction algorithm called Drug Efficacy Leveraging Forked and Specialized networks (DELFOS). Our model learns from multi-omics data from over 65 cancer cell lines, as well as structural data from over 200 compounds, for the prediction of drug sensitivity. We also evaluated the benefits of incorporating single-cell expression data to predict drug response. DELFOS was validated using datasets with unseen cell lines or drugs and compared with other state-of-the-art algorithms, achieving a high prediction performance on several correlation and error metrics. Overall, DELFOS can effectively leverage multi-omics data for the prediction of drug responses in thousands of drug-cell line pairs. AVAILABILITY AND IMPLEMENTATION The DELFOS pipeline and associated data are available at github.com/MoreiraLAB/delfos.
Collapse
Affiliation(s)
- Luiz Felipe Piochi
- Department of Life Sciences, University of Coimbra, Coimbra 3000-456, Portugal
- CNC—Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
- CIBB—Center for Innovative Biomedicine and Biotechnology, Coimbra 3004-504, Portugal
| | - António J Preto
- CNC—Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
- CIBB—Center for Innovative Biomedicine and Biotechnology, Coimbra 3004-504, Portugal
- PhD Programme in Experimental Biology and Biomedicine, Institute for Interdisciplinary Research (IIIUC), University of Coimbra, Coimbra 3030-789, Portugal
| | - Irina S Moreira
- Department of Life Sciences, University of Coimbra, Coimbra 3000-456, Portugal
- CNC—Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
- CIBB—Center for Innovative Biomedicine and Biotechnology, Coimbra 3004-504, Portugal
| |
Collapse
|
15
|
Flanary VL, Fisher JL, Wilk EJ, Howton TC, Lasseigne BN. Computational Advancements in Cancer Combination Therapy Prediction. JCO Precis Oncol 2023; 7:e2300261. [PMID: 37824797 DOI: 10.1200/po.23.00261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 07/20/2023] [Accepted: 08/15/2023] [Indexed: 10/14/2023] Open
Abstract
Given the high attrition rate of de novo drug discovery and limited efficacy of single-agent therapies in cancer treatment, combination therapy prediction through in silico drug repurposing has risen as a time- and cost-effective alternative for identifying novel and potentially efficacious therapies for cancer. The purpose of this review is to provide an introduction to computational methods for cancer combination therapy prediction and to summarize recent studies that implement each of these methods. A systematic search of the PubMed database was performed, focusing on studies published within the past 10 years. Our search included reviews and articles of ongoing and retrospective studies. We prioritized articles with findings that suggest considerations for improving combination therapy prediction methods over providing a meta-analysis of all currently available cancer combination therapy prediction methods. Computational methods used for drug combination therapy prediction in cancer research include networks, regression-based machine learning, classifier machine learning models, and deep learning approaches. Each method class has its own advantages and disadvantages, so careful consideration is needed to determine the most suitable class when designing a combination therapy prediction method. Future directions to improve current combination therapy prediction technology include incorporation of disease pathobiology, drug characteristics, patient multiomics data, and drug-drug interactions to determine maximally efficacious and tolerable drug regimens for cancer. As computational methods improve in their capability to integrate patient, drug, and disease data, more comprehensive models can be developed to more accurately predict safe and efficacious combination drug therapies for cancer and other complex diseases.
Collapse
Affiliation(s)
- Victoria L Flanary
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| | - Jennifer L Fisher
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| | - Elizabeth J Wilk
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| | - Timothy C Howton
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| | - Brittany N Lasseigne
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| |
Collapse
|
16
|
Partin A, Brettin TS, Zhu Y, Narykov O, Clyde A, Overbeek J, Stevens RL. Deep learning methods for drug response prediction in cancer: Predominant and emerging trends. Front Med (Lausanne) 2023; 10:1086097. [PMID: 36873878 PMCID: PMC9975164 DOI: 10.3389/fmed.2023.1086097] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/23/2023] [Indexed: 02/17/2023] Open
Abstract
Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Thomas S. Brettin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Austin Clyde
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Jamie Overbeek
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Rick L. Stevens
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, The University of Chicago, Chicago, IL, United States
| |
Collapse
|
17
|
Shen B, Feng F, Li K, Lin P, Ma L, Li H. A systematic assessment of deep learning methods for drug response prediction: from in vitro to clinical applications. Brief Bioinform 2023; 24:6961794. [PMID: 36575826 DOI: 10.1093/bib/bbac605] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/30/2022] [Accepted: 12/09/2022] [Indexed: 12/29/2022] Open
Abstract
Drug response prediction is an important problem in personalized cancer therapy. Among various newly developed models, significant improvement in prediction performance has been reported using deep learning methods. However, systematic comparisons of deep learning methods, especially of the transferability from preclinical models to clinical cohorts, are currently lacking. To provide a more rigorous assessment, the performance of six representative deep learning methods for drug response prediction using nine evaluation metrics, including the overall prediction accuracy, predictability of each drug, potential associated factors and transferability to clinical cohorts, in multiple application scenarios was benchmarked. Most methods show promising prediction within cell line datasets, and TGSA, with its lower time cost and better performance, is recommended. Although the performance metrics decrease when applying models trained on cell lines to patients, a certain amount of power to distinguish clinical response on some drugs can be maintained using CRDNN and TGSA. With these assessments, we provide a guidance for researchers to choose appropriate methods, as well as insights into future directions for the development of more effective methods in clinical scenarios.
Collapse
Affiliation(s)
- Bihan Shen
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Fangyoumin Feng
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Kunshi Li
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Ping Lin
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Liangxiao Ma
- Bio-Med Big Data Center at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Hong Li
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
18
|
Antiproliferative Activity Predictor: A New Reliable In Silico Tool for Drug Response Prediction against NCI60 Panel. Int J Mol Sci 2022; 23:ijms232214374. [PMID: 36430850 PMCID: PMC9694168 DOI: 10.3390/ijms232214374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 11/13/2022] [Accepted: 11/16/2022] [Indexed: 11/22/2022] Open
Abstract
In vitro antiproliferative assays still represent one of the most important tools in the anticancer drug discovery field, especially to gain insights into the mechanisms of action of anticancer small molecules. The NCI-DTP (National Cancer Institute Developmental Therapeutics Program) undoubtedly represents the most famous project aimed at rapidly testing thousands of compounds against multiple tumor cell lines (NCI60). The large amount of biological data stored in the National Cancer Institute (NCI) database and many other databases has led researchers in the fields of computational biology and medicinal chemistry to develop tools to predict the anticancer properties of new agents in advance. In this work, based on the available antiproliferative data collected by the NCI and the manipulation of molecular descriptors, we propose the new in silico Antiproliferative Activity Predictor (AAP) tool to calculate the GI50 values of input structures against the NCI60 panel. This ligand-based protocol, validated by both internal and external sets of structures, has proven to be highly reliable and robust. The obtained GI50 values of a test set of 99 structures present an error of less than ±1 unit. The AAP is more powerful for GI50 calculation in the range of 4-6, showing that the results strictly correlate with the experimental data. The encouraging results were further supported by the examination of an in-house database of curcumin analogues that have already been studied as antiproliferative agents. The AAP tool identified several potentially active compounds, and a subsequent evaluation of a set of molecules selected by the NCI for the one-dose/five-dose antiproliferative assays confirmed the great potential of our protocol for the development of new anticancer small molecules. The integration of the AAP tool in the free web service DRUDIT provides an interesting device for the discovery and/or optimization of anticancer drugs to the medicinal chemistry community. The training set will be updated with new NCI-tested compounds to cover more chemical spaces, activities, and cell lines. Currently, the same protocol is being developed for predicting the TGI (total growth inhibition) and LC50 (median lethal concentration) parameters to estimate toxicity profiles of small molecules.
Collapse
|
19
|
Shin J, Piao Y, Bang D, Kim S, Jo K. DRPreter: Interpretable Anticancer Drug Response Prediction Using Knowledge-Guided Graph Neural Networks and Transformer. Int J Mol Sci 2022; 23:13919. [PMID: 36430395 PMCID: PMC9699175 DOI: 10.3390/ijms232213919] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/27/2022] [Accepted: 11/08/2022] [Indexed: 11/16/2022] Open
Abstract
Some of the recent studies on drug sensitivity prediction have applied graph neural networks to leverage prior knowledge on the drug structure or gene network, and other studies have focused on the interpretability of the model to delineate the mechanism governing the drug response. However, it is crucial to make a prediction model that is both knowledge-guided and interpretable, so that the prediction accuracy is improved and practical use of the model can be enhanced. We propose an interpretable model called DRPreter (drug response predictor and interpreter) that predicts the anticancer drug response. DRPreter learns cell line and drug information with graph neural networks; the cell-line graph is further divided into multiple subgraphs with domain knowledge on biological pathways. A type-aware transformer in DRPreter helps detect relationships between pathways and a drug, highlighting important pathways that are involved in the drug response. Extensive experiments on the GDSC (Genomics of Drug Sensitivity and Cancer) dataset demonstrate that the proposed method outperforms state-of-the-art graph-based models for drug response prediction. In addition, DRPreter detected putative key genes and pathways for specific drug-cell-line pairs with supporting evidence in the literature, implying that our model can help interpret the mechanism of action of the drug.
Collapse
Affiliation(s)
- Jihye Shin
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Yinhua Piao
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
| | - Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- AIGENDRUG Co., Ltd., Seoul 08826, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul 08826, Korea
- MOGAM Institute for Biomedical Research, Yongin-si 16924, Korea
| | - Kyuri Jo
- Department of Computer Engineering, Chungbuk National University, Cheongju 28644, Korea
| |
Collapse
|
20
|
Cheng X, Dai C, Wen Y, Wang X, Bo X, He S, Peng S. NeRD: a multichannel neural network to predict cellular response of drugs by integrating multidimensional data. BMC Med 2022; 20:368. [PMID: 36244991 PMCID: PMC9575288 DOI: 10.1186/s12916-022-02549-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Accepted: 09/01/2022] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Considering the heterogeneity of tumors, it is a key issue in precision medicine to predict the drug response of each individual. The accumulation of various types of drug informatics and multi-omics data facilitates the development of efficient models for drug response prediction. However, the selection of high-quality data sources and the design of suitable methods remain a challenge. METHODS In this paper, we design NeRD, a multidimensional data integration model based on the PRISM drug response database, to predict the cellular response of drugs. Four feature extractors, including drug structure extractor (DSE), molecular fingerprint extractor (MFE), miRNA expression extractor (mEE), and copy number extractor (CNE), are designed for different types and dimensions of data. A fully connected network is used to fuse all features and make predictions. RESULTS Experimental results demonstrate the effective integration of the global and local structural features of drugs, as well as the features of cell lines from different omics data. For all metrics tested on the PRISM database, NeRD surpassed previous approaches. We also verified that NeRD has strong reliability in the prediction results of new samples. Moreover, unlike other algorithms, when the amount of training data was reduced, NeRD maintained stable performance. CONCLUSIONS NeRD's feature fusion provides a new idea for drug response prediction, which is of great significance for precise cancer treatment.
Collapse
Affiliation(s)
- Xiaoxiao Cheng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Chong Dai
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China.,Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China
| | - Yuqi Wen
- Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China
| | - Xiaoqi Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Song He
- Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China. .,The State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University, Changsha, China.
| |
Collapse
|