1
|
Hu X, Zhang P, Zhang J, Deng L. DeepFusionCDR: Employing Multi-Omics Integration and Molecule-Specific Transformers for Enhanced Prediction of Cancer Drug Responses. IEEE J Biomed Health Inform 2024; 28:6248-6258. [PMID: 38935469 DOI: 10.1109/jbhi.2024.3417014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]
Abstract
Deep learning approaches have demonstrated remarkable potential in predicting cancer drug responses (CDRs), using cell line and drug features. However, existing methods predominantly rely on single-omics data of cell lines, potentially overlooking the complex biological mechanisms governing cell line responses. This paper introduces DeepFusionCDR, a novel approach employing unsupervised contrastive learning to amalgamate multi-omics features, including mutation, transcriptome, methylome, and copy number variation data, from cell lines. Furthermore, we incorporate molecular SMILES-specific transformers to derive drug features from their chemical structures. The unified multi-omics and drug signatures are combined, and a multi-layer perceptron (MLP) is applied to predict IC50 values for cell line-drug pairs. Moreover, this MLP can discern whether a cell line is resistant or sensitive to a particular drug. We assessed DeepFusionCDR's performance on the GDSC dataset and juxtaposed it against cutting-edge methods, demonstrating its superior performance in regression and classification tasks. We also conducted ablation studies and case analyses to exhibit the effectiveness and versatility of our proposed approach. Our results underscore the potential of DeepFusionCDR to enhance CDR predictions by harnessing the power of multi-omics fusion and molecular-specific transformers. The prediction of DeepFusionCDR on TCGA patient data and case study highlight the practical application scenarios of DeepFusionCDR in real-world environments.
Collapse
|
2
|
Saranya KR, Vimina ER. DRN-CDR: A cancer drug response prediction model using multi-omics and drug features. Comput Biol Chem 2024; 112:108175. [PMID: 39191166 DOI: 10.1016/j.compbiolchem.2024.108175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 08/09/2024] [Accepted: 08/14/2024] [Indexed: 08/29/2024]
Abstract
Cancer drug response (CDR) prediction is an important area of research that aims to personalize cancer therapy, optimizing treatment plans for maximum effectiveness while minimizing potential negative effects. Despite the advancements in Deep learning techniques, the effective integration of multi-omics data for drug response prediction remains challenging. In this paper, a regression method using Deep ResNet for CDR (DRN-CDR) prediction is proposed. We aim to explore the potential of considering sole cancer genes in drug response prediction. Here the multi-omics data such as gene expressions, mutation data, and methylation data along with the molecular structural information of drugs were integrated to predict the IC50 values of drugs. Drug features are extracted by employing a Uniform Graph Convolution Network, while Cell line features are extracted using a combination of Convolutional Neural Network and Fully Connected Networks. These features are then concatenated and fed into a deep ResNet for the prediction of IC50 values between Drug - Cell line pairs. The proposed method yielded higher Pearson's correlation coefficient (rp) of 0.7938 with lowest Root Mean Squared Error (RMSE) value of 0.92 when compared with similar methods of tCNNS, MOLI, DeepCDR, TGSA, NIHGCN, DeepTTA, GraTransDRP and TSGCNN. Further, when the model is extended to a classification problem to categorize drugs as sensitive or resistant, we achieved AUC and AUPR measures of 0.7623 and 0.7691, respectively. The drugs such as Tivozanib, SNX-2112, CGP-60474, PHA-665752, Foretinib etc., exhibited low median IC50 values and were found to be effective anti-cancer drugs. The case studies with different TCGA cancer types also revealed the effectiveness of SNX-2112, CGP-60474, Foretinib, Cisplatin, Vinblastine etc. This consistent pattern strongly suggests the effectiveness of the model in predicting CDR.
Collapse
Affiliation(s)
- K R Saranya
- Department of Computer Science and IT, School of Computing, Amrita Vishwa Vidyapeetham, Kochi Campus, India
| | - E R Vimina
- Department of Computer Science and IT, School of Computing, Amrita Vishwa Vidyapeetham, Kochi Campus, India.
| |
Collapse
|
3
|
Li X, Shi X, Li Y, Wang L. MCMVDRP: a multi-channel multi-view deep learning framework for cancer drug response prediction. J Integr Bioinform 2024:jib-2024-0026. [PMID: 39238451 DOI: 10.1515/jib-2024-0026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 07/09/2024] [Indexed: 09/07/2024] Open
Abstract
Drug therapy remains the primary approach to treating tumours. Variability among cancer patients, including variations in genomic profiles, often results in divergent therapeutic responses to analogous anti-cancer drug treatments within the same cohort of cancer patients. Hence, predicting the drug response by analysing the genomic profile characteristics of individual patients holds significant research importance. With the notable progress in machine learning and deep learning, many effective methods have emerged for predicting drug responses utilizing features from both drugs and cell lines. However, these methods are inadequate in capturing a sufficient number of features inherent to drugs. Consequently, we propose a representational approach for drugs that incorporates three distinct types of features: the molecular graph, the SMILE strings, and the molecular fingerprints. In this study, a novel deep learning model, named MCMVDRP, is introduced for the prediction of cancer drug responses. In our proposed model, an amalgamation of these extracted features is performed, followed by the utilization of fully connected layers to predict the drug response based on the IC50 values. Experimental results demonstrate that the presented model outperforms current state-of-the-art models in performance.
Collapse
Affiliation(s)
- Xiangyu Li
- School of Information and Electronics, 47833 Beijing Institute of Technology , Beijing, China
| | - Xiumin Shi
- School of Information and Electronics, 47833 Beijing Institute of Technology , Beijing, China
| | - Yuxuan Li
- School of Information and Electronics, 47833 Beijing Institute of Technology , Beijing, China
| | - Lu Wang
- Department of Critical Care Medicine, Renmin Hospital of Wuhan University, Wuhan, Hubei, China
| |
Collapse
|
4
|
Xu M, Zhu Z, Zhao Y, He K, Huang Q, Zhao Y. RedCDR: Dual Relation Distillation for Cancer Drug Response Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1468-1479. [PMID: 38776197 DOI: 10.1109/tcbb.2024.3404262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2024]
Abstract
Based on multi-omics data and drug information, predicting the response of cancer cell lines to drugs is a crucial area of research in modern oncology, as it can promote the development of personalized treatments. Despite the promising performance achieved by existing models, most of them overlook the variations among different omics and lack effective integration of multi-omics data. Moreover, the explicit modeling of cell line/drug attribute and cell line-drug association has not been thoroughly investigated in existing approaches. To address these issues, we propose RedCDR, a dual relation distillation model for cancer drug response (CDR) prediction. Specifically, a parallel dual-branch architecture is designed to enable both the independent learning and interactive fusion feasible for cell line/drug attribute and cell line-drug association information. To facilitate the adaptive interacting integration of multi-omics data, the proposed multi-omics encoder introduces the multiple similarity relations between cell lines and takes the importance of different omics data into account. To accomplish knowledge transfer from the two independent attribute and association branches to their fusion, a dual relation distillation mechanism consisting of representation distillation and prediction distillation is presented. Experiments conducted on the GDSC and CCLE datasets show that RedCDR outperforms previous state-of-the-art approaches in CDR prediction.
Collapse
|
5
|
Huang Z, Fan Z, Shen S, Wu M, Deng L. MolMVC: Enhancing molecular representations for drug-related tasks through multi-view contrastive learning. Bioinformatics 2024; 40:ii190-ii197. [PMID: 39230706 PMCID: PMC11373324 DOI: 10.1093/bioinformatics/btae386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
MOTIVATION Effective molecular representation is critical in drug development. The complex nature of molecules demands comprehensive multi-view representations, considering 1D, 2D, and 3D aspects, to capture diverse perspectives. Obtaining representations that encompass these varied structures is crucial for a holistic understanding of molecules in drug-related contexts. RESULTS In this study, we introduce an innovative multi-view contrastive learning framework for molecular representation, denoted as MolMVC. Initially, we use a Transformer encoder to capture 1D sequence information and a Graph Transformer to encode the intricate 2D and 3D structural details of molecules. Our approach incorporates a novel attention-guided augmentation scheme, leveraging prior knowledge to create positive samples tailored to different molecular data views. To align multi-view molecular positive samples effectively in latent space, we introduce an adaptive multi-view contrastive loss (AMCLoss). In particular, we calculate AMCLoss at various levels within the model to effectively capture the hierarchical nature of the molecular information. Eventually, we pre-train the encoders via minimizing AMCLoss to obtain the molecular representation, which can be used for various down-stream tasks. In our experiments, we evaluate the performance of our MolMVC on multiple tasks, including molecular property prediction (MPP), drug-target binding affinity (DTA) prediction and cancer drug response (CDR) prediction. The results demonstrate that the molecular representation learned by our MolMVC can enhance the predictive accuracy on these tasks and also reduce the computational costs. Furthermore, we showcase MolMVC's efficacy in drug repositioning across a spectrum of drug-related applications. AVAILABILITY AND IMPLEMENTATION The code and pre-trained model are publicly available at https://github.com/Hhhzj-7/MolMVC.
Collapse
Affiliation(s)
- Zhijian Huang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ziyu Fan
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Siyuan Shen
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Wu
- Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
6
|
Yeh SJ, Paithankar S, Chen R, Xing J, Sun M, Liu K, Zhou J, Chen B. TransCell: In Silico Characterization of Genomic Landscape and Cellular Responses by Deep Transfer Learning. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzad008. [PMID: 39240541 PMCID: PMC11378636 DOI: 10.1093/gpbjnl/qzad008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 06/30/2023] [Accepted: 09/20/2023] [Indexed: 09/07/2024]
Abstract
Gene expression profiling of new or modified cell lines becomes routine today; however, obtaining comprehensive molecular characterization and cellular responses for a variety of cell lines, including those derived from underrepresented groups, is not trivial when resources are minimal. Using gene expression to predict other measurements has been actively explored; however, systematic investigation of its predictive power in various measurements has not been well studied. Here, we evaluated commonly used machine learning methods and presented TransCell, a two-step deep transfer learning framework that utilized the knowledge derived from pan-cancer tumor samples to predict molecular features and responses. Among these models, TransCell had the best performance in predicting metabolite, gene effect score (or genetic dependency), and drug sensitivity, and had comparable performance in predicting mutation, copy number variation, and protein expression. Notably, TransCell improved the performance by over 50% in drug sensitivity prediction and achieved a correlation of 0.7 in gene effect score prediction. Furthermore, predicted drug sensitivities revealed potential repurposing candidates for new 100 pediatric cancer cell lines, and predicted gene effect scores reflected BRAF resistance in melanoma cell lines. Together, we investigated the predictive power of gene expression in six molecular measurement types and developed a web portal (http://apps.octad.org/transcell/) that enables the prediction of 352,000 genomic and cellular response features solely from gene expression profiles.
Collapse
Affiliation(s)
- Shan-Ju Yeh
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA
| | - Shreya Paithankar
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA
| | - Ruoqiao Chen
- Department of Pharmacology and Toxicology, Michigan State University, Grand Rapids, MI 49503, USA
| | - Jing Xing
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA
| | - Mengying Sun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Ke Liu
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA
| | - Jiayu Zhou
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Bin Chen
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA
- Department of Pharmacology and Toxicology, Michigan State University, Grand Rapids, MI 49503, USA
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
7
|
Chen HO, Cui YC, Lin PC, Chiang JH. An Innovative Multi-Omics Model Integrating Latent Alignment and Attention Mechanism for Drug Response Prediction. J Pers Med 2024; 14:694. [PMID: 39063948 PMCID: PMC11277895 DOI: 10.3390/jpm14070694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 06/18/2024] [Accepted: 06/24/2024] [Indexed: 07/28/2024] Open
Abstract
By using omics, we can now examine all components of biological systems simultaneously. Deep learning-based drug prediction methods have shown promise by integrating cancer-related multi-omics data. However, the complex interaction between genes poses challenges in accurately projecting multi-omics data. In this research, we present a predictive model for drug response that incorporates diverse types of omics data, comprising genetic mutation, copy number variation, methylation, and gene expression data. This study proposes latent alignment for information mismatch in integration, which is achieved through an attention module capturing interactions among diverse types of omics data. The latent alignment and attention modules significantly improve predictions, outperforming the baseline model, with MSE = 1.1333, F1-score = 0.5342, and AUROC = 0.5776. High accuracy was achieved in predicting drug responses for piplartine and tenovin-6, while the accuracy was comparatively lower for mitomycin-C and obatoclax. The latent alignment module exclusively outperforms the baseline model, enhancing the MSE by 0.2375, the F1-score by 4.84%, and the AUROC by 6.1%. Similarly, the attention module only improves these metrics by 0.1899, 2.88%, and 2.84%, respectively. In the interpretability case study, panobinostat exhibited the most effective predicted response, with a value of -4.895. We provide reliable insights for drug selection in personalized medicine by identifying crucial genetic factors influencing drug response.
Collapse
Affiliation(s)
- Hui-O Chen
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan 701, Taiwan
- Institute of Medical Informatics, National Cheng Kung University, Tainan 701, Taiwan
| | - Yuan-Chi Cui
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan 701, Taiwan
- Institute of Medical Informatics, National Cheng Kung University, Tainan 701, Taiwan
| | - Peng-Chan Lin
- Department of Oncology, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan 701, Taiwan
- Department of Genomic Medicine, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan 701, Taiwan
| | - Jung-Hsien Chiang
- Department of Oncology, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan 701, Taiwan
- Department of Genomic Medicine, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan 701, Taiwan
| |
Collapse
|
8
|
Abinas V, Abhinav U, Haneem EM, Vishnusankar A, Nazeer KAA. Integration of autoencoder and graph convolutional network for predicting breast cancer drug response. J Bioinform Comput Biol 2024; 22:2450013. [PMID: 39051144 DOI: 10.1142/s0219720024500136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Background and objectives: Breast cancer is the most prevalent type of cancer among women. The effectiveness of anticancer pharmacological therapy may get adversely affected by tumor heterogeneity that includes genetic and transcriptomic features. This leads to clinical variability in patient response to therapeutic drugs. Anticancer drug design and cancer understanding require precise identification of cancer drug responses. The performance of drug response prediction models can be improved by integrating multi-omics data and drug structure data. Methods: In this paper, we propose an Autoencoder (AE) and Graph Convolutional Network (AGCN) for drug response prediction, which integrates multi-omics data and drug structure data. Specifically, we first converted the high dimensional representation of each omic data to a lower dimensional representation using an AE for each omic data set. Subsequently, these individual features are combined with drug structure data obtained using a Graph Convolutional Network and given to a Convolutional Neural Network to calculate IC[Formula: see text] values for every combination of cell lines and drugs. Then a threshold IC[Formula: see text] value is obtained for each drug by performing K-means clustering of their known IC[Formula: see text] values. Finally, with the help of this threshold value, cell lines are classified as either sensitive or resistant to each drug. Results: Experimental results indicate that AGCN has an accuracy of 0.82 and performs better than many existing methods. In addition to that, we have done external validation of AGCN using data taken from The Cancer Genome Atlas (TCGA) clinical database, and we got an accuracy of 0.91. Conclusion: According to the results obtained, concatenating multi-omics data with drug structure data using AGCN for drug response prediction tasks greatly improves the accuracy of the prediction task.
Collapse
Affiliation(s)
- V Abinas
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Calicut, Kerala, India
| | - U Abhinav
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Calicut, Kerala, India
| | - E M Haneem
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Calicut, Kerala, India
| | - A Vishnusankar
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Calicut, Kerala, India
| | - K A Abdul Nazeer
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Calicut, Kerala, India
| |
Collapse
|
9
|
Castilho RM, Castilho LS, Palomares BH, Squarize CH. Determinants of Chromatin Organization in Aging and Cancer-Emerging Opportunities for Epigenetic Therapies and AI Technology. Genes (Basel) 2024; 15:710. [PMID: 38927646 PMCID: PMC11202709 DOI: 10.3390/genes15060710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 05/21/2024] [Accepted: 05/26/2024] [Indexed: 06/28/2024] Open
Abstract
This review article critically examines the pivotal role of chromatin organization in gene regulation, cellular differentiation, disease progression and aging. It explores the dynamic between the euchromatin and heterochromatin, coded by a complex array of histone modifications that orchestrate essential cellular processes. We discuss the pathological impacts of chromatin state misregulation, particularly in cancer and accelerated aging conditions such as progeroid syndromes, and highlight the innovative role of epigenetic therapies and artificial intelligence (AI) in comprehending and harnessing the histone code toward personalized medicine. In the context of aging, this review explores the use of AI and advanced machine learning (ML) algorithms to parse vast biological datasets, leading to the development of predictive models for epigenetic modifications and providing a framework for understanding complex regulatory mechanisms, such as those governing cell identity genes. It supports innovative platforms like CEFCIG for high-accuracy predictions and tools like GridGO for tailored ChIP-Seq analysis, which are vital for deciphering the epigenetic landscape. The review also casts a vision on the prospects of AI and ML in oncology, particularly in the personalization of cancer therapy, including early diagnostics and treatment optimization for diseases like head and neck and colorectal cancers by harnessing computational methods, AI advancements and integrated clinical data for a transformative impact on healthcare outcomes.
Collapse
Affiliation(s)
- Rogerio M. Castilho
- Laboratory of Epithelial Biology, Department of Periodontics and Oral Medicine, School of Dentistry, University of Michigan, Ann Arbor, MI 48109-1078, USA; (L.S.C.); (C.H.S.)
- Rogel Cancer Center, University of Michigan, Ann Arbor, MI 48109-1078, USA
| | - Leonard S. Castilho
- Laboratory of Epithelial Biology, Department of Periodontics and Oral Medicine, School of Dentistry, University of Michigan, Ann Arbor, MI 48109-1078, USA; (L.S.C.); (C.H.S.)
| | - Bruna H. Palomares
- Oral Diagnosis Department, Piracicaba School of Dentistry, State University of Campinas, Piracicaba 13414-903, Sao Paulo, Brazil;
| | - Cristiane H. Squarize
- Laboratory of Epithelial Biology, Department of Periodontics and Oral Medicine, School of Dentistry, University of Michigan, Ann Arbor, MI 48109-1078, USA; (L.S.C.); (C.H.S.)
- Rogel Cancer Center, University of Michigan, Ann Arbor, MI 48109-1078, USA
| |
Collapse
|
10
|
Dey V, Ning X. Improving Anticancer Drug Selection and Prioritization via Neural Learning to Rank. J Chem Inf Model 2024; 64:4071-4088. [PMID: 38740382 PMCID: PMC11134508 DOI: 10.1021/acs.jcim.3c01060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 03/27/2024] [Accepted: 04/16/2024] [Indexed: 05/16/2024]
Abstract
Personalized cancer treatment requires a thorough understanding of complex interactions between drugs and cancer cell lines in varying genetic and molecular contexts. To address this, high-throughput screening has been used to generate large-scale drug response data, facilitating data-driven computational models. Such models can capture complex drug-cell line interactions across various contexts in a fully data-driven manner. However, accurately prioritizing the most effective drugs for each cell line still remains a significant challenge. To address this, we developed multiple neural ranking approaches that leverage large-scale drug response data across multiple cell lines from diverse cancer types. Unlike existing approaches that primarily utilize regression and classification techniques for drug response prediction, we formulated the objective of drug selection and prioritization as a drug ranking problem. In this work, we proposed multiple pairwise and listwise neural ranking methods that learn latent representations of drugs and cell lines and then use those representations to score drugs in each cell line via a learnable scoring function. Specifically, we developed neural pairwise and listwise ranking methods, Pair-PushC and List-One on top of the existing methods, pLETORg and ListNet, respectively. Additionally, we proposed a novel listwise ranking method, List-All, that focuses on all the effective drugs instead of the top effective drug, unlike List-One. We also provide an exhaustive empirical evaluation with state-of-the-art regression and ranking baselines on large-scale data sets across multiple experimental settings. Our results demonstrate that our proposed ranking methods mostly outperform the best baselines with significant improvements of as much as 25.6% in terms of selecting truly effective drugs within the top 20 predicted drugs (i.e., hit@20) across 50% test cell lines. Furthermore, our analyses suggest that the learned latent spaces from our proposed methods demonstrate informative clustering structures and capture relevant underlying biological features. Moreover, our comprehensive evaluation provides a thorough and objective comparison of the performance of different methods (including our proposed ones).
Collapse
Affiliation(s)
- Vishal Dey
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
| | - Xia Ning
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
- Biomedical
Informatics, The Ohio State University, Columbus, Ohio 43210, United States
- Translational
Data Analytics Institute, The Ohio State
University, Columbus, Ohio 43210, United States
| |
Collapse
|
11
|
Li Y, Liu B, Deng J, Guo Y, Du H. Image-based molecular representation learning for drug development: a survey. Brief Bioinform 2024; 25:bbae294. [PMID: 38920347 PMCID: PMC11200195 DOI: 10.1093/bib/bbae294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 05/19/2024] [Accepted: 06/08/2024] [Indexed: 06/27/2024] Open
Abstract
Artificial intelligence (AI) powered drug development has received remarkable attention in recent years. It addresses the limitations of traditional experimental methods that are costly and time-consuming. While there have been many surveys attempting to summarize related research, they only focus on general AI or specific aspects such as natural language processing and graph neural network. Considering the rapid advance on computer vision, using the molecular image to enable AI appears to be a more intuitive and effective approach since each chemical substance has a unique visual representation. In this paper, we provide the first survey on image-based molecular representation for drug development. The survey proposes a taxonomy based on the learning paradigms in computer vision and reviews a large number of corresponding papers, highlighting the contributions of molecular visual representation in drug development. Besides, we discuss the applications, limitations and future directions in the field. We hope this survey could offer valuable insight into the use of image-based molecular representation learning in the context of drug development.
Collapse
Affiliation(s)
- Yue Li
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| | - Bingyan Liu
- School of Computer Science, Beijing University of Posts and Telecommunications, No.10 Xituchen Street, 100876, Beijing, China
| | - Jinyan Deng
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| | - Yi Guo
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| | - Hongbo Du
- Division of Gastroenterology, Dongzhimen Hospital, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
- Institute of Liver Disease, Beijing University of Chinese Medicine, No. 5 Haiyun Warehouse, 100700, Beijing, China
| |
Collapse
|
12
|
Hajim WI, Zainudin S, Mohd Daud K, Alheeti K. Optimized models and deep learning methods for drug response prediction in cancer treatments: a review. PeerJ Comput Sci 2024; 10:e1903. [PMID: 38660174 PMCID: PMC11042005 DOI: 10.7717/peerj-cs.1903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 01/31/2024] [Indexed: 04/26/2024]
Abstract
Recent advancements in deep learning (DL) have played a crucial role in aiding experts to develop personalized healthcare services, particularly in drug response prediction (DRP) for cancer patients. The DL's techniques contribution to this field is significant, and they have proven indispensable in the medical field. This review aims to analyze the diverse effectiveness of various DL models in making these predictions, drawing on research published from 2017 to 2023. We utilized the VOS-Viewer 1.6.18 software to create a word cloud from the titles and abstracts of the selected studies. This study offers insights into the focus areas within DL models used for drug response. The word cloud revealed a strong link between certain keywords and grouped themes, highlighting terms such as deep learning, machine learning, precision medicine, precision oncology, drug response prediction, and personalized medicine. In order to achieve an advance in DRP using DL, the researchers need to work on enhancing the models' generalizability and interoperability. It is also crucial to develop models that not only accurately represent various architectures but also simplify these architectures, balancing the complexity with the predictive capabilities. In the future, researchers should try to combine methods that make DL models easier to understand; this will make DRP reviews more open and help doctors trust the decisions made by DL models in cancer DRP.
Collapse
Affiliation(s)
- Wesam Ibrahim Hajim
- Department of Applied Geology, College of Sciences, Tirkit University, Tikrit, Salah ad Din, Iraq
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia
| | - Suhaila Zainudin
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia
| | - Kauthar Mohd Daud
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia
| | - Khattab Alheeti
- Department of Computer Networking Systems, College of Computer Sciences and Information Technology, University of Anbar, Al Anbar, Ramadi, Iraq
| |
Collapse
|
13
|
Sharma R, Saghapour E, Chen JY. An NLP-based technique to extract meaningful features from drug SMILES. iScience 2024; 27:109127. [PMID: 38455979 PMCID: PMC10918220 DOI: 10.1016/j.isci.2024.109127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 09/30/2023] [Accepted: 02/01/2024] [Indexed: 03/09/2024] Open
Abstract
NLP is a well-established field in ML for developing language models that capture the sequence of words in a sentence. Similarly, drug molecule structures can also be represented as sequences using the SMILES notation. However, unlike natural language texts, special characters in drug SMILES have specific meanings and cannot be ignored. We introduce a novel NLP-based method that extracts interpretable sequences and essential features from drug SMILES notation using N-grams. Our method compares these features to Morgan fingerprint bit-vectors using UMAP-based embedding, and we validate its effectiveness through two personalized drug screening (PSD) case studies. Our NLP-based features are sparse and, when combined with gene expressions and disease phenotype features, produce better ML models for PSD. This approach provides a new way to analyze drug molecule structures represented as SMILES notation, which can help accelerate drug discovery efforts. We have also made our method accessible through a Python library.
Collapse
Affiliation(s)
- Rahul Sharma
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Ehsan Saghapour
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Jake Y. Chen
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
14
|
Lao C, Zheng P, Chen H, Liu Q, An F, Li Z. DeepAEG: a model for predicting cancer drug response based on data enhancement and edge-collaborative update strategies. BMC Bioinformatics 2024; 25:105. [PMID: 38461284 PMCID: PMC10925015 DOI: 10.1186/s12859-024-05723-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Accepted: 02/27/2024] [Indexed: 03/11/2024] Open
Abstract
MOTIVATION The prediction of cancer drug response is a challenging subject in modern personalized cancer therapy due to the uncertainty of drug efficacy and the heterogeneity of patients. It has been shown that the characteristics of the drug itself and the genomic characteristics of the patient can greatly influence the results of cancer drug response. Therefore, accurate, efficient, and comprehensive methods for drug feature extraction and genomics integration are crucial to improve the prediction accuracy. RESULTS Accurate prediction of cancer drug response is vital for guiding the design of anticancer drugs. In this study, we propose an end-to-end deep learning model named DeepAEG which is based on a complete-graph update mode to predict IC50. Specifically, we integrate an edge update mechanism on the basis of a hybrid graph convolutional network to comprehensively learn the potential high-dimensional representation of topological structures in drugs, including atomic characteristics and chemical bond information. Additionally, we present a novel approach for enhancing simplified molecular input line entry specification data by employing sequence recombination to eliminate the defect of single sequence representation of drug molecules. Our extensive experiments show that DeepAEG outperforms other existing methods across multiple evaluation parameters in multiple test sets. Furthermore, we identify several potential anticancer agents, including bortezomib, which has proven to be an effective clinical treatment option. Our results highlight the potential value of DeepAEG in guiding the design of specific cancer treatment regimens.
Collapse
Affiliation(s)
- Chuanqi Lao
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China
| | - Pengfei Zheng
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China
| | - Hongyang Chen
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China.
| | - Qiao Liu
- Department of Statistics, Stanford University, Stanford, Palo Alto, CA, 94305, USA
| | - Feng An
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China
| | - Zhao Li
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China
| |
Collapse
|
15
|
Lin CX, Guan Y, Li HD. Artificial intelligence approaches for molecular representation in drug response prediction. Curr Opin Struct Biol 2024; 84:102747. [PMID: 38091924 DOI: 10.1016/j.sbi.2023.102747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/26/2023] [Accepted: 11/26/2023] [Indexed: 02/09/2024]
Abstract
Drug response prediction is essential for drug development and disease treatment. One key question in predicting drug response is the representation of molecules, which has been greatly advanced by artificial intelligence (AI) techniques in recent years. In this review, we first describe different types of representation methods, pinpointing their key principles and discussing their limitations. Thereafter we discuss potential ways how these methods could be further developed. We expect that this review will provide useful guidance for researchers in the community.
Collapse
Affiliation(s)
- Cui-Xiang Lin
- School of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, Hunan Province, PR China
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
| | - Hong-Dong Li
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, PR China.
| |
Collapse
|
16
|
Vasanthakumari P, Zhu Y, Brettin T, Partin A, Shukla M, Xia F, Narykov O, Weil MR, Stevens RL. A Comprehensive Investigation of Active Learning Strategies for Conducting Anti-Cancer Drug Screening. Cancers (Basel) 2024; 16:530. [PMID: 38339281 PMCID: PMC10854925 DOI: 10.3390/cancers16030530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/12/2024] [Accepted: 01/22/2024] [Indexed: 02/12/2024] Open
Abstract
It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.
Collapse
Affiliation(s)
- Priyanka Vasanthakumari
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
| | - Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Maulik Shukla
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Fangfang Xia
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Michael Ryan Weil
- Cancer Research Technology Program, Cancer Data Science Initiatives, Frederick National Laboratory for Cancer Research, Frederick, MD 21701, USA;
| | - Rick L. Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
17
|
Yang Y, Li P. GPDRP: a multimodal framework for drug response prediction with graph transformer. BMC Bioinformatics 2023; 24:484. [PMID: 38105227 PMCID: PMC10726525 DOI: 10.1186/s12859-023-05618-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 12/13/2023] [Indexed: 12/19/2023] Open
Abstract
BACKGROUND In the field of computational personalized medicine, drug response prediction (DRP) is a critical issue. However, existing studies often characterize drugs as strings, a representation that does not align with the natural description of molecules. Additionally, they ignore gene pathway-specific combinatorial implication. RESULTS In this study, we propose drug Graph and gene Pathway based Drug response prediction method (GPDRP), a new multimodal deep learning model for predicting drug responses based on drug molecular graphs and gene pathway activity. In GPDRP, drugs are represented by molecular graphs, while cell lines are described by gene pathway activity scores. The model separately learns these two types of data using Graph Neural Networks (GNN) with Graph Transformers and deep neural networks. Predictions are subsequently made through fully connected layers. CONCLUSIONS Our results indicate that Graph Transformer-based model delivers superior performance. We apply GPDRP on hundreds of cancer cell lines' bulk RNA-sequencing data, and it outperforms some recently published models. Furthermore, the generalizability and applicability of GPDRP are demonstrated through its predictions on unknown drug-cell line pairs and xenografts. This underscores the interpretability achieved by incorporating gene pathways.
Collapse
Affiliation(s)
- Yingke Yang
- School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang, 471000, China
| | - Peiluan Li
- School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang, 471000, China.
- Longmen Laboratory, Luoyang, 471003, China.
| |
Collapse
|
18
|
Liu Y, Tong S, Chen Y. HMM-GDAN: Hybrid multi-view and multi-scale graph duplex-attention networks for drug response prediction in cancer. Neural Netw 2023; 167:213-222. [PMID: 37660670 DOI: 10.1016/j.neunet.2023.08.036] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 06/01/2023] [Accepted: 08/20/2023] [Indexed: 09/05/2023]
Abstract
Precision medicine is devoted to discovering personalized therapy for complex and difficult diseases like cancer. Many machine learning approaches have been developed for drug response prediction towards precision medicine. Notwithstanding, genetic profiles based multi-view graph learning schemes have not yet been explored for drug response prediction in previous works. Furthermore, multi-scale latent feature fusion is not considered sufficiently in the existing frameworks of graph neural networks (GNNs). Previous works on drug response prediction mainly depend on sequence data or single-view graph data. In this paper, we propose to construct multi-view graph by means of multi-omics data and STRING protein-protein association data, and develop a new architecture of GNNs for drug response prediction in cancer. Specifically, we propose hybrid multi-view and multi-scale graph duplex-attention networks (HMM-GDAN), in which both multi-view self-attention mechanism and view-level attention mechanism are devised to capture the complementary information of views and emphasize on the importance of each view collaboratively, and rich multi-scale features are constructed and integrated to further form high-level representations for better prediction. Experiments on GDSC2 dataset verify the superiority of the proposed HMM-GDAN when compared with state-of-the-art baselines. The effectiveness of multi-view and multi-scale strategies is demonstrated by the ablation study.
Collapse
Affiliation(s)
- Youfa Liu
- College of Informatics, Huazhong Agricultural University, PR China.
| | - Shufan Tong
- College of Informatics, Huazhong Agricultural University, PR China
| | - Yongyong Chen
- School of Computer Science, Harbin Institute of Technology, (Shenzhen), PR China
| |
Collapse
|
19
|
Zhao H, Zhang X, Zhao Q, Li Y, Wang J. MSDRP: a deep learning model based on multisource data for predicting drug response. Bioinformatics 2023; 39:btad514. [PMID: 37606993 PMCID: PMC10474952 DOI: 10.1093/bioinformatics/btad514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 07/30/2023] [Accepted: 08/21/2023] [Indexed: 08/23/2023] Open
Abstract
MOTIVATION Cancer heterogeneity drastically affects cancer therapeutic outcomes. Predicting drug response in vitro is expected to help formulate personalized therapy regimens. In recent years, several computational models based on machine learning and deep learning have been proposed to predict drug response in vitro. However, most of these methods capture drug features based on a single drug description (e.g. drug structure), without considering the relationships between drugs and biological entities (e.g. target, diseases, and side effects). Moreover, most of these methods collect features separately for drugs and cell lines but fail to consider the pairwise interactions between drugs and cell lines. RESULTS In this paper, we propose a deep learning framework, named MSDRP for drug response prediction. MSDRP uses an interaction module to capture interactions between drugs and cell lines, and integrates multiple associations/interactions between drugs and biological entities through similarity network fusion algorithms, outperforming some state-of-the-art models in all performance measures for all experiments. The experimental results of de novo test and independent test demonstrate the excellent performance of our model for new drugs. Furthermore, several case studies illustrate the rationality for using feature vectors derived from drug similarity matrices from multisource data to represent drugs and the interpretability of our model. AVAILABILITY AND IMPLEMENTATION The codes of MSDRP are available at https://github.com/xyzhang-10/MSDRP.
Collapse
Affiliation(s)
- Haochen Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Xiaoyu Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Qichang Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529-0001, United States
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
20
|
Oloulade BM, Gao J, Chen J, Al-Sabri R, Wu Z. Cancer drug response prediction with surrogate modeling-based graph neural architecture search. Bioinformatics 2023; 39:btad478. [PMID: 37555809 PMCID: PMC10432359 DOI: 10.1093/bioinformatics/btad478] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 06/01/2023] [Accepted: 08/08/2023] [Indexed: 08/10/2023] Open
Abstract
MOTIVATION Understanding drug-response differences in cancer treatments is one of the most challenging aspects of personalized medicine. Recently, graph neural networks (GNNs) have become state-of-the-art methods in many graph representation learning scenarios in bioinformatics. However, building an optimal handcrafted GNN model for a particular drug sensitivity dataset requires manual design and fine-tuning of the hyperparameters for the GNN model, which is time-consuming and requires expert knowledge. RESULTS In this work, we propose AutoCDRP, a novel framework for automated cancer drug-response predictor using GNNs. Our approach leverages surrogate modeling to efficiently search for the most effective GNN architecture. AutoCDRP uses a surrogate model to predict the performance of GNN architectures sampled from a search space, allowing it to select the optimal architecture based on evaluation performance. Hence, AutoCDRP can efficiently identify the optimal GNN architecture by exploring the performance of all GNN architectures in the search space. Through comprehensive experiments on two benchmark datasets, we demonstrate that the GNN architecture generated by AutoCDRP surpasses state-of-the-art designs. Notably, the optimal GNN architecture identified by AutoCDRP consistently outperforms the best baseline architecture from the first epoch, providing further evidence of its effectiveness. AVAILABILITY AND IMPLEMENTATION https://github.com/BeObm/AutoCDRP.
Collapse
Affiliation(s)
| | - Jianliang Gao
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jiamin Chen
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Raeed Al-Sabri
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Zhenpeng Wu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
21
|
Zhan Y, Guo J, Philip Chen CL, Meng XB. iBT-Net: an incremental broad transformer network for cancer drug response prediction. Brief Bioinform 2023:bbad256. [PMID: 37429577 DOI: 10.1093/bib/bbad256] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 05/30/2023] [Accepted: 06/15/2023] [Indexed: 07/12/2023] Open
Abstract
In modern precision medicine, it is an important research topic to predict cancer drug response. Due to incomplete chemical structures and complex gene features, however, it is an ongoing work to design efficient data-driven methods for predicting drug response. Moreover, since the clinical data cannot be easily obtained all at once, the data-driven methods may require relearning when new data are available, resulting in increased time consumption and cost. To address these issues, an incremental broad Transformer network (iBT-Net) is proposed for cancer drug response prediction. Different from the gene expression features learning from cancer cell lines, structural features are further extracted from drugs by Transformer. Broad learning system is then designed to integrate the learned gene features and structural features of drugs to predict the response. With the capability of incremental learning, the proposed method can further use new data to improve its prediction performance without retraining totally. Experiments and comparison studies demonstrate the effectiveness and superiority of iBT-Net under different experimental configurations and continuous data learning.
Collapse
Affiliation(s)
- Yongkang Zhan
- School of Computer Science & Engineering,South China University of Technology, 510006, China
| | - Jifeng Guo
- School of Computer Science & Engineering,South China University of Technology, 510006, China
| | - C L Philip Chen
- School of Computer Science & Engineering,South China University of Technology, 510006, China
- Brain and Affective Cognitive Research Center, Pazhou Lab, 510335, China
| | - Xian-Bing Meng
- School of Electromechanical Engineering, Guangdong University of Technology, 510006, China
| |
Collapse
|
22
|
Huang Z, Zhang P, Deng L. DeepCoVDR: deep transfer learning with graph transformer and cross-attention for predicting COVID-19 drug response. Bioinformatics 2023; 39:i475-i483. [PMID: 37387168 DOI: 10.1093/bioinformatics/btad244] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The coronavirus disease 2019 (COVID-19) remains a global public health emergency. Although people, especially those with underlying health conditions, could benefit from several approved COVID-19 therapeutics, the development of effective antiviral COVID-19 drugs is still a very urgent problem. Accurate and robust drug response prediction to a new chemical compound is critical for discovering safe and effective COVID-19 therapeutics. RESULTS In this study, we propose DeepCoVDR, a novel COVID-19 drug response prediction method based on deep transfer learning with graph transformer and cross-attention. First, we adopt a graph transformer and feed-forward neural network to mine the drug and cell line information. Then, we use a cross-attention module that calculates the interaction between the drug and cell line. After that, DeepCoVDR combines drug and cell line representation and their interaction features to predict drug response. To solve the problem of SARS-CoV-2 data scarcity, we apply transfer learning and use the SARS-CoV-2 dataset to fine-tune the model pretrained on the cancer dataset. The experiments of regression and classification show that DeepCoVDR outperforms baseline methods. We also evaluate DeepCoVDR on the cancer dataset, and the results indicate that our approach has high performance compared with other state-of-the-art methods. Moreover, we use DeepCoVDR to predict COVID-19 drugs from FDA-approved drugs and demonstrate the effectiveness of DeepCoVDR in identifying novel COVID-19 drugs. AVAILABILITY AND IMPLEMENTATION https://github.com/Hhhzj-7/DeepCoVDR.
Collapse
Affiliation(s)
- Zhijian Huang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Pan Zhang
- Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, Changsha 410083, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
23
|
Baptista D, Ferreira PG, Rocha M. A systematic evaluation of deep learning methods for the prediction of drug synergy in cancer. PLoS Comput Biol 2023; 19:e1010200. [PMID: 36952569 PMCID: PMC10072473 DOI: 10.1371/journal.pcbi.1010200] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 04/04/2023] [Accepted: 02/08/2023] [Indexed: 03/25/2023] Open
Abstract
One of the main obstacles to the successful treatment of cancer is the phenomenon of drug resistance. A common strategy to overcome resistance is the use of combination therapies. However, the space of possibilities is huge and efficient search strategies are required. Machine Learning (ML) can be a useful tool for the discovery of novel, clinically relevant anti-cancer drug combinations. In particular, deep learning (DL) has become a popular choice for modeling drug combination effects. Here, we set out to examine the impact of different methodological choices on the performance of multimodal DL-based drug synergy prediction methods, including the use of different input data types, preprocessing steps and model architectures. Focusing on the NCI ALMANAC dataset, we found that feature selection based on prior biological knowledge has a positive impact-limiting gene expression data to cancer or drug response-specific genes improved performance. Drug features appeared to be more predictive of drug response, with a 41% increase in coefficient of determination (R2) and 26% increase in Spearman correlation relative to a baseline model that used only cell line and drug identifiers. Molecular fingerprint-based drug representations performed slightly better than learned representations-ECFP4 fingerprints increased R2 by 5.3% and Spearman correlation by 2.8% w.r.t the best learned representations. In general, fully connected feature-encoding subnetworks outperformed other architectures. DL outperformed other ML methods by more than 35% (R2) and 14% (Spearman). Additionally, an ensemble combining the top DL and ML models improved performance by about 6.5% (R2) and 4% (Spearman). Using a state-of-the-art interpretability method, we showed that DL models can learn to associate drug and cell line features with drug response in a biologically meaningful way. The strategies explored in this study will help to improve the development of computational methods for the rational design of effective drug combinations for cancer therapy.
Collapse
Affiliation(s)
- Delora Baptista
- CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal
- LABBELS - Associate Laboratory, Braga, Guimarães, Portugal
| | - Pedro G Ferreira
- Department of Computer Science, Faculty of Sciences, University of Porto, Porto, Portugal
- INESC TEC, Porto, Portugal
- Ipatimup - Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- i3s - Instituto de Investigação e Inovação em Saúde da Universidade do Porto, Porto, Portugal
| | - Miguel Rocha
- CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal
- LABBELS - Associate Laboratory, Braga, Guimarães, Portugal
| |
Collapse
|
24
|
Chu T, Nguyen TT, Hai BD, Nguyen QH, Nguyen T. Graph Transformer for Drug Response Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1065-1072. [PMID: 36107906 DOI: 10.1109/tcbb.2022.3206888] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
BACKGROUND Previous models have shown that learning drug features from their graph representation is more efficient than learning from their strings or numeric representations. Furthermore, integrating multi-omics data of cell lines increases the performance of drug response prediction. However, these models have shown drawbacks in extracting drug features from graph representation and incorporating redundancy information from multi-omics data. This paper proposes a deep learning model, GraTransDRP, to better drug representation and reduce information redundancy. First, the Graph transformer was utilized to extract the drug representation more efficiently. Next, Convolutional neural networks were used to learn the mutation, meth, and transcriptomics features. However, the dimension of transcriptomics features was up to 17737. Therefore, KernelPCA was applied to transcriptomics features to reduce the dimension and transform them into a dense presentation before putting them through the CNN model. Finally, drug and omics features were combined to predict a response value by a fully connected network. Experimental results show that our model outperforms some state-of-the-art methods, including GraphDRP and GraOmicDRP.
Collapse
|
25
|
Badwan BA, Liaropoulos G, Kyrodimos E, Skaltsas D, Tsirigos A, Gorgoulis VG. Machine learning approaches to predict drug efficacy and toxicity in oncology. CELL REPORTS METHODS 2023; 3:100413. [PMID: 36936080 PMCID: PMC10014302 DOI: 10.1016/j.crmeth.2023.100413] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
In recent years, there has been a surge of interest in using machine learning algorithms (MLAs) in oncology, particularly for biomedical applications such as drug discovery, drug repurposing, diagnostics, clinical trial design, and pharmaceutical production. MLAs have the potential to provide valuable insights and predictions in these areas by representing both the disease state and the therapeutic agents used to treat it. To fully utilize the capabilities of MLAs in oncology, it is important to understand the fundamental concepts underlying these algorithms and how they can be applied to assess the efficacy and toxicity of therapeutics. In this perspective, we lay out approaches to represent both the disease state and the therapeutic agents used by MLAs to derive novel insights and make relevant predictions.
Collapse
Affiliation(s)
| | | | - Efthymios Kyrodimos
- First ENT Department, Hippocration Hospital, National Kapodistrian University of Athens, Athens, GR 11527, Greece
| | | | - Aristotelis Tsirigos
- Department of Medicine, New York University School of Medicine, New York, NY 10016, USA
- Department of Pathology, New York University School of Medicine, New York, NY 10016, USA
| | - Vassilis G. Gorgoulis
- Intelligencia Inc, New York, NY 10014, USA
- Department of Histology and Embryology, Faculty of Medicine, School of Health Sciences, National Kapodistrian University of Athens, Athens 11527, Greece
- Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
- Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
- Molecular and Clinical Cancer Sciences, Manchester Cancer Research Centre, Manchester Academic Health Sciences Centre, University of Manchester, Manchester M20 4GJ, UK
| |
Collapse
|
26
|
Partin A, Brettin TS, Zhu Y, Narykov O, Clyde A, Overbeek J, Stevens RL. Deep learning methods for drug response prediction in cancer: Predominant and emerging trends. Front Med (Lausanne) 2023; 10:1086097. [PMID: 36873878 PMCID: PMC9975164 DOI: 10.3389/fmed.2023.1086097] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/23/2023] [Indexed: 02/17/2023] Open
Abstract
Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Thomas S. Brettin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Austin Clyde
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Jamie Overbeek
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Rick L. Stevens
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, The University of Chicago, Chicago, IL, United States
| |
Collapse
|
27
|
Wang H, Dai C, Wen Y, Wang X, Liu W, He S, Bo X, Peng S. GADRP: graph convolutional networks and autoencoders for cancer drug response prediction. Brief Bioinform 2023; 24:6865039. [PMID: 36460622 DOI: 10.1093/bib/bbac501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 10/19/2022] [Accepted: 10/22/2022] [Indexed: 12/04/2022] Open
Abstract
Drug response prediction in cancer cell lines is of great significance in personalized medicine. In this study, we propose GADRP, a cancer drug response prediction model based on graph convolutional networks (GCNs) and autoencoders (AEs). We first use a stacked deep AE to extract low-dimensional representations from cell line features, and then construct a sparse drug cell line pair (DCP) network incorporating drug, cell line, and DCP similarity information. Later, initial residual and layer attention-based GCN (ILGCN) that can alleviate over-smoothing problem is utilized to learn DCP features. And finally, fully connected network is employed to make prediction. Benchmarking results demonstrate that GADRP can significantly improve prediction performance on all metrics compared with baselines on five datasets. Particularly, experiments of predictions of unknown DCP responses, drug-cancer tissue associations, and drug-pathway associations illustrate the predictive power of GADRP. All results highlight the effectiveness of GADRP in predicting drug responses, and its potential value in guiding anti-cancer drug selection.
Collapse
Affiliation(s)
- Hong Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Chong Dai
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China.,Department of Bioinformatics, Beijing Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yuqi Wen
- Department of Bioinformatics, Beijing Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaoqi Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Wenjuan Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Song He
- Department of Bioinformatics, Beijing Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Beijing Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China.,The State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University, Changsha 410082, China
| |
Collapse
|
28
|
Shen B, Feng F, Li K, Lin P, Ma L, Li H. A systematic assessment of deep learning methods for drug response prediction: from in vitro to clinical applications. Brief Bioinform 2023; 24:6961794. [PMID: 36575826 DOI: 10.1093/bib/bbac605] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/30/2022] [Accepted: 12/09/2022] [Indexed: 12/29/2022] Open
Abstract
Drug response prediction is an important problem in personalized cancer therapy. Among various newly developed models, significant improvement in prediction performance has been reported using deep learning methods. However, systematic comparisons of deep learning methods, especially of the transferability from preclinical models to clinical cohorts, are currently lacking. To provide a more rigorous assessment, the performance of six representative deep learning methods for drug response prediction using nine evaluation metrics, including the overall prediction accuracy, predictability of each drug, potential associated factors and transferability to clinical cohorts, in multiple application scenarios was benchmarked. Most methods show promising prediction within cell line datasets, and TGSA, with its lower time cost and better performance, is recommended. Although the performance metrics decrease when applying models trained on cell lines to patients, a certain amount of power to distinguish clinical response on some drugs can be maintained using CRDNN and TGSA. With these assessments, we provide a guidance for researchers to choose appropriate methods, as well as insights into future directions for the development of more effective methods in clinical scenarios.
Collapse
Affiliation(s)
- Bihan Shen
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Fangyoumin Feng
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Kunshi Li
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Ping Lin
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Liangxiao Ma
- Bio-Med Big Data Center at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Hong Li
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
29
|
Qi R, Zou Q. Trends and Potential of Machine Learning and Deep Learning in Drug Study at Single-Cell Level. RESEARCH (WASHINGTON, D.C.) 2023; 6:0050. [PMID: 36930772 PMCID: PMC10013796 DOI: 10.34133/research.0050] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 12/27/2022] [Indexed: 01/12/2023]
Abstract
Cancer treatments always face challenging problems, particularly drug resistance due to tumor cell heterogeneity. The existing datasets include the relationship between gene expression and drug sensitivities; however, the majority are based on tissue-level studies. Study drugs at the single-cell level are perspective to overcome minimal residual disease caused by subclonal resistant cancer cells retained after initial curative therapy. Fortunately, machine learning techniques can help us understand how different types of cells respond to different cancer drugs from the perspective of single-cell gene expression. Good modeling using single-cell data and drug response information will not only improve machine learning for cell-drug outcome prediction but also facilitate the discovery of drugs for specific cancer subgroups and specific cancer treatments. In this paper, we review machine learning and deep learning approaches in drug research. By analyzing the application of these methods on cancer cell lines and single-cell data and comparing the technical gap between single-cell sequencing data analysis and single-cell drug sensitivity analysis, we hope to explore the trends and potential of drug research at the single-cell data level and provide more inspiration for drug research at the single-cell level. We anticipate that this review will stimulate the innovative use of machine learning methods to address new challenges in precision medicine more broadly.
Collapse
Affiliation(s)
- Ren Qi
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
30
|
Didachos C, Kintos DP, Fousteris M, Mylonas P, Kanavos A. An Optimized Cloud Computing Method for Extracting Molecular Descriptors. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2023; 1424:247-254. [PMID: 37486501 DOI: 10.1007/978-3-031-31982-2_28] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/25/2023]
Abstract
Extracting molecular descriptors from chemical compounds is an essential preprocessing phase for developing accurate classification models. Supervised machine learning algorithms offer the capability to detect "hidden" patterns that may exist in a large dataset of compounds, which are represented by their molecular descriptors. Assuming that molecules with similar structure tend to share similar physicochemical properties, large chemical libraries can be screened by applying similarity sourcing techniques in order to detect potential bioactive compounds against a molecular target. However, the process of generating these compound features is time-consuming. Our proposed methodology not only employs cloud computing to accelerate the process of extracting molecular descriptors but also introduces an optimized approach to utilize the computational resources in the most efficient way.
Collapse
Affiliation(s)
- Christos Didachos
- Computer Engineering and Informatics Department, University of Patras, Patras, Greece
| | | | | | - Phivos Mylonas
- Department of Informatics, Ionian University, Corfu, Greece
| | - Andreas Kanavos
- Department of Informatics, Ionian University, Corfu, Greece.
| |
Collapse
|
31
|
Lee M, Kim PJ, Joe H, Kim HG. Gene-centric multi-omics integration with convolutional encoders for cancer drug response prediction. Comput Biol Med 2022; 151:106192. [PMID: 36327883 DOI: 10.1016/j.compbiomed.2022.106192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/26/2022] [Accepted: 10/08/2022] [Indexed: 12/27/2022]
Abstract
MOTIVATION Tumor heterogeneity, including genetic and transcriptomic characteristics, can reduce the efficacy of anticancer pharmacological therapy, resulting in clinical variability in patient response to therapeutic medications. Multi-omics integration can allow in silico models to provide an additional perspective on a biological system. METHODS In this study, we propose a gene-centric multi-channel (GCMC) architecture to integrate multi-omics for predicting cancer drug response. GCMC transformed multi-omics profiles into a three-dimensional tensor with an additional dimension for omics types. GCMC's convolutional encoders captures multi-omics profiles for each gene and yields gene-centric features to predict drug responses. RESULTS We evaluated GCMC on various datasets, including The Cancer Genome Atlas (TCGA) patients, patient-derived xenografts (PDX) mice models, and the Genomics of Drug Sensitivity in Cancer (GDSC) cell line datasets. GCMC achieved better performance than baseline models, including single-omics models, in more than 75% of 265 drugs from GDSC cell line datasets. Furthermore, as for the clinical applicability of GCMC, it achieved the best performance on TCGA and PDX datasets in terms of both AUPR and AUC. We also analyzed models' capability of integrating multi-omics profiles by measuring the contribution ratio of omics types. GCMC can incorporate multi-omics profiles in various manners to enhance performance for each drug type. These results suggested that GCMC can improve performance and feature extraction capability by integrating multi-omics profiles in a gene-centric manner.
Collapse
Affiliation(s)
- Munhwan Lee
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| | - Pil-Jong Kim
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| | - Hyunwhan Joe
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| | - Hong-Gee Kim
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| |
Collapse
|
32
|
Wang C, Lye X, Kaalia R, Kumar P, Rajapakse JC. Deep learning and multi-omics approach to predict drug responses in cancer. BMC Bioinformatics 2022; 22:632. [PMID: 36443676 PMCID: PMC9703655 DOI: 10.1186/s12859-022-04964-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 09/25/2022] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Cancers are genetically heterogeneous, so anticancer drugs show varying degrees of effectiveness on patients due to their differing genetic profiles. Knowing patient's responses to numerous cancer drugs are needed for personalized treatment for cancer. By using molecular profiles of cancer cell lines available from Cancer Cell Line Encyclopedia (CCLE) and anticancer drug responses available in the Genomics of Drug Sensitivity in Cancer (GDSC), we will build computational models to predict anticancer drug responses from molecular features. RESULTS We propose a novel deep neural network model that integrates multi-omics data available as gene expressions, copy number variations, gene mutations, reverse phase protein array expressions, and metabolomics expressions, in order to predict cellular responses to known anti-cancer drugs. We employ a novel graph embedding layer that incorporates interactome data as prior information for prediction. Moreover, we propose a novel attention layer that effectively combines different omics features, taking their interactions into account. The network outperformed feedforward neural networks and reported 0.90 for [Formula: see text] values for prediction of drug responses from cancer cell lines data available in CCLE and GDSC. CONCLUSION The outstanding results of our experiments demonstrate that the proposed method is capable of capturing the interactions of genes and proteins, and integrating multi-omics features effectively. Furthermore, both the results of ablation studies and the investigations of the attention layer imply that gene mutation has a greater influence on the prediction of drug responses than other omics data types. Therefore, we conclude that our approach can not only predict the anti-cancer drug response precisely but also provides insights into reaction mechanisms of cancer cell lines and drugs as well.
Collapse
Affiliation(s)
- Conghao Wang
- grid.59025.3b0000 0001 2224 0361School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798 Singapore
| | - Xintong Lye
- grid.59025.3b0000 0001 2224 0361School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798 Singapore
| | - Rama Kaalia
- grid.59025.3b0000 0001 2224 0361School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798 Singapore
| | - Parvin Kumar
- grid.59025.3b0000 0001 2224 0361School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798 Singapore
| | - Jagath C. Rajapakse
- grid.59025.3b0000 0001 2224 0361School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798 Singapore
| |
Collapse
|
33
|
Askr H, Elgeldawi E, Aboul Ella H, Elshaier YAMM, Gomaa MM, Hassanien AE. Deep learning in drug discovery: an integrative review and future challenges. Artif Intell Rev 2022; 56:5975-6037. [PMID: 36415536 PMCID: PMC9669545 DOI: 10.1007/s10462-022-10306-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2022] [Indexed: 11/18/2022]
Abstract
Recently, using artificial intelligence (AI) in drug discovery has received much attention since it significantly shortens the time and cost of developing new drugs. Deep learning (DL)-based approaches are increasingly being used in all stages of drug development as DL technology advances, and drug-related data grows. Therefore, this paper presents a systematic Literature review (SLR) that integrates the recent DL technologies and applications in drug discovery Including, drug-target interactions (DTIs), drug-drug similarity interactions (DDIs), drug sensitivity and responsiveness, and drug-side effect predictions. We present a review of more than 300 articles between 2000 and 2022. The benchmark data sets, the databases, and the evaluation measures are also presented. In addition, this paper provides an overview of how explainable AI (XAI) supports drug discovery problems. The drug dosing optimization and success stories are discussed as well. Finally, digital twining (DT) and open issues are suggested as future research challenges for drug discovery problems. Challenges to be addressed, future research directions are identified, and an extensive bibliography is also included.
Collapse
Affiliation(s)
- Heba Askr
- Faculty of Computers and Artificial Intelligence, University of Sadat City, Sadat City, Egypt
| | - Enas Elgeldawi
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Heba Aboul Ella
- Faculty of Pharmacy and Drug Technology, Chinese University in Egypt (CUE), Cairo, Egypt
| | | | - Mamdouh M. Gomaa
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Aboul Ella Hassanien
- Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, Egypt
| |
Collapse
|
34
|
Shin J, Piao Y, Bang D, Kim S, Jo K. DRPreter: Interpretable Anticancer Drug Response Prediction Using Knowledge-Guided Graph Neural Networks and Transformer. Int J Mol Sci 2022; 23:13919. [PMID: 36430395 PMCID: PMC9699175 DOI: 10.3390/ijms232213919] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/27/2022] [Accepted: 11/08/2022] [Indexed: 11/16/2022] Open
Abstract
Some of the recent studies on drug sensitivity prediction have applied graph neural networks to leverage prior knowledge on the drug structure or gene network, and other studies have focused on the interpretability of the model to delineate the mechanism governing the drug response. However, it is crucial to make a prediction model that is both knowledge-guided and interpretable, so that the prediction accuracy is improved and practical use of the model can be enhanced. We propose an interpretable model called DRPreter (drug response predictor and interpreter) that predicts the anticancer drug response. DRPreter learns cell line and drug information with graph neural networks; the cell-line graph is further divided into multiple subgraphs with domain knowledge on biological pathways. A type-aware transformer in DRPreter helps detect relationships between pathways and a drug, highlighting important pathways that are involved in the drug response. Extensive experiments on the GDSC (Genomics of Drug Sensitivity and Cancer) dataset demonstrate that the proposed method outperforms state-of-the-art graph-based models for drug response prediction. In addition, DRPreter detected putative key genes and pathways for specific drug-cell-line pairs with supporting evidence in the literature, implying that our model can help interpret the mechanism of action of the drug.
Collapse
Affiliation(s)
- Jihye Shin
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Yinhua Piao
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
| | - Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- AIGENDRUG Co., Ltd., Seoul 08826, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul 08826, Korea
- MOGAM Institute for Biomedical Research, Yongin-si 16924, Korea
| | - Kyuri Jo
- Department of Computer Engineering, Chungbuk National University, Cheongju 28644, Korea
| |
Collapse
|
35
|
Cheng X, Dai C, Wen Y, Wang X, Bo X, He S, Peng S. NeRD: a multichannel neural network to predict cellular response of drugs by integrating multidimensional data. BMC Med 2022; 20:368. [PMID: 36244991 PMCID: PMC9575288 DOI: 10.1186/s12916-022-02549-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Accepted: 09/01/2022] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Considering the heterogeneity of tumors, it is a key issue in precision medicine to predict the drug response of each individual. The accumulation of various types of drug informatics and multi-omics data facilitates the development of efficient models for drug response prediction. However, the selection of high-quality data sources and the design of suitable methods remain a challenge. METHODS In this paper, we design NeRD, a multidimensional data integration model based on the PRISM drug response database, to predict the cellular response of drugs. Four feature extractors, including drug structure extractor (DSE), molecular fingerprint extractor (MFE), miRNA expression extractor (mEE), and copy number extractor (CNE), are designed for different types and dimensions of data. A fully connected network is used to fuse all features and make predictions. RESULTS Experimental results demonstrate the effective integration of the global and local structural features of drugs, as well as the features of cell lines from different omics data. For all metrics tested on the PRISM database, NeRD surpassed previous approaches. We also verified that NeRD has strong reliability in the prediction results of new samples. Moreover, unlike other algorithms, when the amount of training data was reduced, NeRD maintained stable performance. CONCLUSIONS NeRD's feature fusion provides a new idea for drug response prediction, which is of great significance for precise cancer treatment.
Collapse
Affiliation(s)
- Xiaoxiao Cheng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Chong Dai
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China.,Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China
| | - Yuqi Wen
- Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China
| | - Xiaoqi Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Song He
- Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China. .,The State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University, Changsha, China.
| |
Collapse
|
36
|
Peng W, Liu H, Dai W, Yu N, Wang J. Predicting cancer drug response using parallel heterogeneous graph convolutional networks with neighborhood interactions. Bioinformatics 2022; 38:4546-4553. [PMID: 35997568 DOI: 10.1093/bioinformatics/btac574] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 07/26/2022] [Accepted: 08/22/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Due to cancer heterogeneity, the therapeutic effect may not be the same when a cohort of patients of the same cancer type receive the same treatment. The anticancer drug response prediction may help develop personalized therapy regimens to increase survival and reduce patients' expenses. Recently, graph neural network-based methods have aroused widespread interest and achieved impressive results on the drug response prediction task. However, most of them apply graph convolution to process cell line-drug bipartite graphs while ignoring the intrinsic differences between cell lines and drug nodes. Moreover, most of these methods aggregate node-wise neighbor features but fail to consider the element-wise interaction between cell lines and drugs. RESULTS This work proposes a neighborhood interaction (NI)-based heterogeneous graph convolution network method, namely NIHGCN, for anticancer drug response prediction in an end-to-end way. Firstly, it constructs a heterogeneous network consisting of drugs, cell lines and the known drug response information. Cell line gene expression and drug molecular fingerprints are linearly transformed and input as node attributes into an interaction model. The interaction module consists of a parallel graph convolution network layer and a NI layer, which aggregates node-level features from their neighbors through graph convolution operation and considers the element-level of interactions with their neighbors in the NI layer. Finally, the drug response predictions are made by calculating the linear correlation coefficients of feature representations of cell lines and drugs. We have conducted extensive experiments to assess the effectiveness of our model on Cancer Drug Sensitivity Data (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets. It has achieved the best performance compared with the state-of-the-art algorithms, especially in predicting drug responses for new cell lines, new drugs and targeted drugs. Furthermore, our model that was well trained on the GDSC dataset can be successfully applied to predict samples of PDX and TCGA, which verified the transferability of our model from cell line in vitro to the datasets in vivo. AVAILABILITY AND IMPLEMENTATION The source code can be obtained from https://github.com/weiba/NIHGCN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, P.R. China
| | - Hancheng Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, P.R. China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, P.R. China
| | - Ning Yu
- Department of Computing Sciences, The College at Brockport, State University of New York, Brockport, NY 14422, USA
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, P. R. China
| |
Collapse
|
37
|
Chen J, Hao L, Qian X, Lin L, Pan Y, Han X. Machine learning models based on immunological genes to predict the response to neoadjuvant therapy in breast cancer patients. Front Immunol 2022; 13:948601. [PMID: 35935976 PMCID: PMC9352856 DOI: 10.3389/fimmu.2022.948601] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 06/29/2022] [Indexed: 12/13/2022] Open
Abstract
Breast cancer (BC) is the most common malignancy worldwide and neoadjuvant therapy (NAT) plays an important role in the treatment of patients with early BC. However, only a subset of BC patients can achieve pathological complete response (pCR) and benefit from NAT. It is therefore necessary to predict the responses to NAT. Although many models to predict the response to NAT based on gene expression determined by the microarray platform have been proposed, their applications in clinical practice are limited due to the data normalization methods during model building and the disadvantages of the microarray platform compared with the RNA-seq platform. In this study, we first reconfirmed the correlation between immune profiles and pCR in an RNA-seq dataset. Then, we employed multiple machine learning algorithms and a model stacking strategy to build an immunological gene based model (Ipredictor model) and an immunological gene and receptor status based model ICpredictor model) in the RNA-seq dataset. The areas under the receiver operator characteristic curves for the Ipredictor model and ICpredictor models were 0.745 and 0.769 in an independent external test set based on the RNA-seq platform, and were 0.716 and 0.752 in another independent external test set based on the microarray platform. Furthermore, we found that the predictive score of the Ipredictor model was correlated with immune microenvironment and genomic aberration markers. These results demonstrated that the models can accurately predict the response to NAT for BC patients and will contribute to individualized therapy.
Collapse
Affiliation(s)
- Jian Chen
- Department of Oncology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Clinical Research Center for Cancer Bioimmunotherapy of Anhui Province, Hefei, China
| | - Li Hao
- Department of Oncology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Clinical Research Center for Cancer Bioimmunotherapy of Anhui Province, Hefei, China
| | - Xiaojun Qian
- Department of Oncology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Clinical Research Center for Cancer Bioimmunotherapy of Anhui Province, Hefei, China
| | - Lin Lin
- Department of Oncology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Clinical Research Center for Cancer Bioimmunotherapy of Anhui Province, Hefei, China
| | - Yueyin Pan
- Department of Oncology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Clinical Research Center for Cancer Bioimmunotherapy of Anhui Province, Hefei, China
| | - Xinghua Han
- Department of Oncology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Clinical Research Center for Cancer Bioimmunotherapy of Anhui Province, Hefei, China
| |
Collapse
|
38
|
Hostallero DE, Li Y, Emad A. Looking at the BiG Picture: Incorporating bipartite graphs in drug response prediction. Bioinformatics 2022; 38:3609-3620. [PMID: 35674359 DOI: 10.1093/bioinformatics/btac383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 04/17/2022] [Accepted: 06/01/2022] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION The increasing number of publicly available databases containing drugs' chemical structures, their response in cell lines, and molecular profiles of the cell lines has garnered attention to the problem of drug response prediction. However, many existing methods do not fully leverage the information that is shared among cell lines and drugs with similar structure. As such, drug similarities in terms of cell line responses and chemical structures could prove to be useful in forming drug representations to improve drug response prediction accuracy. RESULTS We present two deep learning approaches, BiG-DRP and BiG-DRP+, for drug response prediction. Our models take advantage of the drugs' chemical structure and the underlying relationships of drugs and cell lines through a bipartite graph and a heterogenous graph convolutional network that incorporate sensitive and resistant cell line information in forming drug representations. Evaluation of our methods and other state-of-the-art models in different scenarios shows that incorporating this bipartite graph significantly improves the prediction performance. Additionally, genes that contribute significantly to the performance of our models also point to important biological processes and signaling pathways. Analysis of predicted drug response of patients' tumors using our model revealed important associations between mutations and drug sensitivity, illustrating the utility of our model in pharmacogenomics studies. AVAILABILITY AND IMPLEMENTATION An implementation of the algorithms in Python is provided in https://github.com/ddhostallero/BiG-DRP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Earl Hostallero
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC H3A 0E9, Canada
- Mila, Quebec AI Institute, Montreal, QC H2S 3H1, Canada
| | - Yihui Li
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC H3A 0E9, Canada
| | - Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC H3A 0E9, Canada
- Mila, Quebec AI Institute, Montreal, QC H2S 3H1, Canada
- The Rosalind and Morris Goodman Cancer Institute, Montreal, QC H3A 1A3, Canada
| |
Collapse
|
39
|
Park A, Joo M, Kim K, Son WJ, Lim G, Lee J, Kim JH, Lee DH, Nam S. A comprehensive evaluation of regression-based drug responsiveness prediction models, using cell viability inhibitory concentrations (IC50 values). Bioinformatics 2022; 38:2810-2817. [PMID: 35561188 DOI: 10.1093/bioinformatics/btac177] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 03/06/2022] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Predicting drug response is critical for precision medicine. Diverse methods have predicted drug responsiveness, as measured by the half-maximal drug inhibitory concentration (IC50), in cultured cells. Although IC50s are continuous, traditional prediction models have dealt mainly with binary classification of responsiveness. However, since there are few regression-based IC50 predictions, comprehensive evaluations of regression-based IC50 prediction models, including machine learning (ML) and deep learning (DL), for diverse data types and dataset sizes, have not been addressed. RESULTS Here, we constructed 11 input data settings, including multi-omics settings, with varying dataset sizes, then evaluated the performance of regression-based ML and DL models to predict IC50s. DL models considered two convolutional neural network architectures: CDRScan and residual neural network (ResNet). ResNet was introduced in regression-based DL models for predicting drug response for the first time. As a result, DL models performed better than ML models in all the settings. Also, ResNet performed better than or comparable to CDRScan and ML models in all settings. AVAILABILITY AND IMPLEMENTATION The data underlying this article are available in GitHub at https://github.com/labnams/IC50evaluation. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aron Park
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon 21999, Korea
| | - Minjae Joo
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon 21999, Korea
| | | | - Won-Joon Son
- Samsung Advanced Institute of Technology, Samsung Electronics, Suwon, Gyeonggi-do 16678, Korea
| | - GyuTae Lim
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Korea
| | - Jinhyuk Lee
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Korea
- Department of Bioinformatics, University of Sciences and Technology, Daejeon 34113, Korea
| | - Jung Ho Kim
- Department of Internal Medicine, Gachon University Gil Medical Center, Gachon University School of Medicine, Incheon 21565, Korea
| | - Dae Ho Lee
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon 21999, Korea
- Department of Internal Medicine, Gachon University Gil Medical Center, Gachon University School of Medicine, Incheon 21565, Korea
| | - Seungyoon Nam
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon 21999, Korea
- AI Convergence Center for Medical Science, Department of Genome Medicine and Science, Gachon University Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea
- Department of Life Sciences, Gachon University, Seongnam, Gyeonggi-do 13120, Korea
| |
Collapse
|
40
|
Ma T, Liu Q, Li H, Zhou M, Jiang R, Zhang X. DualGCN: a dual graph convolutional network model to predict cancer drug response. BMC Bioinformatics 2022; 23:129. [PMID: 35428192 PMCID: PMC9011932 DOI: 10.1186/s12859-022-04664-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 04/04/2022] [Indexed: 11/11/2022] Open
Abstract
Background Drug resistance is a critical obstacle in cancer therapy. Discovering cancer drug response is important to improve anti-cancer drug treatment and guide anti-cancer drug design. Abundant genomic and drug response resources of cancer cell lines provide unprecedented opportunities for such study. However, cancer cell lines cannot fully reflect heterogeneous tumor microenvironments. Transferring knowledge studied from in vitro cell lines to single-cell and clinical data will be a promising direction to better understand drug resistance. Most current studies include single nucleotide variants (SNV) as features and focus on improving predictive ability of cancer drug response on cell lines. However, obtaining accurate SNVs from clinical tumor samples and single-cell data is not reliable. This makes it difficult to generalize such SNV-based models to clinical tumor data or single-cell level studies in the future. Results We present a new method, DualGCN, a unified Dual Graph Convolutional Network model to predict cancer drug response. DualGCN encodes both chemical structures of drugs and omics data of biological samples using graph convolutional networks. Then the two embeddings are fed into a multilayer perceptron to predict drug response. DualGCN incorporates prior knowledge on cancer-related genes and protein–protein interactions, and outperforms most state-of-the-art methods while avoiding using large-scale SNV data. Conclusions The proposed method outperforms most state-of-the-art methods in predicting cancer drug response without the use of large-scale SNV data. These favorable results indicate its potential to be extended to clinical and single-cell tumor samples and advancements in precision medicine.
Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04664-4.
Collapse
|
41
|
Farina E, Nabhen JJ, Dacoregio MI, Batalini F, Moraes FY. An overview of artificial intelligence in oncology. Future Sci OA 2022; 8:FSO787. [PMID: 35369274 PMCID: PMC8965797 DOI: 10.2144/fsoa-2021-0074] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 01/19/2022] [Indexed: 11/23/2022] Open
Abstract
Cancer is associated with significant morbimortality globally. Advances in screening, diagnosis, management and survivorship were substantial in the last decades, however, challenges in providing personalized and data-oriented care remain. Artificial intelligence (AI), a branch of computer science used for predictions and automation, has emerged as potential solution to improve the healthcare journey and to promote precision in healthcare. AI applications in oncology include, but are not limited to, optimization of cancer research, improvement of clinical practice (eg., prediction of the association of multiple parameters and outcomes - prognosis and response) and better understanding of tumor molecular biology. In this review, we examine the current state of AI in oncology, including fundamentals, current applications, limitations and future perspectives.
Collapse
Affiliation(s)
- Eduardo Farina
- Department of Radiology, Federal University of São Paulo, SP, 04021-001, Brazil; Diagnósticos da America SA (Dasa), 05425-020, Brazil
| | - Jacqueline J Nabhen
- School of Medicine, Federal University of Paraná, Curitiba, PR, 80060-000, Brazil
| | - Maria Inez Dacoregio
- School of Medicine, State University of Centro-Oeste, Guarapuava, PR, 85040-167, Brazil
| | - Felipe Batalini
- Department of Medicine, Division of Medical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
| | - Fabio Y Moraes
- Department of Oncology, Division of Radiation Oncology, Queen's University, Kingston, ON, K7L 3N6, Canada
| |
Collapse
|
42
|
Jiang L, Jiang C, Yu X, Fu R, Jin S, Liu X. DeepTTA: a transformer-based model for predicting cancer drug response. Brief Bioinform 2022; 23:6554594. [PMID: 35348595 DOI: 10.1093/bib/bbac100] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 02/08/2022] [Accepted: 02/27/2022] [Indexed: 12/27/2022] Open
Abstract
Identifying new lead molecules to treat cancer requires more than a decade of dedicated effort. Before selected drug candidates are used in the clinic, their anti-cancer activity is generally validated by in vitro cellular experiments. Therefore, accurate prediction of cancer drug response is a critical and challenging task for anti-cancer drugs design and precision medicine. With the development of pharmacogenomics, the combination of efficient drug feature extraction methods and omics data has made it possible to use computational models to assist in drug response prediction. In this study, we propose DeepTTA, a novel end-to-end deep learning model that utilizes transformer for drug representation learning and a multilayer neural network for transcriptomic data prediction of the anti-cancer drug responses. Specifically, DeepTTA uses transcriptomic gene expression data and chemical substructures of drugs for drug response prediction. Compared to existing methods, DeepTTA achieved higher performance in terms of root mean square error, Pearson correlation coefficient and Spearman's rank correlation coefficient on multiple test sets. Moreover, we discovered that anti-cancer drugs bortezomib and dactinomycin provide a potential therapeutic option with multiple clinical indications. With its excellent performance, DeepTTA is expected to be an effective method in cancer drug design.
Collapse
Affiliation(s)
- Likun Jiang
- Department of Computer Science, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Changzhi Jiang
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Xinyu Yu
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Rao Fu
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Shuting Jin
- Department of Computer Science, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Xiangrong Liu
- Department of Computer Science, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| |
Collapse
|
43
|
Nguyen GTT, Vu HD, Le DH. Integrating Molecular Graph Data of Drugs and Multiple -Omic Data of Cell Lines for Drug Response Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:710-717. [PMID: 34260355 DOI: 10.1109/tcbb.2021.3096960] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Previous studies have either learned drug's features from their string or numeric representations, which are not natural forms of drugs, or only used genomic data of cell lines for the drug response prediction problem. Here, we proposed a deep learning model, GraOmicDRP, to learn drug's features from their graph representation and integrate multiple -omic data of cell lines. In GraOmicDRP, drugs are represented as graphs of bindings among atoms; meanwhile, cell lines are depicted by not only genomic but also transcriptomic and epigenomic data. Graph convolutional and convolutional neural networks were used to learn the representation of drugs and cell lines, respectively. A combination of the two representations was then used to be representative of each pair of drug-cell line. Finally, the response value of each pair was predicted by a fully connected network. Experimental results indicate that transcriptomic data shows the best among single -omic data; meanwhile, the combinations of transcriptomic and other -omic data achieved the best performance overall in terms of both Root Mean Square Error and Pearson correlation coefficient. In addition, we also show that GraOmicDRP outperforms some state-of-the-art methods, including ones integrating -omic data with drug information such as GraphDRP, and ones using -omic data without drug information such as DeepDR and MOLI.
Collapse
|
44
|
Xia F, Allen J, Balaprakash P, Brettin T, Garcia-Cardona C, Clyde A, Cohn J, Doroshow J, Duan X, Dubinkina V, Evrard Y, Fan YJ, Gans J, He S, Lu P, Maslov S, Partin A, Shukla M, Stahlberg E, Wozniak JM, Yoo H, Zaki G, Zhu Y, Stevens R. A cross-study analysis of drug response prediction in cancer cell lines. Brief Bioinform 2022; 23:bbab356. [PMID: 34524425 PMCID: PMC8769697 DOI: 10.1093/bib/bbab356] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 07/26/2021] [Accepted: 08/11/2021] [Indexed: 11/28/2022] Open
Abstract
To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross-validation within a single study to assess model accuracy. While an essential first step, cross-validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: National Cancer Institute 60, ancer Therapeutics Response Portal (CTRP), Genomics of Drug Sensitivity in Cancer, Cancer Cell Line Encyclopedia and Genentech Cell Line Screening Initiative (gCSI). Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening.
Collapse
Affiliation(s)
| | | | | | | | | | - Austin Clyde
- Argonne National Laboratory
- University of Chicago
| | | | | | | | | | | | - Ya Ju Fan
- Lawrence Livermore National Laboratory
| | | | | | - Pinyi Lu
- Frederick National Laboratory for Cancer Research
| | | | | | | | | | | | | | - George Zaki
- Frederick National Laboratory for Cancer Research
| | | | - Rick Stevens
- Argonne National Laboratory
- University of Chicago
| |
Collapse
|
45
|
Firoozbakht F, Yousefi B, Schwikowski B. An overview of machine learning methods for monotherapy drug response prediction. Brief Bioinform 2022; 23:bbab408. [PMID: 34619752 PMCID: PMC8769705 DOI: 10.1093/bib/bbab408] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/25/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
For an increasing number of preclinical samples, both detailed molecular profiles and their responses to various drugs are becoming available. Efforts to understand, and predict, drug responses in a data-driven manner have led to a proliferation of machine learning (ML) methods, with the longer term ambition of predicting clinical drug responses. Here, we provide a uniquely wide and deep systematic review of the rapidly evolving literature on monotherapy drug response prediction, with a systematic characterization and classification that comprises more than 70 ML methods in 13 subclasses, their input and output data types, modes of evaluation, and code and software availability. ML experts are provided with a fundamental understanding of the biological problem, and how ML methods are configured for it. Biologists and biomedical researchers are introduced to the basic principles of applicable ML methods, and their application to the problem of drug response prediction. We also provide systematic overviews of commonly used data sources used for training and evaluation methods.
Collapse
Affiliation(s)
- Farzaneh Firoozbakht
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Behnam Yousefi
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
- Sorbonne Université, École Doctorale Complexite du Vivant, Paris, France
| | - Benno Schwikowski
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
46
|
He S, Zhao D, Ling Y, Cai H, Cai Y, Zhang J, Wang L. Machine Learning Enables Accurate and Rapid Prediction of Active Molecules Against Breast Cancer Cells. Front Pharmacol 2022; 12:796534. [PMID: 34975493 PMCID: PMC8719637 DOI: 10.3389/fphar.2021.796534] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Accepted: 12/02/2021] [Indexed: 12/22/2022] Open
Abstract
Breast cancer (BC) has surpassed lung cancer as the most frequently occurring cancer, and it is the leading cause of cancer-related death in women. Therefore, there is an urgent need to discover or design new drug candidates for BC treatment. In this study, we first collected a series of structurally diverse datasets consisting of 33,757 active and 21,152 inactive compounds for 13 breast cancer cell lines and one normal breast cell line commonly used in in vitro antiproliferative assays. Predictive models were then developed using five conventional machine learning algorithms, including naïve Bayesian, support vector machine, k-Nearest Neighbors, random forest, and extreme gradient boosting, as well as five deep learning algorithms, including deep neural networks, graph convolutional networks, graph attention network, message passing neural networks, and Attentive FP. A total of 476 single models and 112 fusion models were constructed based on three types of molecular representations including molecular descriptors, fingerprints, and graphs. The evaluation results demonstrate that the best model for each BC cell subtype can achieve high predictive accuracy for the test sets with AUC values of 0.689–0.993. Moreover, important structural fragments related to BC cell inhibition were identified and interpreted. To facilitate the use of the model, an online webserver called ChemBC (http://chembc.idruglab.cn/) and its local version software (https://github.com/idruglab/ChemBC) were developed to predict whether compounds have potential inhibitory activity against BC cells.
Collapse
Affiliation(s)
- Shuyun He
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.,Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Duancheng Zhao
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.,Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Yanle Ling
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.,Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Hanxuan Cai
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.,Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Yike Cai
- Center for Certification and Evaluation, Guangdong Drug Administration, Guangzhou, China
| | - Jiquan Zhang
- State Key Laboratory of Functions and Applications of Medicinal Plants, College of Pharmacy, Guizhou Provincial Engineering Technology Research Center for Chemical Drug R&D, Guizhou Medical University, Guiyang, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.,Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
47
|
Zhu Y, Ouyang Z, Chen W, Feng R, Chen DZ, Cao J, Wu J. TGSA: protein-protein association-based twin graph neural networks for drug response prediction with similarity augmentation. Bioinformatics 2022; 38:461-468. [PMID: 34559177 DOI: 10.1093/bioinformatics/btab650] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 08/16/2021] [Accepted: 09/24/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Drug response prediction (DRP) plays an important role in precision medicine (e.g. for cancer analysis and treatment). Recent advances in deep learning algorithms make it possible to predict drug responses accurately based on genetic profiles. However, existing methods ignore the potential relationships among genes. In addition, similarity among cell lines/drugs was rarely considered explicitly. RESULTS We propose a novel DRP framework, called TGSA, to make better use of prior domain knowledge. TGSA consists of Twin Graph neural networks for Drug Response Prediction (TGDRP) and a Similarity Augmentation (SA) module to fuse fine-grained and coarse-grained information. Specifically, TGDRP abstracts cell lines as graphs based on STRING protein-protein association networks and uses Graph Neural Networks (GNNs) for representation learning. SA views DRP as an edge regression problem on a heterogeneous graph and utilizes GNNs to smooth the representations of similar cell lines/drugs. Besides, we introduce an auxiliary pre-training strategy to remedy the identified limitations of scarce data and poor out-of-distribution generalization. Extensive experiments on the GDSC2 dataset demonstrate that our TGSA consistently outperforms all the state-of-the-art baselines under various experimental settings. We further evaluate the effectiveness and contributions of each component of TGSA via ablation experiments. The promising performance of TGSA shows enormous potential for clinical applications in precision medicine. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/violet-sto/TGSA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yiheng Zhu
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310000, China
| | - Zhenqiu Ouyang
- Polytechnic Institute, Zhejiang University, Hangzhou 310000, China
| | - Wenbo Chen
- Polytechnic Institute, Zhejiang University, Hangzhou 310000, China
| | - Ruiwei Feng
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310000, China
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Ji Cao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310000, China
| | - Jian Wu
- Department of Ophthalmology of the Second Affiliated Hospital School of Medicine, and School of Public Health, Zhejiang University, Hangzhou 310000, China
| |
Collapse
|
48
|
Nguyen T, Nguyen GTT, Nguyen T, Le DH. Graph Convolutional Networks for Drug Response Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:146-154. [PMID: 33606633 DOI: 10.1109/tcbb.2021.3060430] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
BACKGROUND Drug response prediction is an important problem in computational personalized medicine. Many machine-learning-based methods, especially deep learning-based ones, have been proposed for this task. However, these methods often represent the drugs as strings, which are not a natural way to depict molecules. Also, interpretation (e.g., what are the mutation or copy number aberration contributing to the drug response) has not been considered thoroughly. METHODS In this study, we propose a novel method, GraphDRP, based on graph convolutional network for the problem. In GraphDRP, drugs were represented in molecular graphs directly capturing the bonds among atoms, meanwhile cell lines were depicted as binary vectors of genomic aberrations. Representative features of drugs and cell lines were learned by convolution layers, then combined to represent for each drug-cell line pair. Finally, the response value of each drug-cell line pair was predicted by a fully-connected neural network. Four variants of graph convolutional networks were used for learning the features of drugs. RESULTS We found that GraphDRP outperforms tCNNS in all performance measures for all experiments. Also, through saliency maps of the resulting GraphDRP models, we discovered the contribution of the genomic aberrations to the responses. CONCLUSION Representing drugs as graphs can improve the performance of drug response prediction. Availability of data and materials: Data and source code can be downloaded athttps://github.com/hauldhut/GraphDRP.
Collapse
|
49
|
Pepe G, Carrino C, Parca L, Helmer-Citterich M. Dissecting the Genome for Drug Response Prediction. Methods Mol Biol 2022; 2449:187-196. [PMID: 35507263 DOI: 10.1007/978-1-0716-2095-3_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The prediction of the cancer cell lines sensitivity to a specific treatment is one of the current challenges in precision medicine. With omics and pharmacogenomics data being available for over 1000 cancer cell lines, several machine learning and deep learning algorithms have been proposed for drug sensitivity prediction. However, deciding which omics data to use and which computational methods can efficiently incorporate data from different sources is the challenge which several research groups are working on. In this review, we summarize recent advances in the representative computational methods that have been developed in the last 2 years on three public datasets: COSMIC, CCLE, NCI-60. These methods aim to improve the prediction of the cancer cell lines sensitivity to a given treatment by incorporating drug's chemical information in the input or using a priori feature selection. Finally, we discuss the latest published method which aims to improve the prediction of clinical drug response of real patients starting from cancer cell line molecular profiles.
Collapse
Affiliation(s)
- Gerardo Pepe
- Department of Biology, Centro di Bioinformatica Molecolare, University of Rome "Tor Vergata", Rome, Italy
| | - Chiara Carrino
- Department of Biology, Centro di Bioinformatica Molecolare, University of Rome "Tor Vergata", Rome, Italy
| | - Luca Parca
- Italian Space Agency, Via del Politecnico snc, Rome, Italy
| | - Manuela Helmer-Citterich
- Department of Biology, Centro di Bioinformatica Molecolare, University of Rome "Tor Vergata", Rome, Italy.
| |
Collapse
|
50
|
Akbari F, Peymani M, Salehzadeh A, Ghaedi K. Identification of modules based on integrative analysis for drug prediction in colorectal cancer. GENE REPORTS 2021. [DOI: 10.1016/j.genrep.2021.101403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|