1
|
Patne AY, Dhulipala SM, Lawless W, Prakash S, Mohapatra SS, Mohapatra S. Drug Discovery in the Age of Artificial Intelligence: Transformative Target-Based Approaches. Int J Mol Sci 2024; 25:12233. [PMID: 39596300 PMCID: PMC11594879 DOI: 10.3390/ijms252212233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 11/01/2024] [Accepted: 11/06/2024] [Indexed: 11/28/2024] Open
Abstract
The complexities inherent in drug development are multi-faceted and often hamper accuracy, speed and efficiency, thereby limiting success. This review explores how recent developments in machine learning (ML) are significantly impacting target-based drug discovery, particularly in small-molecule approaches. The Simplified Molecular Input Line Entry System (SMILES), which translates a chemical compound's three-dimensional structure into a string of symbols, is now widely used in drug design, mining, and repurposing. Utilizing ML and natural language processing techniques, SMILES has revolutionized lead identification, high-throughput screening and virtual screening. ML models enhance the accuracy of predicting binding affinity and selectivity, reducing the need for extensive experimental screening. Additionally, deep learning, with its strengths in analyzing spatial and sequential data through convolutional neural networks (CNNs) and recurrent neural networks (RNNs), shows promise for virtual screening, target identification, and de novo drug design. Fragment-based approaches also benefit from ML algorithms and techniques like generative adversarial networks (GANs), which predict fragment properties and binding affinities, aiding in hit selection and design optimization. Structure-based drug design, which relies on high-resolution protein structures, leverages ML models for accurate predictions of binding interactions. While challenges such as interpretability and data quality remain, ML's transformative impact accelerates target-based drug discovery, increasing efficiency and innovation. Its potential to deliver new and improved treatments for various diseases is significant.
Collapse
Affiliation(s)
- Akshata Yashwant Patne
- Center for Research and Education in Nanobioengineering, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA;
- Taneja College of Pharmacy Graduate Programs, MDC30, 12908 USF Health Drive, Tampa, FL 33612, USA
| | - Sai Madhav Dhulipala
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA; (S.M.D.); (W.L.)
| | - William Lawless
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA; (S.M.D.); (W.L.)
- Research Service, James A. Haley Veterans Hospital, Tampa, FL 33612, USA
| | - Satya Prakash
- Biomedical Technology and Cell Therapy Research Laboratory, Department of Biomedical Engineering, Faculty of Medicine and Health Sciences, McGill University, 3775 University Street, Montreal, QC H3A 2B4, Canada;
| | - Shyam S. Mohapatra
- Center for Research and Education in Nanobioengineering, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA;
- Taneja College of Pharmacy Graduate Programs, MDC30, 12908 USF Health Drive, Tampa, FL 33612, USA
- Research Service, James A. Haley Veterans Hospital, Tampa, FL 33612, USA
| | - Subhra Mohapatra
- Center for Research and Education in Nanobioengineering, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA;
- Taneja College of Pharmacy Graduate Programs, MDC30, 12908 USF Health Drive, Tampa, FL 33612, USA
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA; (S.M.D.); (W.L.)
- Research Service, James A. Haley Veterans Hospital, Tampa, FL 33612, USA
| |
Collapse
|
2
|
Matboli M, Al-Amodi HS, Khaled A, Khaled R, Ali M, Kamel HFM, Hamid MSAEL, ELsawi HA, Habib EK, Youssef I. Integrating molecular, biochemical, and immunohistochemical features as predictors of hepatocellular carcinoma drug response using machine-learning algorithms. Front Mol Biosci 2024; 11:1430794. [PMID: 39479501 PMCID: PMC11521808 DOI: 10.3389/fmolb.2024.1430794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 09/27/2024] [Indexed: 11/02/2024] Open
Abstract
Introduction Liver cancer, particularly Hepatocellular carcinoma (HCC), remains a significant global health concern due to its high prevalence and heterogeneous nature. Despite the existence of approved drugs for HCC treatment, the scarcity of predictive biomarkers limits their effective utilization. Integrating diverse data types to revolutionize drug response prediction, ultimately enabling personalized HCC management. Method In this study, we developed multiple supervised machine learning models to predict treatment response. These models utilized classifiers such as logistic regression (LR), k-nearest neighbors (kNN), neural networks (NN), support vector machines (SVM), and random forests (RF) using a comprehensive set of molecular, biochemical, and immunohistochemical features as targets of three drugs: Pantoprazole, Cyanidin 3-glycoside (Cyan), and Hesperidin. A set of performance metrics for the complete and reduced models were reported including accuracy, precision, recall (sensitivity), specificity, and the Matthews Correlation Coefficient (MCC). Results and Discussion Notably, (NN) achieved the best prediction accuracy where the combined model using molecular and biochemical features exhibited exceptional predictive power, achieving solid accuracy of 0.9693 ∓ 0.0105 and average area under the ROC curve (AUC) of 0.94 ∓ 0.06 coming from three cross-validation iterations. Also, found seven molecular features, seven biochemical features, and one immunohistochemistry feature as promising biomarkers of treatment response. This comprehensive method has the potential to significantly advance personalized HCC therapy by allowing for more precise drug response estimation and assisting in the identification of effective treatment strategies.
Collapse
Affiliation(s)
- Marwa Matboli
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
- Faculty of Oral and Dental Medicine, Misr International University (MIU), Cairo, Egypt
| | - Hiba S. Al-Amodi
- Biochemistry Department, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Abdelrahman Khaled
- Bioinformatics Group, Center of Informatics Sciences (CIS), School of Information Technology and Computer Sciences, Nile University, Giza, Egypt
| | - Radwa Khaled
- Biotechnology/Biomolecular Chemistry Department, Faculty of Science, Cairo University, Giza, Egypt
| | - Marwa Ali
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Hala F. M. Kamel
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
- Biochemistry Department, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
| | | | - Hind A. ELsawi
- Department of Internal Medicine, Badr University in Cairo, Badr, Egypt
| | - Eman K. Habib
- Department of Anatomy and Cell Biology, Faculty of Medicine, Ain Shams University, Cairo, Egypt
- Department of Anatomy and Cell Biology, Faculty of Medicine, Galala University, Suez, Egypt
| | - Ibrahim Youssef
- Systems and Biomedical Engineering Department, Faculty of Engineering, Cairo University, Giza, Egypt
| |
Collapse
|
3
|
Vishwakarma S, Hernandez-Hernandez S, Ballester PJ. Graph neural networks are promising for phenotypic virtual screening on cancer cell lines. Biol Methods Protoc 2024; 9:bpae065. [PMID: 39502795 PMCID: PMC11537795 DOI: 10.1093/biomethods/bpae065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 08/20/2024] [Accepted: 09/02/2024] [Indexed: 11/08/2024] Open
Abstract
Artificial intelligence is increasingly driving early drug design, offering novel approaches to virtual screening. Phenotypic virtual screening (PVS) aims to predict how cancer cell lines respond to different compounds by focusing on observable characteristics rather than specific molecular targets. Some studies have suggested that deep learning may not be the best approach for PVS. However, these studies are limited by the small number of tested molecules as well as not employing suitable performance metrics and dissimilar-molecules splits better mimicking the challenging chemical diversity of real-world screening libraries. Here we prepared 60 datasets, each containing approximately 30 000-50 000 molecules tested for their growth inhibitory activities on one of the NCI-60 cancer cell lines. We conducted multiple performance evaluations of each of the five machine learning algorithms for PVS on these 60 problem instances. To provide even a more comprehensive evaluation, we used two model validation types: the random split and the dissimilar-molecules split. Overall, about 14 440 training runs aczross datasets were carried out per algorithm. The models were primarily evaluated using hit rate, a more suitable metric in VS contexts. The results show that all models are more challenged by test molecules that are substantially different from those in the training data. In both validation types, the D-MPNN algorithm, a graph-based deep neural network, was found to be the most suitable for building predictive models for this PVS problem.
Collapse
Affiliation(s)
- Sachin Vishwakarma
- Evotec SAS (France), Toulouse, France
- Centre de Recherche en Cancérologie de Marseille, Marseille 13009, France
| | | | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, United Kingdom
| |
Collapse
|
4
|
Xu M, Zhu Z, Zhao Y, He K, Huang Q, Zhao Y. RedCDR: Dual Relation Distillation for Cancer Drug Response Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1468-1479. [PMID: 38776197 DOI: 10.1109/tcbb.2024.3404262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2024]
Abstract
Based on multi-omics data and drug information, predicting the response of cancer cell lines to drugs is a crucial area of research in modern oncology, as it can promote the development of personalized treatments. Despite the promising performance achieved by existing models, most of them overlook the variations among different omics and lack effective integration of multi-omics data. Moreover, the explicit modeling of cell line/drug attribute and cell line-drug association has not been thoroughly investigated in existing approaches. To address these issues, we propose RedCDR, a dual relation distillation model for cancer drug response (CDR) prediction. Specifically, a parallel dual-branch architecture is designed to enable both the independent learning and interactive fusion feasible for cell line/drug attribute and cell line-drug association information. To facilitate the adaptive interacting integration of multi-omics data, the proposed multi-omics encoder introduces the multiple similarity relations between cell lines and takes the importance of different omics data into account. To accomplish knowledge transfer from the two independent attribute and association branches to their fusion, a dual relation distillation mechanism consisting of representation distillation and prediction distillation is presented. Experiments conducted on the GDSC and CCLE datasets show that RedCDR outperforms previous state-of-the-art approaches in CDR prediction.
Collapse
|
5
|
Xie X, Wang F, Wang G, Zhu W, Du X, Wang H. Learning the cellular activity representation based on gene regulatory networks for prediction of tumor response to drugs. Artif Intell Med 2024; 152:102864. [PMID: 38640702 DOI: 10.1016/j.artmed.2024.102864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 01/28/2024] [Accepted: 03/30/2024] [Indexed: 04/21/2024]
Abstract
Predicting the response of tumor cells to anti-tumor drugs is critical to realizing cancer precision medicine. Currently, most existing methods ignore the regulatory relationships between genes and thus have unsatisfactory predictive performance. In this paper, we propose to predict anti-tumor drug efficacy via learning the activity representation of tumor cells based on a priori knowledge of gene regulation networks (GRNs). Specifically, the method simulates the cellular biosystem by synthesizing a cell-gene activity network and then infers a new low-dimensional activity representation for tumor cells from the raw high-dimensional expression profile. The simulated cell-gene network mainly comprises known gene regulatory networks collected from multiple resources and fuses tumor cells by linking them to hotspot genes that are over- or under-expressed in them. The resulting activity representation could not only reflect the shallow expression profile (hotspot genes) but also mines in-depth information of gene regulation activity in tumor cells before treatment. Finally, we build deep learning models on the activity representation for predicting drug efficacy in tumor cells. Experimental results on the benchmark GDSC dataset demonstrate the superior performance of the proposed method over SOTA methods with the highest AUC of 0.954 in the efficacy label prediction and the best R2 of 0.834 in the regression of half maximal inhibitory concentration (IC50) values, suggesting the potential value of the proposed method in practice.
Collapse
Affiliation(s)
- Xinping Xie
- School of mathematics and physics, Anhui Jianzhu University, Hefei, China
| | - Fengting Wang
- School of mathematics and physics, Anhui Jianzhu University, Hefei, China; Institute of Intelligent Machines, Hefei Institutes of Physical Science, CAS, Hefei, China
| | - Guanfu Wang
- School of mathematics and physics, Anhui Jianzhu University, Hefei, China
| | - Weiwei Zhu
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, CAS, Hefei, China; Zhongqi AI Lab, Hefei, China
| | - Xiaodong Du
- Experimental Teaching Center, Hefei University, Hefei, China
| | - Hongqiang Wang
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, CAS, Hefei, China; Zhongqi AI Lab, Hefei, China.
| |
Collapse
|
6
|
Sotudian S, Paschalidis IC. ITNR: Inversion Transformer-based Neural Ranking for cancer drug recommendations. Comput Biol Med 2024; 172:108312. [PMID: 38503090 PMCID: PMC10990436 DOI: 10.1016/j.compbiomed.2024.108312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Revised: 03/09/2024] [Accepted: 03/12/2024] [Indexed: 03/21/2024]
Abstract
Personalized drug response prediction is an approach for tailoring effective therapeutic strategies for patients based on their tumors' genomic characterization. While machine learning methods are widely employed in the literature, they often struggle to capture drug-cell line relations across various cell lines. In addressing this challenge, our study introduces a novel listwise Learning-to-Rank (LTR) model named Inversion Transformer-based Neural Ranking (ITNR). ITNR utilizes genomic features and a transformer architecture to decipher functional relationships and construct models that can predict patient-specific drug responses. Our experiments were conducted on three major drug response data sets, showing that ITNR reliably and consistently outperforms state-of-the-art LTR models.
Collapse
Affiliation(s)
- Shahabeddin Sotudian
- Department of Electrical and Computer Engineering, Division of Systems Engineering, Boston University, Boston, MA, USA.
| | - Ioannis Ch Paschalidis
- Department of Electrical and Computer Engineering, Division of Systems Engineering, Boston University, Boston, MA, USA; Department of Biomedical Engineering, and Faculty of Computing and Data Sciences, Boston University, Boston, MA, USA.
| |
Collapse
|
7
|
Kim H, Lee ER, Park S. Debiased inference for heterogeneous subpopulations in a high-dimensional logistic regression model. Sci Rep 2023; 13:21979. [PMID: 38081913 PMCID: PMC10713553 DOI: 10.1038/s41598-023-48903-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 11/30/2023] [Indexed: 10/16/2024] Open
Abstract
Due to the prevalence of complex data, data heterogeneity is often observed in contemporary scientific studies and various applications. Motivated by studies on cancer cell lines, we consider the analysis of heterogeneous subpopulations with binary responses and high-dimensional covariates. In many practical scenarios, it is common to use a single regression model for the entire data set. To do this effectively, it is critical to quantify the heterogeneity of the effect of covariates across subpopulations through appropriate statistical inference. However, the high dimensionality and discrete nature of the data can lead to challenges in inference. Therefore, we propose a novel statistical inference method for a high-dimensional logistic regression model that accounts for heterogeneous subpopulations. Our primary goal is to investigate heterogeneity across subpopulations by testing the equivalence of the effect of a covariate and the significance of the overall effects of a covariate. To achieve overall sparsity of the coefficients and their fusions across subpopulations, we employ a fused group Lasso penalization method. In addition, we develop a statistical inference method that incorporates bias correction of the proposed penalized method. To address computational issues due to the nonlinear log-likelihood and the fused Lasso penalty, we propose a computationally efficient and fast algorithm by adapting the ideas of the proximal gradient method and the alternating direction method of multipliers (ADMM) to our settings. Furthermore, we develop non-asymptotic analyses for the proposed fused group Lasso and prove that the debiased test statistics admit chi-squared approximations even in the presence of high-dimensional variables. In simulations, the proposed test outperforms existing methods. The practical effectiveness of the proposed method is demonstrated by analyzing data from the Cancer Cell Line Encyclopedia (CCLE).
Collapse
Affiliation(s)
- Hyunjin Kim
- Department of Statistics, Sungkyunkwan University, Seoul, 100190, South Korea
| | - Eun Ryung Lee
- Department of Statistics, Sungkyunkwan University, Seoul, 100190, South Korea.
| | - Seyoung Park
- Department of Statistics, Sungkyunkwan University, Seoul, 100190, South Korea.
| |
Collapse
|
8
|
Ma X, Tang Y, Wang C, Li Y, Zhang J, Luo Y, Xu Z, Wu F, Wang S. Interpretable XGBoost-SHAP Model Predicts Nanoparticles Delivery Efficiency Based on Tumor Genomic Mutations and Nanoparticle Properties. ACS APPLIED BIO MATERIALS 2023; 6:4326-4335. [PMID: 37683105 DOI: 10.1021/acsabm.3c00527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/10/2023]
Abstract
Understanding the complex interaction between nanoparticles (NPs) and tumors in vivo and how it dominates the delivery efficiency of NPs is critical for the translation of nanomedicine. Herein, we proposed an interpretable XGBoost-SHAP model by integrating the information on NPs physicochemical properties and tumor genomic profile to predict the delivery efficiency. The correlation coefficients were 0.66, 0.75, and 0.54 for the prediction of maximum delivery efficiency, delivery efficiency at 24 and 168 h postinjection for test sets. The analysis of the feature importance revealed that the tumor genomic mutations and their interaction with NPs properties played important roles in the delivery of NPs. The biological pathways of the NP-delivery-related genes were further explored through gene ontology enrichment analysis. Our work provides a pipeline to predict and explain the delivery efficiency of NPs to heterogeneous tumors and highlights the power of simultaneously using omics data and interpretable machine learning algorithms for discovering interactions between NPs and individual tumors, which is important for the development of personalized precision nanomedicine.
Collapse
Affiliation(s)
- Xingqun Ma
- Laboratory of Molecular Imaging, Department of Radiology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu 210000, China
- Department of Oncology, Nanjing Baiyi Hospital, Jinling Clinical College of Nanjing Medical University, Nanjing, Jiangsu 210000, China
| | - Yuxia Tang
- Laboratory of Molecular Imaging, Department of Radiology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu 210000, China
| | - Chuanbing Wang
- Laboratory of Molecular Imaging, Department of Radiology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu 210000, China
| | - Yang Li
- Laboratory of Molecular Imaging, Department of Radiology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu 210000, China
| | - Jiulou Zhang
- Laboratory of Molecular Imaging, Department of Radiology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu 210000, China
| | - Yafei Luo
- Laboratory of Molecular Imaging, Department of Radiology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu 210000, China
| | - Ziqing Xu
- Laboratory of Molecular Imaging, Department of Radiology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu 210000, China
| | - Feiyun Wu
- Laboratory of Molecular Imaging, Department of Radiology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu 210000, China
| | - Shouju Wang
- Laboratory of Molecular Imaging, Department of Radiology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu 210000, China
| |
Collapse
|
9
|
DasGupta R, Yap A, Yaqing EY, Chia S. Evolution of precision oncology-guided treatment paradigms. WIREs Mech Dis 2023; 15:e1585. [PMID: 36168283 DOI: 10.1002/wsbm.1585] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 06/30/2022] [Accepted: 07/11/2022] [Indexed: 01/31/2023]
Abstract
Cancer treatment is gradually evolving from the classical use of nonspecific cytotoxic drugs targeting generic mechanisms of cell growth and proliferation. Instead, new "patient-specific treatment paradigms" that are based on an individual patient's tumor-specific molecular features are emerging, and these include "druggable" genomic alterations such as oncogenic driver mutations, downstream activities of cancer-signaling pathways, and the expression of specific genes involved in tumorigenesis and cancer progression. This evolving landscape of making evidence-based treatment decisions forms the foundation of precision oncology, which aims to deliver "the right drug, to the right patient and at the right time". The long-term vision for this approach is to maximize the treatment efficacy while minimizing exposure to ineffective therapy and reducing co-morbidity-related side effects. Successful clinical translation and implementation of this vision have the potential to revolutionize treatment paradigms from predominantly reactive, to more evidence-based, proactive and predictive care. In this article, we review the past and current approaches in precision oncology, and describe their remarkable power and limitations. We also speculate on the evolution of newly emerging methodologies of the future that can be used to address some of the key challenges associated with the existing paradigms. This article is categorized under: Cancer > Genetics/Genomics/Epigenetics Cancer > Molecular and Cellular Physiology Cancer > Computational Models.
Collapse
Affiliation(s)
- Ramanuj DasGupta
- Laboratory of Precision Oncology and Cancer Evolution, Genome Institute of Singapore, A*STAR, Singapore, Singapore.,Cancer Science Institute, National University of Singapore, Singapore, Singapore
| | - Aixin Yap
- Laboratory of Precision Oncology and Cancer Evolution, Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Elena Yong Yaqing
- Laboratory of Precision Oncology and Cancer Evolution, Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Shumei Chia
- Laboratory of Precision Oncology and Cancer Evolution, Genome Institute of Singapore, A*STAR, Singapore, Singapore
| |
Collapse
|
10
|
Xi J, Wang D, Yang X, Zhang W, Huang Q. Cancer omic data based explainable AI drug recommendation inference: A traceability perspective for explainability. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
11
|
Utilization of Cancer Cell Line Screening to Elucidate the Anticancer Activity and Biological Pathways Related to the Ruthenium-Based Therapeutic BOLD-100. Cancers (Basel) 2022; 15:cancers15010028. [PMID: 36612025 PMCID: PMC9817855 DOI: 10.3390/cancers15010028] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 11/30/2022] [Accepted: 12/16/2022] [Indexed: 12/24/2022] Open
Abstract
BOLD-100 (sodium trans-[tetrachlorobis(1H indazole)ruthenate(III)]) is a ruthenium-based anticancer compound currently in clinical development. The identification of cancer types that show increased sensitivity towards BOLD-100 can lead to improved developmental strategies. Sensitivity profiling can also identify mechanisms of action that are pertinent for the bioactivity of complex therapeutics. Sensitivity to BOLD-100 was measured in a 319-cancer-cell line panel spanning 24 tissues. BOLD-100's sensitivity profile showed variation across the tissue lineages, including increased response in esophageal, bladder, and hematologic cancers. Multiple cancers, including esophageal, bile duct and colon cancer, had higher relative response to BOLD-100 than to cisplatin. Response to BOLD-100 showed only moderate correlation to anticancer compounds in the Genomics of Drug Sensitivity in Cancer (GDSC) database, as well as no clear theme in bioactivity of correlated hits, suggesting that BOLD-100 may have a differentiated therapeutic profile. The genomic modalities of cancer cell lines were modeled against the BOLD-100 sensitivity profile, which revealed that genes related to ribosomal processes were associated with sensitivity to BOLD-100. Machine learning modeling of the sensitivity profile to BOLD-100 and gene expression data provided moderative predictive value. These findings provide further mechanistic understanding around BOLD-100 and support its development for additional cancer types.
Collapse
|
12
|
Shin J, Piao Y, Bang D, Kim S, Jo K. DRPreter: Interpretable Anticancer Drug Response Prediction Using Knowledge-Guided Graph Neural Networks and Transformer. Int J Mol Sci 2022; 23:13919. [PMID: 36430395 PMCID: PMC9699175 DOI: 10.3390/ijms232213919] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/27/2022] [Accepted: 11/08/2022] [Indexed: 11/16/2022] Open
Abstract
Some of the recent studies on drug sensitivity prediction have applied graph neural networks to leverage prior knowledge on the drug structure or gene network, and other studies have focused on the interpretability of the model to delineate the mechanism governing the drug response. However, it is crucial to make a prediction model that is both knowledge-guided and interpretable, so that the prediction accuracy is improved and practical use of the model can be enhanced. We propose an interpretable model called DRPreter (drug response predictor and interpreter) that predicts the anticancer drug response. DRPreter learns cell line and drug information with graph neural networks; the cell-line graph is further divided into multiple subgraphs with domain knowledge on biological pathways. A type-aware transformer in DRPreter helps detect relationships between pathways and a drug, highlighting important pathways that are involved in the drug response. Extensive experiments on the GDSC (Genomics of Drug Sensitivity and Cancer) dataset demonstrate that the proposed method outperforms state-of-the-art graph-based models for drug response prediction. In addition, DRPreter detected putative key genes and pathways for specific drug-cell-line pairs with supporting evidence in the literature, implying that our model can help interpret the mechanism of action of the drug.
Collapse
Affiliation(s)
- Jihye Shin
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Yinhua Piao
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
| | - Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- AIGENDRUG Co., Ltd., Seoul 08826, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul 08826, Korea
- MOGAM Institute for Biomedical Research, Yongin-si 16924, Korea
| | - Kyuri Jo
- Department of Computer Engineering, Chungbuk National University, Cheongju 28644, Korea
| |
Collapse
|
13
|
Ahn S, Lee SE, Kim MH. Random-forest model for drug-target interaction prediction via Kullbeck-Leibler divergence. J Cheminform 2022; 14:67. [PMID: 36192818 PMCID: PMC9531514 DOI: 10.1186/s13321-022-00644-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 09/11/2022] [Indexed: 12/04/2022] Open
Abstract
Virtual screening has significantly improved the success rate of early stage drug discovery. Recent virtual screening methods have improved owing to advances in machine learning and chemical information. Among these advances, the creative extraction of drug features is important for predicting drug–target interaction (DTI), which is a large-scale virtual screening of known drugs. Herein, we report Kullbeck–Leibler divergence (KLD) as a DTI feature and the feature-driven classification model applicable to DTI prediction. For the purpose, E3FP three-dimensional (3D) molecular fingerprints of drugs as a molecular representation allow the computation of 3D similarities between ligands within each target (Q–Q matrix) to identify the uniqueness of pharmacological targets and those between a query and a ligand (Q–L vector) in DTIs. The 3D similarity matrices are transformed into probability density functions via kernel density estimation as a nonparametric estimation. Each density model can exploit the characteristics of each pharmacological target and measure the quasi-distance between the ligands. Furthermore, we developed a random forest model from the KLD feature vectors to successfully predict DTIs for representative 17 targets (mean accuracy: 0.882, out-of-bag score estimate: 0.876, ROC AUC: 0.990). The method is applicable for 2D chemical similarity.
Collapse
Affiliation(s)
- Sangjin Ahn
- Gachon Institute of Pharmaceutical Science and Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea.,Department of Artificial Intelligence, Ajou University, Suwon, 16499, Republic of Korea
| | - Si Eun Lee
- Gachon Institute of Pharmaceutical Science and Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea
| | - Mi-Hyun Kim
- Gachon Institute of Pharmaceutical Science and Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea.
| |
Collapse
|
14
|
Gut Microbiota in Nutrition and Health with a Special Focus on Specific Bacterial Clusters. Cells 2022; 11:cells11193091. [PMID: 36231053 PMCID: PMC9563262 DOI: 10.3390/cells11193091] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 09/21/2022] [Accepted: 09/24/2022] [Indexed: 11/25/2022] Open
Abstract
Health is influenced by how the gut microbiome develops as a result of external and internal factors, such as nutrition, the environment, medication use, age, sex, and genetics. Alpha and beta diversity metrics and (enterotype) clustering methods are commonly employed to perform population studies and to analyse the effects of various treatments, yet, with the continuous development of (new) sequencing technologies, and as various omics fields as a result become more accessible for investigation, increasingly sophisticated methodologies are needed and indeed being developed in order to disentangle the complex ways in which the gut microbiome and health are intertwined. Diseases of affluence, such as type 2 diabetes (T2D) and cardiovascular diseases (CVD), are commonly linked to species associated with the Bacteroides enterotype(s) and a decline of various (beneficial) complex microbial trophic networks, which are in turn linked to the aforementioned factors. In this review, we (1) explore the effects that some of the most common internal and external factors have on the gut microbiome composition and how these in turn relate to T2D and CVD, and (2) discuss research opportunities enabled by and the limitations of some of the latest technical developments in the microbiome sector, including the use of artificial intelligence (AI), strain tracking, and peak to trough ratios.
Collapse
|
15
|
Gene expression based inference of cancer drug sensitivity. Nat Commun 2022; 13:5680. [PMID: 36167836 PMCID: PMC9515171 DOI: 10.1038/s41467-022-33291-z] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 09/12/2022] [Indexed: 11/09/2022] Open
Abstract
Inter and intra-tumoral heterogeneity are major stumbling blocks in the treatment of cancer and are responsible for imparting differential drug responses in cancer patients. Recently, the availability of high-throughput screening datasets has paved the way for machine learning based personalized therapy recommendations using the molecular profiles of cancer specimens. In this study, we introduce Precily, a predictive modeling approach to infer treatment response in cancers using gene expression data. In this context, we demonstrate the benefits of considering pathway activity estimates in tandem with drug descriptors as features. We apply Precily on single-cell and bulk RNA sequencing data associated with hundreds of cancer cell lines. We then assess the predictability of treatment outcomes using our in-house prostate cancer cell line and xenografts datasets exposed to differential treatment conditions. Further, we demonstrate the applicability of our approach on patient drug response data from The Cancer Genome Atlas and an independent clinical study describing the treatment journey of three melanoma patients. Our findings highlight the importance of chemo-transcriptomics approaches in cancer treatment selection. Predicting treatment response in cancer remains a highly complex task. Here, the authors develop Precily, a deep neural network framework to predict treatment response in cancer by considering gene expression, pathway activity estimates and drug features, and test this method in multiple datasets and preclinical models.
Collapse
|
16
|
Liu Y, Xiao W, Zhang H, Xin L, Li X, Pan F. Chemotherapy drug potency assessment method of ovarian cancer cells by digital holography microscopy. BIOMEDICAL OPTICS EXPRESS 2022; 13:4370-4385. [PMID: 36032571 PMCID: PMC9408259 DOI: 10.1364/boe.465149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 07/03/2022] [Accepted: 07/18/2022] [Indexed: 06/15/2023]
Abstract
Drug potency assessment plays a crucial role in cancer chemotherapy. The selection of appropriate chemotherapy drugs can reduce the impact on the patient's physical condition and achieve a better therapeutic effect. Various methods have been used to achieve in vitro drug susceptibility assays, but there are few studies on calculating morphology and texture parameters quantitatively based on phase imaging for drug potency assessment. In this study, digital holography microscopy was used to get phase imaging of ovarian cancer cells after adding three different drugs, namely, Cisplatin, Adriamycin, and 5-fluorouracil. Based on the reconstructed phase imaging, four parameters of ovarian cancer cells changed with time, such as the average height, projected area, cluster shade, and entropy, were calculated. And the half-inhibitory concentration of cells under the effect of different drugs was calculated according to these four parameters. The half-inhibitory concentration, which can directly reflect the drug potency, is associated with the morphological and texture features extracted from phase images by numerical fitting. So, a new method for calculating the half-inhibitory concentration was proposed. The result shows that the morphological and texture feature parameters can be used to evaluate the sensitivity of ovarian cancer cells to different drugs by fitting the half-inhibitory concentration numerically. And the result provides a new idea for drug potency assessment methods before chemotherapy for ovarian cancer.
Collapse
Affiliation(s)
- Yakun Liu
- Key Laboratory of Precision Opto-mechatronics Technology, School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing 100191, China
| | - Wen Xiao
- Key Laboratory of Precision Opto-mechatronics Technology, School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing 100191, China
| | - Huanzhi Zhang
- Department of Obstetrics and Gynecology, Peking University People's Hospital, Beijing 100044, China
| | - Lu Xin
- Key Laboratory of Precision Opto-mechatronics Technology, School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing 100191, China
| | - Xiaoping Li
- Department of Obstetrics and Gynecology, Peking University People's Hospital, Beijing 100044, China
| | - Feng Pan
- Key Laboratory of Precision Opto-mechatronics Technology, School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing 100191, China
| |
Collapse
|
17
|
Sotudian S, Paschalidis IC. Machine Learning for Pharmacogenomics and Personalized Medicine: A Ranking Model for Drug Sensitivity Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2324-2333. [PMID: 34043512 PMCID: PMC9642333 DOI: 10.1109/tcbb.2021.3084562] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
It is infeasible to test many different chemotherapy drugs on actual patients in large clinical trials, which motivates computational methods with the ability to learn and exploit associations between drug effectiveness and patient characteristics. This work proposes a machine learning approach to infer robust predictors of drug responses from patient genomic information. Rather than predicting the exact drug response on a given cell line, we introduce an elastic-net regression methodology to compare a drug-cell line pair against an alternative pair. Using predicted pairwise comparisons we rank the effectiveness of different drugs on the same cell line. A total of 173 cell lines and 100 drug responses were used in various settings for training and testing the proposed models. By comparing our approach against twelve baseline methods, we demonstrate that it outperforms the state-of-the-art methods in the literature. In contrast to most other methods, the algorithm is able to maintain its high performance even when we use a large number of drugs and few cell lines.
Collapse
|
18
|
Lo-Thong-Viramoutou O, Charton P, Cadet XF, Grondin-Perez B, Saavedra E, Damour C, Cadet F. Non-linearity of Metabolic Pathways Critically Influences the Choice of Machine Learning Model. Front Artif Intell 2022; 5:744755. [PMID: 35757298 PMCID: PMC9226554 DOI: 10.3389/frai.2022.744755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 04/29/2022] [Indexed: 11/13/2022] Open
Abstract
The use of machine learning (ML) in life sciences has gained wide interest over the past years, as it speeds up the development of high performing models. Important modeling tools in biology have proven their worth for pathway design, such as mechanistic models and metabolic networks, as they allow better understanding of mechanisms involved in the functioning of organisms. However, little has been done on the use of ML to model metabolic pathways, and the degree of non-linearity associated with them is not clear. Here, we report the construction of different metabolic pathways with several linear and non-linear ML models. Different types of data are used; they lead to the prediction of important biological data, such as pathway flux and final product concentration. A comparison reveals that the data features impact model performance and highlight the effectiveness of non-linear models (e.g., QRF: RMSE = 0.021 nmol·min-1 and R2 = 1 vs. Bayesian GLM: RMSE = 1.379 nmol·min-1 R2 = 0.823). It turns out that the greater the degree of non-linearity of the pathway, the better suited a non-linear model will be. Therefore, a decision-making support for pathway modeling is established. These findings generally support the hypothesis that non-linear aspects predominate within the metabolic pathways. This must be taken into account when devising possible applications of these pathways for the identification of biomarkers of diseases (e.g., infections, cancer, neurodegenerative diseases) or the optimization of industrial production processes.
Collapse
Affiliation(s)
- Ophélie Lo-Thong-Viramoutou
- University of Paris, BIGR—Biologie Intégrée du Globule Rouge, Inserm, UMR_S1134, Paris, France
- Laboratory of Excellence GR-Ex, Paris, France
- Laboratory DSIMB, UMR_S1134, BIGR, Inserm, Faculty of Sciences and Technology, University of La Reunion, Saint-Denis, France
| | - Philippe Charton
- University of Paris, BIGR—Biologie Intégrée du Globule Rouge, Inserm, UMR_S1134, Paris, France
- Laboratory of Excellence GR-Ex, Paris, France
- Laboratory DSIMB, UMR_S1134, BIGR, Inserm, Faculty of Sciences and Technology, University of La Reunion, Saint-Denis, France
| | | | - Brigitte Grondin-Perez
- EnergyLab, EA 4079, Faculty of Sciences and Technology, University of La Reunion, Saint-Denis, France
| | - Emma Saavedra
- Departamento de Bioquímica, Instituto Nacional de Cardiología Ignacio Chávez, Mexico City, Mexico
| | - Cédric Damour
- EnergyLab, EA 4079, Faculty of Sciences and Technology, University of La Reunion, Saint-Denis, France
| | - Frédéric Cadet
- University of Paris, BIGR—Biologie Intégrée du Globule Rouge, Inserm, UMR_S1134, Paris, France
- Laboratory of Excellence GR-Ex, Paris, France
- Laboratory DSIMB, UMR_S1134, BIGR, Inserm, Faculty of Sciences and Technology, University of La Reunion, Saint-Denis, France
| |
Collapse
|
19
|
Wang Z, Wang Z, Huang Y, Lu L, Fu Y. A multi-view multi-omics model for cancer drug response prediction. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03294-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
20
|
Zhang L, Yuan Y, Yu J, Liu H. SEMCM: A Self-Expressive Matrix Completion Model for Anti-cancer Drug Sensitivity Prediction. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220302123118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Genomic data sets generated by several recent large scale high-throughput screening efforts pose a thorny computational challenge for anticancer drug sensitivity prediction.
Objective:
We aimed to design an algorithm model that would predict missing elements in incomplete matrices and could be applicable to drug response prediction programs.
Method:
We developed a novel self-expressive matrix completion model to improve the predictive performance of drug response prediction problems. The model is based on the idea of subspace clustering and as a convex problem, it can be solved by alternating direction method of
multipliers. The original incomplete matrix can be filled through model training and parameters updated iteratively.
Results:
We applied SEMCM to Genomics of Drug Sensitivity in Cancer
(GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets to predict
unknown response values. A large number of experiments have proved that the algorithm has good prediction results and stability, which are better than several existing advanced drug sensitivity prediction and matrix
completion algorithms. Without modeling mutation information, SEMCM
could correctly predict cell line-drug associations for mutated cell lines and
wild cell lines. SEMCM can also be used for drug repositioning. The newly
predicted drug responses of GDSC dataset suggest that BL-41 was highly
sensitive to Bortezomib. Moreover, the sensitivity of A172 and NCI-H1437
to Paclitaxel was roughly the same.
Conclusion:
We report an efficient anticancer drug sensitivity prediction algorithm which is open-source and can predict the unknown responses of
cancer cell lines to drugs. Experimental results prove that our method can
not only improve the prediction accuracy but also can be applied to drug
repositioning.
Collapse
Affiliation(s)
- Lin Zhang
- Engineering Research Center of Intelligent Control for Underground
Space, Ministry of Education,
- China University of Mining and Technology, Xuzhou 221116, China
| | - Yuwei Yuan
- Engineering Research Center of Intelligent Control for Underground
Space, Ministry of Education,
- China University of Mining and Technology, Xuzhou 221116, China
| | - Jian Yu
- Engineering Research Center of Intelligent Control for Underground
Space, Ministry of Education,
- China University of Mining and Technology, Xuzhou 221116, China
| | - Hui Liu
- Engineering Research Center of Intelligent Control for Underground
Space, Ministry of Education,
- China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
21
|
Pouryahya M, Oh JH, Mathews JC, Belkhatir Z, Moosmüller C, Deasy JO, Tannenbaum AR. Pan-Cancer Prediction of Cell-Line Drug Sensitivity Using Network-Based Methods. Int J Mol Sci 2022; 23:ijms23031074. [PMID: 35163005 PMCID: PMC8835038 DOI: 10.3390/ijms23031074] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 01/15/2022] [Accepted: 01/17/2022] [Indexed: 01/02/2023] Open
Abstract
The development of reliable predictive models for individual cancer cell lines to identify an optimal cancer drug is a crucial step to accelerate personalized medicine, but vast differences in cancer cell lines and drug characteristics make it quite challenging to develop predictive models that result in high predictive power and explain the similarity of cell lines or drugs. Our study proposes a novel network-based methodology that breaks the problem into smaller, more interpretable problems to improve the predictive power of anti-cancer drug responses in cell lines. For the drug-sensitivity study, we used the GDSC database for 915 cell lines and 200 drugs. The theory of optimal mass transport was first used to separately cluster cell lines and drugs, using gene-expression profiles and extensive cheminformatic drug features, represented in a form of data networks. To predict cell-line specific drug responses, random forest regression modeling was separately performed for each cell-line drug cluster pair. Post-modeling biological analysis was further performed to identify potential biological correlates associated with drug responses. The network-based clustering method resulted in 30 distinct cell-line drug cluster pairs. Predictive modeling on each cell-line-drug cluster outperformed alternative computational methods in predicting drug responses. We found that among the four drugs top-ranked with respect to prediction performance, three targeted the PI3K/mTOR signaling pathway. Predictive modeling on clustered subsets of cell lines and drugs improved the prediction accuracy of cell-line specific drug responses. Post-modeling analysis identified plausible biological processes associated with drug responses.
Collapse
Affiliation(s)
- Maryam Pouryahya
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (M.P.); (J.C.M.); (J.O.D.)
| | - Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (M.P.); (J.C.M.); (J.O.D.)
- Correspondence:
| | - James C. Mathews
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (M.P.); (J.C.M.); (J.O.D.)
| | - Zehor Belkhatir
- School of Engineering and Sustainable Development, De Montfort University, Leicester LE1 9BH, UK;
| | - Caroline Moosmüller
- Department of Mathematics, University of California at San Diego, La Jolla, CA 92093, USA;
| | - Joseph O. Deasy
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (M.P.); (J.C.M.); (J.O.D.)
| | - Allen R. Tannenbaum
- Departments of Computer Science and Applied Mathematics & Statistics, Stony Brook University, Stony Brook, NY 11794, USA;
| |
Collapse
|
22
|
Firoozbakht F, Yousefi B, Schwikowski B. An overview of machine learning methods for monotherapy drug response prediction. Brief Bioinform 2022; 23:bbab408. [PMID: 34619752 PMCID: PMC8769705 DOI: 10.1093/bib/bbab408] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/25/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
For an increasing number of preclinical samples, both detailed molecular profiles and their responses to various drugs are becoming available. Efforts to understand, and predict, drug responses in a data-driven manner have led to a proliferation of machine learning (ML) methods, with the longer term ambition of predicting clinical drug responses. Here, we provide a uniquely wide and deep systematic review of the rapidly evolving literature on monotherapy drug response prediction, with a systematic characterization and classification that comprises more than 70 ML methods in 13 subclasses, their input and output data types, modes of evaluation, and code and software availability. ML experts are provided with a fundamental understanding of the biological problem, and how ML methods are configured for it. Biologists and biomedical researchers are introduced to the basic principles of applicable ML methods, and their application to the problem of drug response prediction. We also provide systematic overviews of commonly used data sources used for training and evaluation methods.
Collapse
Affiliation(s)
- Farzaneh Firoozbakht
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Behnam Yousefi
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
- Sorbonne Université, École Doctorale Complexite du Vivant, Paris, France
| | - Benno Schwikowski
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
23
|
Deng J, Yang Z, Ojima I, Samaras D, Wang F. Artificial intelligence in drug discovery: applications and techniques. Brief Bioinform 2021; 23:6420092. [PMID: 34734228 DOI: 10.1093/bib/bbab430] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 08/02/2021] [Accepted: 09/18/2021] [Indexed: 12/23/2022] Open
Abstract
Artificial intelligence (AI) has been transforming the practice of drug discovery in the past decade. Various AI techniques have been used in many drug discovery applications, such as virtual screening and drug design. In this survey, we first give an overview on drug discovery and discuss related applications, which can be reduced to two major tasks, i.e. molecular property prediction and molecule generation. We then present common data resources, molecule representations and benchmark platforms. As a major part of the survey, AI techniques are dissected into model architectures and learning paradigms. To reflect the technical development of AI in drug discovery over the years, the surveyed works are organized chronologically. We expect that this survey provides a comprehensive review on AI in drug discovery. We also provide a GitHub repository with a collection of papers (and codes, if applicable) as a learning resource, which is regularly updated.
Collapse
Affiliation(s)
- Jianyuan Deng
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY 11790, USA
| | - Zhibo Yang
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA
| | - Iwao Ojima
- Department of Chemistry, Stony Brook University, Stony Brook, NY 11790, USA
| | - Dimitris Samaras
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA
| | - Fusheng Wang
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY 11790, USA.,Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA
| |
Collapse
|
24
|
Selvaraj C, Chandra I, Singh SK. Artificial intelligence and machine learning approaches for drug design: challenges and opportunities for the pharmaceutical industries. Mol Divers 2021; 26:1893-1913. [PMID: 34686947 PMCID: PMC8536481 DOI: 10.1007/s11030-021-10326-z] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 09/24/2021] [Indexed: 12/27/2022]
Abstract
The global spread of COVID-19 has raised the importance of pharmaceutical drug development as intractable and hot research. Developing new drug molecules to overcome any disease is a costly and lengthy process, but the process continues uninterrupted. The critical point to consider the drug design is to use the available data resources and to find new and novel leads. Once the drug target is identified, several interdisciplinary areas work together with artificial intelligence (AI) and machine learning (ML) methods to get enriched drugs. These AI and ML methods are applied in every step of the computer-aided drug design, and integrating these AI and ML methods results in a high success rate of hit compounds. In addition, this AI and ML integration with high-dimension data and its powerful capacity have taken a step forward. Clinical trials output prediction through the AI/ML integrated models could further decrease the clinical trials cost by also improving the success rate. Through this review, we discuss the backend of AI and ML methods in supporting the computer-aided drug design, along with its challenge and opportunity for the pharmaceutical industry. From the available information or data, the AI and ML based prediction for the high throughput virtual screening. After this integration of AI and ML, the success rate of hit identification has gained a momentum with huge success by providing novel drugs.
Collapse
Affiliation(s)
- Chandrabose Selvaraj
- CADD and Molecular Modelling Lab, Department of Bioinformatics, Alagappa University, Science Block, Karaikudi, Tamil Nadu, 630004, India.
| | - Ishwar Chandra
- CADD and Molecular Modelling Lab, Department of Bioinformatics, Alagappa University, Science Block, Karaikudi, Tamil Nadu, 630004, India
| | - Sanjeev Kumar Singh
- CADD and Molecular Modelling Lab, Department of Bioinformatics, Alagappa University, Science Block, Karaikudi, Tamil Nadu, 630004, India.
| |
Collapse
|
25
|
Cabrera-Garcia D, Warm D, de la Fuente P, Fernández-Sánchez MT, Novelli A, Villanueva-Balsera JM. Early prediction of developing spontaneous activity in cultured neuronal networks. Sci Rep 2021; 11:20407. [PMID: 34650146 PMCID: PMC8516856 DOI: 10.1038/s41598-021-99538-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 09/27/2021] [Indexed: 11/18/2022] Open
Abstract
Synchronization and bursting activity are intrinsic electrophysiological properties of in vivo and in vitro neural networks. During early development, cortical cultures exhibit a wide repertoire of synchronous bursting dynamics whose characterization may help to understand the parameters governing the transition from immature to mature networks. Here we used machine learning techniques to characterize and predict the developing spontaneous activity in mouse cortical neurons on microelectrode arrays (MEAs) during the first three weeks in vitro. Network activity at three stages of early development was defined by 18 electrophysiological features of spikes, bursts, synchrony, and connectivity. The variability of neuronal network activity during early development was investigated by applying k-means and self-organizing map (SOM) clustering analysis to features of bursts and synchrony. These electrophysiological features were predicted at the third week in vitro with high accuracy from those at earlier times using three machine learning models: Multivariate Adaptive Regression Splines, Support Vector Machines, and Random Forest. Our results indicate that initial patterns of electrical activity during the first week in vitro may already predetermine the final development of the neuronal network activity. The methodological approach used here may be applied to explore the biological mechanisms underlying the complex dynamics of spontaneous activity in developing neuronal cultures.
Collapse
Affiliation(s)
- David Cabrera-Garcia
- Department of Biochemistry and Molecular Biology and University Institute of Biotechnology of Asturias (IUBA), Campus "El Cristo", University of Oviedo, 33006, Oviedo, Spain.
- Department of Synapse and Network Development, Netherlands Institute for Neuroscience, 1105 BA, Amsterdam, The Netherlands.
| | - Davide Warm
- Department of Biochemistry and Molecular Biology and University Institute of Biotechnology of Asturias (IUBA), Campus "El Cristo", University of Oviedo, 33006, Oviedo, Spain
- Institute of Physiology, University Medical Center of the Johannes Gutenberg University Mainz, Duesbergweg 6, 55128, Mainz, Germany
| | - Pablo de la Fuente
- Department of Biochemistry and Molecular Biology and University Institute of Biotechnology of Asturias (IUBA), Campus "El Cristo", University of Oviedo, 33006, Oviedo, Spain
| | - M Teresa Fernández-Sánchez
- Department of Biochemistry and Molecular Biology and University Institute of Biotechnology of Asturias (IUBA), Campus "El Cristo", University of Oviedo, 33006, Oviedo, Spain
| | - Antonello Novelli
- Department of Biochemistry and Molecular Biology and University Institute of Biotechnology of Asturias (IUBA), Campus "El Cristo", University of Oviedo, 33006, Oviedo, Spain.
- Department of Psychology and University Institute of Biotechnology of Asturias (IUBA), Campus "El Cristo", University of Oviedo, Institute for Sanitary Research of the Princedom of Asturias (ISPA), 33006, Oviedo, Spain.
| | | |
Collapse
|
26
|
Sharifi-Noghabi H, Jahangiri-Tazehkand S, Smirnov P, Hon C, Mammoliti A, Nair SK, Mer AS, Ester M, Haibe-Kains B. Drug sensitivity prediction from cell line-based pharmacogenomics data: guidelines for developing machine learning models. Brief Bioinform 2021; 22:6348324. [PMID: 34382071 DOI: 10.1093/bib/bbab294] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 06/29/2021] [Accepted: 07/10/2021] [Indexed: 11/13/2022] Open
Abstract
The goal of precision oncology is to tailor treatment for patients individually using the genomic profile of their tumors. Pharmacogenomics datasets such as cancer cell lines are among the most valuable resources for drug sensitivity prediction, a crucial task of precision oncology. Machine learning methods have been employed to predict drug sensitivity based on the multiple omics data available for large panels of cancer cell lines. However, there are no comprehensive guidelines on how to properly train and validate such machine learning models for drug sensitivity prediction. In this paper, we introduce a set of guidelines for different aspects of training gene expression-based predictors using cell line datasets. These guidelines provide extensive analysis of the generalization of drug sensitivity predictors and challenge many current practices in the community including the choice of training dataset and measure of drug sensitivity. The application of these guidelines in future studies will enable the development of more robust preclinical biomarkers.
Collapse
Affiliation(s)
- Hossein Sharifi-Noghabi
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada.,Vancouver Prostate Center, Vancouver, British Columbia, Canada.,Princess Margaret Cancer Centre, Toronto, Ontario, Canada
| | - Soheil Jahangiri-Tazehkand
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,University of Toronto, Toronto, Ontario, Canada
| | - Petr Smirnov
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,University of Toronto, Toronto, Ontario, Canada
| | - Casey Hon
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,University of Toronto, Toronto, Ontario, Canada
| | - Anthony Mammoliti
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,University of Toronto, Toronto, Ontario, Canada
| | | | - Arvind Singh Mer
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,University of Toronto, Toronto, Ontario, Canada
| | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada.,Vancouver Prostate Center, Vancouver, British Columbia, Canada
| | - Benjamin Haibe-Kains
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,Ontario Institute for Cancer Research, Toronto, Ontario, Canada.,University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
27
|
Koras K, Kizling E, Juraeva D, Staub E, Szczurek E. Interpretable deep recommender system model for prediction of kinase inhibitor efficacy across cancer cell lines. Sci Rep 2021; 11:15993. [PMID: 34362938 PMCID: PMC8346627 DOI: 10.1038/s41598-021-94564-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 07/06/2021] [Indexed: 01/02/2023] Open
Abstract
Computational models for drug sensitivity prediction have the potential to significantly improve personalized cancer medicine. Drug sensitivity assays, combined with profiling of cancer cell lines and drugs become increasingly available for training such models. Multiple methods were proposed for predicting drug sensitivity from cancer cell line features, some in a multi-task fashion. So far, no such model leveraged drug inhibition profiles. Importantly, multi-task models require a tailored approach to model interpretability. In this work, we develop DEERS, a neural network recommender system for kinase inhibitor sensitivity prediction. The model utilizes molecular features of the cancer cell lines and kinase inhibition profiles of the drugs. DEERS incorporates two autoencoders to project cell line and drug features into 10-dimensional hidden representations and a feed-forward neural network to combine them into response prediction. We propose a novel interpretability approach, which in addition to the set of modeled features considers also the genes and processes outside of this set. Our approach outperforms simpler matrix factorization models, achieving R [Formula: see text] 0.82 correlation between true and predicted response for the unseen cell lines. The interpretability analysis identifies 67 biological processes that drive the cell line sensitivity to particular compounds. Detailed case studies are shown for PHA-793887, XMD14-99 and Dabrafenib.
Collapse
Affiliation(s)
- Krzysztof Koras
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Ewa Kizling
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Dilafruz Juraeva
- Oncology Bioinformatics, Translational Medicine, Merck Healthcare KGaA, Darmstadt, Germany
| | - Eike Staub
- Oncology Bioinformatics, Translational Medicine, Merck Healthcare KGaA, Darmstadt, Germany
| | - Ewa Szczurek
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland.
| |
Collapse
|
28
|
Al-Jarf R, de Sá AGC, Pires DEV, Ascher DB. pdCSM-cancer: Using Graph-Based Signatures to Identify Small Molecules with Anticancer Properties. J Chem Inf Model 2021; 61:3314-3322. [PMID: 34213323 PMCID: PMC8317153 DOI: 10.1021/acs.jcim.1c00168] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
![]()
The development of
new, effective, and safe drugs to treat cancer
remains a challenging and time-consuming task due to limited hit rates,
restraining subsequent development efforts. Despite the impressive
progress of quantitative structure–activity relationship and
machine learning-based models that have been developed to predict
molecule pharmacodynamics and bioactivity, they have had mixed success
at identifying compounds with anticancer properties against multiple
cell lines. Here, we have developed a novel predictive tool, pdCSM-cancer,
which uses a graph-based signature representation of the chemical
structure of a small molecule in order to accurately predict molecules
likely to be active against one or multiple cancer cell lines. pdCSM-cancer
represents the most comprehensive anticancer bioactivity prediction
platform developed till date, comprising trained and validated models
on experimental data of the growth inhibition concentration (GI50%)
effects, including over 18,000 compounds, on 9 tumor types and 74
distinct cancer cell lines. Across 10-fold cross-validation, it achieved
Pearson’s correlation coefficients of up to 0.74 and comparable
performance of up to 0.67 across independent, non-redundant blind
tests. Leveraging the insights from these cell line-specific models,
we developed a generic predictive model to identify molecules active
in at least 60 cell lines. Our final model achieved an area under
the receiver operating characteristic curve (AUC) of up to 0.94 on
10-fold cross-validation and up to 0.94 on independent non-redundant
blind tests, outperforming alternative approaches. We believe that
our predictive tool will provide a valuable resource to optimizing
and enriching screening libraries for the identification of effective
and safe anticancer molecules. To provide a simple and integrated
platform to rapidly screen for potential biologically active molecules
with favorable anticancer properties, we made pdCSM-cancer freely
available online at http://biosig.unimelb.edu.au/pdcsm_cancer.
Collapse
Affiliation(s)
- Raghad Al-Jarf
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Alex G C de Sá
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia.,Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, United Kingdom
| |
Collapse
|
29
|
A Methodological Framework to Discover Pharmacogenomic Interactions Based on Random Forests. Genes (Basel) 2021; 12:genes12060933. [PMID: 34207374 PMCID: PMC8235396 DOI: 10.3390/genes12060933] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 06/15/2021] [Accepted: 06/16/2021] [Indexed: 01/01/2023] Open
Abstract
The identification of genomic alterations in tumor tissues, including somatic mutations, deletions, and gene amplifications, produces large amounts of data, which can be correlated with a diversity of therapeutic responses. We aimed to provide a methodological framework to discover pharmacogenomic interactions based on Random Forests. We matched two databases from the Cancer Cell Line Encyclopaedia (CCLE) project, and the Genomics of Drug Sensitivity in Cancer (GDSC) project. For a total of 648 shared cell lines, we considered 48,270 gene alterations from CCLE as input features and the area under the dose-response curve (AUC) for 265 drugs from GDSC as the outcomes. A three-step reduction to 501 alterations was performed, selecting known driver genes and excluding very frequent/infrequent alterations and redundant ones. For each model, we used the concordance correlation coefficient (CCC) for assessing the predictive performance, and permutation importance for assessing the contribution of each alteration. In a reasonable computational time (56 min), we identified 12 compounds whose response was at least fairly sensitive (CCC > 20) to the alteration profiles. Some diversities were found in the sets of influential alterations, providing clues to discover significant drug-gene interactions. The proposed methodological framework can be helpful for mining pharmacogenomic interactions.
Collapse
|
30
|
Tan X, Yu Y, Duan K, Zhang J, Sun P, Sun H. Current Advances and Limitations of Deep Learning in Anticancer Drug Sensitivity Prediction. Curr Top Med Chem 2021; 20:1858-1867. [PMID: 32648840 DOI: 10.2174/1568026620666200710101307] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 04/02/2020] [Accepted: 04/14/2020] [Indexed: 02/06/2023]
Abstract
Anticancer drug screening can accelerate drug discovery to save the lives of cancer patients, but cancer heterogeneity makes this screening challenging. The prediction of anticancer drug sensitivity is useful for anticancer drug development and the identification of biomarkers of drug sensitivity. Deep learning, as a branch of machine learning, is an important aspect of in silico research. Its outstanding computational performance means that it has been used for many biomedical purposes, such as medical image interpretation, biological sequence analysis, and drug discovery. Several studies have predicted anticancer drug sensitivity based on deep learning algorithms. The field of deep learning has made progress regarding model performance and multi-omics data integration. However, deep learning is limited by the number of studies performed and data sources available, so it is not perfect as a pre-clinical approach for use in the anticancer drug screening process. Improving the performance of deep learning models is a pressing issue for researchers. In this review, we introduce the research of anticancer drug sensitivity prediction and the use of deep learning in this research area. To provide a reference for future research, we also review some common data sources and machine learning methods. Lastly, we discuss the advantages and disadvantages of deep learning, as well as the limitations and future perspectives regarding this approach.
Collapse
Affiliation(s)
- Xian Tan
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Yang Yu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Kaiwen Duan
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Jingbo Zhang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Pingping Sun
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Hui Sun
- College of Humanities and Sciences of Northeast Normal University, Changchun 130117, China
| |
Collapse
|
31
|
Qiu K, Lee J, Kim H, Yoon S, Kang K. Machine learning based anti-cancer drug response prediction and search for predictor genes using cancer cell line gene expression. Genomics Inform 2021; 19:e10. [PMID: 33840174 PMCID: PMC8042299 DOI: 10.5808/gi.20076] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 02/11/2021] [Indexed: 01/06/2023] Open
Abstract
Although many models have been proposed to accurately predict the response of drugs in cell lines recent years, understanding the genome related to drug response is also the key for completing oncology precision medicine. In this paper, based on the cancer cell line gene expression and the drug response data, we established a reliable and accurate drug response prediction model and found predictor genes for some drugs of interest. To this end, we first performed pre-selection of genes based on the Pearson correlation coefficient and then used ElasticNet regression model for drug response prediction and fine gene selection. To find more reliable set of predictor genes, we performed regression twice for each drug, one with IC50 and the other with area under the curve (AUC) (or activity area). For the 12 drugs we tested, the predictive performance in terms of Pearson correlation coefficient exceeded 0.6 and the highest one was 17-AAG for which Pearson correlation coefficient was 0.811 for IC50 and 0.81 for AUC. We identify common predictor genes for IC50 and AUC, with which the performance was similar to those with genes separately found for IC50 and AUC, but with much smaller number of predictor genes. By using only common predictor genes, the highest performance was AZD6244 (0.8016 for IC50, 0.7945 for AUC) with 321 predictor genes.
Collapse
Affiliation(s)
- Kexin Qiu
- Department of Computer Science, Dankook University, Yongin 16890, Korea
| | - JoongHo Lee
- Department of Computer Science, Dankook University, Yongin 16890, Korea
| | - HanByeol Kim
- Department of Computer Science, Dankook University, Yongin 16890, Korea
| | - Seokhyun Yoon
- Department of Computer Science, Dankook University, Yongin 16890, Korea.,Department of Electronics and Electrical Engineering, Dankook University, Yongin 16890, Korea
| | - Keunsoo Kang
- Department of Microbiology, Dankook University, Cheonan 31116, Korea
| |
Collapse
|
32
|
Li Y, Zhang L, Zhang Y, Wen H, Huang J, Shen Y, Li H. A Random Forest Model for Predicting Social Functional Improvement in Chinese Patients with Schizophrenia After 3 Months of Atypical Antipsychotic Monopharmacy: A Cohort Study. Neuropsychiatr Dis Treat 2021; 17:847-857. [PMID: 33776440 PMCID: PMC7989048 DOI: 10.2147/ndt.s280757] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 03/02/2021] [Indexed: 12/27/2022] Open
Abstract
PURPOSE Impaired social functions contribute to the burden of schizophrenia patients and their families, but predictive tools of social functioning prognosis and specific factors are undefined in Chinese clinical practice. This article explores a machine learning tool to identify whether patients will achieve significant social functional improvement after 3 months of atypical antipsychotic monopharmacy and finds the defined risk factors using a multicenter clinical study. PATIENTS AND METHODS A multicenter study on atypical antipsychotic (AAP) treatment in Chinese patients with schizophrenia (SALT-C) was conducted from July 2011 to August 2018. Data from 550 patients with AAP monopharmacy from their baseline to their 3-month follow-up were used to establish machine learning tools after screening. The positive outcome was an increase in the Personal and Social Performance (PSP) scale score by ≥10 points. The predictors were a range of investigator-rated assessments on symptoms, functioning, the safety of AAPs and illness history. The Least Absolute Shrinkage and Selection Operator (LASSO) was used for the feature screening and ranking of the predicted variables. The random forest algorithm and five-fold cross-validation for optimizing the model were selected to ensure the generalizability and precision. RESULTS There were 137 patients (mean [SD] age, 41.1 [16.8] years; 77 [58.8%] female) who had a good social functional prognosis. A lower PSP score, taking a mood stabilizer, a high total Positive and Negative Symptom Scale (PANSS) and PANSS general subscale score, unemployment, a hepatic injury with medication, comorbid cardiovascular disease and being male predicted poor PSP outcomes. The generalizability of the PSP predictive tool was estimated with the precision-recall curve (accuracy of 79.5%, negative predictive value of 92.6% and positive predictive value of 57.1%) and receiver operating characteristic curve (ROC) (specificity of 81.8% and sensitivity of 78.7%). CONCLUSION The machine learning tool established using our current real-world data could assist in predicting PSP outcome by several clinical factors.
Collapse
Affiliation(s)
- Yange Li
- Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China
| | - Lei Zhang
- Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China
| | - Yan Zhang
- Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China
| | - Hui Wen
- Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China
| | - Jingjing Huang
- Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China.,Shanghai Key Laboratory of Psychotic Disorders, Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China
| | - Yifeng Shen
- Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China.,Shanghai Key Laboratory of Psychotic Disorders, Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China.,Shanghai Clinical Research Center for Mental Health, Shanghai, People's Republic of China
| | - Huafang Li
- Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China.,Shanghai Key Laboratory of Psychotic Disorders, Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China.,Shanghai Clinical Research Center for Mental Health, Shanghai, People's Republic of China
| |
Collapse
|
33
|
Ahmed KT, Park S, Jiang Q, Yeu Y, Hwang T, Zhang W. Network-based drug sensitivity prediction. BMC Med Genomics 2020; 13:193. [PMID: 33371891 PMCID: PMC7771088 DOI: 10.1186/s12920-020-00829-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 11/17/2020] [Indexed: 12/15/2022] Open
Abstract
Background Drug sensitivity prediction and drug responsive biomarker selection on high-throughput genomic data is a critical step in drug discovery. Many computational methods have been developed to serve this purpose including several deep neural network models. However, the modular relations among genomic features have been largely ignored in these methods. To overcome this limitation, the role of the gene co-expression network on drug sensitivity prediction is investigated in this study. Methods In this paper, we first introduce a network-based method to identify representative features for drug response prediction by using the gene co-expression network. Then, two graph-based neural network models are proposed and both models integrate gene network information directly into neural network for outcome prediction. Next, we present a large-scale comparative study among the proposed network-based methods, canonical prediction algorithms (i.e., Elastic Net, Random Forest, Partial Least Squares Regression, and Support Vector Regression), and deep neural network models for drug sensitivity prediction. All the source code and processed datasets in this study are available at https://github.com/compbiolabucf/drug-sensitivity-prediction. Results In the comparison of different feature selection methods and prediction methods on a non-small cell lung cancer (NSCLC) cell line RNA-seq gene expression dataset with 50 different drug treatments, we found that (1) the network-based feature selection method improves the prediction performance compared to Pearson correlation coefficients; (2) Random Forest outperforms all the other canonical prediction algorithms and deep neural network models; (3) the proposed graph-based neural network models show better prediction performance compared to deep neural network model; (4) the prediction performance is drug dependent and it may relate to the drug’s mechanism of action. Conclusions Network-based feature selection method and prediction models improve the performance of the drug response prediction. The relations between the genomic features are more robust and stable compared to the correlation between each individual genomic feature and the drug response in high dimension and low sample size genomic datasets.
Collapse
Affiliation(s)
- Khandakar Tanvir Ahmed
- Department of Computer Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL, 32816, USA
| | - Sunho Park
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, 9211 Euclid Ave, Cleveland, OH, 44106, USA
| | - Qibing Jiang
- Department of Computer Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL, 32816, USA
| | - Yunku Yeu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, 9211 Euclid Ave, Cleveland, OH, 44106, USA
| | - TaeHyun Hwang
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, 9211 Euclid Ave, Cleveland, OH, 44106, USA
| | - Wei Zhang
- Department of Computer Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL, 32816, USA.
| |
Collapse
|
34
|
Itzhacky N, Sharan R. Prediction of cancer dependencies from expression data using deep learning. Mol Omics 2020; 17:66-71. [PMID: 33135031 DOI: 10.1039/d0mo00042f] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Detecting cancer dependencies is key to disease treatment. Recent efforts have mapped gene dependencies and drug sensitivities in hundreds of cancer cell lines. These data allow us to learn for the first time models of tumor vulnerabilities and apply them to suggest novel drug targets. Here we devise novel deep learning methods for predicting gene dependencies and drug sensitivities from gene expression measurements. By combining dimensionality reduction strategies, we are able to learn accurate models that outperform simpler neural networks or linear models.
Collapse
Affiliation(s)
- Nitay Itzhacky
- School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel.
| | | |
Collapse
|
35
|
Kleandrova VV, Scotti MT, Scotti L, Nayarisseri A, Speck-Planche A. Cell-based multi-target QSAR model for design of virtual versatile inhibitors of liver cancer cell lines. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2020; 31:815-836. [PMID: 32967475 DOI: 10.1080/1062936x.2020.1818617] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 08/31/2020] [Indexed: 06/11/2023]
Abstract
Liver cancers are one of the leading fatal diseases among malignant neoplasms. Current chemotherapeutic treatments used to fight these illnesses have become less efficient in terms of both efficacy and safety. Therefore, there is a great need of search for new anti-liver cancer agents and this can be accelerated by using computer-aided drug discovery approaches. In this work, we report the development of the first cell-based multi-target model based on quantitative structure-activity relationships (CBMT-QSAR) for the design and prediction of chemicals as anticancer agents against 17 liver cancer cell lines. While having a good quality and predictive power (accuracy higher than 80%) in the training and test sets, respectively, the CBMT-QSAR model was employed as a tool to directly extract suitable fragments from the physicochemical and structural interpretations of the molecular descriptors. Some of these desirable fragments were assembled, leading to the virtual design of eight molecules with drug-like properties, with six of them being predicted as versatile anticancer agents against the 17 liver cancer cell lines reported here.
Collapse
Affiliation(s)
- V V Kleandrova
- Laboratory of Fundamental and Applied Research of Quality and Technology of Food Production, Moscow State University of Food Production , Moscow, Russian Federation
| | - M T Scotti
- Postgraduate Program in Natural and Synthetic Bioactive Products, Federal University of Paraíba , João Pessoa, Brazil
| | - L Scotti
- Postgraduate Program in Natural and Synthetic Bioactive Products, Federal University of Paraíba , João Pessoa, Brazil
| | - A Nayarisseri
- In Silico Research Laboratory, Eminent Biosciences , Indore, Madhya Pradesh, India
| | - A Speck-Planche
- Postgraduate Program in Natural and Synthetic Bioactive Products, Federal University of Paraíba , João Pessoa, Brazil
| |
Collapse
|
36
|
Li Z, Lam YW, Liu Q, Lau AYK, Yu Au-Yeung H, Chan RHM. Machine Learning-Driven Drug Discovery: Prediction of Structure-Cytotoxicity Correlation Leads to Identification of Potential Anti-Leukemia Compounds. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2020:5464-5467. [PMID: 33019216 DOI: 10.1109/embc44109.2020.9175850] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In vitro cytotoxicity screening is a crucial step of anticancer drug discovery. The application of deep learning methodology is gaining increasing attentions in processing drug screening data and studying anticancer mechanisms of chemical compounds. In this work, we explored the utilization of convolutional neural network in modeling the anticancer efficacy of small molecules. In particular, we presented a VGG19 model trained on 2D structural formulae to predict the growth-inhibitory effects of compounds against leukemia cell line CCRF-CEM, without any use of chemical descriptors. The model achieved a normalized RMSE of 15.76% on predicting growth inhibition and a Pearson Correlation Coefficient of 0.72 between predicted and experimental data, demonstrating a strong predictive power in this task. Furthermore, we implemented the Layer-wise Relevance Propagation technique to interpret the network and visualize the chemical groups predicted by the model that contribute to toxicity with human-readable representations.Clinical relevance-This work predicts the cytotoxicity of chemical compounds against human leukemic lymphoblast CCRF-CEM cell lines on a continuous scale, which only requires 2D images of the structural formulae of the compounds as inputs. Knowledge in the structure-toxicity relationship of small molecules will potentially increase the hit rate of primary drug screening assays.
Collapse
|
37
|
An B, Zhang Q, Fang Y, Chen M, Qin Y. Iterative sure independent ranking and screening for drug response prediction. BMC Med Inform Decis Mak 2020; 20:224. [PMID: 32962705 PMCID: PMC7507262 DOI: 10.1186/s12911-020-01240-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 09/02/2020] [Indexed: 11/19/2022] Open
Abstract
Background Prediction of drug response based on multi-omics data is a crucial task in the research of personalized cancer therapy. Results We proposed an iterative sure independent ranking and screening (ISIRS) scheme to select drug response-associated features and applied it to the Cancer Cell Line Encyclopedia (CCLE) dataset. For each drug in CCLE, we incorporated multi-omics data including copy number alterations, mutation and gene expression and selected up to 50 features using ISIRS. Then a linear regression model based on the selected features was exploited to predict the drug response. Cross validation test shows that our prediction accuracies are higher than existing methods for most drugs. Conclusions Our study indicates that the features selected by the marginal utility measure, which measures the conditional probability of drug responses given the feature, are helpful for drug response prediction.
Collapse
Affiliation(s)
- Biao An
- Department of Mathematics, Shanghai Normal University, Shanghai, China
| | - Qianwen Zhang
- College of Information Technology, Shanghai Ocean University, Shanghai, China.,Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China
| | - Yun Fang
- Department of Mathematics, Shanghai Normal University, Shanghai, China
| | - Ming Chen
- College of Information Technology, Shanghai Ocean University, Shanghai, China. .,Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China.
| | - Yufang Qin
- College of Information Technology, Shanghai Ocean University, Shanghai, China. .,Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China.
| |
Collapse
|
38
|
Nakano T, Takeda S, Brown JB. Active learning effectively identifies a minimal set of maximally informative and asymptotically performant cytotoxic structure-activity patterns in NCI-60 cell lines. RSC Med Chem 2020; 11:1075-1087. [PMID: 33479700 PMCID: PMC7513593 DOI: 10.1039/d0md00110d] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 06/30/2020] [Indexed: 11/21/2022] Open
Abstract
The NCI-60 cancer cell line screening panel has provided insights for development of subtype-specific chemical therapies and repurposing. By extracting chemical structure and cytotoxicity patterns, virtual screening potentially complements the availability of high-throughput assay platforms and improves bioactive compound discovery rates by computational prefiltering of candidate compound libraries. Many groups report high prediction performances in computational models of NCI-60 data when using cross-validation or similar techniques, yet prospective therapy development in novel cancers may have little to no such data and further may not have the resources to perform hit identification using large compound libraries. In contrast to bulk screening and analysis, the active learning methodology has demonstrated how to identify compounds for screening in small batches and update computational models iteratively, leading to predictive models with a minimum number of compounds, and importantly clarifying data volumes at which limits in predictive ability are achieved. Here, in replicate per-cell line experiments using 50% of data (∼20 000 compounds) as the external prediction target, predictive limits are reproducibly demonstrated at the stage of systematic selection of 10-30% of the incorporable half. The pattern was consistent across all 60 cell lines. Limits of predictability are found to be correlated to the doubling times of cell lines and the number of cellular response discontinuities (activity cliffs) present per cell line. Organization into chemical scaffolds delineated degrees of predictive challenge. These results provide key insights for strategies in developing new inhibitors in existing cell lines or for future automated therapy selection in personalized oncotherapy.
Collapse
Affiliation(s)
- Takumi Nakano
- Kyoto University Graduate School of Medicine , Department of Molecular Biosciences , Life Science Informatics Research Unit , Konoemachi Yoshida Sakyo , Kyoto 606-8501 , Japan .
| | - Shunichi Takeda
- Kyoto University Graduate School of Medicine , Department of Radiation Genetics , Konoemachi Yoshida Sakyo , Kyoto 606-8501 , Japan
| | - J B Brown
- Kyoto University Graduate School of Medicine , Department of Molecular Biosciences , Life Science Informatics Research Unit , Konoemachi Yoshida Sakyo , Kyoto 606-8501 , Japan .
| |
Collapse
|
39
|
Yuan R, Chen S, Wang Y. Computational Prediction of Drug Responses in Cancer Cell Lines From Cancer Omics and Detection of Drug Effectiveness Related Methylation Sites. Front Genet 2020; 11:917. [PMID: 32849855 PMCID: PMC7426400 DOI: 10.3389/fgene.2020.00917] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Accepted: 07/23/2020] [Indexed: 12/13/2022] Open
Abstract
Accurately predicting the response of a cancer patient to a therapeutic agent remains an important challenge in precision medicine. With the rise of data science, researchers have applied computational models to study the drug inhibition effects on cancers based on cancer genomics and transcriptomics. Moreover, a common epigenetic modification, DNA methylation, has been related to the occurrence and development of cancer, as well as drug effectiveness. Therefore, it is helpful for improvement of drug response prediction through exploring the relationship between DNA methylation and drug effectiveness. Here, we proposed a computational model to predict drug responses in cancers through integration of cancer genomics, transcriptomics, epigenomics, and compound chemical properties. Meanwhile, we applied a regularized regression model (Least Absolute Shrinkage and Selection Operator, lasso) to detect the methylation sites that were closely related to drug effectiveness. The prediction models were trained on a well-known pharmacogenomics data resource, Genomics of Drug Sensitivity in Cancer (GDSC). The cross-validation indicates that the performance of the prediction model using DNA methylation is comparable to that of using other cancer omics, including oncogene mutation and gene expression data. It indicates the important role of DNA methylation in prediction of drug responses. Encyclopedia of DNA Elements (ENCODE) and Transcriptional Regulatory Relationships Unraveled by Sentence-based Text mining (TRRUST2) database analyses suggest that the methylation sites associated with drug effectiveness are mainly located in the transcription factor (TF) binding region. Therefore, we hypothesized that the sensitivity of cancer cells to drugs could be regulated by changing the methylation modification of TF binding region. In conclusion, we confirmed the important role of DNA methylation in prediction of drug responses, and provided some methylation sites that closely related to the drug effectiveness, which may be a great regulatory target for improvement of drug treatment effects on cancer patients.
Collapse
Affiliation(s)
- Rui Yuan
- Key Laboratory of Plateau Biological Adaptation and Evolution, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Shilong Chen
- Key Laboratory of Plateau Biological Adaptation and Evolution, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China.,Institute of Sanjiangyuan National Park, Chinese Academy of Sciences, Xining, China
| | - Yongcui Wang
- Key Laboratory of Plateau Biological Adaptation and Evolution, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China.,Qinghai Provincial Key Laboratory of Crop Molecular Breeding, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China
| |
Collapse
|
40
|
Naulaerts S, Menden MP, Ballester PJ. Concise Polygenic Models for Cancer-Specific Identification of Drug-Sensitive Tumors from Their Multi-Omics Profiles. Biomolecules 2020; 10:E963. [PMID: 32604779 PMCID: PMC7356608 DOI: 10.3390/biom10060963] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 06/20/2020] [Accepted: 06/22/2020] [Indexed: 12/15/2022] Open
Abstract
In silico models to predict which tumors will respond to a given drug are necessary for Precision Oncology. However, predictive models are only available for a handful of cases (each case being a given drug acting on tumors of a specific cancer type). A way to generate predictive models for the remaining cases is with suitable machine learning algorithms that are yet to be applied to existing in vitro pharmacogenomics datasets. Here, we apply XGBoost integrated with a stringent feature selection approach, which is an algorithm that is advantageous for these high-dimensional problems. Thus, we identified and validated 118 predictive models for 62 drugs across five cancer types by exploiting four molecular profiles (sequence mutations, copy-number alterations, gene expression, and DNA methylation). Predictive models were found in each cancer type and with every molecular profile. On average, no omics profile or cancer type obtained models with higher predictive accuracy than the rest. However, within a given cancer type, some molecular profiles were overrepresented among predictive models. For instance, CNA profiles were predictive in breast invasive carcinoma (BRCA) cell lines, but not in small cell lung cancer (SCLC) cell lines where gene expression (GEX) and DNA methylation profiles were the most predictive. Lastly, we identified the best XGBoost model per cancer type and analyzed their selected features. For each model, some of the genes in the selected list had already been found to be individually linked to the response to that drug, providing additional evidence of the usefulness of these models and the merits of the feature selection scheme.
Collapse
Affiliation(s)
- Stefan Naulaerts
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France;
- Institut Paoli-Calmettes, F-13009 Marseille, France
- Aix-Marseille Université, F-13284 Marseille, France
- CNRS UMR7258, F-13009 Marseille, France
- Ludwig Institute for Cancer Research, de Duve Institute, Université catholique de Louvain, 1200 Brussels, Belgium
| | - Michael P. Menden
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, 85764 Neuherberg, Germany;
- Department of Biology, Ludwig-Maximilians University Munich, 82152 Planegg-Martinsried, Germany
- German Centre for Diabetes Research (DZD e.V.), 85764 Neuherberg, Germany
| | - Pedro J. Ballester
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France;
- Institut Paoli-Calmettes, F-13009 Marseille, France
- Aix-Marseille Université, F-13284 Marseille, France
- CNRS UMR7258, F-13009 Marseille, France
| |
Collapse
|
41
|
Deng S, Sun Y, Zhao T, Hu Y, Zang T. A Review of Drug Side Effect Identification Methods. Curr Pharm Des 2020; 26:3096-3104. [PMID: 32532187 DOI: 10.2174/1381612826666200612163819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 05/18/2020] [Indexed: 11/22/2022]
Abstract
Drug side effects have become an important indicator for evaluating the safety of drugs. There are two main factors in the frequent occurrence of drug safety problems; on the one hand, the clinical understanding of drug side effects is insufficient, leading to frequent adverse drug reactions, while on the other hand, due to the long-term period and complexity of clinical trials, side effects of approved drugs on the market cannot be reported in a timely manner. Therefore, many researchers have focused on developing methods to identify drug side effects. In this review, we summarize the methods of identifying drug side effects and common databases in this field. We classified methods of identifying side effects into four categories: biological experimental, machine learning, text mining and network methods. We point out the key points of each kind of method. In addition, we also explain the advantages and disadvantages of each method. Finally, we propose future research directions.
Collapse
Affiliation(s)
- Shuai Deng
- College of Science, Beijing Forestry University, Beijing, China
| | - Yige Sun
- Microbiology Department, Harbin Medical University, Harbin, 150081, China
| | - Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Tianyi Zang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
42
|
Koras K, Juraeva D, Kreis J, Mazur J, Staub E, Szczurek E. Feature selection strategies for drug sensitivity prediction. Sci Rep 2020; 10:9377. [PMID: 32523056 PMCID: PMC7287073 DOI: 10.1038/s41598-020-65927-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 05/06/2020] [Indexed: 12/16/2022] Open
Abstract
Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. Critically, the sensitivity of cancer cells to treatment depends on an unknown subset of a large number of biological features. Here, we compare standard, data-driven feature selection approaches to feature selection driven by prior knowledge of drug targets, target pathways, and gene expression signatures. We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, evaluating 2484 unique models. For 23 drugs, better predictive performance is achieved when the features are selected according to prior knowledge of drug targets and pathways. The best correlation of observed and predicted response using the test set is achieved for Linifanib (r = 0.75). Extending the drug-dependent features with gene expression signatures yields the most predictive models for 60 drugs, with the best performing example of Dabrafenib. For many compounds, even a very small subset of drug-related features is highly predictive of drug sensitivity. Small feature sets selected using prior knowledge are more predictive for drugs targeting specific genes and pathways, while models with wider feature sets perform better for drugs affecting general cellular mechanisms. Appropriate feature selection strategies facilitate the development of interpretable models that are indicative for therapy design.
Collapse
Affiliation(s)
- Krzysztof Koras
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Dilafruz Juraeva
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Julian Kreis
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Johanna Mazur
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Eike Staub
- Merck Healthcare KGaA, Translational Medicine, Department of Bioinformatics, Darmstadt, Germany
| | - Ewa Szczurek
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland.
| |
Collapse
|
43
|
The Road Not Taken with Pyrrole-Imidazole Polyamides: Off-Target Effects and Genomic Binding. Biomolecules 2020; 10:biom10040544. [PMID: 32260120 PMCID: PMC7226143 DOI: 10.3390/biom10040544] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 03/16/2020] [Accepted: 03/19/2020] [Indexed: 12/20/2022] Open
Abstract
The high sequence specificity of minor groove-binding N-methylpyrrole-N-methylimidazole polyamides have made significant advances in cancer and disease biology, yet there have been few comprehensive reports on their off-target effects, most likely as a consequence of the lack of available tools in evaluating genomic binding, an essential aspect that has gone seriously underexplored. Compared to other N-heterocycles, the off-target effects of these polyamides and their specificity for the DNA minor groove and primary base pair recognition require the development of new analytical methods, which are missing in the field today. This review aims to highlight the current progress in deciphering the off-target effects of these N-heterocyclic molecules and suggests new ways that next-generating sequencing can be used in addressing off-target effects.
Collapse
|
44
|
Network modeling of patients' biomolecular profiles for clinical phenotype/outcome prediction. Sci Rep 2020; 10:3612. [PMID: 32107391 PMCID: PMC7046773 DOI: 10.1038/s41598-020-60235-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 11/05/2019] [Indexed: 12/15/2022] Open
Abstract
Methods for phenotype and outcome prediction are largely based on inductive supervised models that use selected biomarkers to make predictions, without explicitly considering the functional relationships between individuals. We introduce a novel network-based approach named Patient-Net (P-Net) in which biomolecular profiles of patients are modeled in a graph-structured space that represents gene expression relationships between patients. Then a kernel-based semi-supervised transductive algorithm is applied to the graph to explore the overall topology of the graph and to predict the phenotype/clinical outcome of patients. Experimental tests involving several publicly available datasets of patients afflicted with pancreatic, breast, colon and colorectal cancer show that our proposed method is competitive with state-of-the-art supervised and semi-supervised predictive systems. Importantly, P-Net also provides interpretable models that can be easily visualized to gain clues about the relationships between patients, and to formulate hypotheses about their stratification.
Collapse
|
45
|
Chen J, Zhang L. A survey and systematic assessment of computational methods for drug response prediction. Brief Bioinform 2020; 22:232-246. [PMID: 31927568 DOI: 10.1093/bib/bbz164] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Drug response prediction arises from both basic and clinical research of personalized therapy, as well as drug discovery for cancers. With gene expression profiles and other omics data being available for over 1000 cancer cell lines and tissues, different machine learning approaches have been applied to drug response prediction. These methods appear in a body of literature and have been evaluated on different datasets with only one or two accuracy metrics. We systematically assess 17 representative methods for drug response prediction, which have been developed in the past 5 years, on four large public datasets in nine metrics. This study provides insights and lessons for future research into drug response prediction.
Collapse
|
46
|
Güvenç Paltun B, Mamitsuka H, Kaski S. Improving drug response prediction by integrating multiple data sources: matrix factorization, kernel and network-based approaches. Brief Bioinform 2019; 22:346-359. [PMID: 31838491 PMCID: PMC7820853 DOI: 10.1093/bib/bbz153] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 11/01/2019] [Accepted: 11/04/2019] [Indexed: 12/17/2022] Open
Abstract
Predicting the response of cancer cell lines to specific drugs is one of the central problems in personalized medicine, where the cell lines show diverse characteristics. Researchers have developed a variety of computational methods to discover associations between drugs and cell lines, and improved drug sensitivity analyses by integrating heterogeneous biological data. However, choosing informative data sources and methods that can incorporate multiple sources efficiently is the challenging part of successful analysis in personalized medicine. The reason is that finding decisive factors of cancer and developing methods that can overcome the problems of integrating data, such as differences in data structures and data complexities, are difficult. In this review, we summarize recent advances in data integration-based machine learning for drug response prediction, by categorizing methods as matrix factorization-based, kernel-based and network-based methods. We also present a short description of relevant databases used as a benchmark in drug response prediction analyses, followed by providing a brief discussion of challenges faced in integrating and interpreting data from multiple sources. Finally, we address the advantages of combining multiple heterogeneous data sources on drug sensitivity analysis by showing an experimental comparison. Contact: betul.guvenc@aalto.fi
Collapse
Affiliation(s)
- Betül Güvenç Paltun
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Samuel Kaski
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| |
Collapse
|
47
|
Koromina M, Pandi MT, Patrinos GP. Rethinking Drug Repositioning and Development with Artificial Intelligence, Machine Learning, and Omics. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2019; 23:539-548. [PMID: 31651216 DOI: 10.1089/omi.2019.0151] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Pharmaceutical industry and the art and science of drug development are sorely in need of novel transformative technologies in the current age of digital health and artificial intelligence (AI). Often described as game-changing technologies, AI and machine learning algorithms have slowly but surely begun to revolutionize pharmaceutical industry and drug development over the past 5 years. In this expert review, we describe the most frequently used machine learning algorithms in drug development pipelines and the -omics databases well poised to support machine learning and drug discovery. Subsequently, we analyze the emerging new computational approaches to drug discovery and the in silico pipelines for drug repositioning and the synergies among -omics system sciences, AI and machine learning. As with system sciences, AI and machine learning embody a system scale and Big Data driven vision for drug discovery and development. We conclude with a future outlook on the ways in which machine learning approaches can be implemented to buttress and expedite drug discovery and precision medicine. As AI and machine learning are rapidly entering pharmaceutical industry and the art and science of drug development, we need to critically examine the attendant prospects and challenges to benefit patients and public health.
Collapse
Affiliation(s)
- Maria Koromina
- Laboratory of Pharmacogenomics and Individualized Therapy, Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece
| | - Maria-Theodora Pandi
- Laboratory of Pharmacogenomics and Individualized Therapy, Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece
| | - George P Patrinos
- Laboratory of Pharmacogenomics and Individualized Therapy, Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece.,Department of Pathology, College of Medicine and Health Sciences, United Arab Emirates University, Al-Ain, Abu Dhabi.,Zayed Center of Health Sciences, United Arab Emirates University, Al-Ain, Abu Dhabi
| |
Collapse
|
48
|
Basu A, Mitra R, Liu H, Schreiber SL, Clemons PA. RWEN: response-weighted elastic net for prediction of chemosensitivity of cancer cell lines. Bioinformatics 2019; 34:3332-3339. [PMID: 29688307 DOI: 10.1093/bioinformatics/bty199] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 04/10/2018] [Indexed: 11/13/2022] Open
Abstract
Motivation In recent years there have been several efforts to generate sensitivity profiles of collections of genomically characterized cell lines to panels of candidate therapeutic compounds. These data provide the basis for the development of in silico models of sensitivity based on cellular, genetic, or expression biomarkers of cancer cells. However, a remaining challenge is an efficient way to identify accurate sets of biomarkers to validate. To address this challenge, we developed methodology using gene-expression profiles of human cancer cell lines to predict the responses of these cell lines to a panel of compounds. Results We developed an iterative weighting scheme which, when applied to elastic net, a regularized regression method, significantly improves the overall accuracy of predictions, particularly in the highly sensitive response region. In addition to application of these methods to actual chemical sensitivity data, we investigated the effects of sample size, number of features, model sparsity, signal-to-noise ratio, and feature correlation on predictive performance using a simulation framework, particularly for situations where the number of covariates is much larger than sample size. While our method aims to be useful in therapeutic discovery and understanding of the basic mechanisms of action of drugs and their targets, it is generally applicable in any domain where predictions of extreme responses are of highest importance. Availability and implementation The iterative and other weighting algorithms were implemented in R. The code is available at https://github.com/kiwtir/RWEN. The CTRP data are available at ftp://caftpd.nci.nih.gov/pub/OCG-DCC/CTD2/Broad/CTRPv2.1_2016_pub_NatChemBiol_12_109/ and the Sanger data at ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/release-6.0/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Amrita Basu
- Chemical Biology & Therapeutics Science Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Ritwik Mitra
- Operational Research and Financial Engineering, Princeton University, Princeton, NJ, USA
| | - Han Liu
- Operational Research and Financial Engineering, Princeton University, Princeton, NJ, USA
| | - Stuart L Schreiber
- Chemical Biology & Therapeutics Science Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Paul A Clemons
- Chemical Biology & Therapeutics Science Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| |
Collapse
|
49
|
Vucicevic J, Nikolic K, Mitchell JB. Rational Drug Design of Antineoplastic Agents Using 3D-QSAR, Cheminformatic, and Virtual Screening Approaches. Curr Med Chem 2019; 26:3874-3889. [DOI: 10.2174/0929867324666170712115411] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Revised: 06/06/2017] [Accepted: 06/13/2017] [Indexed: 01/07/2023]
Abstract
Background:Computer-Aided Drug Design has strongly accelerated the development of novel antineoplastic agents by helping in the hit identification, optimization, and evaluation.Results:Computational approaches such as cheminformatic search, virtual screening, pharmacophore modeling, molecular docking and dynamics have been developed and applied to explain the activity of bioactive molecules, design novel agents, increase the success rate of drug research, and decrease the total costs of drug discovery. Similarity, searches and virtual screening are used to identify molecules with an increased probability to interact with drug targets of interest, while the other computational approaches are applied for the design and evaluation of molecules with enhanced activity and improved safety profile.Conclusion:In this review are described the main in silico techniques used in rational drug design of antineoplastic agents and presented optimal combinations of computational methods for design of more efficient antineoplastic drugs.
Collapse
Affiliation(s)
- Jelica Vucicevic
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Belgrade, Vojvode Stepe 450, 11000 Belgrade, Serbia
| | - Katarina Nikolic
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Belgrade, Vojvode Stepe 450, 11000 Belgrade, Serbia
| | - John B.O. Mitchell
- EaStCHEM School of Chemistry and Biomedical Sciences Research Complex, University of St Andrews, St Andrews KY16 9ST, United Kingdom
| |
Collapse
|
50
|
Hussain S, Ferzund J, Ul-Haq R. Prediction of Drug Target Sensitivity in Cancer Cell Lines Using Apache Spark. J Comput Biol 2019; 26:882-889. [DOI: 10.1089/cmb.2018.0102] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Affiliation(s)
- Shahid Hussain
- Department of Computer Science, COMSATS Institute of Information Technology, Sahiwal, Pakistan
| | - Javed Ferzund
- Department of Computer Science, COMSATS Institute of Information Technology, Sahiwal, Pakistan
| | - Raza Ul-Haq
- Department of Computer Science, COMSATS Institute of Information Technology, Sahiwal, Pakistan
| |
Collapse
|