1
|
Lenhof K, Eckhart L, Rolli LM, Volkamer A, Lenhof HP. Reliable anti-cancer drug sensitivity prediction and prioritization. Sci Rep 2024; 14:12303. [PMID: 38811639 PMCID: PMC11137046 DOI: 10.1038/s41598-024-62956-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 05/23/2024] [Indexed: 05/31/2024] Open
Abstract
The application of machine learning (ML) to solve real-world problems does not only bear great potential but also high risk. One fundamental challenge in risk mitigation is to ensure the reliability of the ML predictions, i.e., the model error should be minimized, and the prediction uncertainty should be estimated. Especially for medical applications, the importance of reliable predictions can not be understated. Here, we address this challenge for anti-cancer drug sensitivity prediction and prioritization. To this end, we present a novel drug sensitivity prediction and prioritization approach guaranteeing user-specified certainty levels. The developed conformal prediction approach is applicable to classification, regression, and simultaneous regression and classification. Additionally, we propose a novel drug sensitivity measure that is based on clinically relevant drug concentrations and enables a straightforward prioritization of drugs for a given cancer sample.
Collapse
Affiliation(s)
- Kerstin Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, 66123, Saarbrücken, Saarland, Germany.
| | - Lea Eckhart
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, 66123, Saarbrücken, Saarland, Germany
| | - Lisa-Marie Rolli
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, 66123, Saarbrücken, Saarland, Germany
| | - Andrea Volkamer
- Center for Bioinformatics, Chair for Data Driven Drug Design, Saarland Informatics Campus (E2.1) Saarland University, Campus, 66123, Saarbrücken, Saarland, Germany
| | - Hans-Peter Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, 66123, Saarbrücken, Saarland, Germany
| |
Collapse
|
2
|
Eckhart L, Lenhof K, Rolli LM, Lenhof HP. A comprehensive benchmarking of machine learning algorithms and dimensionality reduction methods for drug sensitivity prediction. Brief Bioinform 2024; 25:bbae242. [PMID: 38797968 PMCID: PMC11128483 DOI: 10.1093/bib/bbae242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 04/05/2024] [Accepted: 05/06/2024] [Indexed: 05/29/2024] Open
Abstract
A major challenge of precision oncology is the identification and prioritization of suitable treatment options based on molecular biomarkers of the considered tumor. In pursuit of this goal, large cancer cell line panels have successfully been studied to elucidate the relationship between cellular features and treatment response. Due to the high dimensionality of these datasets, machine learning (ML) is commonly used for their analysis. However, choosing a suitable algorithm and set of input features can be challenging. We performed a comprehensive benchmarking of ML methods and dimension reduction (DR) techniques for predicting drug response metrics. Using the Genomics of Drug Sensitivity in Cancer cell line panel, we trained random forests, neural networks, boosting trees and elastic nets for 179 anti-cancer compounds with feature sets derived from nine DR approaches. We compare the results regarding statistical performance, runtime and interpretability. Additionally, we provide strategies for assessing model performance compared with a simple baseline model and measuring the trade-off between models of different complexity. Lastly, we show that complex ML models benefit from using an optimized DR strategy, and that standard models-even when using considerably fewer features-can still be superior in performance.
Collapse
Affiliation(s)
- Lea Eckhart
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123, Saarland, Germany
| | - Kerstin Lenhof
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123, Saarland, Germany
| | - Lisa-Marie Rolli
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123, Saarland, Germany
| | - Hans-Peter Lenhof
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123, Saarland, Germany
| |
Collapse
|
3
|
Li L, Yang L, Yang L, He C, He Y, Chen L, Dong Q, Zhang H, Chen S, Li P. Network pharmacology: a bright guiding light on the way to explore the personalized precise medication of traditional Chinese medicine. Chin Med 2023; 18:146. [PMID: 37941061 PMCID: PMC10631104 DOI: 10.1186/s13020-023-00853-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 10/22/2023] [Indexed: 11/10/2023] Open
Abstract
Network pharmacology can ascertain the therapeutic mechanism of drugs for treating diseases at the level of biological targets and pathways. The effective mechanism study of traditional Chinese medicine (TCM) characterized by multi-component, multi-targeted, and integrative efficacy, perfectly corresponds to the application of network pharmacology. Currently, network pharmacology has been widely utilized to clarify the mechanism of the physiological activity of TCM. In this review, we comprehensively summarize the application of network pharmacology in TCM to reveal its potential of verifying the phenotype and underlying causes of diseases, realizing the personalized and accurate application of TCM. We searched the literature using "TCM network pharmacology" and "network pharmacology" as keywords from Web of Science, PubMed, Google Scholar, as well as Chinese National Knowledge Infrastructure in the last decade. The origins, development, and application of network pharmacology are closely correlated with the study of TCM which has been applied in China for thousands of years. Network pharmacology and TCM have the same core idea and promote each other. A well-defined research strategy for network pharmacology has been utilized in several aspects of TCM research, including the elucidation of the biological basis of diseases and syndromes, the prediction of TCM targets, the screening of TCM active compounds, and the decipherment of mechanisms of TCM in treating diseases. However, several factors limit its application, such as the selection of databases and algorithms, the unstable quality of the research results, and the lack of standardization. This review aims to provide references and ideas for the research of TCM and to encourage the personalized and precise use of Chinese medicine.
Collapse
Affiliation(s)
- Ling Li
- School of Comprehensive Health Management, Xihua University, Chengdu, Sichuan, China.
- Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan, China.
| | - Lele Yang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macau, China
- Zhuhai UM Science and Technology Research Institute, Zhuhai, Guangdong, China
| | - Liuqing Yang
- School of Food and Bioengineering, Xihua University, Chengdu, Sichuan, China
| | - Chunrong He
- School of Food and Bioengineering, Xihua University, Chengdu, Sichuan, China
| | - Yuxin He
- School of Food and Bioengineering, Xihua University, Chengdu, Sichuan, China
| | - Liping Chen
- School of Comprehensive Health Management, Xihua University, Chengdu, Sichuan, China
- Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan, China
| | - Qin Dong
- School of Food and Bioengineering, Xihua University, Chengdu, Sichuan, China
| | - Huaiying Zhang
- School of Comprehensive Health Management, Xihua University, Chengdu, Sichuan, China
| | - Shiyun Chen
- School of Food and Bioengineering, Xihua University, Chengdu, Sichuan, China
| | - Peng Li
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macau, China.
| |
Collapse
|
4
|
Badwan BA, Liaropoulos G, Kyrodimos E, Skaltsas D, Tsirigos A, Gorgoulis VG. Machine learning approaches to predict drug efficacy and toxicity in oncology. CELL REPORTS METHODS 2023; 3:100413. [PMID: 36936080 PMCID: PMC10014302 DOI: 10.1016/j.crmeth.2023.100413] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
In recent years, there has been a surge of interest in using machine learning algorithms (MLAs) in oncology, particularly for biomedical applications such as drug discovery, drug repurposing, diagnostics, clinical trial design, and pharmaceutical production. MLAs have the potential to provide valuable insights and predictions in these areas by representing both the disease state and the therapeutic agents used to treat it. To fully utilize the capabilities of MLAs in oncology, it is important to understand the fundamental concepts underlying these algorithms and how they can be applied to assess the efficacy and toxicity of therapeutics. In this perspective, we lay out approaches to represent both the disease state and the therapeutic agents used by MLAs to derive novel insights and make relevant predictions.
Collapse
Affiliation(s)
| | | | - Efthymios Kyrodimos
- First ENT Department, Hippocration Hospital, National Kapodistrian University of Athens, Athens, GR 11527, Greece
| | | | - Aristotelis Tsirigos
- Department of Medicine, New York University School of Medicine, New York, NY 10016, USA
- Department of Pathology, New York University School of Medicine, New York, NY 10016, USA
| | - Vassilis G. Gorgoulis
- Intelligencia Inc, New York, NY 10014, USA
- Department of Histology and Embryology, Faculty of Medicine, School of Health Sciences, National Kapodistrian University of Athens, Athens 11527, Greece
- Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
- Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
- Molecular and Clinical Cancer Sciences, Manchester Cancer Research Centre, Manchester Academic Health Sciences Centre, University of Manchester, Manchester M20 4GJ, UK
| |
Collapse
|
5
|
Liu XY, Mei XY. Prediction of drug sensitivity based on multi-omics data using deep learning and similarity network fusion approaches. Front Bioeng Biotechnol 2023; 11:1156372. [PMID: 37139048 PMCID: PMC10150883 DOI: 10.3389/fbioe.2023.1156372] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 03/31/2023] [Indexed: 05/05/2023] Open
Abstract
With the rapid development of multi-omics technologies and accumulation of large-scale bio-datasets, many studies have conducted a more comprehensive understanding of human diseases and drug sensitivity from multiple biomolecules, such as DNA, RNA, proteins and metabolites. Using single omics data is difficult to systematically and comprehensively analyze the complex disease pathology and drug pharmacology. The molecularly targeted therapy-based approaches face some challenges, such as insufficient target gene labeling ability, and no clear targets for non-specific chemotherapeutic drugs. Consequently, the integrated analysis of multi-omics data has become a new direction for scientists to explore the mechanism of disease and drug. However, the available drug sensitivity prediction models based on multi-omics data still have problems such as overfitting, lack of interpretability, difficulties in integrating heterogeneous data, and the prediction accuracy needs to be improved. In this paper, we proposed a novel drug sensitivity prediction (NDSP) model based on deep learning and similarity network fusion approaches, which extracts drug targets using an improved sparse principal component analysis (SPCA) method for each omics data, and construct sample similarity networks based on the sparse feature matrices. Furthermore, the fused similarity networks are put into a deep neural network for training, which greatly reduces the data dimensionality and weakens the risk of overfitting problem. We use three omics of data, RNA sequence, copy number aberration and methylation, and select 35 drugs from Genomics of Drug Sensitivity in Cancer (GDSC) for experiments, including Food and Drug Administration (FDA)-approved targeted drugs, FDA-unapproved targeted drugs and non-specific therapies. Compared with some current deep learning methods, our proposed method can extract highly interpretable biological features to achieve highly accurate sensitivity prediction of targeted and non-specific cancer drugs, which is beneficial for the development of precision oncology beyond targeted therapy.
Collapse
Affiliation(s)
- Xiao-Ying Liu
- Guangdong Polytechnic of Science and Technology, Zhuhai, China
- *Correspondence: Xiao-Ying Liu,
| | - Xin-Yue Mei
- Institute of Systems Engineering, Macau University of Science and Technology, Taipa, China
| |
Collapse
|
6
|
Lenhof K, Eckhart L, Gerstner N, Kehl T, Lenhof HP. Simultaneous regression and classification for drug sensitivity prediction using an advanced random forest method. Sci Rep 2022; 12:13458. [PMID: 35931707 PMCID: PMC9356072 DOI: 10.1038/s41598-022-17609-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 07/28/2022] [Indexed: 12/02/2022] Open
Abstract
Machine learning methods trained on cancer cell line panels are intensively studied for the prediction of optimal anti-cancer therapies. While classification approaches distinguish effective from ineffective drugs, regression approaches aim to quantify the degree of drug effectiveness. However, the high specificity of most anti-cancer drugs induces a skewed distribution of drug response values in favor of the more drug-resistant cell lines, negatively affecting the classification performance (class imbalance) and regression performance (regression imbalance) for the sensitive cell lines. Here, we present a novel approach called SimultAneoUs Regression and classificatiON Random Forests (SAURON-RF) based on the idea of performing a joint regression and classification analysis. We demonstrate that SAURON-RF improves the classification and regression performance for the sensitive cell lines at the expense of a moderate loss for the resistant ones. Furthermore, our results show that simultaneous classification and regression can be superior to regression or classification alone.
Collapse
Affiliation(s)
- Kerstin Lenhof
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus (E2.1), 66123, Saarbrücken, Saarland, Germany.
| | - Lea Eckhart
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus (E2.1), 66123, Saarbrücken, Saarland, Germany
| | - Nico Gerstner
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus (E2.1), 66123, Saarbrücken, Saarland, Germany
| | - Tim Kehl
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus (E2.1), 66123, Saarbrücken, Saarland, Germany
| | - Hans-Peter Lenhof
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus (E2.1), 66123, Saarbrücken, Saarland, Germany
| |
Collapse
|
7
|
Firoozbakht F, Yousefi B, Schwikowski B. An overview of machine learning methods for monotherapy drug response prediction. Brief Bioinform 2022; 23:bbab408. [PMID: 34619752 PMCID: PMC8769705 DOI: 10.1093/bib/bbab408] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/25/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
For an increasing number of preclinical samples, both detailed molecular profiles and their responses to various drugs are becoming available. Efforts to understand, and predict, drug responses in a data-driven manner have led to a proliferation of machine learning (ML) methods, with the longer term ambition of predicting clinical drug responses. Here, we provide a uniquely wide and deep systematic review of the rapidly evolving literature on monotherapy drug response prediction, with a systematic characterization and classification that comprises more than 70 ML methods in 13 subclasses, their input and output data types, modes of evaluation, and code and software availability. ML experts are provided with a fundamental understanding of the biological problem, and how ML methods are configured for it. Biologists and biomedical researchers are introduced to the basic principles of applicable ML methods, and their application to the problem of drug response prediction. We also provide systematic overviews of commonly used data sources used for training and evaluation methods.
Collapse
Affiliation(s)
- Farzaneh Firoozbakht
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Behnam Yousefi
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
- Sorbonne Université, École Doctorale Complexite du Vivant, Paris, France
| | - Benno Schwikowski
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
8
|
Miranda SP, Baião FA, Fleck JL, Piccolo SR. Predicting drug sensitivity of cancer cells based on DNA methylation levels. PLoS One 2021; 16:e0238757. [PMID: 34506489 PMCID: PMC8432830 DOI: 10.1371/journal.pone.0238757] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 06/28/2021] [Indexed: 01/22/2023] Open
Abstract
Cancer cell lines, which are cell cultures derived from tumor samples, represent one of the least expensive and most studied preclinical models for drug development. Accurately predicting drug responses for a given cell line based on molecular features may help to optimize drug-development pipelines and explain mechanisms behind treatment responses. In this study, we focus on DNA methylation profiles as one type of molecular feature that is known to drive tumorigenesis and modulate treatment responses. Using genome-wide, DNA methylation profiles from 987 cell lines in the Genomics of Drug Sensitivity in Cancer database, we used machine-learning algorithms to evaluate the potential to predict cytotoxic responses for eight anti-cancer drugs. We compared the performance of five classification algorithms and four regression algorithms representing diverse methodologies, including tree-, probability-, kernel-, ensemble-, and distance-based approaches. We artificially subsampled the data to varying degrees, aiming to understand whether training based on relatively extreme outcomes would yield improved performance. When using classification or regression algorithms to predict discrete or continuous responses, respectively, we consistently observed excellent predictive performance when the training and test sets consisted of cell-line data. Classification algorithms performed best when we trained the models using cell lines with relatively extreme drug-response values, attaining area-under-the-receiver-operating-characteristic-curve values as high as 0.97. The regression algorithms performed best when we trained the models using the full range of drug-response values, although this depended on the performance metrics we used. Finally, we used patient data from The Cancer Genome Atlas to evaluate the feasibility of classifying clinical responses for human tumors based on models derived from cell lines. Generally, the algorithms were unable to identify patterns that predicted patient responses reliably; however, predictions by the Random Forests algorithm were significantly correlated with Temozolomide responses for low-grade gliomas.
Collapse
Affiliation(s)
- Sofia P. Miranda
- Department of Industrial Engineering, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Fernanda A. Baião
- Department of Industrial Engineering, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Julia L. Fleck
- Mines Saint-Etienne, Univ Clermont Auvergne, CNRS, UMR 6158 LIMOS, Centre CIS, Saint-Etienne, France
| | - Stephen R. Piccolo
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
| |
Collapse
|
9
|
Abstract
This perspective article gathers the latest developments in mathematical and computational oncology tools that exploit network approaches for the mathematical modelling, analysis, and simulation of cancer development and therapy design. It instigates the community to explore new paths and synergies under the umbrella of the Special Issue “Networks in Cancer: From Symmetry Breaking to Targeted Therapy”. The focus of the perspective is to demonstrate how networks can model the physics, analyse the interactions, and predict the evolution of the multiple processes behind tumour-host encounters across multiple scales. From agent-based modelling and mechano-biology to machine learning and predictive modelling, the perspective motivates a methodology well suited to mathematical and computational oncology and suggests approaches that mark a viable path towards adoption in the clinic.
Collapse
|
10
|
Jin I, Nam H. HiDRA: Hierarchical Network for Drug Response Prediction with Attention. J Chem Inf Model 2021; 61:3858-3867. [PMID: 34342985 DOI: 10.1021/acs.jcim.1c00706] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Understanding differences in drug responses between patients is crucial for delivering effective cancer treatment. We describe an interpretable AI model for use in predicting drug responses in cancer cells at the gene, molecular pathway, and drug level, which we have called the hierarchical network for drug response prediction with attention. We found that the model shows better accuracy in predicting drugs having efficacy against a given cell line than other state-of-the-art methods, with a root mean squared error of 1.0064, a Pearson's correlation coefficient of 0.9307, and an R2 value of 0.8647. We also confirmed that the model gives high attention to drug-target genes and cancer-related pathways when predicting a response. The validity of predicted results was proven by in vitro cytotoxicity assay. Overall, we propose that our hierarchical and interpretable AI-based model is capable of interpreting intrinsic characteristics of cancer cells and drugs for accurate prediction of cancer-drug responses.
Collapse
Affiliation(s)
- Iljung Jin
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Republic of Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Republic of Korea.,AI Graduate School, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Republic of Korea
| |
Collapse
|
11
|
Biological knowledge-slanted random forest approach for the classification of calcified aortic valve stenosis. BioData Min 2021; 14:35. [PMID: 34301292 PMCID: PMC8305490 DOI: 10.1186/s13040-021-00269-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2021] [Accepted: 07/18/2021] [Indexed: 11/29/2022] Open
Abstract
Background Calcific aortic valve stenosis (CAVS) is a fatal disease and there is no pharmacological treatment to prevent the progression of CAVS. This study aims to identify genes potentially implicated with CAVS in patients with congenital bicuspid aortic valve (BAV) and tricuspid aortic valve (TAV) in comparison with patients having normal valves, using a knowledge-slanted random forest (RF). Results This study implemented a knowledge-slanted random forest (RF) using information extracted from a protein-protein interactions network to rank genes in order to modify their selection probability to draw the candidate split-variables. A total of 15,191 genes were assessed in 19 valves with CAVS (BAV, n = 10; TAV, n = 9) and 8 normal valves. The performance of the model was evaluated using accuracy, sensitivity, and specificity to discriminate cases with CAVS. A comparison with conventional RF was also performed. The performance of this proposed approach reported improved accuracy in comparison with conventional RF to classify cases separately with BAV and TAV (Slanted RF: 59.3% versus 40.7%). When patients with BAV and TAV were grouped against patients with normal valves, the addition of prior biological information was not relevant with an accuracy of 92.6%. Conclusion The knowledge-slanted RF approach reflected prior biological knowledge, leading to better precision in distinguishing between cases with BAV, TAV, and normal valves. The results of this study suggest that the integration of biological knowledge can be useful during difficult classification tasks. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00269-4.
Collapse
|
12
|
In Silico Inference of Synthetic Cytotoxic Interactions from Paclitaxel Responses. Int J Mol Sci 2021; 22:ijms22031097. [PMID: 33499282 PMCID: PMC7865701 DOI: 10.3390/ijms22031097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/18/2021] [Accepted: 01/19/2021] [Indexed: 11/16/2022] Open
Abstract
To exploit negatively interacting pairs of cancer somatic mutations in chemotherapy responses or synthetic cytotoxicity (SC), we systematically determined mutational pairs that had significantly lower paclitaxel half maximal inhibitory concentration (IC50) values. We evaluated 407 cell lines with somatic mutation profiles and estimated their copy number and drug-inhibitory concentrations in Genomics of Drug Sensitivity in Cancer (GDSC) database. The SC effect of 142 mutated gene pairs on response to paclitaxel was successfully cross-validated using human cancer datasets for urogenital cancers available in The Cancer Genome Atlas (TCGA) database. We further analyzed the cumulative effect of increasing SC pair numbers on the TP53 tumor suppressor gene. Patients with TCGA bladder and urogenital cancer exhibited improved cancer survival rates as the number of disrupted SC partners (i.e., SYNE2, SON, and/or PRY) of TP53 increased. The prognostic effect of SC burden on response to paclitaxel treatment could be differentiated from response to other cytotoxic drugs. Thus, the concept of pairwise SC may aid the identification of novel therapeutic and prognostic targets.
Collapse
|
13
|
Huang LC, Yeung W, Wang Y, Cheng H, Venkat A, Li S, Ma P, Rasheed K, Kannan N. Quantitative Structure-Mutation-Activity Relationship Tests (QSMART) model for protein kinase inhibitor response prediction. BMC Bioinformatics 2020; 21:520. [PMID: 33183223 PMCID: PMC7664030 DOI: 10.1186/s12859-020-03842-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Accepted: 10/27/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Protein kinases are a large family of druggable proteins that are genomically and proteomically altered in many human cancers. Kinase-targeted drugs are emerging as promising avenues for personalized medicine because of the differential response shown by altered kinases to drug treatment in patients and cell-based assays. However, an incomplete understanding of the relationships connecting genome, proteome and drug sensitivity profiles present a major bottleneck in targeting kinases for personalized medicine. RESULTS In this study, we propose a multi-component Quantitative Structure-Mutation-Activity Relationship Tests (QSMART) model and neural networks framework for providing explainable models of protein kinase inhibition and drug response ([Formula: see text]) profiles in cell lines. Using non-small cell lung cancer as a case study, we show that interaction terms that capture associations between drugs, pathways, and mutant kinases quantitatively contribute to the response of two EGFR inhibitors (afatinib and lapatinib). In particular, protein-protein interactions associated with the JNK apoptotic pathway, associations between lung development and axon extension, and interaction terms connecting drug substructures and the volume/charge of mutant residues at specific structural locations contribute significantly to the observed [Formula: see text] values in cell-based assays. CONCLUSIONS By integrating multi-omics data in the QSMART model, we not only predict drug responses in cancer cell lines with high accuracy but also identify features and explainable interaction terms contributing to the accuracy. Although we have tested our multi-component explainable framework on protein kinase inhibitors, it can be extended across the proteome to investigate the complex relationships connecting genotypes and drug sensitivity profiles.
Collapse
Affiliation(s)
- Liang-Chin Huang
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Wayland Yeung
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Ye Wang
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602 USA
| | - Huimin Cheng
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602 USA
| | - Aarya Venkat
- Department of Biochemistry and Molecular Biology, 120 Green St., Athens, GA 30602 USA
| | - Sheng Li
- Department of Computer Science, 415 Boyd Graduate Studies Research Center, Athens, GA 30602 USA
| | - Ping Ma
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602 USA
| | - Khaled Rasheed
- Department of Computer Science, 415 Boyd Graduate Studies Research Center, Athens, GA 30602 USA
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
- Department of Biochemistry and Molecular Biology, 120 Green St., Athens, GA 30602 USA
| |
Collapse
|
14
|
Cadow J, Born J, Manica M, Oskooei A, Rodríguez Martínez M. PaccMann: a web service for interpretable anticancer compound sensitivity prediction. Nucleic Acids Res 2020; 48:W502-W508. [PMID: 32402082 PMCID: PMC7319576 DOI: 10.1093/nar/gkaa327] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 04/06/2020] [Accepted: 04/22/2020] [Indexed: 12/19/2022] Open
Abstract
The identification of new targeted and personalized therapies for cancer requires the fast and accurate assessment of the drug efficacy of potential compounds against a particular biomolecular sample. It has been suggested that the integration of complementary sources of information might strengthen the accuracy of a drug efficacy prediction model. Here, we present a web-based platform for the Prediction of AntiCancer Compound sensitivity with Multimodal Attention-based Neural Networks (PaccMann). PaccMann is trained on public transcriptomic cell line profiles, compound structure information and drug sensitivity screenings, and outperforms state-of-the-art methods on anticancer drug sensitivity prediction. On the open-access web service (https://ibm.biz/paccmann-aas), users can select a known drug compound or design their own compound structure in an interactive editor, perform in-silico drug testing and investigate compound efficacy on publicly available or user-provided transcriptomic profiles. PaccMann leverages methods for model interpretability and outputs confidence scores as well as attention heatmaps that highlight the genes and chemical sub-structures that were more important to make a prediction, hence facilitating the understanding of the model's decision making and the involved biochemical processes. We hope to serve the community with a toolbox for fast and efficient validation in drug repositioning or lead compound identification regimes.
Collapse
Affiliation(s)
- Joris Cadow
- Computational Systems Biology Group, IBM Research Europe, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
| | - Jannis Born
- Computational Systems Biology Group, IBM Research Europe, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
- Machine Learning & Computational Biology Lab, D-BSSE, ETH Zürich, Mattenstrasse 26, Basel, 4058, Switzerland
| | - Matteo Manica
- Computational Systems Biology Group, IBM Research Europe, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
| | - Ali Oskooei
- Computational Systems Biology Group, IBM Research Europe, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
| | - María Rodríguez Martínez
- Computational Systems Biology Group, IBM Research Europe, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
| |
Collapse
|
15
|
Pélissier A, Akrout Y, Jahn K, Kuipers J, Klein U, Beerenwinkel N, Rodríguez Martínez M. Computational Model Reveals a Stochastic Mechanism behind Germinal Center Clonal Bursts. Cells 2020; 9:E1448. [PMID: 32532145 PMCID: PMC7349200 DOI: 10.3390/cells9061448] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 05/29/2020] [Accepted: 06/08/2020] [Indexed: 02/06/2023] Open
Abstract
Germinal centers (GCs) are specialized compartments within the secondary lymphoid organs where B cells proliferate, differentiate, and mutate their antibody genes in response to the presence of foreign antigens. Through the GC lifespan, interclonal competition between B cells leads to increased affinity of the B cell receptors for antigens accompanied by a loss of clonal diversity, although the mechanisms underlying clonal dynamics are not completely understood. We present here a multi-scale quantitative model of the GC reaction that integrates an intracellular component, accounting for the genetic events that shape B cell differentiation, and an extracellular stochastic component, which accounts for the random cellular interactions within the GC. In addition, B cell receptors are represented as sequences of nucleotides that mature and diversify through somatic hypermutations. We exploit extensive experimental characterizations of the GC dynamics to parameterize our model, and visualize affinity maturation by means of evolutionary phylogenetic trees. Our explicit modeling of B cell maturation enables us to characterise the evolutionary processes and competition at the heart of the GC dynamics, and explains the emergence of clonal dominance as a result of initially small stochastic advantages in the affinity to antigen. Interestingly, a subset of the GC undergoes massive expansion of higher-affinity B cell variants (clonal bursts), leading to a loss of clonal diversity at a significantly faster rate than in GCs that do not exhibit clonal dominance. Our work contributes towards an in silico vaccine design, and has implications for the better understanding of the mechanisms underlying autoimmune disease and GC-derived lymphomas.
Collapse
Affiliation(s)
- Aurélien Pélissier
- IBM Research Zurich, 8803 Rüschlikon, Switzerland
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; (K.J.); (J.K.); (N.B.)
| | | | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; (K.J.); (J.K.); (N.B.)
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; (K.J.); (J.K.); (N.B.)
| | - Ulf Klein
- Leeds Institute of Medical Research at St. James’s, University of Leeds, Leeds LS9 7TF, UK;
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; (K.J.); (J.K.); (N.B.)
| | | |
Collapse
|
16
|
Crawford J, Greene CS. Incorporating biological structure into machine learning models in biomedicine. Curr Opin Biotechnol 2020; 63:126-134. [PMID: 31962244 PMCID: PMC7308204 DOI: 10.1016/j.copbio.2019.12.021] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 12/17/2019] [Accepted: 12/19/2019] [Indexed: 12/19/2022]
Abstract
In biomedical applications of machine learning, relevant information often has a rich structure that is not easily encoded as real-valued predictors. Examples of such data include DNA or RNA sequences, gene sets or pathways, gene interaction or coexpression networks, ontologies, and phylogenetic trees. We highlight recent examples of machine learning models that use structure to constrain model architecture or incorporate structured data into model training. For machine learning in biomedicine, where sample size is limited and model interpretability is crucial, incorporating prior knowledge in the form of structured data can be particularly useful. The area of research would benefit from performant open source implementations and independent benchmarking efforts.
Collapse
Affiliation(s)
- Jake Crawford
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States; Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States; Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Philadelphia, PA, United States.
| |
Collapse
|
17
|
Gene regulatory network analysis with drug sensitivity reveals synergistic effects of combinatory chemotherapy in gastric cancer. Sci Rep 2020; 10:3932. [PMID: 32127608 PMCID: PMC7054272 DOI: 10.1038/s41598-020-61016-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Accepted: 02/19/2020] [Indexed: 12/14/2022] Open
Abstract
The combination of docetaxel, cisplatin, and fluorouracil (DCF) is highly synergistic in advanced gastric cancer. We aimed to explain these synergistic effects at the molecular level. Thus, we constructed a weighted correlation network using the differentially expressed genes between Stage I and IV gastric cancer based on The Cancer Genome Atlas (TCGA), and three modules were derived. Next, we investigated the correlation between the eigengene of the expression of the gene network modules and the chemotherapeutic drug response to DCF from the Genomics of Drug Sensitivity in Cancer (GDSC) database. The three modules were associated with functions related to cell migration, angiogenesis, and the immune response. The eigengenes of the three modules had a high correlation with DCF (−0.41, −0.40, and −0.15). The eigengenes of the three modules tended to increase as the stage increased. Advanced gastric cancer was affected by the interaction the among modules with three functions, namely cell migration, angiogenesis, and the immune response, all of which are related to metastasis. The weighted correlation network analysis model proved the complementary effects of DCF at the molecular level and thus, could be used as a unique methodology to determine the optimal combination of chemotherapy drugs for patients with gastric cancer.
Collapse
|
18
|
Parca L, Pepe G, Pietrosanto M, Galvan G, Galli L, Palmeri A, Sciandrone M, Ferrè F, Ausiello G, Helmer-Citterich M. Modeling cancer drug response through drug-specific informative genes. Sci Rep 2019; 9:15222. [PMID: 31645597 PMCID: PMC6811538 DOI: 10.1038/s41598-019-50720-0] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 09/06/2019] [Indexed: 12/18/2022] Open
Abstract
Recent advances in pharmacogenomics have generated a wealth of data of different types whose analysis have helped in the identification of signatures of different cellular sensitivity/resistance responses to hundreds of chemical compounds. Among the different data types, gene expression has proven to be the more successful for the inference of drug response in cancer cell lines. Although effective, the whole transcriptome can introduce noise in the predictive models, since specific mechanisms are required for different drugs and these realistically involve only part of the proteins encoded in the genome. We analyzed the pharmacogenomics data of 961 cell lines tested with 265 anti-cancer drugs and developed different machine learning approaches for dissecting the genome systematically and predict drug responses using both drug-unspecific and drug-specific genes. These methodologies reach better response predictions for the vast majority of the screened drugs using tens to few hundreds genes specific to each drug instead of the whole genome, thus allowing a better understanding and interpretation of drug-specific response mechanisms which are not necessarily restricted to the drug known targets.
Collapse
Affiliation(s)
- Luca Parca
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Gerardo Pepe
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Marco Pietrosanto
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Giulio Galvan
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Leonardo Galli
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Antonio Palmeri
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
- Celgene Institute for Translational Research Europe, Sevilla, Spain
| | - Marco Sciandrone
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Fabrizio Ferrè
- Department of Pharmacy and Biotechnology, University of Bologna Alma Mater, Bologna, Italy
| | - Gabriele Ausiello
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | | |
Collapse
|
19
|
Manica M, Oskooei A, Born J, Subramanian V, Sáez-Rodríguez J, Rodríguez Martínez M. Toward Explainable Anticancer Compound Sensitivity Prediction via Multimodal Attention-Based Convolutional Encoders. Mol Pharm 2019; 16:4797-4806. [DOI: 10.1021/acs.molpharmaceut.9b00520] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
| | | | - Jannis Born
- IBM Research, 8803 Zürich, Switzerland
- ETH Zürich, 8092 Zürich, Switzerland
- University of Zürich, 8006 Zürich, Switzerland
| | | | | | | |
Collapse
|