1
|
Sengupta A, Singh SK, Kumar R. Support Vector Machine-Based Prediction Models for Drug Repurposing and Designing Novel Drugs for Colorectal Cancer. ACS OMEGA 2024; 9:18584-18592. [PMID: 38680332 PMCID: PMC11044175 DOI: 10.1021/acsomega.4c01195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 03/28/2024] [Accepted: 03/29/2024] [Indexed: 05/01/2024]
Abstract
Colorectal cancer (CRC) has witnessed a concerning increase in incidence and poses a significant therapeutic challenge due to its poor prognosis. There is a pressing demand to identify novel drug therapies to combat CRC. In this study, we addressed this need by utilizing the pharmacological profiles of anticancer drugs from the Genomics of Drug Sensitivity in Cancer (GDSC) database and developed QSAR models using the Support Vector Machine (SVM) algorithm for prediction of alternative and promiscuous anticancer compounds for CRC treatment. Our QSAR models demonstrated their robustness by achieving a high correlation of determination (R2) after 10-fold cross-validation. For 12 CRC cell lines, R2 ranged from 0.609 to 0.827. The highest performance was achieved for SW1417 and GP5d cell lines with R2 values of 0.827 and 0.786, respectively. Further, we listed the most common chemical descriptors in the drug profiles of the CRC cell lines and we also further reported the correlation of these descriptors with drug activity. The KRFP314 fingerprint was the predominantly occurring descriptor, with the KRFPC314 fingerprint following closely in prevalence within the drug profiles of the CRC cell lines. Beyond predictive modeling, we also confirmed the applicability of our developed QSAR models via in silico methods by conducting descriptor-drug analyses and recapitulating drug-to-oncogene relationships. We also identified two potential anti-CRC FDA-approved drugs, viomycin and diamorphine, using QSAR models. To ensure the easy accessibility and utility of our research findings, we have incorporated these models into a user-friendly prediction Web server named "ColoRecPred", available at https://project.iith.ac.in/cgntlab/colorecpred. We anticipate that this Web server can be used for screening of chemical libraries to identify potential anti-CRC drugs.
Collapse
Affiliation(s)
- Avik Sengupta
- Department
of Biotechnology, Indian Institute of Technology
Hyderabad, Kandi, Telangana 502284, India
| | - Saurabh Kumar Singh
- Department
of Chemistry, Indian Institute of Technology
Hyderabad, Kandi, Telangana 502284, India
| | - Rahul Kumar
- Department
of Biotechnology, Indian Institute of Technology
Hyderabad, Kandi, Telangana 502284, India
| |
Collapse
|
2
|
Al-Jarf R, de Sá AGC, Pires DEV, Ascher DB. pdCSM-cancer: Using Graph-Based Signatures to Identify Small Molecules with Anticancer Properties. J Chem Inf Model 2021; 61:3314-3322. [PMID: 34213323 PMCID: PMC8317153 DOI: 10.1021/acs.jcim.1c00168] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
![]()
The development of
new, effective, and safe drugs to treat cancer
remains a challenging and time-consuming task due to limited hit rates,
restraining subsequent development efforts. Despite the impressive
progress of quantitative structure–activity relationship and
machine learning-based models that have been developed to predict
molecule pharmacodynamics and bioactivity, they have had mixed success
at identifying compounds with anticancer properties against multiple
cell lines. Here, we have developed a novel predictive tool, pdCSM-cancer,
which uses a graph-based signature representation of the chemical
structure of a small molecule in order to accurately predict molecules
likely to be active against one or multiple cancer cell lines. pdCSM-cancer
represents the most comprehensive anticancer bioactivity prediction
platform developed till date, comprising trained and validated models
on experimental data of the growth inhibition concentration (GI50%)
effects, including over 18,000 compounds, on 9 tumor types and 74
distinct cancer cell lines. Across 10-fold cross-validation, it achieved
Pearson’s correlation coefficients of up to 0.74 and comparable
performance of up to 0.67 across independent, non-redundant blind
tests. Leveraging the insights from these cell line-specific models,
we developed a generic predictive model to identify molecules active
in at least 60 cell lines. Our final model achieved an area under
the receiver operating characteristic curve (AUC) of up to 0.94 on
10-fold cross-validation and up to 0.94 on independent non-redundant
blind tests, outperforming alternative approaches. We believe that
our predictive tool will provide a valuable resource to optimizing
and enriching screening libraries for the identification of effective
and safe anticancer molecules. To provide a simple and integrated
platform to rapidly screen for potential biologically active molecules
with favorable anticancer properties, we made pdCSM-cancer freely
available online at http://biosig.unimelb.edu.au/pdcsm_cancer.
Collapse
Affiliation(s)
- Raghad Al-Jarf
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Alex G C de Sá
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia.,Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, United Kingdom
| |
Collapse
|
3
|
Cabrera-Andrade A, López-Cortés A, Jaramillo-Koupermann G, González-Díaz H, Pazos A, Munteanu CR, Pérez-Castillo Y, Tejera E. A Multi-Objective Approach for Anti-Osteosarcoma Cancer Agents Discovery through Drug Repurposing. Pharmaceuticals (Basel) 2020; 13:ph13110409. [PMID: 33266378 PMCID: PMC7700154 DOI: 10.3390/ph13110409] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 11/11/2020] [Accepted: 11/12/2020] [Indexed: 02/08/2023] Open
Abstract
Osteosarcoma is the most common type of primary malignant bone tumor. Although nowadays 5-year survival rates can reach up to 60–70%, acute complications and late effects of osteosarcoma therapy are two of the limiting factors in treatments. We developed a multi-objective algorithm for the repurposing of new anti-osteosarcoma drugs, based on the modeling of molecules with described activity for HOS, MG63, SAOS2, and U2OS cell lines in the ChEMBL database. Several predictive models were obtained for each cell line and those with accuracy greater than 0.8 were integrated into a desirability function for the final multi-objective model. An exhaustive exploration of model combinations was carried out to obtain the best multi-objective model in virtual screening. For the top 1% of the screened list, the final model showed a BEDROC = 0.562, EF = 27.6, and AUC = 0.653. The repositioning was performed on 2218 molecules described in DrugBank. Within the top-ranked drugs, we found: temsirolimus, paclitaxel, sirolimus, everolimus, and cabazitaxel, which are antineoplastic drugs described in clinical trials for cancer in general. Interestingly, we found several broad-spectrum antibiotics and antiretroviral agents. This powerful model predicts several drugs that should be studied in depth to find new chemotherapy regimens and to propose new strategies for osteosarcoma treatment.
Collapse
Affiliation(s)
- Alejandro Cabrera-Andrade
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito 170125, Ecuador;
- Carrera de Enfermería, Facultad de Ciencias de la Salud, Universidad de Las Américas, Quito 170125, Ecuador
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruña, CITIC, Campus Elviña s/n, 15071 A Coruña, Spain; (A.L.-C.); (A.P.); (C.R.M.)
- Correspondence: (A.C.-A.); (E.T.)
| | - Andrés López-Cortés
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruña, CITIC, Campus Elviña s/n, 15071 A Coruña, Spain; (A.L.-C.); (A.P.); (C.R.M.)
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito 170129, Ecuador
- Latin American Network for Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), 28029 Madrid, Spain
| | - Gabriela Jaramillo-Koupermann
- Laboratorio de Biología Molecular, Subproceso de Anatomía Patológica, Hospital de Especialidades Eugenio Espejo, Quito 170403, Ecuador;
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, and Basque Center for Biophysics CSIC-UPV/EHU, University of the Basque Country UPV/EHU, 48940 Leioa, Spain;
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain
| | - Alejandro Pazos
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruña, CITIC, Campus Elviña s/n, 15071 A Coruña, Spain; (A.L.-C.); (A.P.); (C.R.M.)
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), 15006 A Coruña, Spain
| | - Cristian R. Munteanu
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruña, CITIC, Campus Elviña s/n, 15071 A Coruña, Spain; (A.L.-C.); (A.P.); (C.R.M.)
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), 15006 A Coruña, Spain
| | - Yunierkis Pérez-Castillo
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito 170125, Ecuador;
- Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito 170125, Ecuador
| | - Eduardo Tejera
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito 170125, Ecuador;
- Facultad de Ingeniería y Ciencias Agropecuarias, Universidad de Las Américas, Quito 170125, Ecuador
- Correspondence: (A.C.-A.); (E.T.)
| |
Collapse
|
4
|
González-Paz L, Paz JL, Vera-Villalobos J, Alvarado YJ. Compuestos Fitoquímicos Dirigidos al Bloqueo de la Polimerasa Viral del SARS-CoV-2 Causante del COVID-19: un Análisis Comparativo de Funciones de Puntuación para Acoplamientos con Interés Biomédico. REVISTA POLITÉCNICA 2020. [DOI: 10.33333/rp.vol46n1.01] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
La pandemia mundial del COVID-19 causada por el SARS-CoV-2 ha hecho necesario buscar alternativas de tratamiento. La OMS ha recomendado el fármaco aprobado por la FDA Remdesivir dirigido a la RNA polimerasa viral. Adicionalmente, se han evaluado computacionalmente compuestos naturales con propiedades antivirales. Sin embargo, estos estudios se centran en el uso de la función de puntuación del algoritmo AutoDock Vina (ADV) para predecir los candidatos. Aquí proponemos evaluar los fitoquímicos Piperina_ID_638024, EPGG_ID_65064, Curcumina_ID_969516, y Capsaicina_ID_1548943 frente a la RNA polimerasa del SARS-CoV-2 (PDB_ID_6NUR), usando Remdesivir_ID_121304016 como control, mediante análisis computacional, comparativo y multivariado de las funciones de puntuación ADV, PLANTS, MolDock, Rerank y DockT considerando la solubilidad de ligandos e hidrofobicidad de las cavidades implicadas en las interacciones, para aumentar la precisión en la predicción de los mejores acoplamientos de los compuestos naturales frente al COVID-19. Encontramos que 4/5 de las funciones de puntuación exceptuando ADV predijeron el acoplamiento termodinámicamente más favorable con Piperina, superando a Remdesivir. También observamos que las calificaciones de PLANTS, ADV y DockT se afectan por la solubilidad del ligando e hidrofobicidad de cavidades. Bajo las condiciones de este estudio concluimos que los algoritmos MolDock y Rerank son más adecuados para el cribado rápido y la reorganización de acoplamientos, cuando se trabaje con ligandos solubles (Rp = 0.70 para ambos), indistintamente de su polaridad, y dirigidos a cavidades hidrofóbicas de la RNA polimerasa del SARS-CoV-2 (Rp = 0.95 y Rp = 0.90, respectivamente), especialmente para los enfoques computacionales en el contexto de la investigación de fármacos frente al COVID-19.
Collapse
|
5
|
Sidorov P, Naulaerts S, Ariey-Bonnet J, Pasquier E, Ballester PJ. Predicting Synergism of Cancer Drug Combinations Using NCI-ALMANAC Data. Front Chem 2019; 7:509. [PMID: 31380352 PMCID: PMC6646421 DOI: 10.3389/fchem.2019.00509] [Citation(s) in RCA: 79] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 07/02/2019] [Indexed: 12/15/2022] Open
Abstract
Drug combinations are of great interest for cancer treatment. Unfortunately, the discovery of synergistic combinations by purely experimental means is only feasible on small sets of drugs. In silico modeling methods can substantially widen this search by providing tools able to predict which of all possible combinations in a large compound library are synergistic. Here we investigate to which extent drug combination synergy can be predicted by exploiting the largest available dataset to date (NCI-ALMANAC, with over 290,000 synergy determinations). Each cell line is modeled using primarily two machine learning techniques, Random Forest (RF) and Extreme Gradient Boosting (XGBoost), on the datasets provided by NCI-ALMANAC. This large-scale predictive modeling study comprises more than 5,000 pair-wise drug combinations, 60 cell lines, 4 types of models, and 5 types of chemical features. The application of a powerful, yet uncommonly used, RF-specific technique for reliability prediction is also investigated. The evaluation of these models shows that it is possible to predict the synergy of unseen drug combinations with high accuracy (Pearson correlations between 0.43 and 0.86 depending on the considered cell line, with XGBoost providing slightly better predictions than RF). We have also found that restricting to the most reliable synergy predictions results in at least 2-fold error decrease with respect to employing the best learning algorithm without any reliability estimation. Alkylating agents, tyrosine kinase inhibitors and topoisomerase inhibitors are the drugs whose synergy with other partner drugs are better predicted by the models. Despite its leading size, NCI-ALMANAC comprises an extremely small part of all conceivable combinations. Given their accuracy and reliability estimation, the developed models should drastically reduce the number of required in vitro tests by predicting in silico which of the considered combinations are likely to be synergistic.
Collapse
Affiliation(s)
- Pavel Sidorov
- CRCM, INSERM, Cancer Research Center of Marseille, Institut Paoli-Calmettes, Aix-Marseille Univ, CNRS, Marseille, France
| | - Stefan Naulaerts
- CRCM, INSERM, Cancer Research Center of Marseille, Institut Paoli-Calmettes, Aix-Marseille Univ, CNRS, Marseille, France
- Department of Tumor Immunology, Institut de Duve, Bruxelles, Belgium
| | - Jérémy Ariey-Bonnet
- CRCM, INSERM, Cancer Research Center of Marseille, Institut Paoli-Calmettes, Aix-Marseille Univ, CNRS, Marseille, France
| | - Eddy Pasquier
- CRCM, INSERM, Cancer Research Center of Marseille, Institut Paoli-Calmettes, Aix-Marseille Univ, CNRS, Marseille, France
| | - Pedro J. Ballester
- CRCM, INSERM, Cancer Research Center of Marseille, Institut Paoli-Calmettes, Aix-Marseille Univ, CNRS, Marseille, France
| |
Collapse
|
6
|
Dang CC, Peón A, Ballester PJ. Unearthing new genomic markers of drug response by improved measurement of discriminative power. BMC Med Genomics 2018; 11:10. [PMID: 29409485 PMCID: PMC5801688 DOI: 10.1186/s12920-018-0336-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Accepted: 01/29/2018] [Indexed: 12/29/2022] Open
Abstract
Background Oncology drugs are only effective in a small proportion of cancer patients. Our current ability to identify these responsive patients before treatment is still poor in most cases. Thus, there is a pressing need to discover response markers for marketed and research oncology drugs. Screening these drugs against a large panel of cancer cell lines has led to the discovery of new genomic markers of in vitro drug response. However, while the identification of such markers among thousands of candidate drug-gene associations in the data is error-prone, an appraisal of the effectiveness of such detection task is currently lacking. Methods Here we present a new non-parametric method to measuring the discriminative power of a drug-gene association. Unlike parametric statistical tests, the adopted non-parametric test has the advantage of not making strong assumptions about the data distorting the identification of genomic markers. Furthermore, we introduce a new benchmark to further validate these markers in vitro using more recent data not used to identify the markers. Results The application of this new methodology has led to the identification of 128 new genomic markers distributed across 61% of the analysed drugs, including 5 drugs without previously known markers, which were missed by the MANOVA test initially applied to analyse data from the Genomics of Drug Sensitivity in Cancer consortium. Conclusions Discovering markers using more than one statistical test and testing them on independent data is unusual. We found this helpful to discard statistically significant drug-gene associations that were actually spurious correlations. This approach also revealed new, independently validated, in vitro markers of drug response such as Temsirolimus-CDKN2A (resistance) and Gemcitabine-EWS_FLI1 (sensitivity). Electronic supplementary material The online version of this article (10.1186/s12920-018-0336-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Cuong C Dang
- Cancer Research Center of Marseille, INSERM U1068, F-13009, Marseille, France.,Institut Paoli-Calmettes, F-13009, Marseille, France.,Aix-Marseille Université, F-13284, Marseille, France.,CNRS UMR7258, F-13009, Marseille, France
| | - Antonio Peón
- Cancer Research Center of Marseille, INSERM U1068, F-13009, Marseille, France.,Institut Paoli-Calmettes, F-13009, Marseille, France.,Aix-Marseille Université, F-13284, Marseille, France.,CNRS UMR7258, F-13009, Marseille, France
| | - Pedro J Ballester
- Cancer Research Center of Marseille, INSERM U1068, F-13009, Marseille, France. .,Institut Paoli-Calmettes, F-13009, Marseille, France. .,Aix-Marseille Université, F-13284, Marseille, France. .,CNRS UMR7258, F-13009, Marseille, France.
| |
Collapse
|
7
|
Naulaerts S, Dang CC, Ballester PJ. Precision and recall oncology: combining multiple gene mutations for improved identification of drug-sensitive tumours. Oncotarget 2017; 8:97025-97040. [PMID: 29228590 PMCID: PMC5722542 DOI: 10.18632/oncotarget.20923] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 08/14/2017] [Indexed: 02/07/2023] Open
Abstract
Cancer drug therapies are only effective in a small proportion of patients. To make things worse, our ability to identify these responsive patients before administering a treatment is generally very limited. The recent arrival of large-scale pharmacogenomic data sets, which measure the sensitivity of molecularly profiled cancer cell lines to a panel of drugs, has boosted research on the discovery of drug sensitivity markers. However, no systematic comparison of widely-used single-gene markers with multi-gene machine-learning markers exploiting genomic data has been so far conducted. We therefore assessed the performance offered by these two types of models in discriminating between sensitive and resistant cell lines to a given drug. This was carried out for each of 127 considered drugs using genomic data characterising the cell lines. We found that the proportion of cell lines predicted to be sensitive that are actually sensitive (precision) varies strongly with the drug and type of model used. Furthermore, the proportion of sensitive cell lines that are correctly predicted as sensitive (recall) of the best single-gene marker was lower than that of the multi-gene marker in 118 of the 127 tested drugs. We conclude that single-gene markers are only able to identify those drug-sensitive cell lines with the considered actionable mutation, unlike multi-gene markers that can in principle combine multiple gene mutations to identify additional sensitive cell lines. We also found that cell line sensitivities to some drugs (e.g. Temsirolimus, 17-AAG or Methotrexate) are better predicted by these machine-learning models.
Collapse
Affiliation(s)
- Stefan Naulaerts
- Computational Biology and Drug Design, Cancer Research Center of Marseille, INSERM U1068, Marseille, France.,Institut Paoli-Calmettes, Marseille, France.,Aix-Marseille Université, Marseille, France.,CNRS UMR7258, Marseille, France
| | - Cuong C Dang
- Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam
| | - Pedro J Ballester
- Computational Biology and Drug Design, Cancer Research Center of Marseille, INSERM U1068, Marseille, France.,Institut Paoli-Calmettes, Marseille, France.,Aix-Marseille Université, Marseille, France.,CNRS UMR7258, Marseille, France
| |
Collapse
|
8
|
Qureshi A, Kaur G, Kumar M. AVCpred: an integrated web server for prediction and design of antiviral compounds. Chem Biol Drug Des 2017; 89:74-83. [PMID: 27490990 PMCID: PMC7162012 DOI: 10.1111/cbdd.12834] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Revised: 07/21/2016] [Accepted: 07/25/2016] [Indexed: 12/11/2022]
Abstract
Viral infections constantly jeopardize the global public health due to lack of effective antiviral therapeutics. Therefore, there is an imperative need to speed up the drug discovery process to identify novel and efficient drug candidates. In this study, we have developed quantitative structure-activity relationship (QSAR)-based models for predicting antiviral compounds (AVCs) against deadly viruses like human immunodeficiency virus (HIV), hepatitis C virus (HCV), hepatitis B virus (HBV), human herpesvirus (HHV) and 26 others using publicly available experimental data from the ChEMBL bioactivity database. Support vector machine (SVM) models achieved a maximum Pearson correlation coefficient of 0.72, 0.74, 0.66, 0.68, and 0.71 in regression mode and a maximum Matthew's correlation coefficient 0.91, 0.93, 0.70, 0.89, and 0.71, respectively, in classification mode during 10-fold cross-validation. Furthermore, similar performance was observed on the independent validation sets. We have integrated these models in the AVCpred web server, freely available at http://crdd.osdd.net/servers/avcpred. In addition, the datasets are provided in a searchable format. We hope this web server will assist researchers in the identification of potential antiviral agents. It would also save time and cost by prioritizing new drugs against viruses before their synthesis and experimental testing.
Collapse
Affiliation(s)
- Abid Qureshi
- Bioinformatics CentreInstitute of Microbial TechnologyCouncil of Scientific and Industrial ResearchChandigarhIndia
| | - Gazaldeep Kaur
- Bioinformatics CentreInstitute of Microbial TechnologyCouncil of Scientific and Industrial ResearchChandigarhIndia
| | - Manoj Kumar
- Bioinformatics CentreInstitute of Microbial TechnologyCouncil of Scientific and Industrial ResearchChandigarhIndia
| |
Collapse
|
9
|
Nguyen L, Dang CC, Ballester PJ. Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data. F1000Res 2016; 5. [PMID: 28299173 PMCID: PMC5310525 DOI: 10.12688/f1000research.10529.2] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/10/2017] [Indexed: 12/19/2022] Open
Abstract
Background: Selected gene mutations are routinely used to guide the selection of cancer drugs for a given patient tumour. Large pharmacogenomic data sets, such as those by Genomics of Drug Sensitivity in Cancer (GDSC) consortium, were introduced to discover more of these single-gene markers of drug sensitivity. Very recently, machine learning regression has been used to investigate how well cancer cell line sensitivity to drugs is predicted depending on the type of molecular profile. The latter has revealed that gene expression data is the most predictive profile in the pan-cancer setting. However, no study to date has exploited GDSC data to systematically compare the performance of machine learning models based on multi-gene expression data against that of widely-used single-gene markers based on genomics data.
Methods: Here we present this systematic comparison using Random Forest (RF) classifiers exploiting the expression levels of 13,321 genes and an average of 501 tested cell lines per drug. To account for time-dependent batch effects in IC
50 measurements, we employ independent test sets generated with more recent GDSC data than that used to train the predictors and show that this is a more realistic validation than standard k-fold cross-validation.
Results and Discussion: Across 127 GDSC drugs, our results show that the single-gene markers unveiled by the MANOVA analysis tend to achieve higher precision than these RF-based multi-gene models, at the cost of generally having a poor recall (i.e. correctly detecting only a small part of the cell lines sensitive to the drug). Regarding overall classification performance, about two thirds of the drugs are better predicted by the multi-gene RF classifiers. Among the drugs with the most predictive of these models, we found pyrimethamine, sunitinib and 17-AAG.
Conclusions: Thanks to this unbiased validation, we now know that this type of models can predict
in vitro tumour response to some of these drugs. These models can thus be further investigated on
in vivo tumour models. R code to facilitate the construction of alternative machine learning models and their validation in the presented benchmark is available at
http://ballester.marseille.inserm.fr/gdsc.transcriptomicDatav2.tar.gz.
Collapse
Affiliation(s)
- Linh Nguyen
- Cancer Research Center of Marseille, INSERM U1068, Marseille, France; Institut Paoli-Calmettes, Marseille, France; Aix-Marseille Université, Marseille, France; Cancer Research Center of Marseille UMR7258, Marseille, France
| | - Cuong C Dang
- Cancer Research Center of Marseille, INSERM U1068, Marseille, France; Institut Paoli-Calmettes, Marseille, France; Aix-Marseille Université, Marseille, France; Cancer Research Center of Marseille UMR7258, Marseille, France
| | - Pedro J Ballester
- Cancer Research Center of Marseille, INSERM U1068, Marseille, France; Institut Paoli-Calmettes, Marseille, France; Aix-Marseille Université, Marseille, France; Cancer Research Center of Marseille UMR7258, Marseille, France
| |
Collapse
|
10
|
Nguyen L, Dang CC, Ballester PJ. Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data. F1000Res 2016; 5. [PMID: 28299173 DOI: 10.12688/f1000research.10529.1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/28/2016] [Indexed: 12/30/2022] Open
Abstract
Background: Selected gene mutations are routinely used to guide the selection of cancer drugs for a given patient tumour. Large pharmacogenomic data sets, such as those by Genomics of Drug Sensitivity in Cancer (GDSC) consortium, were introduced to discover more of these single-gene markers of drug sensitivity. Very recently, machine learning regression has been used to investigate how well cancer cell line sensitivity to drugs is predicted depending on the type of molecular profile. The latter has revealed that gene expression data is the most predictive profile in the pan-cancer setting. However, no study to date has exploited GDSC data to systematically compare the performance of machine learning models based on multi-gene expression data against that of widely-used single-gene markers based on genomics data. Methods: Here we present this systematic comparison using Random Forest (RF) classifiers exploiting the expression levels of 13,321 genes and an average of 501 tested cell lines per drug. To account for time-dependent batch effects in IC 50 measurements, we employ independent test sets generated with more recent GDSC data than that used to train the predictors and show that this is a more realistic validation than standard k-fold cross-validation. Results and Discussion: Across 127 GDSC drugs, our results show that the single-gene markers unveiled by the MANOVA analysis tend to achieve higher precision than these RF-based multi-gene models, at the cost of generally having a poor recall (i.e. correctly detecting only a small part of the cell lines sensitive to the drug). Regarding overall classification performance, about two thirds of the drugs are better predicted by the multi-gene RF classifiers. Among the drugs with the most predictive of these models, we found pyrimethamine, sunitinib and 17-AAG. Conclusions: Thanks to this unbiased validation, we now know that this type of models can predict in vitro tumour response to some of these drugs. These models can thus be further investigated on in vivo tumour models. R code to facilitate the construction of alternative machine learning models and their validation in the presented benchmark is available at http://ballester.marseille.inserm.fr/gdsc.transcriptomicDatav2.tar.gz.
Collapse
Affiliation(s)
- Linh Nguyen
- Cancer Research Center of Marseille, INSERM U1068, Marseille, France; Institut Paoli-Calmettes, Marseille, France; Aix-Marseille Université, Marseille, France; Cancer Research Center of Marseille UMR7258, Marseille, France
| | - Cuong C Dang
- Cancer Research Center of Marseille, INSERM U1068, Marseille, France; Institut Paoli-Calmettes, Marseille, France; Aix-Marseille Université, Marseille, France; Cancer Research Center of Marseille UMR7258, Marseille, France
| | - Pedro J Ballester
- Cancer Research Center of Marseille, INSERM U1068, Marseille, France; Institut Paoli-Calmettes, Marseille, France; Aix-Marseille Université, Marseille, France; Cancer Research Center of Marseille UMR7258, Marseille, France
| |
Collapse
|
11
|
Singh H, Kumar R, Singh S, Chaudhary K, Gautam A, Raghava GPS. Prediction of anticancer molecules using hybrid model developed on molecules screened against NCI-60 cancer cell lines. BMC Cancer 2016; 16:77. [PMID: 26860193 PMCID: PMC4748564 DOI: 10.1186/s12885-016-2082-y] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Accepted: 01/21/2016] [Indexed: 11/16/2022] Open
Abstract
Background In past, numerous quantitative structure-activity relationship (QSAR) based models have been developed for predicting anticancer activity for a specific class of molecules against different cancer drug targets. In contrast, limited attempt have been made to predict the anticancer activity of a diverse class of chemicals against a wide variety of cancer cell lines. In this study, we described a hybrid method developed on thousands of anticancer and non-anticancer molecules tested against National Cancer Institute (NCI) 60 cancer cell lines. Results Our analysis of anticancer molecules revealed that majority of anticancer molecules contains 18–24 carbon atoms and are dominated by functional groups like R2NH, R3N, ROH, RCOR, and ROR. It was also observed that certain substructures (e.g., 1-methoxy-4-methylbenzene, 1-methoxy benzene, Nitrobenzene, Indole, Propenyl benzene) are more abundant in anticancer molecules. Next, we developed anticancer molecule prediction models using various machine-learning techniques and achieved maximum matthews correlation coefficient (MCC) of 0.81 with 90.40 % accuracy using support vector machine (SVM) based models. In another approach, a novel similarity or potency score based method has been developed using selected fragments/fingerprints and achieved maximum MCC of 0.82 with 90.65 % accuracy. Finally, we combined the strength of above methods and developed a hybrid method with maximum MCC of 0.85 with 92.47 % accuracy. Conclusions We developed a hybrid method utilizing the best of machine learning and potency score based method. The highly accurate hybrid method can be used for classification of anticancer and non-anticancer molecules. In order to facilitate scientific community working in the field of anticancer drug discovery, we integrate hybrid and potency method in a web server CancerIN. This server provides various facilities that includes; virtual screening of anticancer molecules, analog based drug design, and similarity with known anticancer molecules (http://crdd.osdd.net/oscadd/cancerin). Electronic supplementary material The online version of this article (doi:10.1186/s12885-016-2082-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Harinder Singh
- Bioinformatics Centre, Institute of Microbial Technology, Sector 39-A, Chandigarh, India.
| | - Rahul Kumar
- Bioinformatics Centre, Institute of Microbial Technology, Sector 39-A, Chandigarh, India.
| | - Sandeep Singh
- Bioinformatics Centre, Institute of Microbial Technology, Sector 39-A, Chandigarh, India.
| | - Kumardeep Chaudhary
- Bioinformatics Centre, Institute of Microbial Technology, Sector 39-A, Chandigarh, India.
| | - Ankur Gautam
- Bioinformatics Centre, Institute of Microbial Technology, Sector 39-A, Chandigarh, India.
| | - Gajendra P S Raghava
- Bioinformatics Centre, Institute of Microbial Technology, Sector 39-A, Chandigarh, India.
| |
Collapse
|
12
|
Maindola P, Jamal S, Grover A. Cheminformatics Based Machine Learning Models for AMA1-RON2 Abrogators for Inhibiting Plasmodium falciparum Erythrocyte Invasion. Mol Inform 2015; 34:655-64. [PMID: 27490966 DOI: 10.1002/minf.201400139] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Accepted: 02/21/2015] [Indexed: 01/15/2023]
Abstract
Malaria remains a dreadful disease by putting every year about 3.4 billion people at risk and resulting into mortality of 627 thousand people worldwide. Existing therapies based upon Quinines and Artemisinin-based combination therapies have started showing resistance, pressing the need for search of anti-malarials with different mechanisms of action. In this respect erythrocyte invasion by Plasmodium is immensely crucial, as being obligate intracellular parasite it must invade host cells. This process is mediated by interaction between conserved Apical Membrane Antigen (AMA1) and Rhoptry Neck (RON2) protein, which is compulsory for successful invasion of erythrocyte by Plasmodium and manifestation of the disease Malaria. Here, using the physicochemical properties of the compounds available from a confirmatory high throughput screening, which were tested for their disruption capability of this crucial molecular interaction, we trained supervised classifiers and validated their robustness by various statistical parameters. Best model was used for screening new compounds from Traditional Chinese Medicine Database. Some of the best hits already find their use as anti-malarials and the model predicts that an essential part of their effectiveness is likely due to inhibition of AMA1-RON2 interaction. Pharmacophoric features have also been identified to ease further designing of possible leads in an effective way.
Collapse
Affiliation(s)
- Priyank Maindola
- School of Biotechnology, Jawaharlal Nehru University, New Delhi-110067, India phone/fax: +91-11-26738728; fax: +91-11-26702040
| | - Salma Jamal
- School of Biotechnology, Jawaharlal Nehru University, New Delhi-110067, India phone/fax: +91-11-26738728; fax: +91-11-26702040
| | - Abhinav Grover
- School of Biotechnology, Jawaharlal Nehru University, New Delhi-110067, India phone/fax: +91-11-26738728; fax: +91-11-26702040.
| |
Collapse
|