1
|
Myung Y, de Sá AGC, Ascher DB. Deep-PK: deep learning for small molecule pharmacokinetic and toxicity prediction. Nucleic Acids Res 2024:gkae254. [PMID: 38634808 DOI: 10.1093/nar/gkae254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 03/20/2024] [Accepted: 04/10/2024] [Indexed: 04/19/2024] Open
Abstract
Evaluating pharmacokinetic properties of small molecules is considered a key feature in most drug development and high-throughput screening processes. Generally, pharmacokinetics, which represent the fate of drugs in the human body, are described from four perspectives: absorption, distribution, metabolism and excretion-all of which are closely related to a fifth perspective, toxicity (ADMET). Since obtaining ADMET data from in vitro, in vivo or pre-clinical stages is time consuming and expensive, many efforts have been made to predict ADMET properties via computational approaches. However, the majority of available methods are limited in their ability to provide pharmacokinetics and toxicity for diverse targets, ensure good overall accuracy, and offer ease of use, interpretability and extensibility for further optimizations. Here, we introduce Deep-PK, a deep learning-based pharmacokinetic and toxicity prediction, analysis and optimization platform. We applied graph neural networks and graph-based signatures as a graph-level feature to yield the best predictive performance across 73 endpoints, including 64 ADMET and 9 general properties. With these powerful models, Deep-PK supports molecular optimization and interpretation, aiding users in optimizing and understanding pharmacokinetics and toxicity for given input molecules. The Deep-PK is freely available at https://biosig.lab.uq.edu.au/deeppk/.
Collapse
Affiliation(s)
- Yoochan Myung
- School of Chemistry and Molecular Biosciences, The Australian Centre for Ecogenomics, The University of Queensland, Brisbane, Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
| | - Alex G C de Sá
- School of Chemistry and Molecular Biosciences, The Australian Centre for Ecogenomics, The University of Queensland, Brisbane, Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- Baker Department of Cardiometabolic Health, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, The Australian Centre for Ecogenomics, The University of Queensland, Brisbane, Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- Baker Department of Cardiometabolic Health, The University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
2
|
Soh CH, de Sá AGC, Potter E, Halabi A, Ascher DB, Marwick TH. Use of the energy waveform electrocardiogram to detect subclinical left ventricular dysfunction in patients with type 2 diabetes mellitus. Cardiovasc Diabetol 2024; 23:91. [PMID: 38448993 PMCID: PMC10918872 DOI: 10.1186/s12933-024-02141-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 01/22/2024] [Indexed: 03/08/2024] Open
Abstract
BACKGROUND Recent guidelines propose N-terminal pro-B-type natriuretic peptide (NT-proBNP) for recognition of asymptomatic left ventricular (LV) dysfunction (Stage B Heart Failure, SBHF) in type 2 diabetes mellitus (T2DM). Wavelet Transform based signal-processing transforms electrocardiogram (ECG) waveforms into an energy distribution waveform (ew)ECG, providing frequency and energy features that machine learning can use as additional inputs to improve the identification of SBHF. Accordingly, we sought whether machine learning model based on ewECG features was superior to NT-proBNP, as well as a conventional screening tool-the Atherosclerosis Risk in Communities (ARIC) HF risk score, in SBHF screening among patients with T2DM. METHODS Participants in two clinical trials of SBHF (defined as diastolic dysfunction [DD], reduced global longitudinal strain [GLS ≤ 18%] or LV hypertrophy [LVH]) in T2DM underwent 12-lead ECG with additional ewECG feature and echocardiography. Supervised machine learning was adopted to identify the optimal combination of ewECG extracted features for SBHF screening in 178 participants in one trial and tested in 97 participants in the other trial. The accuracy of the ewECG model in SBHF screening was compared with NT-proBNP and ARIC HF. RESULTS SBHF was identified in 128 (72%) participants in the training dataset (median 72 years, 41% female) and 64 (66%) in the validation dataset (median 70 years, 43% female). Fifteen ewECG features showed an area under the curve (AUC) of 0.81 (95% CI 0.787-0.794) in identifying SBHF, significantly better than both NT-proBNP (AUC 0.56, 95% CI 0.44-0.68, p < 0.001) and ARIC HF (AUC 0.67, 95%CI 0.56-0.79, p = 0.002). ewECG features were also led to robust models screening for DD (AUC 0.74, 95% CI 0.73-0.74), reduced GLS (AUC 0.76, 95% CI 0.73-0.74) and LVH (AUC 0.90, 95% CI 0.88-0.89). CONCLUSIONS Machine learning based modelling using additional ewECG extracted features are superior to NT-proBNP and ARIC HF in SBHF screening among patients with T2DM, providing an alternative HF screening strategy for asymptomatic patients and potentially act as a guidance tool to determine those who required echocardiogram to confirm diagnosis. Trial registration LEAVE-DM, ACTRN 12619001393145 and Vic-ELF, ACTRN 12617000116325.
Collapse
Affiliation(s)
- Cheng Hwee Soh
- Imaging Research Laboratory, Baker Heart and Diabetes Institute, PO Box 6492, Melbourne, VIC, 3004, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Australia
| | - Alex G C de Sá
- Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Australia
- Systems and Computational Biology, Bio21 Institute, Parkville, Australia
| | - Elizabeth Potter
- Imaging Research Laboratory, Baker Heart and Diabetes Institute, PO Box 6492, Melbourne, VIC, 3004, Australia
| | - Amera Halabi
- Imaging Research Laboratory, Baker Heart and Diabetes Institute, PO Box 6492, Melbourne, VIC, 3004, Australia
| | - David B Ascher
- Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Australia
- Systems and Computational Biology, Bio21 Institute, Parkville, Australia
| | - Thomas H Marwick
- Imaging Research Laboratory, Baker Heart and Diabetes Institute, PO Box 6492, Melbourne, VIC, 3004, Australia.
- Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Australia.
- Menzies Institute for Medical Research, Hobart, Australia.
| |
Collapse
|
3
|
Nguyen TB, de Sá AGC, Rodrigues CHM, Pires DEV, Ascher DB. LEGO-CSM: a tool for functional characterisation of proteins. Bioinformatics 2023:btad402. [PMID: 37382560 DOI: 10.1093/bioinformatics/btad402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 02/22/2023] [Accepted: 06/27/2023] [Indexed: 06/30/2023]
Abstract
MOTIVATION With the development of sequencing techniques, the discovery of new proteins significantly exceeds the human capacity and resources for experimentally characterising protein functions. LEGO-CSM is a comprehensive web-based resource that fills this gap by leveraging the well-established and robust graph-based signatures to supervised learning models using both protein sequence and structure information to accurately model protein function in terms of Subcellular Localisation, Enzyme Commission (EC) numbers and Gene Ontology (GO) terms. RESULTS We show our models perform as well as or better than alternative approaches, achieving Area Under the Receiver Operating Characteristic Curve (ROC AUC) of up to 0.93 for subcellular localisation, up to 0.93 for EC and up to 0.81 for GO terms on independent blind tests. AVAILABILITY LEGO-CSM's web server is freely available at https://biosig.lab.uq.edu.au/lego_csm. In addition, all datasets used to train and test LEGO-CSM's models can be downloaded at https://biosig.lab.uq.edu.au/lego_csm/data. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thanh Binh Nguyen
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
| | - Alex G C de Sá
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, Victoria 3010, Australia
| | - Carlos H M Rodrigues
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Computing and Information Systems, University of Melbourne, Parkville, Victoria 3052, Australia
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, Victoria 3010, Australia
- School of Computing and Information Systems, University of Melbourne, Parkville, Victoria 3052, Australia
| |
Collapse
|
4
|
Iftkhar S, de Sá AGC, Velloso JPL, Aljarf R, Pires DEV, Ascher DB. cardioToxCSM: A Web Server for Predicting Cardiotoxicity of Small Molecules. J Chem Inf Model 2022; 62:4827-4836. [PMID: 36219164 DOI: 10.1021/acs.jcim.2c00822] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The design of novel, safe, and effective drugs to treat human diseases is a challenging venture, with toxicity being one of the main sources of attrition at later stages of development. Failure due to toxicity incurs a significant increase in costs and time to market, with multiple drugs being withdrawn from the market due to their adverse effects. Cardiotoxicity, for instance, was responsible for the failure of drugs such as fenspiride, propoxyphene, and valdecoxib. While significant effort has been dedicated to mitigate this issue by developing computational approaches that aim to identify molecules likely to be toxic, including quantitative structure-activity relationship models and machine learning methods, current approaches present limited performance and interpretability. To overcome these, we propose a new web-based computational method, cardioToxCSM, which can predict six types of cardiac toxicity outcomes, including arrhythmia, cardiac failure, heart block, hERG toxicity, hypertension, and myocardial infarction, efficiently and accurately. cardioToxCSM was developed using the concept of graph-based signatures, molecular descriptors, toxicophore matchings, and molecular fingerprints, leveraging explainable machine learning, and was validated internally via different cross validation schemes and externally via low-redundancy blind sets. The models presented robust performances with areas under ROC curves of up to 0.898 on 5-fold cross-validation, consistent with metrics on blind tests. Additionally, our models provide interpretation of the predictions by identifying whether substructures that are commonly enriched in toxic compounds were present. We believe cardioToxCSM will provide valuable insight into the potential cardiotoxicity of small molecules early on drug screening efforts. The method is made freely available as a web server at https://biosig.lab.uq.edu.au/cardiotoxcsm.
Collapse
Affiliation(s)
- Saba Iftkhar
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Alex G C de Sá
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| | - João P L Velloso
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Raghad Aljarf
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| |
Collapse
|
5
|
de Sá AGC, Long Y, Portelli S, Pires DEV, Ascher DB. toxCSM: comprehensive prediction of small molecule toxicity profiles. Brief Bioinform 2022; 23:6673851. [PMID: 35998885 DOI: 10.1093/bib/bbac337] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 07/17/2022] [Accepted: 07/23/2022] [Indexed: 01/29/2023] Open
Abstract
Drug discovery is a lengthy, costly and high-risk endeavour that is further convoluted by high attrition rates in later development stages. Toxicity has been one of the main causes of failure during clinical trials, increasing drug development time and costs. To facilitate early identification and optimisation of toxicity profiles, several computational tools emerged aiming at improving success rates by timely pre-screening drug candidates. Despite these efforts, there is an increasing demand for platforms capable of assessing both environmental as well as human-based toxicity properties at large scale. Here, we present toxCSM, a comprehensive computational platform for the study and optimisation of toxicity profiles of small molecules. toxCSM leverages on the well-established concepts of graph-based signatures, molecular descriptors and similarity scores to develop 36 models for predicting a range of toxicity properties, which can assist in developing safer drugs and agrochemicals. toxCSM achieved an Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) of up to 0.99 and Pearson's correlation coefficients of up to 0.94 on 10-fold cross-validation, with comparable performance on blind test sets, outperforming all alternative methods. toxCSM is freely available as a user-friendly web server and API at http://biosig.lab.uq.edu.au/toxcsm.
Collapse
Affiliation(s)
- Alex G C de Sá
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland, 4072, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia.,Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Yangyang Long
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, 3052, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland, 4072, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, 3052, Australia
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland, 4072, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia.,Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, Victoria, 3010, Australia
| |
Collapse
|
6
|
Uthayopas K, de Sá AGC, Alavi A, Pires DEV, Ascher DB. TSMDA: Target and symptom-based computational model for miRNA-disease-association prediction. Mol Ther Nucleic Acids 2021; 26:536-546. [PMID: 34631283 PMCID: PMC8479276 DOI: 10.1016/j.omtn.2021.08.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/19/2021] [Indexed: 02/06/2023]
Abstract
The emergence of high-throughput sequencing techniques has revealed a primary role of microRNAs (miRNAs) in a wide range of diseases, including cancers and neurodegenerative disorders. Understanding novel relationships between miRNAs and diseases can potentially unveil complex pathogenesis mechanisms, leading to effective diagnosis and treatment. The investigation of novel miRNA-disease associations, however, is currently costly and time consuming. Over the years, several computational models have been proposed to prioritize potential miRNA-disease associations, but with limited usability or predictive capability. In order to fill this gap, we introduce TSMDA, a novel machine-learning method that leverages target and symptom information and negative sample selection to predict miRNA-disease association. TSMDA significantly outperforms similar methods, achieving an area under the receiver operating characteristic (ROC) curve (AUC) of 0.989 and 0.982 under 5-fold cross-validation and blind test, respectively. We also demonstrate the capability of the method to uncover potential miRNA-disease associations in breast, prostate, and lung cancers, as case studies. We believe TSMDA will be an invaluable tool for the community to explore and prioritize potentially new miRNA-disease associations for further experimental characterization. The method was made available as a freely accessible and user-friendly web interface at http://biosig.unimelb.edu.au/tsmda/.
Collapse
Affiliation(s)
- Korawich Uthayopas
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, VIC, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, VIC, Australia
| | - Alex G C de Sá
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, VIC, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, VIC, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, VIC, Australia
| | - Azadeh Alavi
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, VIC, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, VIC, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, VIC, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, VIC, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, VIC, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, VIC, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, VIC, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, VIC, Australia.,Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, UK
| |
Collapse
|
7
|
Abstract
![]()
The development of
new, effective, and safe drugs to treat cancer
remains a challenging and time-consuming task due to limited hit rates,
restraining subsequent development efforts. Despite the impressive
progress of quantitative structure–activity relationship and
machine learning-based models that have been developed to predict
molecule pharmacodynamics and bioactivity, they have had mixed success
at identifying compounds with anticancer properties against multiple
cell lines. Here, we have developed a novel predictive tool, pdCSM-cancer,
which uses a graph-based signature representation of the chemical
structure of a small molecule in order to accurately predict molecules
likely to be active against one or multiple cancer cell lines. pdCSM-cancer
represents the most comprehensive anticancer bioactivity prediction
platform developed till date, comprising trained and validated models
on experimental data of the growth inhibition concentration (GI50%)
effects, including over 18,000 compounds, on 9 tumor types and 74
distinct cancer cell lines. Across 10-fold cross-validation, it achieved
Pearson’s correlation coefficients of up to 0.74 and comparable
performance of up to 0.67 across independent, non-redundant blind
tests. Leveraging the insights from these cell line-specific models,
we developed a generic predictive model to identify molecules active
in at least 60 cell lines. Our final model achieved an area under
the receiver operating characteristic curve (AUC) of up to 0.94 on
10-fold cross-validation and up to 0.94 on independent non-redundant
blind tests, outperforming alternative approaches. We believe that
our predictive tool will provide a valuable resource to optimizing
and enriching screening libraries for the identification of effective
and safe anticancer molecules. To provide a simple and integrated
platform to rapidly screen for potential biologically active molecules
with favorable anticancer properties, we made pdCSM-cancer freely
available online at http://biosig.unimelb.edu.au/pdcsm_cancer.
Collapse
Affiliation(s)
- Raghad Al-Jarf
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Alex G C de Sá
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia.,Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, United Kingdom
| |
Collapse
|