1
|
Heyndrickx W, Mervin L, Morawietz T, Sturm N, Friedrich L, Zalewski A, Pentina A, Humbeck L, Oldenhof M, Niwayama R, Schmidtke P, Fechner N, Simm J, Arany A, Drizard N, Jabal R, Afanasyeva A, Loeb R, Verma S, Harnqvist S, Holmes M, Pejo B, Telenczuk M, Holway N, Dieckmann A, Rieke N, Zumsande F, Clevert DA, Krug M, Luscombe C, Green D, Ertl P, Antal P, Marcus D, Do Huu N, Fuji H, Pickett S, Acs G, Boniface E, Beck B, Sun Y, Gohier A, Rippmann F, Engkvist O, Göller AH, Moreau Y, Galtier MN, Schuffenhauer A, Ceulemans H. MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information. J Chem Inf Model 2024; 64:2331-2344. [PMID: 37642660 PMCID: PMC11005050 DOI: 10.1021/acs.jcim.3c00799] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Indexed: 08/31/2023]
Abstract
Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.
Collapse
Affiliation(s)
| | - Lewis Mervin
- AstraZeneca
R&D, Biomedical Campus, 1 Francis Crick Ave, Cambridge CB2 0SL, U.K.
| | - Tobias Morawietz
- Bayer
Pharma
AG, Global Drug Discovery, Chemical Research,
Computational Chemistry, Aprather Weg 18 a, Wuppertal 42096, Germany
| | - Noé Sturm
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Lukas Friedrich
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Adam Zalewski
- Amgen Research
(Munich) GmbH, Staffelseestraße
2, Munich 81477, Germany
| | - Anastasia Pentina
- Bayer AG, Machine Learning Research, Research & Development,
Pharmaceuticals, Berlin 10117, Germany
| | - Lina Humbeck
- BI Medicinal
Chemistry Department, Boehringer Ingelheim
Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Germany
| | - Martijn Oldenhof
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Ritsuya Niwayama
- Institut
de recherches Servier, 125 chemin de ronde Croissy-sur-Seine, Île-de-France 78290, France
| | | | - Nikolas Fechner
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Jaak Simm
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Adam Arany
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | | | - Rama Jabal
- Iktos, 65 rue de Prony, Paris 75017, France
| | - Arina Afanasyeva
- Modality
Informatics Group, Digital Research Solutions, Advanced Informatics
& Analytics, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Regis Loeb
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Shlok Verma
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Simon Harnqvist
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Matthew Holmes
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Balazs Pejo
- Budapest
University of Technology and Economics, Department of Networked Systems and Services, Műegyetem rkp. 3, Budapest 1111, Hungary
| | | | - Nicholas Holway
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Arne Dieckmann
- Bayer
AG, API Production, Product Supply, Pharmaceuticals, Ernst-Schering-Straße 14, Bergkamen 59192, Germany
| | - Nicola Rieke
- NVIDIA
GmbH, Floessergasse 2, Munich 81369, Germany
| | | | - Djork-Arné Clevert
- Bayer AG, Machine Learning Research, Research & Development,
Pharmaceuticals, Berlin 10117, Germany
| | - Michael Krug
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Christopher Luscombe
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Darren Green
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Peter Ertl
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Peter Antal
- Budapest
University of Technology and Economics, Department of Measurement and Information Systems, Műegyetem rkp. 3, Budapest 1111, Hungary
| | - David Marcus
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | | | - Hideyoshi Fuji
- Modality
Informatics Group, Digital Research Solutions, Advanced Informatics
& Analytics, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Stephen Pickett
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Gergely Acs
- Budapest
University of Technology and Economics, Department of Networked Systems and Services, Műegyetem rkp. 3, Budapest 1111, Hungary
| | - Eric Boniface
- Substra
Foundation - Labelia Labs, 4 rue Voltaire, Nantes 44000, France
| | - Bernd Beck
- BI Medicinal
Chemistry Department, Boehringer Ingelheim
Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Germany
| | - Yax Sun
- Amgen
Research, 1 Amgen Center
Drive, Thousand Oaks, California 92130, United States
| | - Arnaud Gohier
- Institut
de recherches Servier, 125 chemin de ronde Croissy-sur-Seine, Île-de-France 78290, France
| | - Friedrich Rippmann
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Ola Engkvist
- AstraZeneca, Molecular AI, Discovery Sciences,
R&D, Pepparedsleden
1, Mölndal 431 50, Sweden
| | - Andreas H. Göller
- Bayer
Pharma
AG, Global Drug Discovery, Chemical Research,
Computational Chemistry, Aprather Weg 18 a, Wuppertal 42096, Germany
| | - Yves Moreau
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | | | - Ansgar Schuffenhauer
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Hugo Ceulemans
- Janssen
Pharmaceutica NV, Turnhoutseweg 30, Beerse 2340, Belgium
| |
Collapse
|
2
|
Jain S, Siramshetty VB, Alves VM, Muratov EN, Kleinstreuer N, Tropsha A, Nicklaus MC, Simeonov A, Zakharov AV. Large-Scale Modeling of Multispecies Acute Toxicity End Points Using Consensus of Multitask Deep Learning Methods. J Chem Inf Model 2021; 61:653-663. [PMID: 33533614 DOI: 10.1021/acs.jcim.0c01164] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Computational methods to predict molecular properties regarding safety and toxicology represent alternative approaches to expedite drug development, screen environmental chemicals, and thus significantly reduce associated time and costs. There is a strong need and interest in the development of computational methods that yield reliable predictions of toxicity, and many approaches, including the recently introduced deep neural networks, have been leveraged towards this goal. Herein, we report on the collection, curation, and integration of data from the public data sets that were the source of the ChemIDplus database for systemic acute toxicity. These efforts generated the largest publicly available such data set comprising > 80,000 compounds measured against a total of 59 acute systemic toxicity end points. This data was used for developing multiple single- and multitask models utilizing random forest, deep neural networks, convolutional, and graph convolutional neural network approaches. For the first time, we also reported the consensus models based on different multitask approaches. To the best of our knowledge, prediction models for 36 of the 59 end points have never been published before. Furthermore, our results demonstrated a significantly better performance of the consensus model obtained from three multitask learning approaches that particularly predicted the 29 smaller tasks (less than 300 compounds) better than other models developed in the study. The curated data set and the developed models have been made publicly available at https://github.com/ncats/ld50-multitask, https://predictor.ncats.io/, and https://cactus.nci.nih.gov/download/acute-toxicity-db (data set only) to support regulatory and research applications.
Collapse
Affiliation(s)
- Sankalp Jain
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Vishal B Siramshetty
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Vinicius M Alves
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Nicole Kleinstreuer
- Division of Intramural Research, Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Durham, North Carolina 27709, United States.,National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Durham, North Carolina 27709, United States
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Marc C Nicklaus
- Computer-Aided Drug Design (CADD) Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, NCI-Frederick, 376 Boyles Street, Frederick, Maryland 21702, United States
| | - Anton Simeonov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Alexey V Zakharov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| |
Collapse
|
3
|
Vahedi N, Mohammadhosseini M, Nekoei M. QSAR Study of PARP Inhibitors by GA-MLR, GA-SVM and GA-ANN Approaches. CURR ANAL CHEM 2020. [DOI: 10.2174/1573411016999200518083359] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
The poly(ADP-ribose) polymerases (PARP) is a nuclear enzyme superfamily
present in eukaryotes.
Methods:
In the present report, some efficient linear and non-linear methods including multiple linear
regression (MLR), support vector machine (SVM) and artificial neural networks (ANN) were successfully
used to develop and establish quantitative structure-activity relationship (QSAR) models
capable of predicting pEC50 values of tetrahydropyridopyridazinone derivatives as effective PARP
inhibitors. Principal component analysis (PCA) was used to a rational division of the whole data set
and selection of the training and test sets. A genetic algorithm (GA) variable selection method was
employed to select the optimal subset of descriptors that have the most significant contributions to
the overall inhibitory activity from the large pool of calculated descriptors.
Results:
The accuracy and predictability of the proposed models were further confirmed using crossvalidation,
validation through an external test set and Y-randomization (chance correlations) approaches.
Moreover, an exhaustive statistical comparison was performed on the outputs of the proposed
models. The results revealed that non-linear modeling approaches, including SVM and ANN
could provide much more prediction capabilities.
Conclusion:
Among the constructed models and in terms of root mean square error of predictions
(RMSEP), cross-validation coefficients (Q2 LOO and Q2 LGO), as well as R2 and F-statistical value for
the training set, the predictive power of the GA-SVM approach was better. However, compared with
MLR and SVM, the statistical parameters for the test set were more proper using the GA-ANN model.
Collapse
Affiliation(s)
- Nafiseh Vahedi
- Department of Chemistry, College of Basic Sciences, Shahrood Branch, Islamic Azad University, Shahrood, Iran
| | - Majid Mohammadhosseini
- Department of Chemistry, College of Basic Sciences, Shahrood Branch, Islamic Azad University, Shahrood, Iran
| | - Mehdi Nekoei
- Department of Chemistry, College of Basic Sciences, Shahrood Branch, Islamic Azad University, Shahrood, Iran
| |
Collapse
|
4
|
Yuan Y, Pei J, Lai L. LigBuilder V3: A Multi-Target de novo Drug Design Approach. Front Chem 2020; 8:142. [PMID: 32181242 PMCID: PMC7059350 DOI: 10.3389/fchem.2020.00142] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 02/14/2020] [Indexed: 12/18/2022] Open
Abstract
With the rapid development of systems-based pharmacology and poly-pharmacology, method development for rational design of multi-target drugs has becoming urgent. In this paper, we present the first de novo multi-target drug design program LigBuilder V3, which can be used to design ligands to target multiple receptors, multiple binding sites of one receptor, or various conformations of one receptor. LigBuilder V3 is generally applicable in de novo multi-target drug design and optimization, especially for the design of concise ligands for protein targets with large difference in binding sites. To demonstrate the utility of LigBuilder V3, we have used it to design dual-functional inhibitors targeting HIV protease and HIV reverse transcriptase with three different strategy, including multi-target de novo design, multi-target growing, and multi-target linking. The designed compounds were computational validated by MM/GBSA binding free energy estimation as highly potential multi-target inhibitors for both HIV protease and HIV reverse transcriptase. The LigBuilder V3 program can be downloaded at “http://www.pkumdl.cn/ligbuilder3/”.
Collapse
Affiliation(s)
- Yaxia Yuan
- Beijing National Laboratory for Molecular Sciences, State Key Laboratory for Structural Chemistry of Unstable and Stable Species, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Luhua Lai
- Beijing National Laboratory for Molecular Sciences, State Key Laboratory for Structural Chemistry of Unstable and Stable Species, College of Chemistry and Molecular Engineering, Peking University, Beijing, China.,Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China.,Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| |
Collapse
|
5
|
Nava Lara RA, Aguilera-Mendoza L, Brizuela CA, Peña A, Del Rio G. Heterologous Machine Learning for the Identification of Antimicrobial Activity in Human-Targeted Drugs. Molecules 2019; 24:molecules24071258. [PMID: 30935109 PMCID: PMC6479866 DOI: 10.3390/molecules24071258] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 03/09/2019] [Accepted: 03/14/2019] [Indexed: 12/13/2022] Open
Abstract
The emergence of microbes resistant to common antibiotics represent a current treat to human health. It has been recently recognized that non-antibiotic labeled drugs may promote antibiotic-resistance mechanisms in the human microbiome by presenting a secondary antibiotic activity; hence, the development of computer-assisted procedures to identify antibiotic activity in human-targeted compounds may assist in preventing the emergence of resistant microbes. In this regard, it is worth noting that while most antibiotics used to treat human infectious diseases are non-peptidic compounds, most known antimicrobials nowadays are peptides, therefore all computer-based models aimed to predict antimicrobials either use small datasets of non-peptidic compounds rendering predictions with poor reliability or they predict antimicrobial peptides that are not currently used in humans. Here we report a machine-learning-based approach trained to identify gut antimicrobial compounds; a unique aspect of our model is the use of heterologous training sets, in which peptide and non-peptide antimicrobial compounds were used to increase the size of the training data set. Our results show that combining peptide and non-peptide antimicrobial compounds rendered the best classification of gut antimicrobial compounds. Furthermore, this classification model was tested on the latest human-approved drugs expecting to identify antibiotics with broad-spectrum activity and our results show that the model rendered predictions consistent with current knowledge about broad-spectrum antibiotics. Therefore, heterologous machine learning rendered an efficient computational approach to classify antimicrobial compounds.
Collapse
Affiliation(s)
- Rodrigo A Nava Lara
- Department of biochemistry and structural biology, Instituto de Fisiología Celular, UNAM, Mexico City 04510, Mexico.
| | | | - Carlos A Brizuela
- Computer Science Department, CICESE Research Center, Ensenada, Baja California 22860, Mexico.
| | - Antonio Peña
- Department of genetics, Instituto de Fisiología Celular, UNAM, Mexico City 04510, Mexico.
| | - Gabriel Del Rio
- Department of biochemistry and structural biology, Instituto de Fisiología Celular, UNAM, Mexico City 04510, Mexico.
| |
Collapse
|
6
|
Khan K, Kar S, Sanderson H, Roy K, Leszczynski J. Ecotoxicological Modeling, Ranking and Prioritization of Pharmaceuticals Using QSTR and i‐QSTTR Approaches: Application of 2D and Fragment Based Descriptors. Mol Inform 2018; 38:e1800078. [DOI: 10.1002/minf.201800078] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 11/01/2018] [Indexed: 12/22/2022]
Affiliation(s)
- Kabiruddin Khan
- Drug Theoretics and Cheminformatics Laboratory Department of Pharmaceutical Technology Jadavpur University Kolkata 700032 India
| | - Supratik Kar
- Interdisciplinary Center for Nanotoxicity Department of Chemistry, Physics and Atmospheric Sciences Jackson State University Jackson MS-39217 USA
| | - Hans Sanderson
- Department of Environmental Science, Section for Toxicology and Chemistry Aarhus University Frederiksborgvej 399 DK-4000 Roskilde Denmark
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory Department of Pharmaceutical Technology Jadavpur University Kolkata 700032 India
| | - Jerzy Leszczynski
- Interdisciplinary Center for Nanotoxicity Department of Chemistry, Physics and Atmospheric Sciences Jackson State University Jackson MS-39217 USA
| |
Collapse
|
7
|
Kumar A, Sharma A. Computational Modeling of Multi-target-Directed Inhibitors Against Alzheimer’s Disease. NEUROMETHODS 2018. [DOI: 10.1007/978-1-4939-7404-7_19] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
|
8
|
Chen H, Bauer U, Engkvist O. Merged Multiple Ligands. METHODS AND PRINCIPLES IN MEDICINAL CHEMISTRY 2017. [DOI: 10.1002/9783527674381.ch9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Affiliation(s)
- Hongming Chen
- Discovery Sciences, Innovative Medicines and Early Development; AstraZeneca; Pepparedsleden 1 431 83 Mölndal Sweden
| | - Udo Bauer
- Cardiovascular and Metabolic Diseases, Innovative Medicines and Early Development; AstraZeneca; Pepparedsleden 1 431 83 Mölndal Sweden
| | - Ola Engkvist
- Discovery Sciences, Innovative Medicines and Early Development; AstraZeneca; Pepparedsleden 1 431 83 Mölndal Sweden
| |
Collapse
|
9
|
A study of the Immune Epitope Database for some fungi species using network topological indices. Mol Divers 2017; 21:713-718. [DOI: 10.1007/s11030-017-9749-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Accepted: 05/09/2017] [Indexed: 10/19/2022]
|
10
|
Chiddarwar RK, Rohrer SG, Wolf A, Tresch S, Wollenhaupt S, Bender A. In silico target prediction for elucidating the mode of action of herbicides including prospective validation. J Mol Graph Model 2016; 71:70-79. [PMID: 27846423 DOI: 10.1016/j.jmgm.2016.10.021] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2016] [Accepted: 10/25/2016] [Indexed: 01/04/2023]
Abstract
The rapid emergence of pesticide resistance has given rise to a demand for herbicides with new mode of action (MoA). In the agrochemical sector, with the availability of experimental high throughput screening (HTS) data, it is now possible to utilize in silico target prediction methods in the early discovery phase to suggest the MoA of a compound via data mining of bioactivity data. While having been established in the pharmaceutical context, in the agrochemical area this approach poses rather different challenges, as we have found in this work, partially due to different chemistry, but even more so due to different (usually smaller) amounts of data, and different ways of conducting HTS. With the aim to apply computational methods for facilitating herbicide target identification, 48,000 bioactivity data against 16 herbicide targets were processed to train Laplacian modified Naïve Bayesian (NB) classification models. The herbicide target prediction model ("HerbiMod") is an ensemble of 16 binary classification models which are evaluated by internal, external and prospective validation sets. In addition to the experimental inactives, 10,000 random agrochemical inactives were included in the training process, which showed to improve the overall balanced accuracy of our models up to 40%. For all the models, performance in terms of balanced accuracy of≥80% was achieved in five-fold cross validation. Ranking target predictions was addressed by means of z-scores which improved predictivity over using raw scores alone. An external testset of 247 compounds from ChEMBL and a prospective testset of 394 compounds from BASF SE tested against five well studied herbicide targets (ACC, ALS, HPPD, PDS and PROTOX) were used for further validation. Only 4% of the compounds in the external testset lied in the applicability domain and extrapolation (and correct prediction) was hence impossible, which on one hand was surprising, and on the other hand illustrated the utilization of using applicability domains in the first place. However, performance better than 60% in balanced accuracy was achieved on the prospective testset, where all the compounds fell within the applicability domain, and which hence underlines the possibility of using target prediction also in the area of agrochemicals.
Collapse
Affiliation(s)
- Rucha K Chiddarwar
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Sebastian G Rohrer
- Global Research Crop Protection, BASF SE, Speyerer Strasse 2, 67177 Limburgerhof, Germany
| | - Antje Wolf
- Computational Chemistry and Biology, BASF SE, Carl-Bosch-Strasse 38, 67056 Ludwigshafen, Germany
| | - Stefan Tresch
- Global Research Crop Protection, BASF SE, Speyerer Strasse 2, 67177 Limburgerhof, Germany
| | - Sabrina Wollenhaupt
- Computational Chemistry and Biology, BASF SE, Carl-Bosch-Strasse 38, 67056 Ludwigshafen, Germany
| | - Andreas Bender
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom.
| |
Collapse
|
11
|
Speck-Planche A, Kleandrova VV, Ruso JM, Cordeiro MNDS. First Multitarget Chemo-Bioinformatic Model To Enable the Discovery of Antibacterial Peptides against Multiple Gram-Positive Pathogens. J Chem Inf Model 2016; 56:588-98. [PMID: 26960000 DOI: 10.1021/acs.jcim.5b00630] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Antimicrobial peptides (AMPs) have emerged as promising therapeutic alternatives to fight against the diverse infections caused by different pathogenic microorganisms. In this context, theoretical approaches in bioinformatics have paved the way toward the creation of several in silico models capable of predicting antimicrobial activities of peptides. All current models have several significant handicaps, which prevent the efficient search for highly active AMPs. Here, we introduce the first multitarget (mt) chemo-bioinformatic model devoted to performing alignment-free prediction of antibacterial activity of peptides against multiple Gram-positive bacterial strains. The model was constructed from a data set containing 2488 cases of AMPs sequences assayed against at least 1 out of 50 Gram-positive bacterial strains. This mt-chemo-bioinformatic model displayed percentages of correct classification higher than 90.00% in both training and prediction (test) sets. For the first time, two computational approaches derived from basic concepts in genetics and molecular biology were applied, allowing the calculations of the relative contributions of any amino acid (in a defined position) to the antibacterial activity of an AMP and depending on the bacterial strain used in the biological assay. The present mt-chemo-bioinformatic model constitutes a powerful tool to enable the discovery of potent and versatile AMPs.
Collapse
Affiliation(s)
- Alejandro Speck-Planche
- Department of Applied Physics, University of Santiago de Compostela (USC) , 15782 Santiago de Compostela, Spain.,REQUIMTE/Department of Chemistry and Biochemistry, University of Porto , 4169-007 Porto, Portugal
| | - Valeria V Kleandrova
- Faculty of Technology and Production Management, Moscow State University of Food Production , Volokolamskoe shosse 11, 125080 Moscow, Russia
| | - Juan M Ruso
- Department of Applied Physics, University of Santiago de Compostela (USC) , 15782 Santiago de Compostela, Spain
| | - M N D S Cordeiro
- REQUIMTE/Department of Chemistry and Biochemistry, University of Porto , 4169-007 Porto, Portugal
| |
Collapse
|
12
|
Fernandez-Lozano C, Cuiñas RF, Seoane JA, Fernández-Blanco E, Dorado J, Munteanu CR. Classification of signaling proteins based on molecular star graph descriptors using Machine Learning models. J Theor Biol 2015; 384:50-8. [DOI: 10.1016/j.jtbi.2015.07.038] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Revised: 07/20/2015] [Accepted: 07/27/2015] [Indexed: 12/11/2022]
|
13
|
Wang T, Wu MB, Lin JP, Yang LR. Quantitative structure–activity relationship: promising advances in drug discovery platforms. Expert Opin Drug Discov 2015; 10:1283-300. [DOI: 10.1517/17460441.2015.1083006] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
14
|
Speck-Planche A, Cordeiro MNDS. Multitasking models for quantitative structure–biological effect relationships: current status and future perspectives to speed up drug discovery. Expert Opin Drug Discov 2015; 10:245-56. [DOI: 10.1517/17460441.2015.1006195] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
15
|
Medina Marrero R, Marrero-Ponce Y, Barigye SJ, Echeverría Díaz Y, Acevedo-Barrios R, Casañola-Martín GM, García Bernal M, Torrens F, Pérez-Giménez F. QuBiLs-MAS method in early drug discovery and rational drug identification of antifungal agents. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2015; 26:943-58. [PMID: 26567876 DOI: 10.1080/1062936x.2015.1104517] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
The QuBiLs-MAS approach is used for the in silico modelling of the antifungal activity of organic molecules. To this effect, non-stochastic (NS) and simple-stochastic (SS) atom-based quadratic indices are used to codify chemical information for a comprehensive dataset of 2478 compounds having a great structural variability, with 1087 of them being antifungal agents, covering the broadest antifungal mechanisms of action known so far. The NS and SS index-based antifungal activity classification models obtained using linear discriminant analysis (LDA) yield correct classification percentages of 90.73% and 92.47%, respectively, for the training set. Additionally, these models are able to correctly classify 92.16% and 87.56% of 706 compounds in an external test set. A comparison of the statistical parameters of the QuBiLs-MAS LDA-based models with those for models reported in the literature reveals comparable to superior performance, although the latter were built over much smaller and less diverse datasets, representing fewer mechanisms of action. It may therefore be inferred that the QuBiLs-MAS method constitutes a valuable tool useful in the design and/or selection of new and broad spectrum agents against life-threatening fungal infections.
Collapse
Affiliation(s)
- R Medina Marrero
- a Computer-Aided Molecular 'Biosilico' Discovery and Bioinformatic Research International Network (CAMD-BIR-IN) , Cartagena de Indias , Bolivar , Colombia
- b Department of Microbiology , Chemical Bioactive Center, Central University of Las Villas , Villa Clara , Cuba
| | - Y Marrero-Ponce
- a Computer-Aided Molecular 'Biosilico' Discovery and Bioinformatic Research International Network (CAMD-BIR-IN) , Cartagena de Indias , Bolivar , Colombia
- c Grupo de Investigación en Estudios Químicos y Biológicos, Facultad de Ciencias Básicas , Universidad Tecnológica de Bolívar , Cartagena de Indias , Bolívar , Colombia
- d Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia , Universitat de València , Valencia , Spain
- h Grupo de Investigación Microbiología y Ambiente (GIMA) . Programa de Bacteriología, Facultad Ciencias de la Salud, Universidad de San Buenaventura , Calle Real de Ternera, 130010, Cartagena (Bolivar) , Colombia
| | - S J Barigye
- a Computer-Aided Molecular 'Biosilico' Discovery and Bioinformatic Research International Network (CAMD-BIR-IN) , Cartagena de Indias , Bolivar , Colombia
- e Departamento de Química , Universidade Federal de Lavras , Lavras , MG , Brazil
| | - Y Echeverría Díaz
- a Computer-Aided Molecular 'Biosilico' Discovery and Bioinformatic Research International Network (CAMD-BIR-IN) , Cartagena de Indias , Bolivar , Colombia
| | - R Acevedo-Barrios
- c Grupo de Investigación en Estudios Químicos y Biológicos, Facultad de Ciencias Básicas , Universidad Tecnológica de Bolívar , Cartagena de Indias , Bolívar , Colombia
| | - G M Casañola-Martín
- a Computer-Aided Molecular 'Biosilico' Discovery and Bioinformatic Research International Network (CAMD-BIR-IN) , Cartagena de Indias , Bolivar , Colombia
- d Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia , Universitat de València , Valencia , Spain
- f Facultad de Ingeniería Ambiental , Universidad Estatal Amazónica , Puyo , Ecuador
| | - M García Bernal
- b Department of Microbiology , Chemical Bioactive Center, Central University of Las Villas , Villa Clara , Cuba
| | - F Torrens
- g Institut Universitari de Ciència Molecular, Universitat de València , Valencia , Spain
| | - F Pérez-Giménez
- d Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia , Universitat de València , Valencia , Spain
| |
Collapse
|
16
|
Markov mean properties for cell death-related protein classification. J Theor Biol 2014; 349:12-21. [DOI: 10.1016/j.jtbi.2014.01.033] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Revised: 01/21/2014] [Accepted: 01/24/2014] [Indexed: 11/18/2022]
|
17
|
Fernandez-Lozano C, Fernández-Blanco E, Dave K, Pedreira N, Gestal M, Dorado J, Munteanu CR. Improving enzyme regulatory protein classification by means of SVM-RFE feature selection. MOLECULAR BIOSYSTEMS 2014; 10:1063-71. [DOI: 10.1039/c3mb70489k] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
18
|
Mor S, Pahal P, Narasimhan B. Synthesis, characterization, biological evaluation and QSAR studies of 11-p-substituted phenyl-12-phenyl-11a,12-dihydro-11H-indeno[2,1-c][1,5]benzothiazepines as potential antimicrobial agents. Eur J Med Chem 2012; 57:196-210. [DOI: 10.1016/j.ejmech.2012.09.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2012] [Revised: 08/31/2012] [Accepted: 09/03/2012] [Indexed: 10/27/2022]
|
19
|
Speck-Planche A, Kleandrova VV, Luan F, Cordeiro MNDS. Predicting multiple ecotoxicological profiles in agrochemical fungicides: a multi-species chemoinformatic approach. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2012; 80:308-313. [PMID: 22521812 DOI: 10.1016/j.ecoenv.2012.03.018] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2012] [Revised: 03/22/2012] [Accepted: 03/23/2012] [Indexed: 05/31/2023]
Abstract
Agriculture is needed to deal with crop losses caused by biotic stresses like pests. The use of pesticides has played a vital role, contributing to improve crop production and harvest productivity, providing a better crop quality and supply, and consequently contributing with the improvement of the human health. An important group of these pesticides is fungicides. However, the use of these agrochemical fungicides is an important source of contamination, damaging the ecosystems. Several studies have been realized for the assessment of the toxicity in agrochemical fungicides, but the principal limitation is the use of structurally related compounds against usually one indicator species. In order to overcome this problem, we explore the quantitative structure-toxicity relationships (QSTR) in agrochemical fungicides. Here, we developed the first multi-species (ms) chemoinformatic approach for the prediction multiple ecotoxicological profiles of fungicides against 20 indicators species and their classifications in toxic or nontoxic. The ms-QSTR discriminant model was based on substructural descriptors and a heterogeneous database of compounds. The percentages of correct classification were higher than 90% for both, training and prediction series. Also, substructural alerts responsible for the toxicity/no toxicity in fungicides respect all ecotoxicological profiles, were extracted and analyzed.
Collapse
Affiliation(s)
- Alejandro Speck-Planche
- REQUIMTE/Department of Chemistry and Biochemistry, University of Porto, 4169-007 Porto, Portugal.
| | | | | | | |
Collapse
|
20
|
Ma XH, Shi Z, Tan C, Jiang Y, Go ML, Low BC, Chen YZ. In-silico approaches to multi-target drug discovery : computer aided multi-target drug design, multi-target virtual screening. Pharm Res 2010; 27:739-49. [PMID: 20221898 DOI: 10.1007/s11095-010-0065-2] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2009] [Accepted: 01/08/2010] [Indexed: 01/25/2023]
Abstract
Multi-target drugs against selective multiple targets improve therapeutic efficacy, safety and resistance profiles by collective regulations of a primary therapeutic target together with compensatory elements and resistance activities. Efforts have been made to employ in-silico methods for facilitating the search and design of selective multi-target agents. These methods have shown promising potential in facilitating drug discovery directed at selective multiple targets.
Collapse
Affiliation(s)
- Xiao Hua Ma
- Bioinformatics and Drug Design Group, Department of Pharmacy, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore, 117543, Singapore
| | | | | | | | | | | | | |
Collapse
|
21
|
Multi-target spectral moment QSAR versus ANN for antiparasitic drugs against different parasite species. Bioorg Med Chem 2010; 18:2225-2231. [PMID: 20185316 DOI: 10.1016/j.bmc.2010.01.068] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2009] [Revised: 01/22/2010] [Accepted: 01/29/2010] [Indexed: 11/23/2022]
Abstract
There are many of pathogen parasite species with different susceptibility profile to antiparasitic drugs. Unfortunately, almost QSAR models predict the biological activity of drugs against only one parasite species. Consequently, predicting the probability with which a drug is active against different species with a single unify model is a goal of the major importance. In so doing, we use Markov Chains theory to calculate new multi-target spectral moments to fit a QSAR model that predict by the first time a mt-QSAR model for 500 drugs tested in the literature against 16 parasite species and other 207 drugs no tested in the literature using spectral moments. The data was processed by linear discriminant analysis (LDA) classifying drugs as active or non-active against the different tested parasite species. The model correctly classifies 311 out of 358 active compounds (86.9%) and 2328 out of 2577 non-active compounds (90.3%) in training series. Overall training performance was 89.9%. Validation of the model was carried out by means of external predicting series. In these series the model classified correctly 157 out 190, 82.6% of antiparasitic compounds and 1151 out of 1277 non-active compounds (90.1%). Overall predictability performance was 89.2%. In addition we developed four types of non Linear Artificial neural networks (ANN) and we compared with the mt-QSAR model. The improved ANN model had an overall training performance was 87%. The present work report the first attempts to calculate within a unify framework probabilities of antiparasitic action of drugs against different parasite species based on spectral moment analysis.
Collapse
|
22
|
Prado-Prado FJ, Ubeira FM, Borges F, González-DÃaz H. Unified QSAR & network-based computational chemistry approach to antimicrobials. II. Multiple distance and triadic census analysis of antiparasitic drugs complex networks. J Comput Chem 2010; 31:164-73. [DOI: 10.1002/jcc.21292] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
23
|
Semmar N. A New Mixture Design-Based Approach to Graphical Screening of Potential Interconnections and Variability Processes in Metabolic Systems. Chem Biol Drug Des 2010; 75:91-105. [DOI: 10.1111/j.1747-0285.2009.00912.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
24
|
Multi-target spectral moment: QSAR for antiviral drugs vs. different viral species. Anal Chim Acta 2009; 651:159-64. [PMID: 19782806 DOI: 10.1016/j.aca.2009.08.022] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2009] [Revised: 08/05/2009] [Accepted: 08/18/2009] [Indexed: 11/23/2022]
Abstract
The antiviral QSAR models have an important limitation today. They predict the biological activity of drugs against only one viral species. This is determined by the fact that most of the current reported molecular descriptors encode only information about the molecular structure. As a result, predicting the probability with which a drug is active against different viral species with a single unifying model is a goal of major importance. In this work, we use Markov Chain theory to calculate new multi-target spectral moments to fit a QSAR model for drugs active against 40 viral species. The model is based on 500 drugs (including active and non-active compounds) tested as antiviral agents in the recent literature; not all drugs were predicted against all viruses, but only those with experimental values. The database also contains 207 well-known compounds (not as recent as the previous ones) reported in the Merck Index with other activities that do not include antiviral action against any virus species. We used Linear Discriminant Analysis (LDA) to classify all these drugs into two classes as active or non-active against the different viral species tested, whose data we processed. The model correctly classifies 5129 out of 5594 non-active compounds (91.69%) and 412 out of 422 active compounds (97.63%). Overall training predictability was 92.34%. The validation of the model was carried out by means of external predicting series, the model classifying, thus, 2568 out of 2779 non-active compounds and 224 out of 229 active compounds. Overall training predictability was 92.82%. The present work reports the first attempts to calculate within a unified framework the probabilities of antiviral drugs against different virus species based on a spectral moment analysis.
Collapse
|
25
|
Design of novel antituberculosis compounds using graph-theoretical and substructural approaches. Mol Divers 2009; 13:445-58. [PMID: 19340599 DOI: 10.1007/s11030-009-9129-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2008] [Accepted: 02/16/2009] [Indexed: 10/20/2022]
Abstract
The increasing resistance of Mycobacterium tuberculosis to the existing drugs has alarmed the worldwide scientific community. In an attempt to overcome this problem, two models for the design and prediction of new antituberculosis agents were obtained. The first used a mixed approach, containing descriptors based on fragments and the topological substructural molecular design approach (TOPS-MODE) descriptors. The other model used a combination of two-dimensional (2D) and three-dimensional (3D) descriptors. A data set of 167 compounds with great structural variability, 72 of them antituberculosis agents and 95 compounds belonging to other pharmaceutical categories, was analyzed. The first model showed sensitivity, specificity, and accuracy values above 80% and the second one showed values higher than 75% for these statistical indices. Subsequently, 12 structures of imidazoles not included in this study were designed, taking into account the two models. In both cases accuracy was 100%, showing that the methodology in silico developed by us is promising for the rational design of antituberculosis drugs.
Collapse
|
26
|
García I, Munteanu CR, Fall Y, Gómez G, Uriarte E, González-Díaz H. QSAR and complex network study of the chiral HMGR inhibitor structural diversity. Bioorg Med Chem 2009; 17:165-75. [DOI: 10.1016/j.bmc.2008.11.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2008] [Revised: 10/31/2008] [Accepted: 11/06/2008] [Indexed: 10/21/2022]
|
27
|
Prado-Prado FJ, Martinez de la Vega O, Uriarte E, Ubeira FM, Chou KC, González-Díaz H. Unified QSAR approach to antimicrobials. 4. Multi-target QSAR modeling and comparative multi-distance study of the giant components of antiviral drug-drug complex networks. Bioorg Med Chem 2008; 17:569-75. [PMID: 19112024 DOI: 10.1016/j.bmc.2008.11.075] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2008] [Revised: 11/24/2008] [Accepted: 11/28/2008] [Indexed: 11/18/2022]
Abstract
One limitation of almost all antiviral Quantitative Structure-Activity Relationships (QSAR) models is that they predict the biological activity of drugs against only one species of virus. Consequently, the development of multi-tasking QSAR models (mt-QSAR) to predict drugs activity against different species of virus is of the major vitally important. These mt-QSARs offer also a good opportunity to construct drug-drug Complex Networks (CNs) that can be used to explore large and complex drug-viral species databases. It is known that in very large CNs we can use the Giant Component (GC) as a representative sub-set of nodes (drugs) and but the drug-drug similarity function selected may strongly determines the final network obtained. In the three previous works of the present series we reported mt-QSAR models to predict the antimicrobial activity against different fungi [Gonzalez-Diaz, H.; Prado-Prado, F. J.; Santana, L.; Uriarte, E. Bioorg.Med.Chem.2006, 14, 5973], bacteria [Prado-Prado, F. J.; Gonzalez-Diaz, H.; Santana, L.; Uriarte E. Bioorg.Med.Chem.2007, 15, 897] or parasite species [Prado-Prado, F.J.; González-Díaz, H.; Martinez de la Vega, O.; Ubeira, F.M.; Chou K.C. Bioorg.Med.Chem.2008, 16, 5871]. However, including these works, we do not found any report of mt-QSAR models for antivirals drug, or a comparative study of the different GC extracted from drug-drug CNs based on different similarity functions. In this work, we used Linear Discriminant Analysis (LDA) to fit a mt-QSAR model that classify 600 drugs as active or non-active against the 41 different tested species of virus. The model correctly classifies 143 of 169 active compounds (specificity=84.62%) and 119 of 139 non-active compounds (sensitivity=85.61%) and presents overall training accuracy of 85.1% (262 of 308 cases). Validation of the model was carried out by means of external predicting series, classifying the model 466 of 514, 90.7% of compounds. In order to illustrate the performance of the model in practice, we develop a virtual screening recognizing the model as active 92.7%, 102 of 110 antivirus compounds. These compounds were never use in training or predicting series. Next, we obtained and compared the topology of the CNs and their respective GCs based on Euclidean, Manhattan, Chebychey, Pearson and other similarity measures. The GC of the Manhattan network showed the more interesting features for drug-drug similarity search. We also give the procedure for the construction of Back-Projection Maps for the contribution of each drug sub-structure to the antiviral activity against different species.
Collapse
Affiliation(s)
- Francisco J Prado-Prado
- Department of Microbiology and Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, Santiago de Compostela 15782, Spain
| | | | | | | | | | | |
Collapse
|
28
|
Cruz-Monteagudo M, Borges F, Cordeiro MNDS. Desirability-based multiobjective optimization for global QSAR studies: application to the design of novel NSAIDs with improved analgesic, antiinflammatory, and ulcerogenic profiles. J Comput Chem 2008; 29:2445-59. [PMID: 18452123 DOI: 10.1002/jcc.20994] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Up to now, very few reports have been published concerning the application of multiobjective optimization (MOOP) techniques to quantitative structure-activity relationship (QSAR) studies. However, none reports the optimization of objectives related directly to the desired pharmaceutical profile of the drug. In this work, for the first time, it is proposed a MOOP method based on Derringer's desirability function that allows conducting global QSAR studies considering simultaneously the pharmacological, pharmacokinetic and toxicological profile of a set of molecule candidates. The usefulness of the method is demonstrated by applying it to the simultaneous optimization of the analgesic, antiinflammatory, and ulcerogenic properties of a library of fifteen 3-(3-methylphenyl)-2-substituted amino-3H-quinazolin-4-one compounds. The levels of the predictor variables producing concurrently the best possible compromise between these properties is found and used to design a set of new optimized drug candidates. Our results also suggest the relevant role of the bulkiness of alkyl substituents on the C-2 position of the quinazoline ring over the ulcerogenic properties for this family of compounds. Finally, and most importantly, the desirability-based MOOP method proposed is a valuable tool and shall aid in the future rational design of novel successful drugs.
Collapse
Affiliation(s)
- Maykel Cruz-Monteagudo
- Physico-Chemical Molecular Research Unit, Department of Organic Chemistry, Faculty of Pharmacy, University of Porto, 4150-047 Porto, Portugal.
| | | | | |
Collapse
|
29
|
Cruz-Monteagudo M, Borges F, Cordeiro MNDS, Cagide Fajin JL, Morell C, Ruiz RM, Cañizares-Carmenate Y, Dominguez ER. Desirability-based methods of multiobjective optimization and ranking for global QSAR studies. Filtering safe and potent drug candidates from combinatorial libraries. ACTA ACUST UNITED AC 2008; 10:897-913. [PMID: 18855460 DOI: 10.1021/cc800115y] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Up to now, very few applications of multiobjective optimization (MOOP) techniques to quantitative structure-activity relationship (QSAR) studies have been reported in the literature. However, none of them report the optimization of objectives related directly to the final pharmaceutical profile of a drug. In this paper, a MOOP method based on Derringer's desirability function that allows conducting global QSAR studies, simultaneously considering the potency, bioavailability, and safety of a set of drug candidates, is introduced. The results of the desirability-based MOOP (the levels of the predictor variables concurrently producing the best possible compromise between the properties determining an optimal drug candidate) are used for the implementation of a ranking method that is also based on the application of desirability functions. This method allows ranking drug candidates with unknown pharmaceutical properties from combinatorial libraries according to the degree of similarity with the previously determined optimal candidate. Application of this method will make it possible to filter the most promising drug candidates of a library (the best-ranked candidates), which should have the best pharmaceutical profile (the best compromise between potency, safety and bioavailability). In addition, a validation method of the ranking process, as well as a quantitative measure of the quality of a ranking, the ranking quality index (Psi), is proposed. The usefulness of the desirability-based methods of MOOP and ranking is demonstrated by its application to a library of 95 fluoroquinolones, reporting their gram-negative antibacterial activity and mammalian cell cytotoxicity. Finally, the combined use of the desirability-based methods of MOOP and ranking proposed here seems to be a valuable tool for rational drug discovery and development.
Collapse
Affiliation(s)
- Maykel Cruz-Monteagudo
- Physico-Chemical Molecular Research Unit, Department of Organic Chemistry, Faculty of Pharmacy, REQUIMTE, Department of Chemistry, and CIQ-UP, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal.
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Munteanu CR, González-Díaz H, Magalhães AL. Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices. J Theor Biol 2008; 254:476-82. [PMID: 18606172 DOI: 10.1016/j.jtbi.2008.06.003] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2008] [Revised: 05/15/2008] [Accepted: 06/06/2008] [Indexed: 10/21/2022]
Abstract
The huge amount of new proteins that need a fast enzymatic activity characterization creates demands of protein QSAR theoretical models. The protein parameters that can be used for an enzyme/non-enzyme classification includes the simpler indices such as composition, sequence and connectivity, also called topological indices (TIs) and the computationally expensive 3D descriptors. A comparison of the 3D versus lower dimension indices has not been reported with respect to the power of discrimination of proteins according to enzyme action. A set of 966 proteins (enzymes and non-enzymes) whose structural characteristics are provided by PDB/DSSP files was analyzed with Python/Biopython scripts, STATISTICA and Weka. The list of indices includes, but it is not restricted to pure composition indices (residue fractions), DSSP secondary structure protein composition and 3D indices (surface and access). We also used mixed indices such as composition-sequence indices (Chou's pseudo-amino acid compositions or coupling numbers), 3D-composition (surface fractions) and DSSP secondary structure amino acid composition/propensities (obtained with our Prot-2S Web tool). In addition, we extend and test for the first time several classic TIs for the Randic's protein sequence Star graphs using our Sequence to Star Graph (S2SG) Python application. All the indices were processed with general discriminant analysis models (GDA), neural networks (NN) and machine learning (ML) methods and the results are presented versus complexity, average of Shannon's information entropy (Sh) and data/method type. This study compares for the first time all these classes of indices to assess the ratios between model accuracy and indices/model complexity in enzyme/non-enzyme discrimination. The use of different methods and complexity of data shows that one cannot establish a direct relation between the complexity and the accuracy of the model.
Collapse
Affiliation(s)
- Cristian Robert Munteanu
- REQUIMTE/Faculty of Science, Chemistry Department, University of Porto, Porto 4169-007, Portugal.
| | | | | |
Collapse
|
31
|
Prado-Prado FJ, González-Díaz H, de la Vega OM, Ubeira FM, Chou KC. Unified QSAR approach to antimicrobials. Part 3: first multi-tasking QSAR model for input-coded prediction, structural back-projection, and complex networks clustering of antiprotozoal compounds. Bioorg Med Chem 2008; 16:5871-80. [PMID: 18485714 DOI: 10.1016/j.bmc.2008.04.068] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Revised: 04/22/2008] [Accepted: 04/25/2008] [Indexed: 10/22/2022]
Abstract
Several pathogen parasite species show different susceptibilities to different antiparasite drugs. Unfortunately, almost all structure-based methods are one-task or one-target Quantitative Structure-Activity Relationships (ot-QSAR) that predict the biological activity of drugs against only one parasite species. Consequently, multi-tasking learning to predict drugs activity against different species by a single model (mt-QSAR) is vitally important. In the two previous works of the present series we reported two single mt-QSAR models in order to predict the antimicrobial activity against different fungal (Bioorg. Med. Chem.2006, 14, 5973-5980) or bacterial species (Bioorg. Med. Chem.2007, 15, 897-902). These mt-QSARs offer a good opportunity (unpractical with ot-QSAR) to construct drug-drug similarity Complex Networks and to map the contribution of sub-structures to function for multiple species. These possibilities were unattended in our previous works. In the present work, we continue this series toward other important direction of chemotherapy (antiparasite drugs) with the development of an mt-QSAR for more than 500 drugs tested in the literature against different parasites. The data were processed by Linear Discriminant Analysis (LDA) classifying drugs as active or non-active against the different tested parasite species. The model correctly classifies 212 out of 244 (87.0%) cases in training series and 207 out of 243 compounds (85.4%) in external validation series. In order to illustrate the performance of the QSAR for the selection of active drugs we carried out an additional virtual screening of antiparasite compounds not used in training or predicting series; the model recognized 97 out of 114 (85.1%) of them. We also give the procedures to construct back-projection maps and to calculate sub-structures contribution to the biological activity. Finally, we used the outputs of the QSAR to construct, by the first time, a multi-species Complex Networks of antiparasite drugs. The network predicted has 380 nodes (compounds), 634 edges (pairs of compounds with similar activity). This network allows us to cluster different compounds and identify on average three known compounds similar to a new query compound according to their profile of biological activity. This is the first attempt to calculate probabilities of antiparasitic action of drugs against different parasites.
Collapse
|
32
|
GonzÁlez-DÍaz H, Prado-Prado FJ. Unified QSAR and network-based computational chemistry approach to antimicrobials, part 1: Multispecies activity models for antifungals. J Comput Chem 2007; 29:656-67. [DOI: 10.1002/jcc.20826] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
33
|
Cruz-Monteagudo M, Cordeiro MNDS, Borges F. Computational chemistry approach for the early detection of drug-induced idiosyncratic liver toxicity. J Comput Chem 2007; 29:533-49. [PMID: 17705164 DOI: 10.1002/jcc.20812] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Idiosyncratic drug toxicity (IDT), considered as a toxic host-dependent event, with an apparent lack of dose response relationship, is usually not predictable from early phases of clinical trials, representing a particularly confounding complication in drug development. Albeit a rare event (usually <1/5000), IDT is often life threatening and is one of the major reasons new drugs never reach the market or are withdrawn post marketing. Computational methodologies, like the computer-based approach proposed in the present study, can play an important role in addressing IDT in early drug discovery. We report for the first time a systematic evaluation of classification models to predict idiosyncratic hepatotoxicity based on linear discriminant analysis (LDA), artificial neural networks (ANN), and machine learning algorithms (OneR) in conjunction with a 3D molecular structure representation and feature selection methods. These modeling techniques (LDA, feature selection to prevent over-fitting and multicollinearity, ANN to capture nonlinear relationships in the data, as well as the simple OneR classifier) were found to produce QSTR models with satisfactory internal cross-validation statistics and predictivity on an external subset of chemicals. More specifically, the models reached values of accuracy/sensitivity/specificity over 84%/78%/90%, respectively in the training series along with predictivity values ranging from ca. 78 to 86% of correctly classified drugs. An LDA-based desirability analysis was carried out in order to select the levels of the predictor variables needed to trigger the more desirable drug, i.e. the drug with lower potential for idiosyncratic hepatotoxicity. Finally, two external test sets were used to evaluate the ability of the models in discriminating toxic from nontoxic structurally and pharmacologically related drugs and the ability of the best model (LDA) in detecting potential idiosyncratic hepatotoxic drugs, respectively. The computational approach proposed here can be considered as a useful tool in early IDT prognosis.
Collapse
Affiliation(s)
- Maykel Cruz-Monteagudo
- Physico-Chemical Molecular Research Unit, Department of Organic Chemistry, Faculty of Pharmacy, University of Porto, 4150-047 Porto, Portugal
| | | | | |
Collapse
|