1
|
Peteani G, Huynh MTD, Gerebtzoff G, Rodríguez-Pérez R. Application of machine learning models for property prediction to targeted protein degraders. Nat Commun 2024; 15:5764. [PMID: 38982061 PMCID: PMC11233499 DOI: 10.1038/s41467-024-49979-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 06/21/2024] [Indexed: 07/11/2024] Open
Abstract
Machine learning (ML) systems can model quantitative structure-property relationships (QSPR) using existing experimental data and make property predictions for new molecules. With the advent of modalities such as targeted protein degraders (TPD), the applicability of QSPR models is questioned and ML usage in TPD-centric projects remains limited. Herein, ML models are developed and evaluated for TPDs' property predictions, including passive permeability, metabolic clearance, cytochrome P450 inhibition, plasma protein binding, and lipophilicity. Interestingly, performance on TPDs is comparable to that of other modalities. Predictions for glues and heterobifunctionals often yield lower and higher errors, respectively. For permeability, CYP3A4 inhibition, and human and rat microsomal clearance, misclassification errors into high and low risk categories are lower than 4% for glues and 15% for heterobifunctionals. For all modalities, misclassification errors range from 0.8% to 8.1%. Investigated transfer learning strategies improve predictions for heterobifunctionals. This is the first comprehensive evaluation of ML for the prediction of absorption, distribution, metabolism, and excretion (ADME) and physicochemical properties of TPD molecules, including heterobifunctional and molecular glue sub-modalities. Taken together, our investigations show that ML-based QSPR models are applicable to TPDs and support ML usage for TPDs' design, to potentially accelerate drug discovery.
Collapse
Affiliation(s)
- Giulia Peteani
- Novartis Biomedical Research, Novartis Campus, 4002, Basel, Switzerland
| | | | | | | |
Collapse
|
2
|
Fluetsch A, Trunzer M, Gerebtzoff G, Rodríguez-Pérez R. Deep Learning Models Compared to Experimental Variability for the Prediction of CYP3A4 Time-Dependent Inhibition. Chem Res Toxicol 2024; 37:549-560. [PMID: 38501689 DOI: 10.1021/acs.chemrestox.3c00305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Most drugs are mainly metabolized by cytochrome P450 (CYP450), which can lead to drug-drug interactions (DDI). Specifically, time-dependent inhibition (TDI) of CYP3A4 isoenzyme has been associated with clinically relevant DDI. To overcome potential DDI issues, high-throughput in vitro assays were established to assess the TDI of CYP3A4 during the discovery and lead optimization phases. However, in silico machine learning models would enable an earlier and larger-scale assessment of TDI potential liabilities. For CYP inhibition, most modeling efforts have focused on highly imbalanced and small data sets. Moreover, assay variability is rarely considered, which is key to understand the model's quality and suitability for decision-making. In this work, machine learning models were built for the prediction of TDI of CYP3A4, evaluated prospectively, and compared to the variability of the experimental assay. Different modeling strategies were investigated to assess their influence on the model's performance. Through multitask learning, additional data sets were leveraged for model building, coming from public databases, in-house CYP-related assays, or other pharmaceutical companies (federated learning). Apart from the numerical prediction of inactivation rates of CYP3A4 TDI, three-class predictions were carried out, giving a negative (inactivation rate kobs < 0.01 min-1), weak positive (0.01 ≤ kobs ≤ 0.025 min-1), or positive (kobs > 0.025 min-1) output. The final multitask graph neural network model achieved misclassification rates of 8 and 7% for positive and negative TDI, respectively. Importantly, the presented deep learning-based predictions had a similar precision to the reproducibility of in vitro experiments and thus offered great opportunities for drug design, early derisk of DDI potential, and selection of experiments. To facilitate CYP inhibition modeling efforts in the public domain, the developed model was used to annotate ∼16 000 publicly available structures, and a surrogate data set is shared as Supporting Information.
Collapse
Affiliation(s)
- Andrin Fluetsch
- Novartis Biomedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Markus Trunzer
- Novartis Biomedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Grégori Gerebtzoff
- Novartis Biomedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | | |
Collapse
|
3
|
Bassani D, Brigo A, Andrews-Morger A. Federated Learning in Computational Toxicology: An Industrial Perspective on the Effiris Hackathon. Chem Res Toxicol 2023; 36:1503-1517. [PMID: 37584277 PMCID: PMC10523574 DOI: 10.1021/acs.chemrestox.3c00137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Indexed: 08/17/2023]
Abstract
In silico approaches have acquired a towering role in pharmaceutical research and development, allowing laboratories all around the world to design, create, and optimize novel molecular entities with unprecedented efficiency. From a toxicological perspective, computational methods have guided the choices of medicinal chemists toward compounds displaying improved safety profiles. Even if the recent advances in the field are significant, many challenges remain active in the on-target and off-target prediction fields. Machine learning methods have shown their ability to identify molecules with safety concerns. However, they strongly depend on the abundance and diversity of data used for their training. Sharing such information among pharmaceutical companies remains extremely limited due to confidentiality reasons, but in this scenario, a recent concept named "federated learning" can help overcome such concerns. Within this framework, it is possible for companies to contribute to the training of common machine learning algorithms, using, but not sharing, their proprietary data. Very recently, Lhasa Limited organized a hackathon involving several industrial partners in order to assess the performance of their federated learning platform, called "Effiris". In this paper, we share our experience as Roche in participating in such an event, evaluating the performance of the federated algorithms and comparing them with those coming from our in-house-only machine learning models. Our aim is to highlight the advantages of federated learning and its intrinsic limitations and also suggest some points for potential improvements in the method.
Collapse
Affiliation(s)
- Davide Bassani
- Pharmaceutical Research &
Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., 4070 Basel, Switzerland
| | - Alessandro Brigo
- Pharmaceutical Research &
Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., 4070 Basel, Switzerland
| | - Andrea Andrews-Morger
- Pharmaceutical Research &
Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., 4070 Basel, Switzerland
| |
Collapse
|
4
|
Klambauer G, Clevert DA, Shah I, Benfenati E, Tetko IV. Introduction to the Special Issue: AI Meets Toxicology. Chem Res Toxicol 2023; 36:1163-1167. [PMID: 37599584 DOI: 10.1021/acs.chemrestox.3c00217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/22/2023]
Affiliation(s)
- Günter Klambauer
- ELLIS Unit Linz, LIT AI Lab & Institute for Machine Learning, Johannes Kepler University Linz, Altenbergerstraße 69, Linz 4040, Austria
| | - Djork-Arné Clevert
- Machine Learning Research, Pfizer Worldwide Research Development and Medical, Linkstr. 10, Berlin 10785, Germany
| | - Imran Shah
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Emilio Benfenati
- Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milano 20156, Italy
| | - Igor V Tetko
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), 85764 Neuherberg, Germany
- BIGCHEM GmbH, Valerystr. 49, 85716 Unterschleißheim, Germany
| |
Collapse
|
5
|
Gupta R, Srivastava D, Sahu M, Tiwari S, Ambasta RK, Kumar P. Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers 2021; 25:1315-1360. [PMID: 33844136 PMCID: PMC8040371 DOI: 10.1007/s11030-021-10217-3] [Citation(s) in RCA: 269] [Impact Index Per Article: 89.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 03/22/2021] [Indexed: 02/06/2023]
Abstract
Drug designing and development is an important area of research for pharmaceutical companies and chemical scientists. However, low efficacy, off-target delivery, time consumption, and high cost impose a hurdle and challenges that impact drug design and discovery. Further, complex and big data from genomics, proteomics, microarray data, and clinical trials also impose an obstacle in the drug discovery pipeline. Artificial intelligence and machine learning technology play a crucial role in drug discovery and development. In other words, artificial neural networks and deep learning algorithms have modernized the area. Machine learning and deep learning algorithms have been implemented in several drug discovery processes such as peptide synthesis, structure-based virtual screening, ligand-based virtual screening, toxicity prediction, drug monitoring and release, pharmacophore modeling, quantitative structure-activity relationship, drug repositioning, polypharmacology, and physiochemical activity. Evidence from the past strengthens the implementation of artificial intelligence and deep learning in this field. Moreover, novel data mining, curation, and management techniques provided critical support to recently developed modeling algorithms. In summary, artificial intelligence and deep learning advancements provide an excellent opportunity for rational drug design and discovery process, which will eventually impact mankind. The primary concern associated with drug design and development is time consumption and production cost. Further, inefficiency, inaccurate target delivery, and inappropriate dosage are other hurdles that inhibit the process of drug delivery and development. With advancements in technology, computer-aided drug design integrating artificial intelligence algorithms can eliminate the challenges and hurdles of traditional drug design and development. Artificial intelligence is referred to as superset comprising machine learning, whereas machine learning comprises supervised learning, unsupervised learning, and reinforcement learning. Further, deep learning, a subset of machine learning, has been extensively implemented in drug design and development. The artificial neural network, deep neural network, support vector machines, classification and regression, generative adversarial networks, symbolic learning, and meta-learning are examples of the algorithms applied to the drug design and discovery process. Artificial intelligence has been applied to different areas of drug design and development process, such as from peptide synthesis to molecule design, virtual screening to molecular docking, quantitative structure-activity relationship to drug repositioning, protein misfolding to protein-protein interactions, and molecular pathway identification to polypharmacology. Artificial intelligence principles have been applied to the classification of active and inactive, monitoring drug release, pre-clinical and clinical development, primary and secondary drug screening, biomarker development, pharmaceutical manufacturing, bioactivity identification and physiochemical properties, prediction of toxicity, and identification of mode of action.
Collapse
Affiliation(s)
- Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Devesh Srivastava
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Mehar Sahu
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Swati Tiwari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India.
| |
Collapse
|
6
|
Plante J, Caine BA, Popelier PLA. Enhancing Carbon Acid pK a Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values. Molecules 2021; 26:1048. [PMID: 33671348 PMCID: PMC7922142 DOI: 10.3390/molecules26041048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Revised: 02/08/2021] [Accepted: 02/11/2021] [Indexed: 11/25/2022] Open
Abstract
The prediction of the aqueous pKa of carbon acids by Quantitative Structure Property Relationship or cheminformatics-based methods is a rather arduous problem. Primarily, there are insufficient high-quality experimental data points measured in homogeneous conditions to allow for a good global model to be generated. In our computationally efficient pKa prediction method, we generate an atom-type feature vector, called a distance spectrum, from the assigned ionisation atom, and learn coefficients for those atom-types that show the impact each atom-type has on the pKa of the ionisable centre. In the current work, we augment our dataset with pKa values from a series of high performing local models derived from the Ab Initio Bond Lengths method (AIBL). We find that, in distilling the knowledge available from multiple models into one general model, the prediction error for an external test set is reduced compared to that using literature experimental data alone.
Collapse
Affiliation(s)
- Jeffrey Plante
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, UK;
| | - Beth A. Caine
- Manchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester M1 7DN, UK;
| | - Paul L. A. Popelier
- Manchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester M1 7DN, UK;
- Department of Chemistry, University of Manchester, Oxford Road, Manchester M13 9PL, UK
| |
Collapse
|
7
|
Withnall M, Lindelöf E, Engkvist O, Chen H. Building attention and edge message passing neural networks for bioactivity and physical-chemical property prediction. J Cheminform 2020; 12:1. [PMID: 33430988 PMCID: PMC6951016 DOI: 10.1186/s13321-019-0407-y] [Citation(s) in RCA: 81] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Accepted: 12/25/2019] [Indexed: 01/01/2023] Open
Abstract
Neural Message Passing for graphs is a promising and relatively recent approach for applying Machine Learning to networked data. As molecules can be described intrinsically as a molecular graph, it makes sense to apply these techniques to improve molecular property prediction in the field of cheminformatics. We introduce Attention and Edge Memory schemes to the existing message passing neural network framework, and benchmark our approaches against eight different physical–chemical and bioactivity datasets from the literature. We remove the need to introduce a priori knowledge of the task and chemical descriptor calculation by using only fundamental graph-derived properties. Our results consistently perform on-par with other state-of-the-art machine learning approaches, and set a new standard on sparse multi-task virtual screening targets. We also investigate model performance as a function of dataset preprocessing, and make some suggestions regarding hyperparameter selection.
Collapse
Affiliation(s)
- M Withnall
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| | - E Lindelöf
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| | - O Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - H Chen
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.,Centre of Chemistry and Chemical Biology, Guangzhou Regenerative Medicine and Health-Guangdong Laboratory, 190 Kai Yuan Avenue, Science Park, Guangzhou, China
| |
Collapse
|
8
|
Plante J, Werner S. JPlogP: an improved logP predictor trained using predicted data. J Cheminform 2018; 10:61. [PMID: 30552535 PMCID: PMC6755606 DOI: 10.1186/s13321-018-0316-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 12/03/2018] [Indexed: 11/30/2022] Open
Abstract
The partition coefficient between octanol and water (logP) has been an important descriptor in QSAR predictions for many years and therefore the prediction of logP has been examined countless times. One of the best performing models is to predict the logP using multiple methods and average the result. We have used those averaged predictions to develop a training-set which was able to distil the information present across the disparate logP methods into one single model. Our model was built using extendable atom-types, where each atom is distilled down into a 6 digit number, and each individual atom is assumed to have a small additive effect on the overall logP of the molecule. Beyond the simple coefficient model a consensus model is evaluated, which uses known compounds as a starting point in the calculation and modifies the experimental logP using the same coefficients as in the first model. We then test the performance of our models against two different datasets, one where many different models routinely perform well against, and another designed to more represent pharmaceutical space. The true strength of the model is represented in the pharmaceutical benchmark set, where both models perform better than any previously developed models.
Collapse
Affiliation(s)
- Jeffrey Plante
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS UK
| | - Stephane Werner
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS UK
| |
Collapse
|
9
|
Pastor M, Quintana J, Sanz F. Development of an Infrastructure for the Prediction of Biological Endpoints in Industrial Environments. Lessons Learned at the eTOX Project. Front Pharmacol 2018; 9:1147. [PMID: 30364191 PMCID: PMC6193068 DOI: 10.3389/fphar.2018.01147] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 09/21/2018] [Indexed: 11/13/2022] Open
Abstract
In silico methods are increasingly being used for assessing the chemical safety of substances, as a part of integrated approaches involving in vitro and in vivo experiments. A paradigmatic example of these strategies is the eTOX project http://www.etoxproject.eu, funded by the European Innovative Medicines Initiative (IMI), which aimed at producing high quality predictions of in vivo toxicity of drug candidates and resulted in generating about 200 models for diverse endpoints of toxicological interest. In an industry-oriented project like eTOX, apart from the predictive quality, the models need to meet other quality parameters related to the procedures for their generation and their intended use. For example, when the models are used for predicting the properties of drug candidates, the prediction system must guarantee the complete confidentiality of the compound structures. The interface of the system must be designed to provide non-expert users all the information required to choose the models and appropriately interpret the results. Moreover, procedures like installation, maintenance, documentation, validation and versioning, which are common in software development, must be also implemented for the models and for the prediction platform in which they are implemented. In this article we describe our experience in the eTOX project and the lessons learned after 7 years of close collaboration between industrial and academic partners. We believe that some of the solutions found and the tools developed could be useful for supporting similar initiatives in the future.
Collapse
Affiliation(s)
| | | | - Ferran Sanz
- *Correspondence: Manuel Pastor, Ferran Sanz,
| |
Collapse
|
10
|
Tetko IV, Engkvist O, Koch U, Reymond JL, Chen H. BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry. Mol Inform 2016; 35:615-621. [PMID: 27464907 PMCID: PMC5129546 DOI: 10.1002/minf.201600073] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 07/06/2016] [Indexed: 01/19/2023]
Abstract
The increasing volume of biomedical data in chemistry and life sciences requires the development of new methods and approaches for their handling. Here, we briefly discuss some challenges and opportunities of this fast growing area of research with a focus on those to be addressed within the BIGCHEM project. The article starts with a brief description of some available resources for “Big Data” in chemistry and a discussion of the importance of data quality. We then discuss challenges with visualization of millions of compounds by combining chemical and biological data, the expectations from mining the “Big Data” using advanced machine‐learning methods, and their applications in polypharmacology prediction and target de‐convolution in phenotypic screening. We show that the efficient exploration of billions of molecules requires the development of smart strategies. We also address the issue of secure information sharing without disclosing chemical structures, which is critical to enable bi‐party or multi‐party data sharing. Data sharing is important in the context of the recent trend of “open innovation” in pharmaceutical industry, which has led to not only more information sharing among academics and pharma industries but also the so‐called “precompetitive” collaboration between pharma companies. At the end we highlight the importance of education in “Big Data” for further progress of this area.
Collapse
Affiliation(s)
- Igor V Tetko
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, b. 60w, D-85764, Neuherberg, Germany.,BIGCHEM GmbH, Ingolstädter Landstraße 1, b. 60w, D-85764, Neuherberg, Germany
| | - Ola Engkvist
- Discovery Sciences, AstraZeneca R&D Gothenburg, Pepparedsleden 1, Mölndal, SE-43183, Sweden
| | - Uwe Koch
- Lead Discovery Center GmbH, Otto-Hahn Strasse 15, Dortmund, 44227, Germany
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Hongming Chen
- Discovery Sciences, AstraZeneca R&D Gothenburg, Pepparedsleden 1, Mölndal, SE-43183, Sweden
| |
Collapse
|
11
|
Carrió P, López O, Sanz F, Pastor M. eTOXlab, an open source modeling framework for implementing predictive models in production environments. J Cheminform 2015; 7:8. [PMID: 25774224 PMCID: PMC4358905 DOI: 10.1186/s13321-015-0058-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2014] [Accepted: 02/24/2015] [Indexed: 11/10/2022] Open
Abstract
Background Computational models based in Quantitative-Structure Activity Relationship (QSAR) methodologies are widely used tools for predicting the biological properties of new compounds. In many instances, such models are used as a routine in the industry (e.g. food, cosmetic or pharmaceutical industry) for the early assessment of the biological properties of new compounds. However, most of the tools currently available for developing QSAR models are not well suited for supporting the whole QSAR model life cycle in production environments. Results We have developed eTOXlab; an open source modeling framework designed to be used at the core of a self-contained virtual machine that can be easily deployed in production environments, providing predictions as web services. eTOXlab consists on a collection of object-oriented Python modules with methods mapping common tasks of standard modeling workflows. This framework allows building and validating QSAR models as well as predicting the properties of new compounds using either a command line interface or a graphic user interface (GUI). Simple models can be easily generated by setting a few parameters, while more complex models can be implemented by overriding pieces of the original source code. eTOXlab benefits from the object-oriented capabilities of Python for providing high flexibility: any model implemented using eTOXlab inherits the features implemented in the parent model, like common tools and services or the automatic exposure of the models as prediction web services. The particular eTOXlab architecture as a self-contained, portable prediction engine allows building models with confidential information within corporate facilities, which can be safely exported and used for prediction without disclosing the structures of the training series. Conclusions The software presented here provides full support to the specific needs of users that want to develop, use and maintain predictive models in corporate environments. The technologies used by eTOXlab (web services, VM, object-oriented programming) provide an elegant solution to common practical issues; the system can be installed easily in heterogeneous environments and integrates well with other software. Moreover, the system provides a simple and safe solution for building models with confidential structures that can be shared without disclosing sensitive information. Electronic supplementary material The online version of this article (doi:10.1186/s13321-015-0058-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pau Carrió
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM (Hospital del Mar Medical Research Institute), Dr. Aiguader 88, E-08003 Barcelona, Spain
| | - Oriol López
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM (Hospital del Mar Medical Research Institute), Dr. Aiguader 88, E-08003 Barcelona, Spain
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM (Hospital del Mar Medical Research Institute), Dr. Aiguader 88, E-08003 Barcelona, Spain
| | - Manuel Pastor
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM (Hospital del Mar Medical Research Institute), Dr. Aiguader 88, E-08003 Barcelona, Spain
| |
Collapse
|
12
|
Ekins S, Clark AM, Swamidass SJ, Litterman N, Williams AJ. Bigger data, collaborative tools and the future of predictive drug discovery. J Comput Aided Mol Des 2014; 28:997-1008. [PMID: 24943138 PMCID: PMC4198464 DOI: 10.1007/s10822-014-9762-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2014] [Accepted: 06/09/2014] [Indexed: 12/31/2022]
Abstract
Over the past decade we have seen a growth in the provision of chemistry data and cheminformatics tools as either free websites or software as a service commercial offerings. These have transformed how we find molecule-related data and use such tools in our research. There have also been efforts to improve collaboration between researchers either openly or through secure transactions using commercial tools. A major challenge in the future will be how such databases and software approaches handle larger amounts of data as it accumulates from high throughput screening and enables the user to draw insights, enable predictions and move projects forward. We now discuss how information from some drug discovery datasets can be made more accessible and how privacy of data should not overwhelm the desire to share it at an appropriate time with collaborators. We also discuss additional software tools that could be made available and provide our thoughts on the future of predictive drug discovery in this age of big data. We use some examples from our own research on neglected diseases, collaborations, mobile apps and algorithm development to illustrate these ideas.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC, 27526, USA,
| | | | | | | | | |
Collapse
|
13
|
Matlock M, Swamidass SJ. Sharing chemical relationships does not reveal structures. J Chem Inf Model 2013; 54:37-48. [PMID: 24289228 DOI: 10.1021/ci400399a] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
In this study, we propose a new, secure method of sharing useful chemical information from small-molecule libraries, without revealing the structures of the libraries' molecules. Our method shares the relationship between molecules rather than structural descriptors. This is an important advance because, over the past few years, several groups have developed and published new methods of analyzing small-molecule screening data. These methods include advanced hit-picking protocols, promiscuous active filters, economic optimization algorithms, and screening visualizations, which can identify patterns in the data that might otherwise be overlooked. Application of these methods to private data requires finding strategies for sharing useful chemical data without revealing chemical structures. This problem has been examined in the context of ADME prediction models, with results from information theory suggesting it is impossible to share useful chemical information without revealing structures. In contrast, we present a new strategy for encoding the relationships between molecules instead of their structures, based on anonymized scaffold networks and trees, that safely shares enough chemical information to be useful in analyzing chemical data, while also sufficiently blinding structures from discovery. We present the details of this encoding, an analysis of the usefulness of the information it conveys, and the security of the structures it encodes. This approach makes it possible to share data across institutions, and may securely enable collaborative analysis that can yield insight into both specific projects and screening technology as a whole.
Collapse
Affiliation(s)
- Matthew Matlock
- Washington University School of Medicine , Department of Pathology and Immunology, St. Louis, Missouri 63110, United States
| | | |
Collapse
|
14
|
Abstract
An associative neural network (ASNN) is an ensemble-based method inspired by the function and structure of neural network correlations in brain. The method operates by simulating the short- and long-term memory of neural networks. The long-term memory is represented by ensemble of neural network weights, while the short-term memory is stored as a pool of internal neural network representations of the input pattern. The organization allows the ASNN to incorporate new data cases in short-term memory and provides high generalization ability without the need to retrain the neural network weights. The method can be used to estimate a bias and the applicability domain of models. Applications of the ASNN in QSAR and drug design are exemplified. The developed algorithm is available at http://www.vcclab.org.
Collapse
Affiliation(s)
- Igor V Tetko
- GSF--Institute for Bioinformatics, Neuherberg, Germany
| |
Collapse
|
15
|
Benigni R, Netzeva TI, Benfenati E, Bossa C, Franke R, Helma C, Hulzebos E, Marchant C, Richard A, Woo YT, Yang C. The expanding role of predictive toxicology: an update on the (Q)SAR models for mutagens and carcinogens. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, ENVIRONMENTAL CARCINOGENESIS & ECOTOXICOLOGY REVIEWS 2007; 25:53-97. [PMID: 17365342 DOI: 10.1080/10590500701201828] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Different regulatory schemes worldwide, and in particular, the preparation for the new REACH (Registration, Evaluation and Authorization of CHemicals) legislation in Europe, increase the reliance on estimation methods for predicting potential chemical hazard. To meet the increased expectations, the availability of valid (Q)SARs becomes a critical issue, especially for endpoints that have complex mechanisms of action, are time-and cost-consuming, and require a large number of animals to test. Here, findings from the survey on (Q)SARs for mutagenicity and carcinogenicity, initiated by the European Chemicals Bureau (ECB) and carried out by the Istituto Superiore di Sanita' are summarized, key aspects are discussed, and a broader view towards future needs and perspectives is given.
Collapse
Affiliation(s)
- Romualdo Benigni
- Istituto Superiore di Sanita, Environment and Health Department, Rome, Italy.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Tetko IV, Bruneau P, Mewes HW, Rohrer DC, Poda GI. Can we estimate the accuracy of ADME–Tox predictions? Drug Discov Today 2006; 11:700-7. [PMID: 16846797 DOI: 10.1016/j.drudis.2006.06.013] [Citation(s) in RCA: 162] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2006] [Revised: 04/07/2006] [Accepted: 06/16/2006] [Indexed: 11/26/2022]
Abstract
There have recently been developments in the methods used to access the accuracy of the prediction and applicability domain of absorption, distribution, metabolism, excretion and toxicity models, and also in the methods used to predict the physicochemical properties of compounds in the early stages of drug development. The methods are classified into two main groups: those based on the analysis of similarity of molecules, and those based on the analysis of calculated properties. An analysis of octanol-water distribution coefficients is used to exemplify the consistency of estimated and calculated accuracy of the ALOGPS program (http://www.vcclab.org) to predict in-house and publicly available datasets.
Collapse
Affiliation(s)
- Igor V Tetko
- Institute for Bioinformatics, GSF--National Research Centre for Environment and Health, Neuherberg, D-85764, Germany.
| | | | | | | | | |
Collapse
|
17
|
Bologa C, Allu TK, Olah M, Kappler MA, Oprea TI. Descriptor collision and confusion: Toward the design of descriptors to mask chemical structures. J Comput Aided Mol Des 2005; 19:625-35. [PMID: 16322910 DOI: 10.1007/s10822-005-9020-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2005] [Accepted: 10/06/2005] [Indexed: 10/25/2022]
Abstract
We examined "descriptor collision" for several chemical fingerprint systems (MDL 320, Daylight, SMDL), and for a 2D-based descriptor set. For large databases (ChemNavigator and WOMBAT), the smallest collision rate remains around 5%. We systematically increase the "descriptor collision" rate (here termed "descriptor confusion"), in order to design a set of "descriptors to mask chemical structures", DMCS. If effective, a DMCS system would not allow third parties to determine the original chemical structures used to derive the DMCS set (i.e., reverse engineering). Using SMDL keys, the "confusion" rate is increased to 45.6% by eliminating those keys that have a low frequency of occurrence in WOMBAT structures. We applied an automated PLS engine, WB-PLS [Olah et al., J. Comput. Aided Mol. Des., 18 (2004) 437], to 1277 series of structures from 948 targets in WOMBAT, in order to validate the biological relevance of the SMDL descriptors as a potential DMCS set. The "reduced set" of SMDL descriptors has a small loss of modeling power (around 20%) compared to the initial descriptor set, while the collision rate is significantly increased. These results indicate that the development of an effective DMCS is possible. If well documented, DMCS systems would encourage private sector data release (e.g., related to water solubility) and directly benefit public sector science.
Collapse
Affiliation(s)
- Cristian Bologa
- Division of Biocomputing, University of New Mexico School of Medicine, MSC11 6145, Albuquerque, NM 87131, USA
| | | | | | | | | |
Collapse
|