1
|
Han Z, Xia Z, Xia J, Tetko IV, Wu S. The state-of-the-art machine learning model for Plasma Protein Binding Prediction: computational modeling with OCHEM and experimental validation. Eur J Pharm Sci 2024; 204:106946. [PMID: 39490636 DOI: 10.1016/j.ejps.2024.106946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 10/18/2024] [Accepted: 10/23/2024] [Indexed: 11/05/2024]
Abstract
Plasma protein binding (PPB) is closely related to pharmacokinetics, pharmacodynamics and drug toxicity. Existing models for predicting PPB often suffer from low prediction accuracy and poor interpretability, especially for high PPB compounds, and are most often not experimentally validated. Here, we carried out a strict data curation protocol, and applied consensus modeling to obtain a model with a coefficient of determination of 0.90 and 0.91 on the training set and the test set, respectively. This model (available on the OCHEM platform https://ochem.eu/article/29) was further retrospectively validated for a set of 63 poly-fluorinated molecules and prospectively validated for a set of 25 highly diverse compounds, and its performance for both these sets was superior to that of the other previously reported models. Furthermore, we identified the physicochemical and structural characteristics of high and low PPB molecules for further structural optimization. Finally, we provide practical and detailed recommendations for structural optimization to decrease PPB binding of lead compounds.
Collapse
Affiliation(s)
- Zunsheng Han
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Zhonghua Xia
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Jie Xia
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China.
| | - Igor V Tetko
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; BIGCHEM GmbH, Valerystr. 49, 85716 Unterschleißheim, Germany.
| | - Song Wu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China.
| |
Collapse
|
2
|
Mahjour BA, Coley CW. RDCanon: A Python Package for Canonicalizing the Order of Tokens in SMARTS Queries. J Chem Inf Model 2024; 64:2948-2954. [PMID: 38488634 DOI: 10.1021/acs.jcim.4c00138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
SMARTS is a widely used language in cheminformatics for defining substructural queries for database lookups, reaction templates for chemical transformations, and other applications. As an extension to SMILES, many SMARTS patterns can represent the same query. Despite this, no canonicalization algorithm invariant of the line notation sequence or atomic numbering is publicly available. Here, we introduce RDCanon, an open-source Python package that can be used to standardize SMARTS queries. RDCanon is designed to ensure that the sequence of atomic queries remains consistent for all graphs representing the same substructure query and to ensure a canonical sequence of primitives within each individual atom query; furthermore, the algorithm can be applied to canonicalize the order of reactants, agents, and products and their atom map numbers in reaction SMARTS templates. As part of its canonicalization algorithm, RDCanon provides a mechanism in which the canonicalized SMARTS is optimized for speed against specific molecular databases. Several case studies are provided to showcase improved efficiency in substructure matching and retrosynthetic analysis.
Collapse
Affiliation(s)
- Babak A Mahjour
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
3
|
Sankar S, Vasudevan S, Chandra N. CRD: A de novo design algorithm for the prediction of cognate protein receptors for small molecule ligands. Structure 2024; 32:362-375.e4. [PMID: 38194962 DOI: 10.1016/j.str.2023.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 10/20/2023] [Accepted: 12/13/2023] [Indexed: 01/11/2024]
Abstract
While predicting a ligand that binds to a protein is feasible with current methods, the opposite, i.e., the prediction of a receptor for a ligand remains challenging. We present an approach for predicting receptors of a given ligand that uses de novo design and structural bioinformatics. We have developed the algorithm CRD, comprising multiple modules combining fragment-based sub-site finding, a machine learning function to estimate the size of the site, a genetic algorithm that encodes knowledge on protein structures and a physics-based fitness scoring scheme. CRD includes a pseudo-receptor design component followed by a mapping component to identify proteins that might contain these sites. CRD recovers the sites and receptors of several natural ligands. It designs similar sites for similar ligands, yet to some extent can distinguish between closely related ligands. CRD correctly predicts receptor classes for several drugs and might become a valuable tool for drug discovery.
Collapse
Affiliation(s)
- Santhosh Sankar
- Department of Biochemistry, Indian Institute of Science, Bangalore, Karnataka 560012, India
| | - Sneha Vasudevan
- IISc Mathematics Initiative, Indian Institute of Science, Bangalore, Karnataka 560012, India
| | - Nagasuma Chandra
- Department of Biochemistry, Indian Institute of Science, Bangalore, Karnataka 560012, India; Department of Bioengineering, Indian Institute of Science, Bangalore, Karnataka 560012, India.
| |
Collapse
|
4
|
Raghavan P, Haas BC, Ruos ME, Schleinitz J, Doyle AG, Reisman SE, Sigman MS, Coley CW. Dataset Design for Building Models of Chemical Reactivity. ACS CENTRAL SCIENCE 2023; 9:2196-2204. [PMID: 38161380 PMCID: PMC10755851 DOI: 10.1021/acscentsci.3c01163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 11/06/2023] [Accepted: 11/15/2023] [Indexed: 01/03/2024]
Abstract
Models can codify our understanding of chemical reactivity and serve a useful purpose in the development of new synthetic processes via, for example, evaluating hypothetical reaction conditions or in silico substrate tolerance. Perhaps the most determining factor is the composition of the training data and whether it is sufficient to train a model that can make accurate predictions over the full domain of interest. Here, we discuss the design of reaction datasets in ways that are conducive to data-driven modeling, emphasizing the idea that training set diversity and model generalizability rely on the choice of molecular or reaction representation. We additionally discuss the experimental constraints associated with generating common types of chemistry datasets and how these considerations should influence dataset design and model building.
Collapse
Affiliation(s)
- Priyanka Raghavan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Brittany C. Haas
- Department
of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Madeline E. Ruos
- Department
of Chemistry & Biochemistry, University
of California, Los Angeles, Los Angeles, California 90095, United States
| | - Jules Schleinitz
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Abigail G. Doyle
- Department
of Chemistry & Biochemistry, University
of California, Los Angeles, Los Angeles, California 90095, United States
| | - Sarah E. Reisman
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Matthew S. Sigman
- Department
of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Connor W. Coley
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
5
|
Mukherjee G, Braka A, Wu S. Quantifying Functional-Group-like Structural Fragments in Molecules and Its Applications in Drug Design. J Chem Inf Model 2023; 63:2073-2083. [PMID: 36881497 DOI: 10.1021/acs.jcim.3c00050] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Abstract
A functional group in a molecule is a structural fragment consisting of a few atoms or a single atom that imparts reactivity to a molecule. Hence, defining functional groups is crucial in chemistry to predict the properties and reactivities of molecules. However, there is no established method in the literature for defining functional groups based on reactivity parameters. In this work, we addressed this issue by designing a set of predefined structural fragments along with reactivity parameters like electron conjugation and ring strain. This approach uses bond orders and atom connectivities to quantify the presence of these fragments within an organic molecule based on a given input molecular coordinate. To assess the effectiveness of this approach, we performed a case study to show the benefits of using these newly designed structural fragments instead of traditional fingerprint-based methods for grouping potential COX1/COX2 inhibitors by screening an approved drug library against aspirin molecule. The structural fragment-based model for ternary classification of rat oral LD50 of chemicals showed performance similar to the fingerprint-based models. In evaluating the regression model performance for aqueous solubility, log(S), predictions, our approach outperformed the fingerprint-based model.
Collapse
Affiliation(s)
- Goutam Mukherjee
- R&D Center, PharmCADD Co. Ltd., 12F, 331, Jungang-daero, Dong-gu, Busan 48792, Republic of Korea
| | - Abdennour Braka
- R&D Center, PharmCADD Co. Ltd., 12F, 331, Jungang-daero, Dong-gu, Busan 48792, Republic of Korea
| | - Sangwook Wu
- R&D Center, PharmCADD Co. Ltd., 12F, 331, Jungang-daero, Dong-gu, Busan 48792, Republic of Korea.,Department of Physics, Pukyong National University, Busan 48513, Republic of Korea
| |
Collapse
|
6
|
MORTAR: a rich client application for in silico molecule fragmentation. J Cheminform 2023; 15:1. [PMID: 36593523 PMCID: PMC9809053 DOI: 10.1186/s13321-022-00674-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 12/17/2022] [Indexed: 01/03/2023] Open
Abstract
Developing and implementing computational algorithms for the extraction of specific substructures from molecular graphs (in silico molecule fragmentation) is an iterative process. It involves repeated sequences of implementing a rule set, applying it to relevant structural data, checking the results, and adjusting the rules. This requires a computational workflow with data import, fragmentation algorithm integration, and result visualisation. The described workflow is normally unavailable for a new algorithm and must be set up individually. This work presents an open Java rich client Graphical User Interface (GUI) application to support the development of new in silico molecule fragmentation algorithms and make them readily available upon release. The MORTAR (MOlecule fRagmenTAtion fRamework) application visualises fragmentation results of a set of molecules in various ways and provides basic analysis features. Fragmentation algorithms can be integrated and developed within MORTAR by using a specific wrapper class. In addition, fragmentation pipelines with any combination of the available fragmentation methods can be executed. Upon release, three fragmentation algorithms are already integrated: ErtlFunctionalGroupsFinder, Sugar Removal Utility, and Scaffold Generator. These algorithms, as well as all cheminformatics functionalities in MORTAR, are implemented based on the Chemistry Development Kit (CDK).
Collapse
|
7
|
Ji Z, Shi R, Lu J, Li F, Yang Y. ReLMole: Molecular Representation Learning Based on Two-Level Graph Similarities. J Chem Inf Model 2022; 62:5361-5372. [PMID: 36302249 DOI: 10.1021/acs.jcim.2c00798] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Molecular representation is a critical part of various prediction tasks for physicochemical properties of molecules and drug design. As graph notations are common in expressing the structural information of chemical compounds, graph neural networks (GNNs) have become the mainstream backbone model for learning molecular representation. However, the scarcity of task-specific labels in the biomedical domain limits the power of GNNs. Recently, self-supervised pretraining for GNNs has been leveraged to deal with this issue, while the existing pretraining methods are mainly designed for graph data in general domains without considering the specific data properties of molecules. In this paper, we propose a representation learning method for molecular graphs, called ReLMole, which is featured by a hierarchical graph modeling of molecules and a contrastive learning scheme based on two-level graph similarities. We assess the performance of ReLMole on two types of downstream tasks, namely, the prediction of molecular properties (MPs) and drug-drug interaction (DDIs). ReLMole achieves promising results for all the tasks. It outperforms the baseline models by over 2.6% on ROC-AUC averaged across six MP prediction tasks, and it improves the F1 value by 7-18% in DDI prediction for unseen drugs compared with other self-supervised models.
Collapse
Affiliation(s)
- Zewei Ji
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai200240, China
| | - Runhan Shi
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai200240, China
| | - Jiarui Lu
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai200240, China
| | - Fang Li
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai200240, China
| | - Yang Yang
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai200240, China
| |
Collapse
|
8
|
Shulga DA, Ivanov NN, Palyulin VA. In Silico Structure-Based Approach for Group Efficiency Estimation in Fragment-Based Drug Design Using Evaluation of Fragment Contributions. Molecules 2022; 27:1985. [PMID: 35335347 PMCID: PMC8951103 DOI: 10.3390/molecules27061985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 03/10/2022] [Accepted: 03/15/2022] [Indexed: 12/10/2022] Open
Abstract
The notion of a contribution of a specific group in an organic molecule's property and/or activity is both common in our thinking and is still not strictly correct due to the inherent non-additivity of free energy with respect to molecular fragments composing a molecule. The fragment- based drug discovery (FBDD) approach has proven to be fruitful in addressing the above notions. The main difficulty of the FBDD, however, is in its reliance on the low throughput and expensive experimental means of determining the fragment-sized molecules binding. In this article we propose a way to enhance the throughput and availability of the FBDD methods by judiciously using an in silico means of assessing the contribution to ligand-receptor binding energy of fragments of a molecule under question using a previously developed in silico Reverse Fragment Based Drug Discovery (R-FBDD) approach. It has been shown that the proposed structure-based drug discovery (SBDD) type of approach fills in the vacant niche among the existing in silico approaches, which mainly stem from the ligand-based drug discovery (LBDD) counterparts. In order to illustrate the applicability of the approach, our work retrospectively repeats the findings of the use case of an FBDD hit-to-lead project devoted to the experimentally based determination of additive group efficiency (GE)-an analog of ligand efficiency (LE) for a group in the molecule-using the Free-Wilson (FW) decomposition. It is shown that in using our in silico approach to evaluate fragment contributions of a ligand and to estimate GE one can arrive at similar decisions as those made using the experimentally determined activity-based FW decomposition. It is also shown that the approach is rather robust to the choice of the scoring function, provided the latter demonstrates a decent scoring power. We argue that the proposed approach of in silico assessment of GE has a wider applicability domain and expect that it will be widely applicable to enhance the net throughput of drug discovery based on the FBDD paradigm.
Collapse
Affiliation(s)
- Dmitry A. Shulga
- Department of Chemistry, Lomonosov Moscow State University, 119991 Moscow, Russia;
| | | | - Vladimir A. Palyulin
- Department of Chemistry, Lomonosov Moscow State University, 119991 Moscow, Russia;
| |
Collapse
|
9
|
What Features of Ligands Are Relevant to the Opening of Cryptic Pockets in Drug Targets? INFORMATICS 2022. [DOI: 10.3390/informatics9010008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Small-molecule drug design aims to identify inhibitors that can specifically bind to a functionally important region on the target, i.e., an active site of an enzyme. Identification of potential binding pockets is typically based on static three-dimensional structures. However, small molecules may induce and select a dynamic binding pocket that is not visible in the apo protein, which presents a well-recognized challenge for structure-based drug discovery. Here, we assessed whether it is possible to identify features in molecules, which we refer to as inducers, that can induce the opening of cryptic pockets. The volume change between apo and bound protein conformations was used as a metric to differentiate chemical features in inducers vs. non-inducers. Based on the dataset of holo–apo pairs, classification models were built to determine an optimum threshold. The model analysis suggested that inducers preferred to be more hydrophobic and aromatic. The impact of sulfur was ambiguous, while phosphorus and halogen atoms were overrepresented in inducers. The fragment analysis showed that small changes in the structures of molecules can strongly affect the potential to induce a cryptic pocket. This analysis and developed model can be used to design inducers that can potentially open cryptic pockets for undruggable proteins.
Collapse
|
10
|
Ebert A, Goss KU. Screening of 6000 Compounds for Uncoupling Activity: A Comparison Between a Mechanistic Biophysical Model and the Structural Alert Profiler Mitotox. Toxicol Sci 2021; 185:208-219. [PMID: 34865177 DOI: 10.1093/toxsci/kfab139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protonophoric uncoupling of phosphorylation is an important factor when assessing chemicals for their toxicity, and has recently moved into focus in pharmaceutical research with respect to the treatment of diseases such as cancer, diabetes, or obesity. Reliably identifying uncoupling activity is thus a valuable goal. To that end, we screened more than 6000 anionic compounds for in vitro uncoupling activity, using a biophysical model based on ab initio COSMO-RS input parameters with the molecular structure as the only external input. We combined these results with a model for baseline toxicity (narcosis). Our model identified more than 1250 possible uncouplers in the screening dataset, and identified possible new uncoupler classes such as thiophosphoric acids. When tested against 423 known uncouplers and 612 known inactive compounds in the dataset, the model reached a sensitivity of 83% and a specificity of 96%. In a direct comparison, it showed a similar specificity than the structural alert profiler Mitotox (97%), but much higher sensitivity than Mitotox (47%). The biophysical model thus allows for a more accurate screening for uncoupling activity than existing structural alert profilers. We propose to use our model as a complementary tool to screen large datasets for protonophoric uncoupling activity in drug development and toxicity assessment.
Collapse
Affiliation(s)
- Andrea Ebert
- Analytical Environmental Chemistry, Helmholtz Centre for Environmental Research-UFZ, D-04318 Leipzig, Germany
| | - Kai-Uwe Goss
- Analytical Environmental Chemistry, Helmholtz Centre for Environmental Research-UFZ, D-04318 Leipzig, Germany.,Institute of Chemistry, Martin Luther University, D-06120 Halle, Germany
| |
Collapse
|
11
|
Ghosh D, Koch U, Hadian K, Sattler M, Tetko IV. Highly Accurate Filters to Flag Frequent Hitters in AlphaScreen Assays by Suggesting their Mechanism. Mol Inform 2021; 41:e2100151. [PMID: 34676998 DOI: 10.1002/minf.202100151] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Accepted: 09/29/2021] [Indexed: 11/06/2022]
Abstract
AlphaScreen is one of the most widely used assay technologies in drug discovery due to its versatility, dynamic range and sensitivity. However, a presence of false positives and frequent hitters contributes to difficulties with an interpretation of measured HTS data. Although filters do exist to identify frequent hitters for AlphaScreen, they are frequently based on privileged scaffolds. The development of such filters is time consuming and requires deep domain knowledge. Recently, machine learning and artificial intelligence methods are emerging as important tools to advance drug discovery and chemoinformatics, including their application to identification of frequent hitters in screening assays. However, the relative performance and complementarity of the Machine Learning and scaffold-based techniques has not yet been comprehensively compared. In this study, we analysed filters based on the privileged scaffolds with filters built using machine learning. Our results demonstrate that machine-learning methods provide more accurate filters for identification of frequent hitters in AlphaScreen assays than scaffold-based methods and can be easily redeveloped once new data are measured. We present highly accurate models to identify frequent hitters in AlphaScreen assays.
Collapse
Affiliation(s)
- Dipan Ghosh
- Lead Discovery Center GmbH, Otto-Hahn-Straße 15, 44227, Dortmund, Germany
| | - Uwe Koch
- Lead Discovery Center GmbH, Otto-Hahn-Straße 15, 44227, Dortmund, Germany
| | - Kamyar Hadian
- Assay Development and Screening Platform, Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany
| | - Michael Sattler
- Bavarian NMR Center, Department Chemie, Technische Universität München, Ernst-Otto-Fischerstraße 2, D-85747, Garching, Germany.,Institute of Structural Biology, Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany.,G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street 1, 153045, Ivanovo, Russia.,BIGCHEM GmbH, Valerystr. 49, D-85716, Unterschleißheim, Germany
| |
Collapse
|
12
|
Comprehensive analysis of R-groups in medicinal chemistry. Future Med Chem 2021; 14:5-7. [PMID: 34672719 DOI: 10.4155/fmc-2021-0250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
13
|
Mansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TE, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown J, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash AH, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo DP, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, Kleinstreuer NC. CATMoS: Collaborative Acute Toxicity Modeling Suite. ENVIRONMENTAL HEALTH PERSPECTIVES 2021; 129:47013. [PMID: 33929906 PMCID: PMC8086800 DOI: 10.1289/ehp8495] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
BACKGROUND Humans are exposed to tens of thousands of chemical substances that need to be assessed for their potential toxicity. Acute systemic toxicity testing serves as the basis for regulatory hazard classification, labeling, and risk management. However, it is cost- and time-prohibitive to evaluate all new and existing chemicals using traditional rodent acute toxicity tests. In silico models built using existing data facilitate rapid acute toxicity predictions without using animals. OBJECTIVES The U.S. Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) Acute Toxicity Workgroup organized an international collaboration to develop in silico models for predicting acute oral toxicity based on five different end points: Lethal Dose 50 (LD50 value, U.S. Environmental Protection Agency hazard (four) categories, Globally Harmonized System for Classification and Labeling hazard (five) categories, very toxic chemicals [LD50 (LD50≤50mg/kg)], and nontoxic chemicals (LD50>2,000mg/kg). METHODS An acute oral toxicity data inventory for 11,992 chemicals was compiled, split into training and evaluation sets, and made available to 35 participating international research groups that submitted a total of 139 predictive models. Predictions that fell within the applicability domains of the submitted models were evaluated using external validation sets. These were then combined into consensus models to leverage strengths of individual approaches. RESULTS The resulting consensus predictions, which leverage the collective strengths of each individual model, form the Collaborative Acute Toxicity Modeling Suite (CATMoS). CATMoS demonstrated high performance in terms of accuracy and robustness when compared with in vivo results. DISCUSSION CATMoS is being evaluated by regulatory agencies for its utility and applicability as a potential replacement for in vivo rat acute oral toxicity studies. CATMoS predictions for more than 800,000 chemicals have been made available via the National Toxicology Program's Integrated Chemical Environment tools and data sets (ice.ntp.niehs.nih.gov). The models are also implemented in a free, standalone, open-source tool, OPERA, which allows predictions of new and untested chemicals to be made. https://doi.org/10.1289/EHP8495.
Collapse
Affiliation(s)
- Kamel Mansouri
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Research Triangle Park, North Carolina, USA
| | - Agnes L. Karmaus
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | | | - Grace Patlewicz
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Prachi Pradeep
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
- Oak Ridge Institute for Science and Education (ORISE) Research Participation Program, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Domenico Alberga
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | | | - Timothy E.H. Allen
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Dave Allen
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | - Vinicius M. Alves
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | - Carolina H. Andrade
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | | | - Davide Ballabio
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Shannon Bell
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | - Emilio Benfenati
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Sudin Bhattacharya
- Institute for Quantitative Health Science and Engineering, Department of Biomedical Engineering, Michigan State University, East Lansing, Michigan, USA
| | - Joyce V. Bastos
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | - Stephen Boyd
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA
| | - J.B. Brown
- Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Stephen J. Capuzzi
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Yaroslav Chushak
- Aeromedical Research Department, Force Health Protection, USAFSAM, Dayton, Ohio, USA
- Henry M Jackson Foundation for the Advancement of Military Medicine, Dayton, Ohio, USA
| | - Heather Ciallella
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | - Alex M. Clark
- Collaborations Pharmaceuticals, Inc., Raleigh, North Carolina, USA
| | - Viviana Consonni
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | | | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., Raleigh, North Carolina, USA
| | - Sherif Farag
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Maxim Fedorov
- Skoltech, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Denis Fourches
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | - Domenico Gadaleta
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Feng Gao
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA
| | - Jeffery M. Gearhart
- Aeromedical Research Department, Force Health Protection, USAFSAM, Dayton, Ohio, USA
- Henry M Jackson Foundation for the Advancement of Military Medicine, Dayton, Ohio, USA
| | - Garett Goh
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Jonathan M. Goodman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Francesca Grisoni
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Christopher M. Grulke
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | | | - Matthew Hirn
- Department of Computational Mathematics, Science & Engineering, Department of Mathematics, Michigan State University, East Lansing, Michigan, USA
| | - Pavel Karpov
- Institute of Structural Biology, Helmholtz Zentrum München (GmbH), Neuherberg, Germany
| | | | - Giovanna J. Lavado
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | | | - Xinhao Li
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina, USA
| | | | - Filippo Lunghini
- Laboratoire de Chemoinformatique, URM7140, Université de Strasbourg, Strasbourg, France
| | - Giuseppe F. Mangiatordi
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | - Gilles Marcou
- Laboratoire de Chemoinformatique, URM7140, Université de Strasbourg, Strasbourg, France
| | - Dan Marsh
- Underwriters Laboratories, Northbrook, Illinois, USA
| | - Todd Martin
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Cincinnati, Ohio, USA
| | | | - Eugene N. Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | | | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Orazio Nicolotti
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | - Reine Note
- L’Oréal Research & Innovation, Aulnay-sous-Bois, France
| | - Paritosh Pande
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | | | - Tyler Peryea
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Robert Rallo
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Alessandra Roncaglioni
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | | | - Patricia Ruiz
- Office of Innovation and Analytics, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Daniel P. Russo
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | - Ahmed Sayed
- Rosettastein Consulting UG, Freising, Germany
| | - Risa Sayre
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
- Oak Ridge Institute for Science and Education (ORISE) Research Participation Program, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Timothy Sheils
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Charles Siegel
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Arthur C. Silva
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | - Anton Simeonov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Sergey Sosnin
- Skoltech, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Noel Southall
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Judy Strickland
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Brian Teppen
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA
| | - Igor V. Tetko
- Institute of Structural Biology, Helmholtz Zentrum München (GmbH), Neuherberg, Germany
- BIGCHEM GmbH, Unterschleissheim, Germany
| | - Dennis Thomas
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | | | - Roberto Todeschini
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Cosimo Toma
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Ignacio Tripodi
- Computer Science/Interdisciplinary Quantitative Biology, University of Colorado, Boulder, Colorado, USA
| | - Daniela Trisciuzzi
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Alexandre Varnek
- Laboratoire de Chemoinformatique, URM7140, Université de Strasbourg, Strasbourg, France
| | - Kristijan Vukovic
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Zhongyu Wang
- School of Environmental Sciences and Technology, Dalian University of Technology; Dalian, Liaoning, China
| | - Liguo Wang
- School of Environmental Sciences and Technology, Dalian University of Technology; Dalian, Liaoning, China
| | | | - Andrew J. Wedlake
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | | | - Dan Wilson
- The Dow Chemical Company, Midland, Michigan, USA
| | - Zijun Xiao
- School of Environmental Sciences and Technology, Dalian University of Technology; Dalian, Liaoning, China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Gergely Zahoranszky-Kohalmi
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Alexey V. Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Zhen Zhang
- Dow Agrosciences, Indianapolis, Indiana, USA
| | - Tongan Zhao
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | | | - Warren Casey
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Research Triangle Park, North Carolina, USA
| | - Nicole C. Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Research Triangle Park, North Carolina, USA
| |
Collapse
|
14
|
Wu L, Huang R, Tetko IV, Xia Z, Xu J, Tong W. Trade-off Predictivity and Explainability for Machine-Learning Powered Predictive Toxicology: An in-Depth Investigation with Tox21 Data Sets. Chem Res Toxicol 2021; 34:541-549. [PMID: 33513003 DOI: 10.1021/acs.chemrestox.0c00373] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Selecting a model in predictive toxicology often involves a trade-off between prediction performance and explainability: should we sacrifice the model performance to gain explainability or vice versa. Here we present a comprehensive study to assess algorithm and feature influences on model performance in chemical toxicity research. We conducted over 5000 models for a Tox21 bioassay data set of 65 assays and ∼7600 compounds. Seven molecular representations as features and 12 modeling approaches varying in complexity and explainability were employed to systematically investigate the impact of various factors on model performance and explainability. We demonstrated that end points dictated a model's performance, regardless of the chosen modeling approach including deep learning and chemical features. Overall, more complex models such as (LS-)SVM and Random Forest performed marginally better than simpler models such as linear regression and KNN in the presented Tox21 data analysis. Since a simpler model with acceptable performance often also is easy to interpret for the Tox21 data set, it clearly was the preferred choice due to its better explainability. Given that each data set had its own error structure both for dependent and independent variables, we strongly recommend that it is important to conduct a systematic study with a broad range of model complexity and feature explainability to identify model balancing its predictivity and explainability.
Collapse
Affiliation(s)
- Leihong Wu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, 3900 NCTR Road, Jefferson, Arkansas 72079, United States
| | - Ruili Huang
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany.,BIGCHEM GmbH, Valerystraße 49, DE-85716 Unterschleißheim, Germany
| | - Zhonghua Xia
- Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, 3900 NCTR Road, Jefferson, Arkansas 72079, United States
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, 3900 NCTR Road, Jefferson, Arkansas 72079, United States
| |
Collapse
|
15
|
Suthers PF, Foster CJ, Sarkar D, Wang L, Maranas CD. Recent advances in constraint and machine learning-based metabolic modeling by leveraging stoichiometric balances, thermodynamic feasibility and kinetic law formalisms. Metab Eng 2020; 63:13-33. [PMID: 33310118 DOI: 10.1016/j.ymben.2020.11.013] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 11/13/2020] [Accepted: 11/27/2020] [Indexed: 12/16/2022]
Abstract
Understanding the governing principles behind organisms' metabolism and growth underpins their effective deployment as bioproduction chassis. A central objective of metabolic modeling is predicting how metabolism and growth are affected by both external environmental factors and internal genotypic perturbations. The fundamental concepts of reaction stoichiometry, thermodynamics, and mass action kinetics have emerged as the foundational principles of many modeling frameworks designed to describe how and why organisms allocate resources towards both growth and bioproduction. This review focuses on the latest algorithmic advancements that have integrated these foundational principles into increasingly sophisticated quantitative frameworks.
Collapse
Affiliation(s)
- Patrick F Suthers
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, USA; DOE Center for Advanced Bioenergy and Bioproducts Innovation, The Pennsylvania State University, University Park, PA, USA
| | - Charles J Foster
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, USA
| | - Debolina Sarkar
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, USA
| | - Lin Wang
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, USA
| | - Costas D Maranas
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, USA; DOE Center for Advanced Bioenergy and Bioproducts Innovation, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
16
|
Yang ZY, Dong J, Yang ZJ, Yin M, Jiang HL, Lu AP, Chen X, Hou TJ, Cao DS. ChemFLuo: a web-server for structure analysis and identification of fluorescent compounds. Brief Bioinform 2020; 22:5985287. [PMID: 33201188 DOI: 10.1093/bib/bbaa282] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 09/12/2020] [Accepted: 09/25/2020] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Fluorescent detection methods are indispensable tools for chemical biology. However, the frequent appearance of potential fluorescent compound has greatly interfered with the recognition of compounds with genuine activity. Such fluorescence interference is especially difficult to identify as it is reproducible and possesses concentration-dependent characteristic. Therefore, the development of a credible screening tool to detect fluorescent compounds from chemical libraries is urgently needed in early stages of drug discovery. RESULTS In this study, we developed a webserver ChemFLuo for fluorescent compound detection, based on two large and high-quality training datasets containing 4906 blue and 8632 green fluorescent compounds. These molecules were used to construct a group of prediction models based on the combination of three machine learning algorithms and seven types of molecular representations. The best blue fluorescence prediction model achieved with balanced accuracy (BA) = 0.858 and area under the receiver operating characteristic curve (AUC) = 0.931 for the validation set, and BA = 0.823 and AUC = 0.903 for the test set. The best green fluorescence prediction model achieved the prediction accuracy with BA = 0.810 and AUC = 0.887 for the validation set, and BA = 0.771 and AUC = 0.852 for the test set. Besides prediction model, 22 blue and 16 green representative fluorescent substructures were summarized for the screening of potential fluorescent compounds. The comparison with other fluorescence detection tools and theapplication to external validation sets and large molecule libraries have demonstrated the reliability of prediction model for fluorescent compound detection. CONCLUSION ChemFLuo is a public webserver to filter out compounds with undesirable fluorescent properties, which will benefit the design of high-quality chemical libraries for drug discovery. It is freely available at http://admet.scbdd.com/chemfluo/index/.
Collapse
Affiliation(s)
- Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003, P. R. China
| | - Jie Dong
- Central South University of Forestry and Technology, Changsha, 410004, P.R. China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003, P. R. China
| | - Mingzhu Yin
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, P.R. China
| | - Hong-Li Jiang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003, P. R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P.R. China
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, P.R. China
| | - Ting-Jun Hou
- Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003, P. R. China.,Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P.R. China
| |
Collapse
|
17
|
Yang ZY, Yang ZJ, Lu AP, Hou TJ, Cao DS. Scopy: an integrated negative design python library for desirable HTS/VS database design. Brief Bioinform 2020; 22:5901981. [PMID: 32892221 DOI: 10.1093/bib/bbaa194] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2020] [Revised: 07/27/2020] [Accepted: 07/28/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND High-throughput screening (HTS) and virtual screening (VS) have been widely used to identify potential hits from large chemical libraries. However, the frequent occurrence of 'noisy compounds' in the screened libraries, such as compounds with poor drug-likeness, poor selectivity or potential toxicity, has greatly weakened the enrichment capability of HTS and VS campaigns. Therefore, the development of comprehensive and credible tools to detect noisy compounds from chemical libraries is urgently needed in early stages of drug discovery. RESULTS In this study, we developed a freely available integrated python library for negative design, called Scopy, which supports the functions of data preparation, calculation of descriptors, scaffolds and screening filters, and data visualization. The current version of Scopy can calculate 39 basic molecular properties, 3 comprehensive molecular evaluation scores, 2 types of molecular scaffolds, 6 types of substructure descriptors and 2 types of fingerprints. A number of important screening rules are also provided by Scopy, including 15 drug-likeness rules (13 drug-likeness rules and 2 building block rules), 8 frequent hitter rules (four assay interference substructure filters and four promiscuous compound substructure filters), and 11 toxicophore filters (five human-related toxicity substructure filters, three environment-related toxicity substructure filters and three comprehensive toxicity substructure filters). Moreover, this library supports four different visualization functions to help users to gain a better understanding of the screened data, including basic feature radar chart, feature-feature-related scatter diagram, functional group marker gram and cloud gram. CONCLUSION Scopy provides a comprehensive Python package to filter out compounds with undesirable properties or substructures, which will benefit the design of high-quality chemical libraries for drug design and discovery. It is freely available at https://github.com/kotori-y/Scopy.
Collapse
Affiliation(s)
- Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University (Changsha)
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, China
| |
Collapse
|
18
|
Morger A, Mathea M, Achenbach JH, Wolf A, Buesen R, Schleifer KJ, Landsiedel R, Volkamer A. KnowTox: pipeline and case study for confident prediction of potential toxic effects of compounds in early phases of development. J Cheminform 2020; 12:24. [PMID: 33431007 PMCID: PMC7157991 DOI: 10.1186/s13321-020-00422-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 03/09/2020] [Indexed: 02/07/2023] Open
Abstract
Risk assessment of newly synthesised chemicals is a prerequisite for regulatory approval. In this context, in silico methods have great potential to reduce time, cost, and ultimately animal testing as they make use of the ever-growing amount of available toxicity data. Here, KnowTox is presented, a novel pipeline that combines three different in silico toxicology approaches to allow for confident prediction of potentially toxic effects of query compounds, i.e. machine learning models for 88 endpoints, alerts for 919 toxic substructures, and computational support for read-across. It is mainly based on the ToxCast dataset, containing after preprocessing a sparse matrix of 7912 compounds tested against 985 endpoints. When applying machine learning models, applicability and reliability of predictions for new chemicals are of utmost importance. Therefore, first, the conformal prediction technique was deployed, comprising an additional calibration step and per definition creating internally valid predictors at a given significance level. Second, to further improve validity and information efficiency, two adaptations are suggested, exemplified at the androgen receptor antagonism endpoint. An absolute increase in validity of 23% on the in-house dataset of 534 compounds could be achieved by introducing KNNRegressor normalisation. This increase in validity comes at the cost of efficiency, which could again be improved by 20% for the initial ToxCast model by balancing the dataset during model training. Finally, the value of the developed pipeline for risk assessment is discussed using two in-house triazole molecules. Compared to a single toxicity prediction method, complementing the outputs of different approaches can have a higher impact on guiding toxicity testing and de-selecting most likely harmful development-candidate compounds early in the development process.
Collapse
Affiliation(s)
- Andrea Morger
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Charitéplatz 1, Berlin, Germany
| | | | | | | | | | | | | | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Charitéplatz 1, Berlin, Germany.
| |
Collapse
|
19
|
Yang ZY, Dong J, Yang ZJ, Lu AP, Hou TJ, Cao DS. Structural Analysis and Identification of False Positive Hits in Luciferase-Based Assays. J Chem Inf Model 2020; 60:2031-2043. [PMID: 32202787 DOI: 10.1021/acs.jcim.9b01188] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Luciferase-based bioluminescence detection techniques are highly favored in high-throughput screening (HTS), in which the firefly luciferase (FLuc) is the most commonly used variant. However, FLuc inhibitors can interfere with the activity of luciferase, which may result in false positive signals in HTS assays. In order to reduce the unnecessary cost of time and money, an in silico prediction model for FLuc inhibitors is highly desirable. In this study, we built an extensive data set consisting of 20 888 FLuc inhibitors and 198 608 noninhibitors, and then developed a group of classification models based on the combination of three machine learning (ML) algorithms and four types of molecular representations. The best prediction model based on XGBoost and ECFP4 and MOE2d descriptors yielded a balanced accuracy (BA) of 0.878 and an area under the receiver operating characteristic curve (AUC) value of 0.958 for the validation set, and a BA of 0.886 and an AUC of 0.947 for the test set. Three external validation sets, including set 1 (3231 FLuc inhibitors and 69 783 noninhibitors), set 2 (695 FLuc inhibitors and 75 913 noninhibitors), and set 3 (1138 FLuc inhibitors and 8155 noninhibitors), were used to verify the predictive ability of our models. The BA values for the three external validation sets given by the best model are 0.864, 0.845, and 0.791, respectively. In addition, the important features or structural fragments related to FLuc inhibitors were recognized by the Shapley additive explanations (SHAP) method along with their influences on predictions, which may provide valuable clues to detecting undesirable luciferase inhibitors. Based on the important and explanatory features, 16 rules were proposed for detecting FLuc inhibitors, which can achieve a correction rate of 70% for FLuc inhibitors. Furthermore, a comparison with existing prediction rules and models for FLuc inhibitors used in virtual screening verified the high reliability of the models and rules proposed in this study. We also used the model to screen three curated chemical databases, and almost 10% of the molecules in the evaluated databases were predicted as inhibitors, highlighting the potential risk of false positives in luciferase-based assays. Finally, a public web server called ChemFLuc was developed (http://admet.scbdd.com/chemfluc/index/), and it offers a free available service to predict potential FLuc inhibitors.
Collapse
Affiliation(s)
- Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003, P.R. China
| | - Jie Dong
- Central South University of Forestry and Technology, Changsha, 410004, P.R. China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003, P.R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P.R. China
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P.R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003, P.R. China.,Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P.R. China
| |
Collapse
|
20
|
Ghiandoni GM, Bodkin MJ, Chen B, Hristozov D, Wallace JEA, Webster J, Gillet VJ. Enhancing reaction-based de novo design using a multi-label reaction class recommender. J Comput Aided Mol Des 2020; 34:783-803. [PMID: 32112286 PMCID: PMC7293200 DOI: 10.1007/s10822-020-00300-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 02/13/2020] [Indexed: 12/31/2022]
Abstract
Reaction-based de novo design refers to the in-silico generation of novel chemical structures by combining reagents using structural transformations derived from known reactions. The driver for using reaction-based transformations is to increase the likelihood of the designed molecules being synthetically accessible. We have previously described a reaction-based de novo design method based on reaction vectors which are transformation rules that are encoded automatically from reaction databases. A limitation of reaction vectors is that they account for structural changes that occur at the core of a reaction only, and they do not consider the presence of competing functionalities that can compromise the reaction outcome. Here, we present the development of a Reaction Class Recommender to enhance the reaction vector framework. The recommender is intended to be used as a filter on the reaction vectors that are applied during de novo design to reduce the combinatorial explosion of in-silico molecules produced while limiting the generated structures to those which are most likely to be synthesisable. The recommender has been validated using an external data set extracted from the recent medicinal chemistry literature and in two simulated de novo design experiments. Results suggest that the use of the recommender drastically reduces the number of solutions explored by the algorithm while preserving the chance of finding relevant solutions and increasing the global synthetic accessibility of the designed molecules.
Collapse
Affiliation(s)
- Gian Marco Ghiandoni
- Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK
| | - Michael J Bodkin
- Evotec (U.K.) Ltd, 114 Innovation Drive, Milton Park, Abingdon, OX14 4RZ, UK
| | - Beining Chen
- Chemistry Department, University of Sheffield, Dainton Building, Brook Hill, Sheffield, S3 7HF, UK
| | - Dimitar Hristozov
- Evotec (U.K.) Ltd, 114 Innovation Drive, Milton Park, Abingdon, OX14 4RZ, UK
| | - James E A Wallace
- Evotec (U.K.) Ltd, 114 Innovation Drive, Milton Park, Abingdon, OX14 4RZ, UK
| | - James Webster
- Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK
| | - Valerie J Gillet
- Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK.
| |
Collapse
|
21
|
Cheng X, Sun D, Zhang D, Tian Y, Ding S, Cai P, Hu QN. RxnBLAST: molecular scaffold and reactive chemical environment feature extractor for biochemical reactions. Bioinformatics 2020; 36:2946-2947. [DOI: 10.1093/bioinformatics/btaa036] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 12/11/2019] [Accepted: 01/14/2020] [Indexed: 12/28/2022] Open
Abstract
Abstract
Motivation
Molecular scaffolds are useful in medicinal chemistry to describe, discuss and visualize series of chemical compounds, biochemical transformations and associated biological properties.
Results
Here, we present RxnBLAST as a web-based tool for analyzing scaffold transformations and reactive chemical environment features in bioreactions. RxnBLAST extracts chemical features from bioreactions including atom–atom mapping, reaction centers, rules and functional groups to help understand chemical compositions and reaction patterns. Core-to-Core is proposed, which can be utilized in scaffold networks and for constructing a reaction space, as well as providing guidance for subsequent biosynthesis efforts.
Availability and implementation
RxnBLAST is available at: http://design.rxnfinder.org/rxnblast/.
Collapse
Affiliation(s)
- Xingxiang Cheng
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Dandan Sun
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Dachuan Zhang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu Tian
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
- University of Chinese Academy of Sciences, Beijing 100864, China
| | - Shaozhen Ding
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Pengli Cai
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
22
|
Müller S. Flexible heuristic algorithm for automatic molecule fragmentation: application to the UNIFAC group contribution model. J Cheminform 2019; 11:57. [PMID: 33430960 PMCID: PMC6701077 DOI: 10.1186/s13321-019-0382-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Accepted: 08/12/2019] [Indexed: 11/10/2022] Open
Abstract
A priori calculation of thermophysical properties and predictive thermodynamic models can be very helpful for developing new industrial processes. Group contribution methods link the target property to contributions based on chemical groups or other molecular subunits of a given molecule. However, the fragmentation of the molecule into its subunits is usually done manually impeding the fast testing and development of new group contribution methods based on large databases of molecules. The aim of this work is to develop strategies to overcome the challenges that arise when attempting to fragment molecules automatically while keeping the definition of the groups as simple as possible. Furthermore, these strategies are implemented in two fragmentation algorithms. The first algorithm finds only one solution while the second algorithm finds all possible fragmentations. Both algorithms are tested to fragment a database of 20,000+ molecules for use with the group contribution model Universal Quasichemical Functional Group Activity Coefficients (UNIFAC). Comparison of the results with a reference database shows that both algorithms are capable of successfully fragmenting all the molecules automatically. Furthermore, when applying them on a larger database it is shown, that the newly developed algorithms are capable of fragmenting structures previously thought not possible to fragment.
Collapse
Affiliation(s)
- Simon Müller
- Institute of Thermal Separation Processes, Hamburg University of Technology, Eißendorfer Straße 38, 21073, Hamburg, Germany.
| |
Collapse
|
23
|
Allen CHG, Mervin LH, Mahmoud SY, Bender A. Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity. J Cheminform 2019; 11:36. [PMID: 31152262 PMCID: PMC6544914 DOI: 10.1186/s13321-019-0356-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 05/15/2019] [Indexed: 01/06/2023] Open
Abstract
Despite the increasing knowledge in both the chemical and biological domains the assimilation and exploration of heterogeneous datasets, encoding information about the chemical, bioactivity and phenotypic properties of compounds, remains a challenge due to requirement for overlap between chemicals assayed across the spaces. Here, we have constructed a novel dataset, larger than we have used in prior work, comprising 579 acute oral toxic compounds and 1427 non-toxic compounds derived from regulatory GHS information, along with their corresponding molecular and protein target descriptors and qHTS in vitro assay readouts from the Tox21 project. We found no clear association between the results of a FAFDrugs4 toxicophore screen and the acute oral toxicity classifications for our compound set; and a screen using a subset of the ToxAlerts toxicophores was also of limited utility, with only slight enrichment toward the toxic set (odds ratio of 1.48). We then investigated to what degree toxic and non-toxic compounds could be separated in each of the spaces, to compare their potential contribution to further analyses. Using an LDA projection, we found the largest degree of separation using chemical descriptors (Cohen’s d of 1.95) and the lowest degree of separation between toxicity classes using qHTS descriptors (Cohen’s d of 0.67). To compare the predictivity of the feature spaces for the toxicity endpoint, we next trained Random Forest (RF) acute oral toxicity classifiers on either molecular, protein target and qHTS descriptors. RFs trained on molecular and protein target descriptors were most predictive, with ROC AUC values of 0.80–0.92 and 0.70–0.85, respectively, across three test sets. RFs trained on both chemical and protein target descriptors combined exhibited similar predictive performance to the single-domain models (ROC AUC of 0.80–0.91). Model interpretability was improved by the inclusion of protein target descriptors, which allow the identification of specific targets (e.g. Retinal dehydrogenase) with literature links to toxic modes of action (e.g. oxidative stress). The dataset compiled in this study has been made available for future application.
Collapse
Affiliation(s)
- Chad H G Allen
- Department of Chemistry, Centre for Molecular Informatics, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Lewis H Mervin
- Department of Chemistry, Centre for Molecular Informatics, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Samar Y Mahmoud
- Department of Chemistry, Centre for Molecular Informatics, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Andreas Bender
- Department of Chemistry, Centre for Molecular Informatics, Lensfield Road, Cambridge, CB2 1EW, UK.
| |
Collapse
|
24
|
Ayromlou A, Masoudi S, Mirzaie A. Scorzonera calyculata Aerial Part Extract Mediated Synthesis of Silver Nanoparticles: Evaluation of Their Antibacterial, Antioxidant and Anticancer Activities. J CLUST SCI 2019. [DOI: 10.1007/s10876-019-01563-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
25
|
Belmouhoub M, Chebout I, Iguer-ouada M. Antidiabetic and anti-hypercholesterolemic effects of flavonoid-rich fractions ofRosmarinus officinalisin streptozotocin-induced diabetes in mice. ACTA ACUST UNITED AC 2018. [DOI: 10.3166/phyto-2018-0054] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
26
|
Ghosh D, Koch U, Hadian K, Sattler M, Tetko IV. Luciferase Advisor: High-Accuracy Model To Flag False Positive Hits in Luciferase HTS Assays. J Chem Inf Model 2018; 58:933-942. [PMID: 29667823 DOI: 10.1021/acs.jcim.7b00574] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Firefly luciferase is an enzyme that has found ubiquitous use in biological assays in high-throughput screening (HTS) campaigns. The inhibition of luciferase in such assays could lead to a false positive result. This issue has been known for a long time, and there have been significant efforts to identify luciferase inhibitors in order to enhance recognition of false positives in screening assays. However, although a large amount of publicly accessible luciferase counterscreen data is available, to date little effort has been devoted to building a chemoinformatic model that can identify such molecules in a given data set. In this study we developed models to identify these molecules using various methods, such as molecular docking, SMARTS screening, pharmacophores, and machine learning methods. Among the structure-based methods, the pharmacophore-based method showed promising results, with a balanced accuracy of 74.2%. However, machine-learning approaches using associative neural networks outperformed all of the other methods explored, producing a final model with a balanced accuracy of 89.7%. The high predictive accuracy of this model is expected to be useful for advising which compounds are potential luciferase inhibitors present in luciferase HTS assays. The models developed in this work are freely available at the OCHEM platform at http://ochem.eu .
Collapse
Affiliation(s)
- Dipan Ghosh
- Institute of Structural Biology , Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) , Ingolstaedter Landstrasse 1 , 85764 Neuherberg , Germany
| | - Uwe Koch
- Lead Discovery Center GmbH , Otto-Hahn-Straße 15 , 44227 Dortmund , Germany
| | - Kamyar Hadian
- Assay Development and Screening Platform , Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) , Ingolstaedter Landstrasse 1 , 85764 Neuherberg , Germany
| | - Michael Sattler
- Bayerisches NMR-Zentrum, Department of Chemistry , Technical University of Munich , Ernst-Otto-Fischer-Straße 2 , 85747 Garching , Germany
| | - Igor V Tetko
- Institute of Structural Biology , Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) , Ingolstaedter Landstrasse 1 , 85764 Neuherberg , Germany.,BIGCHEM GmbH , Ingolstaedter Landstrasse 1 b. 60w , 85764 Neuherberg , Germany
| |
Collapse
|
27
|
Withnall M, Chen H, Tetko IV. Matched Molecular Pair Analysis on Large Melting Point Datasets: A Big Data Perspective. ChemMedChem 2018; 13:599-606. [PMID: 28650584 PMCID: PMC5900986 DOI: 10.1002/cmdc.201700303] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Revised: 06/26/2017] [Indexed: 11/11/2022]
Abstract
A matched molecular pair (MMP) analysis was used to examine the change in melting point (MP) between pairs of similar molecules in a set of ∼275k compounds. We found many cases in which the change in MP (ΔMP) of compounds correlates with changes in functional groups. In line with the results of a previous study, correlations between ΔMP and simple molecular descriptors, such as the number of hydrogen bond donors, were identified. In using a larger dataset, covering a wider chemical space and range of melting points, we observed that this method remains stable and scales well with larger datasets. This MMP-based method could find use as a simple privacy-preserving technique to analyze large proprietary databases and share findings between participating research groups.
Collapse
Affiliation(s)
- Michael Withnall
- Helmholtz Zentrum München—German Research Center for Environmental Health, GmbHInstitute of Structural BiologyNeuherbergGermany
| | - Hongming Chen
- External Sciences, Discovery Sciences, Innovative Medicines and Early Development Biotech Unit, AstraZeneca R&D GothenburgMölndal43183Sweden
| | - Igor V. Tetko
- Helmholtz Zentrum München—German Research Center for Environmental Health, GmbHInstitute of Structural BiologyNeuherbergGermany
- BIGCHEM GmbHIngolstädter Landstraße 1, b. 60w85764NeuherbergGermany
- Institute of Structural Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, GmbHIngolstädter Landstraße 185764NeuherbergGermany
| |
Collapse
|
28
|
Ertl P. An algorithm to identify functional groups in organic molecules. J Cheminform 2017; 9:36. [PMID: 29086048 PMCID: PMC5462667 DOI: 10.1186/s13321-017-0225-z] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Accepted: 05/27/2017] [Indexed: 12/02/2022] Open
Abstract
Background The concept of functional groups forms a basis of organic chemistry, medicinal chemistry, toxicity assessment, spectroscopy and also chemical nomenclature. All current software systems to identify functional groups are based on a predefined list of substructures. We are not aware of any program that can identify all functional groups in a molecule automatically. The algorithm presented in this article is an attempt to solve this scientific challenge. Results An algorithm to identify functional groups in a molecule based on iterative marching through its atoms is described. The procedure is illustrated by extracting functional groups from the bioactive portion of the ChEMBL database, resulting in identification of 3080 unique functional groups. Conclusions A new algorithm to identify all functional groups in organic molecules is presented. The algorithm is relatively simple and full details with examples are provided, therefore implementation in any cheminformatics toolkit should be relatively easy. The new method allows the analysis of functional groups in large chemical databases in a way that was not possible using previous approaches. Electronic supplementary material The online version of this article (doi:10.1186/s13321-017-0225-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Peter Ertl
- Novartis Institutes for BioMedical Research, 4056, Basel, Switzerland.
| |
Collapse
|
29
|
Tetko IV, Maran U, Tropsha A. Public (Q)SAR Services, Integrated Modeling Environments, and Model Repositories on the Web: State of the Art and Perspectives for Future Development. Mol Inform 2016; 36. [PMID: 27778468 DOI: 10.1002/minf.201600082] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 10/03/2016] [Indexed: 01/08/2023]
Abstract
Thousands of (Quantitative) Structure-Activity Relationships (Q)SAR models have been described in peer-reviewed publications; however, this way of sharing seldom makes models available for the use by the research community outside of the developer's laboratory. Conversely, on-line models allow broad dissemination and application representing the most effective way of sharing the scientific knowledge. Approaches for sharing and providing on-line access to models range from web services created by individual users and laboratories to integrated modeling environments and model repositories. This emerging transition from the descriptive and informative, but "static", and for the most part, non-executable print format to interactive, transparent and functional delivery of "living" models is expected to have a transformative effect on modern experimental research in areas of scientific and regulatory use of (Q)SAR models.
Collapse
Affiliation(s)
- Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München -, German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, D-, 85764, Neuherberg, Germany.,BigChem GmbH, Ingolstädter Landstraße 1, b. 60w, D-, 85764, Neuherberg, Germany
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu, 50411, Estonia
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA.,Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya St. 18, 420008, Kazan, Russia
| |
Collapse
|
30
|
Novotarskyi S, Abdelaziz A, Sushko Y, Körner R, Vogt J, Tetko IV. ToxCast EPA in Vitro to in Vivo Challenge: Insight into the Rank-I Model. Chem Res Toxicol 2016; 29:768-75. [PMID: 27120770 PMCID: PMC5413193 DOI: 10.1021/acs.chemrestox.5b00481] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
The
ToxCast EPA challenge was managed by TopCoder in Spring 2014.
The goal of the challenge was to develop a model to predict the lowest
effect level (LEL) concentration based on in vitro measurements and calculated in silico descriptors.
This article summarizes the computational steps used to develop the
Rank-I model, which calculated the lowest prediction error for the
secret test data set of the challenge. The model was developed using
the publicly available Online CHEmical database and Modeling environment
(OCHEM), and it is freely available at http://ochem.eu/article/68104. Surprisingly, this model does not use any in vitro measurements. The logic of the decision steps used to develop the
model and the reason to skip inclusion of in vitro measurements is described. We also show that inclusion of in vitro assays would not improve the accuracy of the model.
Collapse
Affiliation(s)
| | - Ahmed Abdelaziz
- Rosettastein Consulting (UG) , D-85354 Freising, Germany.,Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, TUM-Technische Universität München , Freising, Germany
| | - Yurii Sushko
- eADMET GmbH , Lichtenbergstraße 8, D-85748 Garching, Munich, Germany
| | - Robert Körner
- eADMET GmbH , Lichtenbergstraße 8, D-85748 Garching, Munich, Germany
| | - Joachim Vogt
- eADMET GmbH , Lichtenbergstraße 8, D-85748 Garching, Munich, Germany
| | - Igor V Tetko
- Helmholtz Zentrum München - Research Center for Environmental Health (GmbH), Institute of Structural Biology , Ingolstädter Landstraße 1 b. 60w, D-85764 Neuherberg, Germany.,BigChem GmbH , Ingolstädter Landstraße 1 b. 60w, D-85764 Neuherberg, Germany
| |
Collapse
|
31
|
Abstract
Chemoinformatics techniques were originally developed for the construction and searching of large archives of chemical structures but they were soon applied to problems in drug discovery and are now playing an increasingly important role in many additional areas of chemistry. This Special Issue contains seven original research articles and four review articles that provide an introduction to several aspects of this rapidly developing field.
Collapse
Affiliation(s)
- Peter Willett
- Information School, University of Sheffield, 211 Portobello, Sheffield S1 4DP, UK.
| |
Collapse
|
32
|
Brenke JK, Salmina ES, Ringelstetter L, Dornauer S, Kuzikov M, Rothenaigner I, Schorpp K, Giehler F, Gopalakrishnan J, Kieser A, Gul S, Tetko IV, Hadian K. Identification of Small-Molecule Frequent Hitters of Glutathione S-Transferase–Glutathione Interaction. ACTA ACUST UNITED AC 2016; 21:596-607. [DOI: 10.1177/1087057116639992] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In high-throughput screening (HTS) campaigns, the binding of glutathione S-transferase (GST) to glutathione (GSH) is used for detection of GST-tagged proteins in protein-protein interactions or enzyme assays. However, many false-positives, so-called frequent hitters (FH), arise that either prevent GST/GSH interaction or interfere with assay signal generation or detection. To identify GST-FH compounds, we analyzed the data of five independent AlphaScreen-based screening campaigns to classify compounds that inhibit the GST/GSH interaction. We identified 53 compounds affecting GST/GSH binding but not influencing His-tag/Ni2+-NTA interaction and general AlphaScreen signals. The structures of these 53 experimentally identified GST-FHs were analyzed in chemoinformatic studies to categorize substructural features that promote interference with GST/GSH binding. Here, we confirmed several existing chemoinformatic filters and more importantly extended them as well as added novel filters that specify compounds with anti–GST/GSH activity. Selected compounds were also tested using different antibody-based GST detection technologies and exhibited no interference clearly demonstrating specificity toward their GST/GSH interaction. Thus, these newly described GST-FH will further contribute to the identification of FH compounds containing promiscuous substructures. The developed filters were uploaded to the OCHEM website ( http://ochem.eu ) and are publicly accessible for analysis of future HTS results.
Collapse
Affiliation(s)
- Jara K. Brenke
- Helmholtz Zentrum München für Gesundheit und Umwelt (GmbH), Assay Development and Screening Platform, Institute of Molecular Toxicology and Pharmacology, Neuherberg, Germany
| | - Elena S. Salmina
- Institute for Organic Chemistry, Technical University Bergakademie Freiberg, Germany
| | - Larissa Ringelstetter
- Helmholtz Zentrum München für Gesundheit und Umwelt (GmbH), Assay Development and Screening Platform, Institute of Molecular Toxicology and Pharmacology, Neuherberg, Germany
| | - Scarlett Dornauer
- Helmholtz Zentrum München für Gesundheit und Umwelt (GmbH), Assay Development and Screening Platform, Institute of Molecular Toxicology and Pharmacology, Neuherberg, Germany
| | - Maria Kuzikov
- Fraunhofer Institute for Molecular Biology and Applied Ecology, ScreeningPort (Fraunhofer-IME SP), Hamburg, Germany
| | - Ina Rothenaigner
- Helmholtz Zentrum München für Gesundheit und Umwelt (GmbH), Assay Development and Screening Platform, Institute of Molecular Toxicology and Pharmacology, Neuherberg, Germany
| | - Kenji Schorpp
- Helmholtz Zentrum München für Gesundheit und Umwelt (GmbH), Assay Development and Screening Platform, Institute of Molecular Toxicology and Pharmacology, Neuherberg, Germany
| | - Fabian Giehler
- Helmholtz Zentrum München für Gesundheit und Umwelt (GmbH), Research Unit Gene Vectors, Munich, Germany
- German Center for Infection Research (DZIF), Partner Site Munich, Munich, Germany
| | - Jay Gopalakrishnan
- Laboratory for Centrosome and Cytoskeleton Biology, CMMC, Cologne, Germany
| | - Arnd Kieser
- Helmholtz Zentrum München für Gesundheit und Umwelt (GmbH), Research Unit Gene Vectors, Munich, Germany
- German Center for Infection Research (DZIF), Partner Site Munich, Munich, Germany
| | - Sheraz Gul
- Fraunhofer Institute for Molecular Biology and Applied Ecology, ScreeningPort (Fraunhofer-IME SP), Hamburg, Germany
| | - Igor V. Tetko
- Helmholtz Zentrum München für Gesundheit und Umwelt (GmbH), Institute of Structural Biology, Neuherberg, Germany
- BigChem GmbH, Ingolstädter Landstrasse 1, Neuherberg, Germany
| | - Kamyar Hadian
- Helmholtz Zentrum München für Gesundheit und Umwelt (GmbH), Assay Development and Screening Platform, Institute of Molecular Toxicology and Pharmacology, Neuherberg, Germany
| |
Collapse
|
33
|
Tetko IV, M. Lowe D, Williams AJ. The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS. J Cheminform 2016; 8:2. [PMID: 26807157 PMCID: PMC4724158 DOI: 10.1186/s13321-016-0113-y] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Accepted: 01/08/2016] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Melting point (MP) is an important property in regards to the solubility of chemical compounds. Its prediction from chemical structure remains a highly challenging task for quantitative structure-activity relationship studies. Success in this area of research critically depends on the availability of high quality MP data as well as accurate chemical structure representations in order to develop models. Currently, available datasets for MP predictions have been limited to around 50k molecules while lots more data are routinely generated following the synthesis of novel materials. Significant amounts of MP data are freely available within the patent literature and, if it were available in the appropriate form, could potentially be used to develop predictive models. RESULTS We have developed a pipeline for the automated extraction and annotation of chemical data from published PATENTS. Almost 300,000 data points have been collected and used to develop models to predict melting and pyrolysis (decomposition) points using tools available on the OCHEM modeling platform (http://ochem.eu). A number of technical challenges were simultaneously solved to develop models based on these data. These included the handing of sparse data matrices with >200,000,000,000 entries and parallel calculations using 32 × 6 cores per task using 13 descriptor sets totaling more than 700,000 descriptors. We showed that models developed using data collected from PATENTS had similar or better prediction accuracy compared to the highly curated data used in previous publications. The separation of data for chemicals that decomposed rather than melting, from compounds that did undergo a normal melting transition, was performed and models for both pyrolysis and MPs were developed. The accuracy of the consensus MP models for molecules from the drug-like region of chemical space was similar to their estimated experimental accuracy, 32 °C. Last but not least, important structural features related to the pyrolysis of chemicals were identified, and a model to predict whether a compound will decompose instead of melting was developed. CONCLUSIONS We have shown that automated tools for the analysis of chemical information have reached a mature stage allowing for the extraction and collection of high quality data to enable the development of structure-activity relationship models. The developed models and data are publicly available at http://ochem.eu/article/99826.
Collapse
Affiliation(s)
- Igor V. Tetko
- />Institute of Structural Biology, Helmholtz Zentrum München für Gesundheit und Umwelt (HMGU), Ingolstädter Landstraße 1, b. 60w, 85764 Neuherberg, Germany
- />BigChem GmbH, 85764 Neuherberg, Germany
| | - Daniel M. Lowe
- />NextMove Software Limited, Innovation Centre (Unit 23), Cambridge Science Park, Cambridge, CB4 0EY UK
| | | |
Collapse
|