1
|
Zhu X, Polyakov VR, Bajjuri K, Hu H, Maderna A, Tovee CA, Ward SC. Building Machine Learning Small Molecule Melting Points and Solubility Models Using CCDC Melting Points Dataset. J Chem Inf Model 2023; 63:2948-2959. [PMID: 37125691 DOI: 10.1021/acs.jcim.3c00308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Predicting solubility of small molecules is a very difficult undertaking due to the lack of reliable and consistent experimental solubility data. It is well known that for a molecule in a crystal lattice to be dissolved, it must, first, dissociate from the lattice and then, second, be solvated. The melting point of a compound is proportional to the lattice energy, and the octanol-water partition coefficient (log P) is a measure of the compound's solvation efficiency. The CCDC's melting point dataset of almost one hundred thousand compounds was utilized to create widely applicable machine learning models of small molecule melting points. Using the general solubility equation, the aqueous thermodynamic solubilities of the same compounds can be predicted. The global model could be easily localized by adding additional melting point measurements for a chemical series of interest.
Collapse
Affiliation(s)
- Xiangwei Zhu
- Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States
| | - Valery R Polyakov
- Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States
| | - Krishna Bajjuri
- Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States
| | - Huiyong Hu
- Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States
| | - Andreas Maderna
- Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States
| | - Clare A Tovee
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, U.K
| | - Suzanna C Ward
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, U.K
| |
Collapse
|
2
|
Syed TA, Ansari KB, Banerjee A, Wood DA, Khan MS, Al Mesfer MK. Machine‐learning predictions of caffeine co‐crystal formation accompanying experimental and molecular validations. J FOOD PROCESS ENG 2022. [DOI: 10.1111/jfpe.14230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Tanweer A. Syed
- Department of Chemical Engineering Institute of Chemical Technology Mumbai Maharashtra India
| | - Khursheed B. Ansari
- Department of Chemical Engineering Zakir Husain College of Engineering and Technology, Aligarh Muslim University Aligarh Uttar Pradesh India
| | - Arghya Banerjee
- Department of Chemical Engineering Indian Institute of Technology Ropar Punjab India
| | | | - Mohd Shariq Khan
- Department of Chemical Engineering, College of Engineering Dhofar University Salalah Oman
| | | |
Collapse
|
3
|
Li Y, Aslam A, Saeed S, Zhang G, Kanwal S. Targeting highly resisted anticancer drugs through topological descriptors using VIKOR multi-criteria decision analysis. EUROPEAN PHYSICAL JOURNAL PLUS 2022; 137:1245. [PMID: 36405039 PMCID: PMC9667010 DOI: 10.1140/epjp/s13360-022-03469-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Accepted: 11/06/2022] [Indexed: 06/16/2023]
Abstract
UNLABELLED The disease cancer is expanding on high spans in virtually all over the world, and undoubtedly, the research through all the aspects of sciences for each of its perspective is a great cause in reducing its severeness and symptoms. Chemotherapy is itself a cure to cancer as it helps in controlling the formation of cancerous cells but leaving multiple side effects on a human body. In this research work, we targeted 21 anticancer drugs that are in taken by the patients in combinations during chemotherapies. We introduce another branch of mathematics named as OR (Operations Research) linking to the chemical graph theory. Chemical graph theory allows us to generate highly resistant research on any structure via quantitative structure property relationship (QSPR) modeling to explore and develop new compounds for drugs. In this research study, we visualized what else the QSPR could provide when it comes to ranking drugs. We visualized the results obtained for boiling points and enthalpy of vaporizations through QSPR as the values of correlation coefficients and the errors generated under unique QSPR modeling. The implementation of VIKOR provides the best ranking for each of anticancer drugs when keeping in concern the specified properties, and the conclusions from this research work show another path to biologist scientists to create best combinations keeping in concern the study generated from QSPR. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1140/epjp/s13360-022-03469-x.
Collapse
Affiliation(s)
- Yali Li
- School of Software, Pingdingshan University, Pingdingshan, 467000 China
- Henan International Joint Laboratory for Multidimensional Topology and Carcinogenic Characteristics Analysis of Atmospheric Particulate Matter PM2.5, Pingdingshan, 467000 China
| | - Adnan Aslam
- Department of Natural Sciences and Humanities, University of Engineering and Technology, Lahore (RCET), Pakistan
| | - Saadia Saeed
- Department of Mathematics, Lahore College for Women University, Lahore, Pakistan
| | - Guoping Zhang
- School of Software, Pingdingshan University, Pingdingshan, 467000 China
- Henan International Joint Laboratory for Multidimensional Topology and Carcinogenic Characteristics Analysis of Atmospheric Particulate Matter PM2.5, Pingdingshan, 467000 China
| | - Salma Kanwal
- Department of Mathematics, Lahore College for Women University, Lahore, Pakistan
| |
Collapse
|
4
|
|
5
|
Carrera GVSM. The Melting Point Profile of Organic Molecules: A Chemoinformatic Approach. ADVANCED THEORY AND SIMULATIONS 2022. [DOI: 10.1002/adts.202200503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Gonçalo V. S. M. Carrera
- Chemistry Department LAQV‐REQUIMTE NOVA School of Science and Technology Caparica 2829‐516 Portugal
| |
Collapse
|
6
|
Bujak M, Podsiadło M, Katrusiak A. Response to comment on Properties and interactions - melting point of tribromobenzene isomers. ACTA CRYSTALLOGRAPHICA SECTION B, STRUCTURAL SCIENCE, CRYSTAL ENGINEERING AND MATERIALS 2022; 78:276-278. [PMID: 35411867 DOI: 10.1107/s2052520622003067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Affiliation(s)
- Maciej Bujak
- Faculty of Chemistry, University of Opole, Oleska 48, Opole, 45-052, Poland
| | - Marcin Podsiadło
- Faculty of Chemistry, Adam Mickiewicz University, Uniwersytetu Poznańskiego 8, Poznań, 61-614, Poland
| | - Andrzej Katrusiak
- Faculty of Chemistry, Adam Mickiewicz University, Uniwersytetu Poznańskiego 8, Poznań, 61-614, Poland
| |
Collapse
|
7
|
Makarov D, Fadeeva Y, Shmukler L, Tetko I. Beware of proper validation of models for ionic Liquids! J Mol Liq 2021. [DOI: 10.1016/j.molliq.2021.117722] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
8
|
On prediction of melting points without computer simulation: a focus on energetic molecular crystals. FIREPHYSCHEM 2021. [DOI: 10.1016/j.fpc.2021.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
9
|
Xiang Y, Tang YH, Liu H, Lin G, Sun H. Predicting Single-Substance Phase Diagrams: A Kernel Approach on Graph Representations of Molecules. J Phys Chem A 2021; 125:4488-4497. [PMID: 33999627 DOI: 10.1021/acs.jpca.1c02391] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
This work presents a Gaussian process regression (GPR) model on top of a novel graph representation of chemical molecules that predicts thermodynamic properties of pure substances in single, double, and triple phases. A transferable molecular graph representation is proposed as the input for a marginalized graph kernel, which is the major component of the covariance function in our GPR models. Radial basis function kernels of temperature and pressure are also incorporated into the covariance function when necessary. We predicted three types of representative properties of pure substances in single, double, and triple phases, i.e., critical temperature, vapor-liquid equilibrium (VLE) density, and pressure-temperature density. The accuracy of the models is nearly identical to the precision of the experimental measurements. Moreover, the reliability of our predictions can be quantified on a per-sample basis using the posterior uncertainty of the GPR model. We compare our model against Morgan fingerprints and a graph neural network to further demonstrate the advantage of the proposed method.
Collapse
Affiliation(s)
- Yan Xiang
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yu-Hang Tang
- Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Hongyi Liu
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Guang Lin
- Department of Mathematics & School of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47907, United States
| | - Huai Sun
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
10
|
Toropova AP, Toropov AA, Benfenati E. The self-organizing vector of atom-pairs proportions: use to develop models for melting points. Struct Chem 2021. [DOI: 10.1007/s11224-021-01778-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
11
|
Sifain AE, Rice BM, Yalkowsky SH, Barnes BC. Machine learning transition temperatures from 2D structure. J Mol Graph Model 2021; 105:107848. [PMID: 33667863 DOI: 10.1016/j.jmgm.2021.107848] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/11/2021] [Accepted: 01/19/2021] [Indexed: 10/22/2022]
Abstract
A priori knowledge of physicochemical properties such as melting and boiling could expedite materials discovery. However, theoretical modeling from first principles poses a challenge for efficient virtual screening of potential candidates. As an alternative, the tools of data science are becoming increasingly important for exploring chemical datasets and predicting material properties. Herein, we extend a molecular representation, or set of descriptors, first developed for quantitative structure-property relationship modeling by Yalkowsky and coworkers known as the Unified Physicochemical Property Estimation Relationships (UPPER). This molecular representation has group-constitutive and geometrical descriptors that map to enthalpy and entropy; two thermodynamic quantities that drive thermal phase transitions. We extend the UPPER representation to include additional information about sp2-bonded fragments. Additionally, instead of using the UPPER descriptors in a series of thermodynamically-inspired calculations, as per Yalkowsky, we use the descriptors to construct a vector representation for use with machine learning techniques. The concise and easy-to-compute representation, combined with a gradient-boosting decision tree model, provides an appealing framework for predicting experimental transition temperatures in a diverse chemical space. An application to energetic materials shows that the method is predictive, despite a relatively modest energetics reference dataset. We also report competitive results on diverse public datasets of melting points (i.e., OCHEM, Enamine, Bradley, and Bergström) comprised of over 47k structures. Open source software is available at https://github.com/USArmyResearchLab/ARL-UPPER.
Collapse
Affiliation(s)
- Andrew E Sifain
- CCDC Army Research Laboratory, Aberdeen Proving Ground, MD, 21005, USA
| | - Betsy M Rice
- CCDC Army Research Laboratory, Aberdeen Proving Ground, MD, 21005, USA
| | - Samuel H Yalkowsky
- Department of Pharmaceutics, College of Pharmacy, University of Arizona, Tucson, AZ, 85721, USA
| | - Brian C Barnes
- CCDC Army Research Laboratory, Aberdeen Proving Ground, MD, 21005, USA.
| |
Collapse
|
12
|
Fu L, Yang ZY, Yang ZJ, Yin MZ, Lu AP, Chen X, Liu S, Hou TJ, Cao DS. QSAR-assisted-MMPA to expand chemical transformation space for lead optimization. Brief Bioinform 2021; 22:6071857. [PMID: 33418563 DOI: 10.1093/bib/bbaa374] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 10/25/2020] [Accepted: 11/25/2020] [Indexed: 11/13/2022] Open
Abstract
Matched molecular pairs analysis (MMPA) has become a powerful tool for automatically and systematically identifying medicinal chemistry transformations from compound/property datasets. However, accurate determination of matched molecular pair (MMP) transformations largely depend on the size and quality of existing experimental data. Lack of high-quality experimental data heavily hampers the extraction of more effective medicinal chemistry knowledge. Here, we developed a new strategy called quantitative structure-activity relationship (QSAR)-assisted-MMPA to expand the number of chemical transformations and took the logD7.4 property endpoint as an example to demonstrate the reliability of the new method. A reliable logD7.4 consensus prediction model was firstly established, and its applicability domain was strictly assessed. By applying the reliable logD7.4 prediction model to screen two chemical databases, we obtained more high-quality logD7.4 data by defining a strict applicability domain threshold. Then, MMPA was performed on the predicted data and experimental data to derive more chemical rules. To validate the reliability of the chemical rules, we compared the magnitude and directionality of the property changes of the predicted rules with those of the measured rules. Then, we compared the novel chemical rules generated by our proposed approach with the published chemical rules, and found that the magnitude and directionality of the property changes were consistent, indicating that the proposed QSAR-assisted-MMPA approach has the potential to enrich the collection of rule types or even identify completely novel rules. Finally, we found that the number of the MMP rules derived from the experimental data could be amplified by the predicted data, which is helpful for us to analyze the medicinal chemical rules in local chemical environment. In summary, the proposed QSAR-assisted-MMPA approach could be regarded as a very promising strategy to expand the chemical transformation space for lead optimization, especially when no enough experimental data can support MMPA.
Collapse
Affiliation(s)
- Li Fu
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China.,Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Ming-Zhu Yin
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R China
| | - Xiang Chen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Shao Liu
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong-Sheng Cao
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China.,Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha 410008, Hunan, P. R. China.,Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R China
| |
Collapse
|
13
|
Korshunova M, Ginsburg B, Tropsha A, Isayev O. OpenChem: A Deep Learning Toolkit for Computational Chemistry and Drug Design. J Chem Inf Model 2021; 61:7-13. [PMID: 33393291 DOI: 10.1021/acs.jcim.0c00971] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Deep learning models have demonstrated outstanding results in many data-rich areas of research, such as computer vision and natural language processing. Currently, there is a rise of deep learning in computational chemistry and materials informatics, where deep learning could be effectively applied in modeling the relationship between chemical structures and their properties. With the immense growth of chemical and materials data, deep learning models can begin to outperform conventional machine learning techniques such as random forest, support vector machines, and nearest neighbor. Herein, we introduce OpenChem, a PyTorch-based deep learning toolkit for computational chemistry and drug design. OpenChem offers easy and fast model development, modular software design, and several data preprocessing modules. It is freely available via the GitHub repository.
Collapse
Affiliation(s)
- Maria Korshunova
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, United States.,Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, United States
| | - Boris Ginsburg
- NVIDIA Corporation, Santa Clara, California 95050, United States
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Olexandr Isayev
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, United States.,Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, United States
| |
Collapse
|
14
|
Crystalline tetrazepam as a case study on the volume change on melting of molecular organic compounds. Int J Pharm 2021; 593:120124. [DOI: 10.1016/j.ijpharm.2020.120124] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 11/23/2020] [Accepted: 11/23/2020] [Indexed: 11/22/2022]
|
15
|
Yuan J, Liu X, Wang S, Chang C, Zeng Q, Song Z, Jin Y, Zeng Q, Sun G, Ruan S, Greenwell C, Abramov YA. Virtual coformer screening by a combined machine learning and physics-based approach. CrystEngComm 2021. [DOI: 10.1039/d1ce00587a] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Cocrystals as a solid form technology for improving physicochemical properties have gained increasing popularity in the pharmaceutical, nutraceutical, and agrochemical industries.
Collapse
Affiliation(s)
- Jiuchuang Yuan
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Xuetao Liu
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogeomics, Peking University Shenzhen Graduate School, Shenzhen, 518055 China
| | - Simin Wang
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Chao Chang
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Qiao Zeng
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Zhengtian Song
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Yingdi Jin
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Qun Zeng
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Guangxu Sun
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | - Shigang Ruan
- XtalPi Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen, 518100 China
| | | | - Yuriy A. Abramov
- XtalPi Inc, Cambridge, Massachusetts 02142, USA
- Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
16
|
Boobier S, Hose DRJ, Blacker AJ, Nguyen BN. Machine learning with physicochemical relationships: solubility prediction in organic solvents and water. Nat Commun 2020; 11:5753. [PMID: 33188226 PMCID: PMC7666209 DOI: 10.1038/s41467-020-19594-z] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 10/12/2020] [Indexed: 11/09/2022] Open
Abstract
Solubility prediction remains a critical challenge in drug development, synthetic route and chemical process design, extraction and crystallisation. Here we report a successful approach to solubility prediction in organic solvents and water using a combination of machine learning (ANN, SVM, RF, ExtraTrees, Bagging and GP) and computational chemistry. Rational interpretation of dissolution process into a numerical problem led to a small set of selected descriptors and subsequent predictions which are independent of the applied machine learning method. These models gave significantly more accurate predictions compared to benchmarked open-access and commercial tools, achieving accuracy close to the expected level of noise in training data (LogS ± 0.7). Finally, they reproduced physicochemical relationship between solubility and molecular properties in different solvents, which led to rational approaches to improve the accuracy of each models.
Collapse
Affiliation(s)
- Samuel Boobier
- Institute of Process Research & Development, School of Chemistry, University of Leeds, Woodhouse Lane, Leeds, LS2 9JT, UK
| | - David R J Hose
- Chemical Development, Pharmaceutical Technology and Development, Operations, AstraZeneca, Macclesfield, SK10 2NA, UK
| | - A John Blacker
- Institute of Process Research & Development, School of Chemistry, University of Leeds, Woodhouse Lane, Leeds, LS2 9JT, UK
| | - Bao N Nguyen
- Institute of Process Research & Development, School of Chemistry, University of Leeds, Woodhouse Lane, Leeds, LS2 9JT, UK.
| |
Collapse
|
17
|
Sivaraman G, Jackson NE, Sanchez-Lengeling B, Vázquez-Mayagoitia Á, Aspuru-Guzik A, Vishwanath V, de Pablo JJ. A machine learning workflow for molecular analysis: application to melting points. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/ab8aa3] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Abstract
Computational tools encompassing integrated molecular prediction, analysis, and generation are key for molecular design in a variety of critical applications. In this work, we develop a workflow for molecular analysis (MOLAN) that integrates an ensemble of supervised and unsupervised machine learning techniques to analyze molecular data sets. The MOLAN workflow combines molecular featurization, clustering algorithms, uncertainty analysis, low-bias dataset construction, high-performance regression models, graph-based molecular embeddings and attribution, and a semi-supervised variational autoencoder based on the novel SELFIES representation to enable molecular design. We demonstrate the utility of the MOLAN workflow in the context of a challenging multi-molecule property prediction problem: the determination of melting points solely from single molecule structure. This application serves as a case study for how to employ the MOLAN workflow in the context of molecular property prediction.
Collapse
|
18
|
Tinworth CP, Young RJ. Facts, Patterns, and Principles in Drug Discovery: Appraising the Rule of 5 with Measured Physicochemical Data. J Med Chem 2020; 63:10091-10108. [PMID: 32324397 DOI: 10.1021/acs.jmedchem.9b01596] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The rule of 5 was designed to estimate the likelihood of poor absorption or permeation, noting the impact of poor solubility. This Perspective explores the impact of various physicochemical descriptors and contemporary lipophilicity measurements on permeability and solubility, showing that the distribution coefficient log D7.4 (rather than log P) is the most impactful parameter. Molecular weight, almost invariably the defining characteristic of "beyond the rule of 5" compounds, has little impact on solubility when log D7.4 measurements and aromaticity are considered. Predicting permeation is more complex, given passive and carrier transport mechanisms; however, notable patterns of behavior are apparent, giving insight even "beyond the rule of 5". Recommended best practices should involve using the facts (measurements) and the patterns they reveal to establish informative principles rather than fastidious rules.
Collapse
Affiliation(s)
- Christopher P Tinworth
- Medicinal Sciences and Technology, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Robert J Young
- Medicinal Sciences and Technology, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K.,Blue Burgundy Ltd., Bedford, Bedfordshire MK45 2AD, U.K
| |
Collapse
|
19
|
Mathieu D. QSPR versus fragment-based methods to predict octanol-air partition coefficients: Revisiting a recent comparison of both approaches. CHEMOSPHERE 2020; 245:125584. [PMID: 31864054 DOI: 10.1016/j.chemosphere.2019.125584] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 12/04/2019] [Accepted: 12/07/2019] [Indexed: 06/10/2023]
Abstract
The octanol-air partition coefficient (KOA) is useful to assess the fate of organic chemicals in the environment. Very recently, an interesting comparison of current methods to predict this property (Chemosphere 148 (2016) 118-125) highlighted a newly introduced Quantitative Structure-Property Relationship (QSPR), as a group-contribution (GC) method and a quantum chemical solvation model were reported to yield significantly less accurate results. Based on the observation that the so-called GC method investigated in this earlier study was inconsistent with the temperature dependence of KOA, the previously recommended QSPR is presently compared to the geometrical fragment (GF) additivity scheme. In addition to providing some improvement in terms of accuracy, this fragment-based procedure exhibits many advantages in terms of simplicity, interpretability, applicability and availability.
Collapse
|
20
|
Karpov P, Godin G, Tetko IV. Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J Cheminform 2020; 12:17. [PMID: 33431004 PMCID: PMC7079452 DOI: 10.1186/s13321-020-00423-w] [Citation(s) in RCA: 107] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Accepted: 03/09/2020] [Indexed: 01/03/2023] Open
Abstract
We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis is based on an internal consensus. That both the augmentation and transfer learning are based on embeddings allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings needed to train a QSAR model are available on https://github.com/bigchem/transformer-cnn. The repository also has a standalone program for QSAR prognosis which calculates individual atoms contributions, thus interpreting the model’s result. OCHEM [3] environment (https://ochem.eu) hosts the on-line implementation of the method proposed.
Collapse
Affiliation(s)
- Pavel Karpov
- Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany. .,BIGCHEM GmbH, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.
| | - Guillaume Godin
- Firmenich International SA, Digital Lab, Geneva, Lausanne, Switzerland
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.,BIGCHEM GmbH, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| |
Collapse
|
21
|
Chen G, Shen Z, Iyer A, Ghumman UF, Tang S, Bi J, Chen W, Li Y. Machine-Learning-Assisted De Novo Design of Organic Molecules and Polymers: Opportunities and Challenges. Polymers (Basel) 2020; 12:E163. [PMID: 31936321 PMCID: PMC7023065 DOI: 10.3390/polym12010163] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 12/27/2019] [Accepted: 01/02/2020] [Indexed: 12/18/2022] Open
Abstract
Organic molecules and polymers have a broad range of applications in biomedical, chemical, and materials science fields. Traditional design approaches for organic molecules and polymers are mainly experimentally-driven, guided by experience, intuition, and conceptual insights. Though they have been successfully applied to discover many important materials, these methods are facing significant challenges due to the tremendous demand of new materials and vast design space of organic molecules and polymers. Accelerated and inverse materials design is an ideal solution to these challenges. With advancements in high-throughput computation, artificial intelligence (especially machining learning, ML), and the growth of materials databases, ML-assisted materials design is emerging as a promising tool to flourish breakthroughs in many areas of materials science and engineering. To date, using ML-assisted approaches, the quantitative structure property/activity relation for material property prediction can be established more accurately and efficiently. In addition, materials design can be revolutionized and accelerated much faster than ever, through ML-enabled molecular generation and inverse molecular design. In this perspective, we review the recent progresses in ML-guided design of organic molecules and polymers, highlight several successful examples, and examine future opportunities in biomedical, chemical, and materials science fields. We further discuss the relevant challenges to solve in order to fully realize the potential of ML-assisted materials design for organic molecules and polymers. In particular, this study summarizes publicly available materials databases, feature representations for organic molecules, open-source tools for feature generation, methods for molecular generation, and ML models for prediction of material properties, which serve as a tutorial for researchers who have little experience with ML before and want to apply ML for various applications. Last but not least, it draws insights into the current limitations of ML-guided design of organic molecules and polymers. We anticipate that ML-assisted materials design for organic molecules and polymers will be the driving force in the near future, to meet the tremendous demand of new materials with tailored properties in different fields.
Collapse
Affiliation(s)
- Guang Chen
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
| | - Zhiqiang Shen
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
| | - Akshay Iyer
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Umar Farooq Ghumman
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Shan Tang
- State Key Laboratory of Structural Analysis for Industrial Equipment, Department of Engineering Mechanics, and International Research Center for Computational Mechanics, Dalian University of Technology, Dalian 116023, China;
| | - Jinbo Bi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA;
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
- Polymer Program, Institute of Materials Science, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
22
|
Bunally SB, Luscombe CN, Young RJ. Using Physicochemical Measurements to Influence Better Compound Design. SLAS DISCOVERY 2019; 24:791-801. [PMID: 31429385 DOI: 10.1177/2472555219859845] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
During the past decade, the physicochemical quality of molecules under investigation at all stages of the drug discovery process has come under particular scrutiny. The issues associated with excessive lipophilicity and poor solubility in particular are many and varied, ranging from poor outcomes in screening campaigns to promiscuity, limited and/or poorly predictable pharmacokinetic exposure, and, ultimately, greater chances of clinical failure. In this review, contemporary methods to secure key measurements are described along with their relevance to understanding the behavior of molecules in environments pertinent to pharmacological activity. Together, the various measurements contribute to predictive models of both the physicochemical properties themselves and the outcomes they influence.
Collapse
Affiliation(s)
| | | | - Robert J Young
- 1 GlaxoSmithKline Medicines Research Centre, Stevenage, UK
| |
Collapse
|
23
|
Dalavitsou A, Vasiliadis A, Mordos MD, Kouskoura MG, Markopoulou CK. Analytes’ Structure and Signal Response in Evaporating Light Scattering Detector (ELSD). CURR ANAL CHEM 2019. [DOI: 10.2174/1573411014666180330161557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Working with an Evaporative Light Scattering Detector (ELSD), the target
components are converted to a suspension of particles in a gas phase by a nebulizer and heated while
the mobile phase is evaporated. Then, the incident light is directed at the remaining particles which are
scattered and detected.
Methods:
The signal response of an ELS detector is studied through the correlation of the signal intensity
of 65 compounds (at 30, 45 and 80°C) with their structural and physicochemical characteristics.
Therefore, 67 physicochemical properties as well as structural features of the analytes were inserted as
X variables and they were studied in correlation with their signal intensity (Y variable).
Results:
The collected data were statistically processed with the use of partial least squares method. The
results proved that several properties were those that mainly affected the signal intensity either increasing
or decreasing this response.
Conclusion:
The derived results proved that properties related to vapor pressure, size, density, melting
and boiling point of the analytes were responsible for changes in the signal intensity. The light detected
was also affected by properties relevant to the ability of a molecule to form hydrogen bonds (HBA and
HBD) and its polarizability or refractivity, but at a lower extent.
Collapse
Affiliation(s)
- Antonia Dalavitsou
- Laboratory of Pharmaceutical Analysis, Department of Pharmaceutical Technology, School of Pharmacy, Faculty of Health Sciences, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Alexandros Vasiliadis
- Laboratory of Pharmaceutical Analysis, Department of Pharmaceutical Technology, School of Pharmacy, Faculty of Health Sciences, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Michail D. Mordos
- Laboratory of Pharmaceutical Analysis, Department of Pharmaceutical Technology, School of Pharmacy, Faculty of Health Sciences, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Maria G. Kouskoura
- Laboratory of Pharmaceutical Analysis, Department of Pharmaceutical Technology, School of Pharmacy, Faculty of Health Sciences, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Catherine K. Markopoulou
- Laboratory of Pharmaceutical Analysis, Department of Pharmaceutical Technology, School of Pharmacy, Faculty of Health Sciences, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| |
Collapse
|
24
|
Affiliation(s)
- Tong Deng
- College of Physical and Electronics Engineering, Sichuan Normal University, Chengdu, People’s Republic of China
| | - Guo-zhu Jia
- College of Physical and Electronics Engineering, Sichuan Normal University, Chengdu, People’s Republic of China
| |
Collapse
|
25
|
Palmblad M. Visual and Semantic Enrichment of Analytical Chemistry Literature Searches by Combining Text Mining and Computational Chemistry. Anal Chem 2019; 91:4312-4316. [PMID: 30835438 PMCID: PMC6448173 DOI: 10.1021/acs.analchem.8b05818] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
The
open-access scientific literature contains a wealth of information
for meaningful text mining. However, this information is not always
easy to retrieve. This technical note addresses the problem by a new
flexible method combining in a single workflow existing resources
for literature searches, text mining, and large-scale prediction of
physicochemical and biological properties. The results are visualized
as virtual mass spectra, chromatograms, or images in styles new to
text mining but familiar to analytical chemistry. The method is demonstrated
on comparisons of analytical-chemistry techniques and semantically
enriched searches for proteins and their activities, but it may also
be of general utility in experimental design, drug discovery, chemical
syntheses, business intelligence, and historical studies. The method
is realized in shareable scientific workflows using only freely available
data, services, and software that scale to millions of publications
and named chemical entities in the literature.
Collapse
Affiliation(s)
- Magnus Palmblad
- Center for Proteomics and Metabolomics , Leiden University Medical Center , Postzone S3-P, Postbus 9600, 2300 RC Leiden , The Netherlands
| |
Collapse
|
26
|
Abstract
De novo drug design aims to generate novel chemical compounds with desirable chemical and pharmacological properties from scratch using computer-based methods. Recently, deep generative neural networks have become a very active research frontier in de novo drug discovery, both in theoretical and in experimental evidence, shedding light on a promising new direction of automatic molecular generation and optimization. In this review, we discussed recent development of deep learning models for molecular generation and summarized them as four different generative architectures with four different optimization strategies. We also discussed future directions of deep generative models for de novo drug design.
Collapse
|
27
|
Brown TN, Armitage JM, Arnot JA. Application of an Iterative Fragment Selection (IFS) Method to Estimate Entropies of Fusion and Melting Points of Organic Chemicals. Mol Inform 2019; 38:e1800160. [PMID: 30816634 DOI: 10.1002/minf.201800160] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2018] [Accepted: 02/10/2019] [Indexed: 11/09/2022]
Abstract
The main objective of this study is to develop and evaluate novel Quantitative Structure-Property Relationships (QSPRs) for predicting entropy of fusion (ΔSM ) and melting point (TM ) of organic chemicals from chemical structure. The QSPRs are developed using the Iterative Fragment Selection (IFS) method that requires only 2D structural information from the user (SMILES codes) for property prediction. The QSPRs also provide information on the applicability domain for each calculation and uncertainty estimates for the predictions. The root mean square error (RMSE) for the external validation sets are 11.8 J mol-1 K-1 and 46.9 K for the ΔSM and TM QSPRs, respectively. The performance of the new QSPRs is comparable to other predictive methods but has advantages with respect to availability and ease of use as well as the guidance on applicability domain for each prediction. Limitations of the new QSPRs are discussed. The QSPRs are coded as a user-friendly, freely available tool.
Collapse
Affiliation(s)
| | - James M Armitage
- AES Armitage Environmental Sciences, Inc., Ottawa ON, Canada, K1L 8C3
| | - Jon A Arnot
- ARC Arnot Research and Consulting, Inc., Toronto ON, Canada, M4M 1W4.,Department of Physical and Environmental Sciences, University of Toronto Scarborough, Toronto ON, Canada, M1C 1A4.,Department of Pharmacology and Toxicology, University of Toronto, Toronto ON, Canada, M5S 1A8
| |
Collapse
|
28
|
Sakkiah S, Guo W, Pan B, Kusko R, Tong W, Hong H. Computational prediction models for assessing endocrine disrupting potential of chemicals. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, ENVIRONMENTAL CARCINOGENESIS & ECOTOXICOLOGY REVIEWS 2019; 36:192-218. [PMID: 30633647 DOI: 10.1080/10590501.2018.1537132] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Endocrine disrupting chemicals (EDCs) mimic natural hormones and disrupt endocrine function. Humans and wildlife are exposed to EDCs might alter endocrine functions through various mechanisms and lead to an adverse effects. Hence, EDCs identification is important to protect the ecosystem and to promote the public health. Leveraging in-vitro and in-vivo experiments to identify potential EDCs is time consuming and expensive. Hence, quantitative structure-activity relationship is applied to screen the potential EDCs. Here, we summarize the predictive models developed using various algorithms to forecast the binding activity of chemicals to the estrogen and androgen receptors, alpha-fetoprotein, and sex hormone binding globulin.
Collapse
Affiliation(s)
- Sugunadevi Sakkiah
- a Division of Bioinformatics and Biostatistics , National Center for Toxicological Research, U.S. Food and Drug Administration , Jefferson , Arkansas , USA
| | - Wenjing Guo
- a Division of Bioinformatics and Biostatistics , National Center for Toxicological Research, U.S. Food and Drug Administration , Jefferson , Arkansas , USA
| | - Bohu Pan
- a Division of Bioinformatics and Biostatistics , National Center for Toxicological Research, U.S. Food and Drug Administration , Jefferson , Arkansas , USA
| | - Rebecca Kusko
- b Immuneering Corporation , Cambridge , Massachusetts , USA
| | - Weida Tong
- a Division of Bioinformatics and Biostatistics , National Center for Toxicological Research, U.S. Food and Drug Administration , Jefferson , Arkansas , USA
| | - Huixiao Hong
- a Division of Bioinformatics and Biostatistics , National Center for Toxicological Research, U.S. Food and Drug Administration , Jefferson , Arkansas , USA
| |
Collapse
|
29
|
Liu R, Wallqvist A. Molecular Similarity-Based Domain Applicability Metric Efficiently Identifies Out-of-Domain Compounds. J Chem Inf Model 2018; 59:181-189. [DOI: 10.1021/acs.jcim.8b00597] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Ruifeng Liu
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland 21702, United States
| | - Anders Wallqvist
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland 21702, United States
| |
Collapse
|
30
|
Popova M, Isayev O, Tropsha A. Deep reinforcement learning for de novo drug design. SCIENCE ADVANCES 2018; 4:eaap7885. [PMID: 30050984 PMCID: PMC6059760 DOI: 10.1126/sciadv.aap7885] [Citation(s) in RCA: 499] [Impact Index Per Article: 83.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2017] [Accepted: 06/13/2018] [Indexed: 05/20/2023]
Abstract
We have devised and implemented a novel computational strategy for de novo design of molecules with desired properties termed ReLeaSE (Reinforcement Learning for Structural Evolution). On the basis of deep and reinforcement learning (RL) approaches, ReLeaSE integrates two deep neural networks-generative and predictive-that are trained separately but are used jointly to generate novel targeted chemical libraries. ReLeaSE uses simple representation of molecules by their simplified molecular-input line-entry system (SMILES) strings only. Generative models are trained with a stack-augmented memory network to produce chemically feasible SMILES strings, and predictive models are derived to forecast the desired properties of the de novo-generated compounds. In the first phase of the method, generative and predictive models are trained separately with a supervised learning algorithm. In the second phase, both models are trained jointly with the RL approach to bias the generation of new chemical structures toward those with the desired physical and/or biological properties. In the proof-of-concept study, we have used the ReLeaSE method to design chemical libraries with a bias toward structural complexity or toward compounds with maximal, minimal, or specific range of physical properties, such as melting point or hydrophobicity, or toward compounds with inhibitory activity against Janus protein kinase 2. The approach proposed herein can find a general use for generating targeted chemical libraries of novel compounds optimized for either a single desired property or multiple properties.
Collapse
Affiliation(s)
- Mariya Popova
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599, USA
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow 141700, Russia
- Skolkovo Institute of Science and Technology, Moscow 143026, Russia
| | - Olexandr Isayev
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599, USA
- Corresponding author. (A.T.); (O.I.)
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599, USA
- Corresponding author. (A.T.); (O.I.)
| |
Collapse
|
31
|
Withnall M, Chen H, Tetko IV. Matched Molecular Pair Analysis on Large Melting Point Datasets: A Big Data Perspective. ChemMedChem 2018; 13:599-606. [PMID: 28650584 PMCID: PMC5900986 DOI: 10.1002/cmdc.201700303] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Revised: 06/26/2017] [Indexed: 11/11/2022]
Abstract
A matched molecular pair (MMP) analysis was used to examine the change in melting point (MP) between pairs of similar molecules in a set of ∼275k compounds. We found many cases in which the change in MP (ΔMP) of compounds correlates with changes in functional groups. In line with the results of a previous study, correlations between ΔMP and simple molecular descriptors, such as the number of hydrogen bond donors, were identified. In using a larger dataset, covering a wider chemical space and range of melting points, we observed that this method remains stable and scales well with larger datasets. This MMP-based method could find use as a simple privacy-preserving technique to analyze large proprietary databases and share findings between participating research groups.
Collapse
Affiliation(s)
- Michael Withnall
- Helmholtz Zentrum München—German Research Center for Environmental Health, GmbHInstitute of Structural BiologyNeuherbergGermany
| | - Hongming Chen
- External Sciences, Discovery Sciences, Innovative Medicines and Early Development Biotech Unit, AstraZeneca R&D GothenburgMölndal43183Sweden
| | - Igor V. Tetko
- Helmholtz Zentrum München—German Research Center for Environmental Health, GmbHInstitute of Structural BiologyNeuherbergGermany
- BIGCHEM GmbHIngolstädter Landstraße 1, b. 60w85764NeuherbergGermany
- Institute of Structural Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, GmbHIngolstädter Landstraße 185764NeuherbergGermany
| |
Collapse
|
32
|
Tebes-Stevens C, Patel JM, Koopmans M, Olmstead J, Hilal SH, Pope N, Weber EJ, Wolfe K. Demonstration of a consensus approach for the calculation of physicochemical properties required for environmental fate assessments. CHEMOSPHERE 2018; 194:94-106. [PMID: 29197820 PMCID: PMC6146973 DOI: 10.1016/j.chemosphere.2017.11.137] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 11/21/2017] [Accepted: 11/22/2017] [Indexed: 05/21/2023]
Abstract
Eight software applications are compared for their performance in estimating the octanol-water partition coefficient (Kow), melting point, vapor pressure and water solubility for a dataset of polychlorinated biphenyls, polybrominated diphenyl ethers, polychlorinated dibenzodioxins, and polycyclic aromatic hydrocarbons. The predicted property values are compared against a curated dataset of measured property values compiled from the scientific literature with careful consideration given to the analytical methods used for property measurements of these hydrophobic chemicals. The variability in the predicted values from different calculators generally increases for higher values of Kow and melting point and for lower values of water solubility and vapor pressure. For each property, no individual calculator outperforms the others for all four of the chemical classes included in the analysis. Because calculator performance varies based on chemical class and property value, the geometric mean and the median of the calculated values from multiple calculators that use different estimation algorithms are recommended as more reliable estimates of the property value than the value from any single calculator.
Collapse
Affiliation(s)
- Caroline Tebes-Stevens
- U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, GA 30605, United States.
| | - Jay M Patel
- ORISE Fellow, U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, GA 30605, United States
| | - Michaela Koopmans
- ORAU, U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, GA 30605, United States
| | - John Olmstead
- ORAU, U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, GA 30605, United States
| | - Said H Hilal
- U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, GA 30605, United States
| | - Nick Pope
- Independent Contractor, Hildebran, NC, United States
| | - Eric J Weber
- U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, GA 30605, United States
| | - Kurt Wolfe
- U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, GA 30605, United States
| |
Collapse
|
33
|
Mathieu D. Atom Pair Contribution Method: Fast and General Procedure To Predict Molecular Formation Enthalpies. J Chem Inf Model 2018; 58:12-26. [DOI: 10.1021/acs.jcim.7b00613] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
34
|
Mathieu D. Solubility of organic compounds in octanol: Improved predictions based on the geometrical fragment approach. CHEMOSPHERE 2017; 182:399-405. [PMID: 28511135 DOI: 10.1016/j.chemosphere.2017.05.045] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Revised: 05/05/2017] [Accepted: 05/08/2017] [Indexed: 06/07/2023]
Abstract
Two new models are introduced to predict the solubility of chemicals in octanol (Soct), taking advantage of the extensive character of log(Soct) through a decomposition of molecules into so-called geometrical fragments (GF). They are extensively validated and their compliance with regulatory requirements is demonstrated. The first model requires just a molecular formula as input. Despite an extreme simplicity, it performs as well as an advanced random forest model involving 86 descriptors, with a root mean square error (RMSE) of 0.64 log units for an external test set of 100 molecules. For the second one, which requires the melting point Tm as input, introducing GF descriptors reduces the RMSE from about 0.7 to <0.5 log units, a performance that could previously be obtained only through the use of Abraham descriptors. A script is provided for easy application of the models, taking into account the limits of their applicability domains.
Collapse
|
35
|
Coley CW, Barzilay R, Green WH, Jaakkola TS, Jensen KF. Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction. J Chem Inf Model 2017; 57:1757-1772. [PMID: 28696688 DOI: 10.1021/acs.jcim.6b00601] [Citation(s) in RCA: 220] [Impact Index Per Article: 31.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The task of learning an expressive molecular representation is central to developing quantitative structure-activity and property relationships. Traditional approaches rely on group additivity rules, empirical measurements or parameters, or generation of thousands of descriptors. In this paper, we employ a convolutional neural network for this embedding task by treating molecules as undirected graphs with attributed nodes and edges. Simple atom and bond attributes are used to construct atom-specific feature vectors that take into account the local chemical environment using different neighborhood radii. By working directly with the full molecular graph, there is a greater opportunity for models to identify important features relevant to a prediction task. Unlike other graph-based approaches, our atom featurization preserves molecule-level spatial information that significantly enhances model performance. Our models learn to identify important features of atom clusters for the prediction of aqueous solubility, octanol solubility, melting point, and toxicity. Extensions and limitations of this strategy are discussed.
Collapse
Affiliation(s)
- Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology , 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology , 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology , 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Tommi S Jaakkola
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology , 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology , 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
36
|
Mathieu D, Bouteloup R. Reliable and Versatile Model for the Density of Liquids Based on Additive Volume Increments. Ind Eng Chem Res 2016. [DOI: 10.1021/acs.iecr.6b03809] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
37
|
Tetko IV, Maran U, Tropsha A. Public (Q)SAR Services, Integrated Modeling Environments, and Model Repositories on the Web: State of the Art and Perspectives for Future Development. Mol Inform 2016; 36. [PMID: 27778468 DOI: 10.1002/minf.201600082] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 10/03/2016] [Indexed: 01/08/2023]
Abstract
Thousands of (Quantitative) Structure-Activity Relationships (Q)SAR models have been described in peer-reviewed publications; however, this way of sharing seldom makes models available for the use by the research community outside of the developer's laboratory. Conversely, on-line models allow broad dissemination and application representing the most effective way of sharing the scientific knowledge. Approaches for sharing and providing on-line access to models range from web services created by individual users and laboratories to integrated modeling environments and model repositories. This emerging transition from the descriptive and informative, but "static", and for the most part, non-executable print format to interactive, transparent and functional delivery of "living" models is expected to have a transformative effect on modern experimental research in areas of scientific and regulatory use of (Q)SAR models.
Collapse
Affiliation(s)
- Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München -, German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, D-, 85764, Neuherberg, Germany.,BigChem GmbH, Ingolstädter Landstraße 1, b. 60w, D-, 85764, Neuherberg, Germany
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu, 50411, Estonia
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA.,Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya St. 18, 420008, Kazan, Russia
| |
Collapse
|
38
|
Does ‘Big Data’ exist in medicinal chemistry, and if so, how can it be harnessed? Future Med Chem 2016; 8:1801-1806. [DOI: 10.4155/fmc-2016-0163] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
|
39
|
Tetko IV, Engkvist O, Koch U, Reymond JL, Chen H. BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry. Mol Inform 2016; 35:615-621. [PMID: 27464907 PMCID: PMC5129546 DOI: 10.1002/minf.201600073] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 07/06/2016] [Indexed: 01/19/2023]
Abstract
The increasing volume of biomedical data in chemistry and life sciences requires the development of new methods and approaches for their handling. Here, we briefly discuss some challenges and opportunities of this fast growing area of research with a focus on those to be addressed within the BIGCHEM project. The article starts with a brief description of some available resources for “Big Data” in chemistry and a discussion of the importance of data quality. We then discuss challenges with visualization of millions of compounds by combining chemical and biological data, the expectations from mining the “Big Data” using advanced machine‐learning methods, and their applications in polypharmacology prediction and target de‐convolution in phenotypic screening. We show that the efficient exploration of billions of molecules requires the development of smart strategies. We also address the issue of secure information sharing without disclosing chemical structures, which is critical to enable bi‐party or multi‐party data sharing. Data sharing is important in the context of the recent trend of “open innovation” in pharmaceutical industry, which has led to not only more information sharing among academics and pharma industries but also the so‐called “precompetitive” collaboration between pharma companies. At the end we highlight the importance of education in “Big Data” for further progress of this area.
Collapse
Affiliation(s)
- Igor V Tetko
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, b. 60w, D-85764, Neuherberg, Germany.,BIGCHEM GmbH, Ingolstädter Landstraße 1, b. 60w, D-85764, Neuherberg, Germany
| | - Ola Engkvist
- Discovery Sciences, AstraZeneca R&D Gothenburg, Pepparedsleden 1, Mölndal, SE-43183, Sweden
| | - Uwe Koch
- Lead Discovery Center GmbH, Otto-Hahn Strasse 15, Dortmund, 44227, Germany
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Hongming Chen
- Discovery Sciences, AstraZeneca R&D Gothenburg, Pepparedsleden 1, Mölndal, SE-43183, Sweden
| |
Collapse
|
40
|
Abstract
INTRODUCTION Neural networks are becoming a very popular method for solving machine learning and artificial intelligence problems. The variety of neural network types and their application to drug discovery requires expert knowledge to choose the most appropriate approach. AREAS COVERED In this review, the authors discuss traditional and newly emerging neural network approaches to drug discovery. Their focus is on backpropagation neural networks and their variants, self-organizing maps and associated methods, and a relatively new technique, deep learning. The most important technical issues are discussed including overfitting and its prevention through regularization, ensemble and multitask modeling, model interpretation, and estimation of applicability domain. Different aspects of using neural networks in drug discovery are considered: building structure-activity models with respect to various targets; predicting drug selectivity, toxicity profiles, ADMET and physicochemical properties; characteristics of drug-delivery systems and virtual screening. EXPERT OPINION Neural networks continue to grow in importance for drug discovery. Recent developments in deep learning suggests further improvements may be gained in the analysis of large chemical data sets. It's anticipated that neural networks will be more widely used in drug discovery in the future, and applied in non-traditional areas such as drug delivery systems, biologically compatible materials, and regenerative medicine.
Collapse
Affiliation(s)
- Igor I Baskin
- a Faculty of Physics , M.V. Lomonosov Moscow State University , Moscow , Russia.,b A.M. Butlerov Institute of Chemistry , Kazan Federal University , Kazan , Russia
| | - David Winkler
- c CSIRO Manufacturing , Clayton , VIC , Australia.,d Monash Institute for Pharmaceutical Sciences , Monash University , Parkville , VIC , Australia.,e Latrobe Institute for Molecular Science , Bundoora , VIC , Australia.,f School of Chemical and Physical Sciences , Flinders University , Bedford Park , SA , Australia
| | - Igor V Tetko
- g Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) , Institute of Structural Biology , Neuherberg , Germany.,h BigChem GmbH , Neuherberg , Germany
| |
Collapse
|
41
|
Mansouri K, Abdelaziz A, Rybacka A, Roncaglioni A, Tropsha A, Varnek A, Zakharov A, Worth A, Richard AM, Grulke CM, Trisciuzzi D, Fourches D, Horvath D, Benfenati E, Muratov E, Wedebye EB, Grisoni F, Mangiatordi GF, Incisivo GM, Hong H, Ng HW, Tetko IV, Balabin I, Kancherla J, Shen J, Burton J, Nicklaus M, Cassotti M, Nikolov NG, Nicolotti O, Andersson PL, Zang Q, Politi R, Beger RD, Todeschini R, Huang R, Farag S, Rosenberg SA, Slavov S, Hu X, Judson RS. CERAPP: Collaborative Estrogen Receptor Activity Prediction Project. ENVIRONMENTAL HEALTH PERSPECTIVES 2016; 124:1023-33. [PMID: 26908244 PMCID: PMC4937869 DOI: 10.1289/ehp.1510267] [Citation(s) in RCA: 222] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Revised: 10/05/2015] [Accepted: 02/08/2016] [Indexed: 05/18/2023]
Abstract
BACKGROUND Humans are exposed to thousands of man-made chemicals in the environment. Some chemicals mimic natural endocrine hormones and, thus, have the potential to be endocrine disruptors. Most of these chemicals have never been tested for their ability to interact with the estrogen receptor (ER). Risk assessors need tools to prioritize chemicals for evaluation in costly in vivo tests, for instance, within the U.S. EPA Endocrine Disruptor Screening Program. OBJECTIVES We describe a large-scale modeling project called CERAPP (Collaborative Estrogen Receptor Activity Prediction Project) and demonstrate the efficacy of using predictive computational models trained on high-throughput screening data to evaluate thousands of chemicals for ER-related activity and prioritize them for further testing. METHODS CERAPP combined multiple models developed in collaboration with 17 groups in the United States and Europe to predict ER activity of a common set of 32,464 chemical structures. Quantitative structure-activity relationship models and docking approaches were employed, mostly using a common training set of 1,677 chemical structures provided by the U.S. EPA, to build a total of 40 categorical and 8 continuous models for binding, agonist, and antagonist ER activity. All predictions were evaluated on a set of 7,522 chemicals curated from the literature. To overcome the limitations of single models, a consensus was built by weighting models on scores based on their evaluated accuracies. RESULTS Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. Out of the 32,464 chemicals, the consensus model predicted 4,001 chemicals (12.3%) as high priority actives and 6,742 potential actives (20.8%) to be considered for further testing. CONCLUSION This project demonstrated the possibility to screen large libraries of chemicals using a consensus of different in silico approaches. This concept will be applied in future projects related to other end points. CITATION Mansouri K, Abdelaziz A, Rybacka A, Roncaglioni A, Tropsha A, Varnek A, Zakharov A, Worth A, Richard AM, Grulke CM, Trisciuzzi D, Fourches D, Horvath D, Benfenati E, Muratov E, Wedebye EB, Grisoni F, Mangiatordi GF, Incisivo GM, Hong H, Ng HW, Tetko IV, Balabin I, Kancherla J, Shen J, Burton J, Nicklaus M, Cassotti M, Nikolov NG, Nicolotti O, Andersson PL, Zang Q, Politi R, Beger RD, Todeschini R, Huang R, Farag S, Rosenberg SA, Slavov S, Hu X, Judson RS. 2016. CERAPP Collaborative Estrogen Receptor Activity Prediction Project. Environ Health Perspect 124:1023-1033; http://dx.doi.org/10.1289/ehp.1510267.
Collapse
Affiliation(s)
- Kamel Mansouri
- National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
- Oak Ridge Institute for Science and Education, Oak Ridge, Tennessee, USA
| | - Ahmed Abdelaziz
- Institute of Structural Biology, Helmholtz Zentrum Muenchen-German Research Center for Environmental Health (GmbH), Neuherberg, Germany
| | | | - Alessandra Roncaglioni
- Environmental Chemistry and Toxicology Laboratory, IRCCS (Istituto di Ricovero e Cura a Carattere Scientifico)-Istituto di Ricerche Farmacologiche Mario Negri, Milan, Italy
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Alexandre Varnek
- Laboratoire de Chemoinformatique, University of Strasbourg, Strasbourg, France
| | - Alexey Zakharov
- National Cancer Institute, National Institutes of Health (NIH), Department of Health and Human Services (DHHS), Bethesda, Maryland, USA
| | - Andrew Worth
- Institute for Health and Consumer Protection (IHCP), Joint Research Centre of the European Commission in Ispra, Ispra, Italy
| | - Ann M. Richard
- National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Christopher M. Grulke
- National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | | | - Denis Fourches
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Dragos Horvath
- Laboratoire de Chemoinformatique, University of Strasbourg, Strasbourg, France
| | - Emilio Benfenati
- Environmental Chemistry and Toxicology Laboratory, IRCCS (Istituto di Ricovero e Cura a Carattere Scientifico)-Istituto di Ricerche Farmacologiche Mario Negri, Milan, Italy
| | - Eugene Muratov
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Eva Bay Wedebye
- Division of Toxicology and Risk Assessment, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Francesca Grisoni
- Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| | | | - Giuseppina M. Incisivo
- Environmental Chemistry and Toxicology Laboratory, IRCCS (Istituto di Ricovero e Cura a Carattere Scientifico)-Istituto di Ricerche Farmacologiche Mario Negri, Milan, Italy
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration (USDA), Jefferson, Arizona, USA
| | - Hui W. Ng
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration (USDA), Jefferson, Arizona, USA
| | - Igor V. Tetko
- Institute of Structural Biology, Helmholtz Zentrum Muenchen-German Research Center for Environmental Health (GmbH), Neuherberg, Germany
- BigChem GmbH, Neuherberg, Germany
| | - Ilya Balabin
- High Performance Computing, Lockheed Martin, Research Triangle Park, North Carolina, USA
| | - Jayaram Kancherla
- National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Jie Shen
- Research Institute for Fragrance Materials, Inc., Woodcliff Lake, New Jersey, USA
| | - Julien Burton
- Institute for Health and Consumer Protection (IHCP), Joint Research Centre of the European Commission in Ispra, Ispra, Italy
| | - Marc Nicklaus
- National Cancer Institute, National Institutes of Health (NIH), Department of Health and Human Services (DHHS), Bethesda, Maryland, USA
| | - Matteo Cassotti
- Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| | - Nikolai G. Nikolov
- Division of Toxicology and Risk Assessment, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Orazio Nicolotti
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | | | - Qingda Zang
- Integrated Laboratory Systems, Inc., Research Triangle Park, North Carolina, USA
| | - Regina Politi
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Richard D. Beger
- Division of Systems Biology, National Center for Toxicological Research, USDA, Jefferson, Arizona, USA
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| | - Ruili Huang
- National Center for Advancing Translational Sciences, NIH, DHHS, Bethesda, Maryland, USA
| | - Sherif Farag
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Sine A. Rosenberg
- Division of Toxicology and Risk Assessment, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Svetoslav Slavov
- Integrated Laboratory Systems, Inc., Research Triangle Park, North Carolina, USA
| | - Xin Hu
- National Center for Advancing Translational Sciences, NIH, DHHS, Bethesda, Maryland, USA
| | - Richard S. Judson
- National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
- Address correspondence to R.S. Judson, U.S. EPA, National Center for Computational Toxicology, 109 T.W. Alexander Dr., Research Triangle Park, NC 27711 USA. Telephone: (919) 541-3085. E-mail:
| |
Collapse
|
42
|
Mathieu D. Physics-Based Modeling of Chemical Hazards in a Regulatory Framework: Comparison with Quantitative Structure–Property Relationship (QSPR) Methods for Impact Sensitivities. Ind Eng Chem Res 2016. [DOI: 10.1021/acs.iecr.6b01536] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
43
|
Novotarskyi S, Abdelaziz A, Sushko Y, Körner R, Vogt J, Tetko IV. ToxCast EPA in Vitro to in Vivo Challenge: Insight into the Rank-I Model. Chem Res Toxicol 2016; 29:768-75. [PMID: 27120770 PMCID: PMC5413193 DOI: 10.1021/acs.chemrestox.5b00481] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
The
ToxCast EPA challenge was managed by TopCoder in Spring 2014.
The goal of the challenge was to develop a model to predict the lowest
effect level (LEL) concentration based on in vitro measurements and calculated in silico descriptors.
This article summarizes the computational steps used to develop the
Rank-I model, which calculated the lowest prediction error for the
secret test data set of the challenge. The model was developed using
the publicly available Online CHEmical database and Modeling environment
(OCHEM), and it is freely available at http://ochem.eu/article/68104. Surprisingly, this model does not use any in vitro measurements. The logic of the decision steps used to develop the
model and the reason to skip inclusion of in vitro measurements is described. We also show that inclusion of in vitro assays would not improve the accuracy of the model.
Collapse
Affiliation(s)
| | - Ahmed Abdelaziz
- Rosettastein Consulting (UG) , D-85354 Freising, Germany.,Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, TUM-Technische Universität München , Freising, Germany
| | - Yurii Sushko
- eADMET GmbH , Lichtenbergstraße 8, D-85748 Garching, Munich, Germany
| | - Robert Körner
- eADMET GmbH , Lichtenbergstraße 8, D-85748 Garching, Munich, Germany
| | - Joachim Vogt
- eADMET GmbH , Lichtenbergstraße 8, D-85748 Garching, Munich, Germany
| | - Igor V Tetko
- Helmholtz Zentrum München - Research Center for Environmental Health (GmbH), Institute of Structural Biology , Ingolstädter Landstraße 1 b. 60w, D-85764 Neuherberg, Germany.,BigChem GmbH , Ingolstädter Landstraße 1 b. 60w, D-85764 Neuherberg, Germany
| |
Collapse
|
44
|
Buchholz H, Emel'yanenko VN, Lorenz H, Verevkin SP. An Examination of the Phase Transition Thermodynamics of (S)- and (RS)-Naproxen as a Basis for the Design of Enantioselective Crystallization Processes. J Pharm Sci 2016; 105:1676-1683. [PMID: 27056629 DOI: 10.1016/j.xphs.2016.02.032] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Revised: 02/03/2016] [Accepted: 02/16/2016] [Indexed: 10/22/2022]
Abstract
A detailed experimental analysis of the phase transition thermodynamics of (S)-naproxen and (RS)-naproxen is reported. Vapor pressures were determined experimentally via the transpiration method. Sublimation enthalpies were obtained from the vapor pressures and from independent TGA measurements. Thermodynamics of fusion which have been well-studied in the literature were systematically remeasured by DSC. Both sublimation and fusion enthalpies were adjusted to one reference temperature, T = 298 K, using measured heat capacities of the solid and the melt phase by DSC. Average values from the measurements and from literature data were suggested for the sublimation and fusion enthalpies. In order to prove consistency of the proposed values the vaporization enthalpies obtained by combination of both were compared to vaporization enthalpies obtained by the group-additivity method and the correlation-gas chromatography method. The importance of reliable and precise phase transition data for thermochemical calculations such as the prediction of solid/liquid phase behaviour of chiral compounds is highlighted.
Collapse
Affiliation(s)
- Hannes Buchholz
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Saxony-Anhalt, Germany.
| | | | - Heike Lorenz
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Saxony-Anhalt, Germany
| | - Sergey P Verevkin
- Department of Physical Chemistry, Kazan Federal University, Kazan, Tatarstan, Russia; Department of Physical Chemistry and Department "Science and Technology of Life, Light and Matter," University of Rostock, Rostock, Mecklenburg-Vorpommern, Germany
| |
Collapse
|
45
|
Tetko IV, Varbanov HP, Galanski MS, Talmaciu M, Platts JA, Ravera M, Gabano E. Prediction of logP for Pt(II) and Pt(IV) complexes: Comparison of statistical and quantum-chemistry based approaches. J Inorg Biochem 2016; 156:1-13. [PMID: 26717258 DOI: 10.1016/j.jinorgbio.2015.12.006] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Revised: 11/19/2015] [Accepted: 12/09/2015] [Indexed: 01/31/2023]
Abstract
The octanol/water partition coefficient, logP, is one of the most important physico-chemical parameters for the development of new metal-based anticancer drugs with improved pharmacokinetic properties. This study addresses an issue with the absence of publicly available models to predict logP of Pt(IV) complexes. Following data collection and subsequent development of models based on 187 complexes from literature, we validate new and previously published models on a new set of 11 Pt(II) and 35 Pt(IV) complexes, which were kept blind during the model development step. The error of the consensus model, 0.65 for Pt(IV) and 0.37 for Pt(II) complexes, indicates its good accuracy of predictions. The lower accuracy for Pt(IV) complexes was attributed to experimental difficulties with logP measurements for some poorly-soluble compounds. This model was developed using general-purpose descriptors such as extended functional groups, molecular fragments and E-state indices. Surprisingly, models based on quantum-chemistry calculations provided lower prediction accuracy. We also found that all the developed models strongly overestimate logP values for the three complexes measured in the presence of DMSO. Considering that DMSO is frequently used as a solvent to store chemicals, its effect should not be overlooked when logP measurements by means of the shake flask method are performed. The final models are freely available at http://ochem.eu/article/76903.
Collapse
Affiliation(s)
- Igor V Tetko
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstaedter Landstrasse 1, b. 60w, D-85764 Neuherberg, Germany; BigChem GmbH, Ingolstaedter Landstrasse 1, b. 60w, D-85764 Neuherberg, Germany.
| | - Hristo P Varbanov
- Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland; Institute of Inorganic Chemistry, University of Vienna, Waehringer Strasse 42, A-1090 Vienna, Austria
| | - Mathea S Galanski
- Institute of Inorganic Chemistry, University of Vienna, Waehringer Strasse 42, A-1090 Vienna, Austria
| | - Mona Talmaciu
- School of Chemistry, Cardiff University, Park Place, Cardiff CF10 3AT, UK; «Iuliu Haţieganu» University of Medicine and Pharmacy, Faculty of Pharmacy, Analytical Chemistry Department, Cluj-Napoca, Romania
| | - James A Platts
- School of Chemistry, Cardiff University, Park Place, Cardiff CF10 3AT, UK
| | - Mauro Ravera
- Dipartimento di Scienze e Innovazione Tecnologica, Università del Piemonte Orientale, Viale Teresa Michel 11, 15121 Alessandria, Italy
| | - Elisabetta Gabano
- Dipartimento di Scienze e Innovazione Tecnologica, Università del Piemonte Orientale, Viale Teresa Michel 11, 15121 Alessandria, Italy
| |
Collapse
|
46
|
Tetko IV, M. Lowe D, Williams AJ. The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS. J Cheminform 2016; 8:2. [PMID: 26807157 PMCID: PMC4724158 DOI: 10.1186/s13321-016-0113-y] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Accepted: 01/08/2016] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Melting point (MP) is an important property in regards to the solubility of chemical compounds. Its prediction from chemical structure remains a highly challenging task for quantitative structure-activity relationship studies. Success in this area of research critically depends on the availability of high quality MP data as well as accurate chemical structure representations in order to develop models. Currently, available datasets for MP predictions have been limited to around 50k molecules while lots more data are routinely generated following the synthesis of novel materials. Significant amounts of MP data are freely available within the patent literature and, if it were available in the appropriate form, could potentially be used to develop predictive models. RESULTS We have developed a pipeline for the automated extraction and annotation of chemical data from published PATENTS. Almost 300,000 data points have been collected and used to develop models to predict melting and pyrolysis (decomposition) points using tools available on the OCHEM modeling platform (http://ochem.eu). A number of technical challenges were simultaneously solved to develop models based on these data. These included the handing of sparse data matrices with >200,000,000,000 entries and parallel calculations using 32 × 6 cores per task using 13 descriptor sets totaling more than 700,000 descriptors. We showed that models developed using data collected from PATENTS had similar or better prediction accuracy compared to the highly curated data used in previous publications. The separation of data for chemicals that decomposed rather than melting, from compounds that did undergo a normal melting transition, was performed and models for both pyrolysis and MPs were developed. The accuracy of the consensus MP models for molecules from the drug-like region of chemical space was similar to their estimated experimental accuracy, 32 °C. Last but not least, important structural features related to the pyrolysis of chemicals were identified, and a model to predict whether a compound will decompose instead of melting was developed. CONCLUSIONS We have shown that automated tools for the analysis of chemical information have reached a mature stage allowing for the extraction and collection of high quality data to enable the development of structure-activity relationship models. The developed models and data are publicly available at http://ochem.eu/article/99826.
Collapse
Affiliation(s)
- Igor V. Tetko
- />Institute of Structural Biology, Helmholtz Zentrum München für Gesundheit und Umwelt (HMGU), Ingolstädter Landstraße 1, b. 60w, 85764 Neuherberg, Germany
- />BigChem GmbH, 85764 Neuherberg, Germany
| | - Daniel M. Lowe
- />NextMove Software Limited, Innovation Centre (Unit 23), Cambridge Science Park, Cambridge, CB4 0EY UK
| | | |
Collapse
|
47
|
Salmina ES, Haider N, Tetko IV. Extended Functional Groups (EFG): An Efficient Set for Chemical Characterization and Structure-Activity Relationship Studies of Chemical Compounds. Molecules 2015; 21:E1. [PMID: 26703557 PMCID: PMC6273096 DOI: 10.3390/molecules21010001] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Revised: 12/09/2015] [Accepted: 12/15/2015] [Indexed: 11/16/2022] Open
Abstract
The article describes a classification system termed “extended functional groups” (EFG), which are an extension of a set previously used by the CheckMol software, that covers in addition heterocyclic compound classes and periodic table groups. The functional groups are defined as SMARTS patterns and are available as part of the ToxAlerts tool (http://ochem.eu/alerts) of the On-line CHEmical database and Modeling (OCHEM) environment platform. The article describes the motivation and the main ideas behind this extension and demonstrates that EFG can be efficiently used to develop and interpret structure-activity relationship models.
Collapse
Affiliation(s)
- Elena S Salmina
- Institute for Organic Chemistry, Technical University Bergakademie Freiberg, Leipziger Str. 29, Freiberg D-09596, Germany.
| | - Norbert Haider
- Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, Vienna A-1090, Austria.
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, b. 60w, Neuherberg D-85764, Germany.
- BigChem GmbH, Ingolstädter Landstraße 1, b. 60w, Neuherberg D-85764, Germany.
| |
Collapse
|
48
|
Tales from the war on error: the art and science of curating QSAR data. J Comput Aided Mol Des 2015; 29:897-910. [PMID: 26290258 DOI: 10.1007/s10822-015-9865-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Accepted: 08/07/2015] [Indexed: 10/23/2022]
Abstract
Curating the data underlying quantitative structure-activity relationship models is a never-ending struggle. Some curation can now be automated but much cannot, especially where data as complex as those pertaining to molecular absorption, distribution, metabolism, excretion, and toxicity are concerned (vide infra). The authors discuss some particularly challenging problem areas in terms of specific examples involving experimental context, incompleteness of data, confusion of units, problematic nomenclature, tautomerism, and misapplication of automated structure recognition tools.
Collapse
|