1
|
Piir G, Sild S, Maran U. Interpretable machine learning for the identification of estrogen receptor agonists, antagonists, and binders. Chemosphere 2024; 347:140671. [PMID: 37951393 DOI: 10.1016/j.chemosphere.2023.140671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 10/25/2023] [Accepted: 11/07/2023] [Indexed: 11/14/2023]
Abstract
An abnormal hormonal activity or exposure to endocrine-disrupting chemicals (EDCs) can cause endocrine system malfunction. Among the many interactions EDCs can affect is the disruption of estrogen signalling, which can lead to adverse health effects such as cancer, osteoporosis, neurodegenerative diseases, cardiovascular disease, insulin resistance, and obesity. Knowing which chemical can act as an EDC is a significant advantage and a practical necessity. New Approach Methodologies (NAM) computational models offer a quick and cost-effective solution for preliminary hazard assessment of chemicals without animal testing. Therefore, a machine learning approach was used to investigate the relationships between estrogen receptor (ER) activity and chemical structure to identify chemicals that can interact with ER. For this purpose, the consolidated in vitro assay data from ToxCast/Tox21 projects was used for developing Random Forest classification models for ER binding, agonists, and antagonists. The overall classification prediction accuracy reaches up to 82%, depending on whether the model predicted agonists, antagonists, or compounds that bind to the active site. Given the imbalance in endocrine disruption data, the derived models are good candidates for deprioritising chemicals and reducing animal testing. The interpretation of theoretical molecular descriptors of the models was consistent with the molecular interactions known in the ligand binding pocket. The estimated class probabilities enabled the analysis of the applicability domain of the developed models and the assessment of the predictions' reliability, followed by the guidelines for interpreting prediction results. The models are openly accessible and useable at QsarDB.org (http://dx.doi.org/10.15152/QDB.259) according to the FAIR (Findable, Accessible, Interoperable, Reusable) principles.
Collapse
Affiliation(s)
- Geven Piir
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu, 50411, Estonia
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu, 50411, Estonia
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu, 50411, Estonia.
| |
Collapse
|
2
|
Oja M, Sild S, Piir G, Maran U. Intrinsic Aqueous Solubility: Mechanistically Transparent Data-Driven Modeling of Drug Substances. Pharmaceutics 2022; 14:pharmaceutics14102248. [PMID: 36297685 PMCID: PMC9611068 DOI: 10.3390/pharmaceutics14102248] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/12/2022] [Accepted: 10/18/2022] [Indexed: 11/07/2022] Open
Abstract
Intrinsic aqueous solubility is a foundational property for understanding the chemical, technological, pharmaceutical, and environmental behavior of drug substances. Despite years of solubility research, molecular structure-based prediction of the intrinsic aqueous solubility of drug substances is still under active investigation. This paper describes the authors’ systematic data-driven modelling in which two fit-for-purpose training data sets for intrinsic aqueous solubility were collected and curated, and three quantitative structure–property relationships were derived to make predictions for the most recent solubility challenge. All three models perform well individually, while being mechanistically transparent and easy to understand. Molecular descriptors involved in the models are related to the following key steps in the solubility process: dissociation of the molecule from the crystal, formation of a cavity in the solvent, and insertion of the molecule into the solvent. A consensus modeling approach with these models remarkably improved prediction capability and reduced the number of strong outliers by more than two times. The performance and outliers of the second solubility challenge predictions were analyzed retrospectively. All developed models have been published in the QsarDB.org repository according to FAIR principles and can be used without restrictions for exploring, downloading, and making predictions.
Collapse
Affiliation(s)
| | | | | | - Uko Maran
- Correspondence: ; Tel.: +372-7-375-254; Fax: +372-7-375-264
| |
Collapse
|
3
|
Toots KM, Sild S, Leis J, Acree WE, Maran U. Machine Learning Quantitative Structure–Property Relationships as a Function of Ionic Liquid Cations for the Gas-Ionic Liquid Partition Coefficient of Hydrocarbons. Int J Mol Sci 2022; 23:ijms23147534. [PMID: 35886881 PMCID: PMC9323540 DOI: 10.3390/ijms23147534] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/27/2022] [Accepted: 06/30/2022] [Indexed: 02/01/2023] Open
Abstract
Ionic liquids (ILs) are known for their unique characteristics as solvents and electrolytes. Therefore, new ILs are being developed and adapted as innovative chemical environments for different applications in which their properties need to be understood on a molecular level. Computational data-driven methods provide means for understanding of properties at molecular level, and quantitative structure–property relationships (QSPRs) provide the framework for this. This framework is commonly used to study the properties of molecules in ILs as an environment. The opposite situation where the property is considered as a function of the ionic liquid does not exist. The aim of the present study was to supplement this perspective with new knowledge and to develop QSPRs that would allow the understanding of molecular interactions in ionic liquids based on the structure of the cationic moiety. A wide range of applications in electrochemistry, separation and extraction chemistry depends on the partitioning of solutes between the ionic liquid and the surrounding environment that is characterized by the gas-ionic liquid partition coefficient. To model this property as a function of the structure of a cationic counterpart, a series of ionic liquids was selected with a common bis-(trifluoromethylsulfonyl)-imide anion, [Tf2N]−, for benzene, hexane and cyclohexane. MLR, SVR and GPR machine learning approaches were used to derive data-driven models and their performance was compared. The cross-validation coefficients of determination in the range 0.71–0.93 along with other performance statistics indicated a strong accuracy of models for all data series and machine learning methods. The analysis and interpretation of descriptors revealed that generally higher lipophilicity and dispersion interaction capability, and lower polarity in the cations induces a higher partition coefficient for benzene, hexane, cyclohexane and hydrocarbons in general. The applicability domain analysis of models concluded that there were no highly influential outliers and the models are applicable to a wide selection of cation families with variable size, polarity and aliphatic or aromatic nature.
Collapse
Affiliation(s)
- Karl Marti Toots
- Department of Chemistry, University of Tartu, 14a Ravila Street, 50411 Tartu, Estonia; (K.M.T.); (S.S.); (J.L.)
| | - Sulev Sild
- Department of Chemistry, University of Tartu, 14a Ravila Street, 50411 Tartu, Estonia; (K.M.T.); (S.S.); (J.L.)
| | - Jaan Leis
- Department of Chemistry, University of Tartu, 14a Ravila Street, 50411 Tartu, Estonia; (K.M.T.); (S.S.); (J.L.)
| | - William E. Acree
- Department of Chemistry, University of North Texas, 1155 Union Circle Drive #305070, Denton, TX 76203, USA;
| | - Uko Maran
- Department of Chemistry, University of Tartu, 14a Ravila Street, 50411 Tartu, Estonia; (K.M.T.); (S.S.); (J.L.)
- Correspondence:
| |
Collapse
|
4
|
Toots KM, Sild S, Leis J, Acree Jr. WE, Maran U. The quantitative structure-property relationships for the gas-ionic liquid partition coefficient of a large variety of organic compounds in three ionic liquids. J Mol Liq 2021. [DOI: 10.1016/j.molliq.2021.117573] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
5
|
Piir G, Sild S, Maran U. Binary and multi-class classification for androgen receptor agonists, antagonists and binders. Chemosphere 2021; 262:128313. [PMID: 33182081 DOI: 10.1016/j.chemosphere.2020.128313] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 08/24/2020] [Accepted: 09/10/2020] [Indexed: 06/11/2023]
Abstract
Androgens and androgen receptor regulate a variety of biological effects in the human body. The impaired functioning of androgen receptor may have different adverse health effects from cancer to infertility. Therefore, it is important to determine whether new chemicals have any binding activity and act as androgen agonists or antagonists before commercial use. Due to the large number of chemicals that require experimental testing, the computational methods are a viable alternative. Therefore, the aim of the present study was to develop predictive QSAR models for classifying compounds according to their activity at the androgen receptor. A large data set of chemicals from the CoMPARA project was used for this purpose and random forest classification models have been developed for androgen binding, agonistic, and antagonistic activity. In addition, a unique effort has been made for multi-class approach that discriminates between inactive compounds, agonists and antagonists simultaneously. For the evaluation set, the classification models predicted agonists with 80% of accuracy and for the antagonists' and binders' the respective metrics were 72% and 78%. Combining agonists, antagonists and inactive compounds into a multi-class approach added complexity to the modelling task and resulted to 64% prediction accuracy for the evaluation set. Considering the size of the training data sets and their imbalance, the achieved evaluation accuracy is very good. The final classification models are available for exploring and predicting at QsarDB repository (https://doi.org/10.15152/QDB.236).
Collapse
Affiliation(s)
- Geven Piir
- University of Tartu, Institute of Chemistry, Ravila 14A, Tartu, 50411, Estonia
| | - Sulev Sild
- University of Tartu, Institute of Chemistry, Ravila 14A, Tartu, 50411, Estonia
| | - Uko Maran
- University of Tartu, Institute of Chemistry, Ravila 14A, Tartu, 50411, Estonia.
| |
Collapse
|
6
|
Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL, Andrade CH, Bai F, Balabin I, Ballabio D, Benfenati E, Bhhatarai B, Boyer S, Chen J, Consonni V, Farag S, Fourches D, García-Sosa AT, Gramatica P, Grisoni F, Grulke CM, Hong H, Horvath D, Hu X, Huang R, Jeliazkova N, Li J, Li X, Liu H, Manganelli S, Mangiatordi GF, Maran U, Marcou G, Martin T, Muratov E, Nguyen DT, Nicolotti O, Nikolov NG, Norinder U, Papa E, Petitjean M, Piir G, Pogodin P, Poroikov V, Qiao X, Richard AM, Roncaglioni A, Ruiz P, Rupakheti C, Sakkiah S, Sangion A, Schramm KW, Selvaraj C, Shah I, Sild S, Sun L, Taboureau O, Tang Y, Tetko IV, Todeschini R, Tong W, Trisciuzzi D, Tropsha A, Van Den Driessche G, Varnek A, Wang Z, Wedebye EB, Williams AJ, Xie H, Zakharov AV, Zheng Z, Judson RS. CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity. Environ Health Perspect 2020; 128:27002. [PMID: 32074470 DOI: 10.23645/epacomptox.5176876] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
BACKGROUND Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being addressed using high-throughput screening (HTS) in vitro approaches and computational modeling. OBJECTIVES In support of the Endocrine Disruptor Screening Program, the U.S. Environmental Protection Agency (EPA) led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities. Here, we describe the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts, which follows the steps of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP). METHODS The CoMPARA list of screened chemicals built on CERAPP's list of 32,464 chemicals to include additional chemicals of interest, as well as simulated ToxCast™ metabolites, totaling 55,450 chemical structures. Computational toxicology scientists from 25 international groups contributed 91 predictive models for binding, agonist, and antagonist activity predictions. Models were underpinned by a common training set of 1,746 chemicals compiled from a combined data set of 11 ToxCast™/Tox21 HTS in vitro assays. RESULTS The resulting models were evaluated using curated literature data extracted from different sources. To overcome the limitations of single-model approaches, CoMPARA predictions were combined into consensus models that provided averaged predictive accuracy of approximately 80% for the evaluation set. DISCUSSION The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of ∼875,000 chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.
Collapse
Affiliation(s)
- Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
- ScitoVation LLC, Research Triangle Park, North Carolina, USA
- Integrated Laboratory Systems, Inc., Morrisville, North Carolina, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM), National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Ahmed M Abdelaziz
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Domenico Alberga
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Vinicius M Alves
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Carolina H Andrade
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
| | - Fang Bai
- School of Pharmacy, Lanzhou University, China
| | - Ilya Balabin
- Information Systems & Global Solutions (IS&GS), Lockheed Martin, USA
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Emilio Benfenati
- Istituto di Ricerche Farmacologiche "Mario Negri", IRCCS, Milan, Italy
| | - Barun Bhhatarai
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Scott Boyer
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Jingwen Chen
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Sherif Farag
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Denis Fourches
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | | | - Paola Gramatica
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Francesca Grisoni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Chris M Grulke
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Dragos Horvath
- Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Xin Hu
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ruili Huang
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Jiazhong Li
- School of Pharmacy, Lanzhou University, China
| | - Xuehua Li
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | | | - Serena Manganelli
- Istituto di Ricerche Farmacologiche "Mario Negri", IRCCS, Milan, Italy
| | | | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Gilles Marcou
- Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Todd Martin
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
| | - Eugene Muratov
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Orazio Nicolotti
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Nikolai G Nikolov
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Ulf Norinder
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Ester Papa
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Michel Petitjean
- Computational Modeling of Protein-Ligand Interactions (CMPLI)-INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Geven Piir
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Pavel Pogodin
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Vladimir Poroikov
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Xianliang Qiao
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Ann M Richard
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | | | - Patricia Ruiz
- Computational Toxicology and Methods Development Laboratory, Division of Toxicology and Human Health Sciences, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Chetan Rupakheti
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
- Department of Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, USA
| | - Sugunadevi Sakkiah
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Alessandro Sangion
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Karl-Werner Schramm
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Chandrabose Selvaraj
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Imran Shah
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Lixia Sun
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Olivier Taboureau
- Computational Modeling of Protein-Ligand Interactions (CMPLI)-INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Yun Tang
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Igor V Tetko
- BIGCHEM GmbH, Neuherberg, Germany
- Helmholtz Zentrum Muenchen - German Research Center for Environmental Health (GmbH), Neuherberg, Germany
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | | | - Alexander Tropsha
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - George Van Den Driessche
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Zhongyu Wang
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Eva B Wedebye
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Antony J Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Hongbin Xie
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Alexey V Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ziye Zheng
- Chemistry Department, Umeå University, Umeå, Sweden
| | - Richard S Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| |
Collapse
|
7
|
Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL, Andrade CH, Bai F, Balabin I, Ballabio D, Benfenati E, Bhhatarai B, Boyer S, Chen J, Consonni V, Farag S, Fourches D, García-Sosa AT, Gramatica P, Grisoni F, Grulke CM, Hong H, Horvath D, Hu X, Huang R, Jeliazkova N, Li J, Li X, Liu H, Manganelli S, Mangiatordi GF, Maran U, Marcou G, Martin T, Muratov E, Nguyen DT, Nicolotti O, Nikolov NG, Norinder U, Papa E, Petitjean M, Piir G, Pogodin P, Poroikov V, Qiao X, Richard AM, Roncaglioni A, Ruiz P, Rupakheti C, Sakkiah S, Sangion A, Schramm KW, Selvaraj C, Shah I, Sild S, Sun L, Taboureau O, Tang Y, Tetko IV, Todeschini R, Tong W, Trisciuzzi D, Tropsha A, Van Den Driessche G, Varnek A, Wang Z, Wedebye EB, Williams AJ, Xie H, Zakharov AV, Zheng Z, Judson RS. CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity. Environ Health Perspect 2020; 128:27002. [PMID: 32074470 PMCID: PMC7064318 DOI: 10.1289/ehp5580] [Citation(s) in RCA: 92] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 11/27/2019] [Accepted: 12/05/2019] [Indexed: 05/04/2023]
Abstract
BACKGROUND Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being addressed using high-throughput screening (HTS) in vitro approaches and computational modeling. OBJECTIVES In support of the Endocrine Disruptor Screening Program, the U.S. Environmental Protection Agency (EPA) led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities. Here, we describe the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts, which follows the steps of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP). METHODS The CoMPARA list of screened chemicals built on CERAPP's list of 32,464 chemicals to include additional chemicals of interest, as well as simulated ToxCast™ metabolites, totaling 55,450 chemical structures. Computational toxicology scientists from 25 international groups contributed 91 predictive models for binding, agonist, and antagonist activity predictions. Models were underpinned by a common training set of 1,746 chemicals compiled from a combined data set of 11 ToxCast™/Tox21 HTS in vitro assays. RESULTS The resulting models were evaluated using curated literature data extracted from different sources. To overcome the limitations of single-model approaches, CoMPARA predictions were combined into consensus models that provided averaged predictive accuracy of approximately 80% for the evaluation set. DISCUSSION The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of ∼ 875,000 chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.
Collapse
Affiliation(s)
- Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
- ScitoVation LLC, Research Triangle Park, North Carolina, USA
- Integrated Laboratory Systems, Inc., Morrisville, North Carolina, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM), National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Ahmed M. Abdelaziz
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Domenico Alberga
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Vinicius M. Alves
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Carolina H. Andrade
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
| | - Fang Bai
- School of Pharmacy, Lanzhou University, China
| | - Ilya Balabin
- Information Systems & Global Solutions (IS&GS), Lockheed Martin, USA
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Emilio Benfenati
- Istituto di Ricerche Farmacologiche “Mario Negri”, IRCCS, Milan, Italy
| | - Barun Bhhatarai
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Scott Boyer
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Jingwen Chen
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Sherif Farag
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Denis Fourches
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | | | - Paola Gramatica
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Francesca Grisoni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Chris M. Grulke
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Dragos Horvath
- Laboratoire de Chémoinformatique—UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Xin Hu
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ruili Huang
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Jiazhong Li
- School of Pharmacy, Lanzhou University, China
| | - Xuehua Li
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | | | - Serena Manganelli
- Istituto di Ricerche Farmacologiche “Mario Negri”, IRCCS, Milan, Italy
| | | | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Gilles Marcou
- Laboratoire de Chémoinformatique—UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Todd Martin
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
| | - Eugene Muratov
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Orazio Nicolotti
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Nikolai G. Nikolov
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Ulf Norinder
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Ester Papa
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Michel Petitjean
- Computational Modeling of Protein-Ligand Interactions (CMPLI)–INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Geven Piir
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Pavel Pogodin
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Vladimir Poroikov
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Xianliang Qiao
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Ann M. Richard
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | | | - Patricia Ruiz
- Computational Toxicology and Methods Development Laboratory, Division of Toxicology and Human Health Sciences, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Chetan Rupakheti
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
- Department of Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, USA
| | - Sugunadevi Sakkiah
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Alessandro Sangion
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Karl-Werner Schramm
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Chandrabose Selvaraj
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Imran Shah
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Lixia Sun
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Olivier Taboureau
- Computational Modeling of Protein-Ligand Interactions (CMPLI)–INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Yun Tang
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Igor V. Tetko
- BIGCHEM GmbH, Neuherberg, Germany
- Helmholtz Zentrum Muenchen – German Research Center for Environmental Health (GmbH), Neuherberg, Germany
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | | | - Alexander Tropsha
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - George Van Den Driessche
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique—UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Zhongyu Wang
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Eva B. Wedebye
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Antony J. Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Hongbin Xie
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Alexey V. Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ziye Zheng
- Chemistry Department, Umeå University, Umeå, Sweden
| | - Richard S. Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| |
Collapse
|
8
|
Oja M, Sild S, Maran U. Logistic Classification Models for pH–Permeability Profile: Predicting Permeability Classes for the Biopharmaceutical Classification System. J Chem Inf Model 2019; 59:2442-2455. [DOI: 10.1021/acs.jcim.8b00833] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Mare Oja
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu 50411, Estonia
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu 50411, Estonia
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu 50411, Estonia
| |
Collapse
|
9
|
Sild S, Piir G, Neagu D, Maran U. CHAPTER 6. Storing and Using Qualitative and Quantitative Structure–Activity Relationships in the Era of Toxicological and Chemical Data Expansion. Issues in Toxicology 2019. [DOI: 10.1039/9781782623656-00185] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
10
|
Piir G, Kahn I, García-Sosa AT, Sild S, Ahte P, Maran U. Best Practices for QSAR Model Reporting: Physical and Chemical Properties, Ecotoxicity, Environmental Fate, Human Health, and Toxicokinetics Endpoints. Environ Health Perspect 2018; 126:126001. [PMID: 30561225 PMCID: PMC6371683 DOI: 10.1289/ehp3264] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Revised: 10/19/2018] [Accepted: 11/07/2018] [Indexed: 05/31/2023]
Abstract
BACKGROUND Quantitative and qualitative structure–activity relationships (QSARs) have been used to understand chemical behavior for almost a century. The main source of QSAR models is the scientific literature, but the open question is how well these models are documented. OBJECTIVES The main aim of this study was to critically analyze the publication practices of QSARs with regard to transparency, potential reproducibility, and independent verification. The focus was on the level of technical completeness of the published QSARs. METHODS A total of 1,533 QSAR articles reporting 79 individual endpoints, mostly in environmental and health science, were reviewed. The QSAR parameters required for technical completeness were grouped into five categories: chemical structures, experimental endpoint values, descriptor values, mathematical representation of the model, and predicted endpoint values. The data were summarized and discussed using Circos plots. RESULTS Altogether, 42.5% of the reviewed articles were found to be potentially reproducible. The potential reproducibility for different endpoint groups varied; the respective rates were 39% for physical and chemical properties, 52% for ecotoxicity, 56% for environmental fate, 30% for human health, and 32% for toxicokinetics. The reproducibility of QSARs is discussed and placed in the context of the reproducibility of the experimental methods. Included are 65 references to open QSAR datasets as examples of models restored from scientific articles. DISCUSSION Strikingly poor documentation of QSARs was observed, which reduces the transparency, availability, and consequently, the application of research results in scientific, industrial, and regulatory areas. A list of the components needed to ensure the best practices for QSAR reporting is provided, allowing long-term use and preservation of the models. This list also allows an assessment of the reproducibility of models by interested parties such as journal editors, reviewers, regulators, evaluators, and potential users. https://doi.org/10.1289/EHP3264.
Collapse
Affiliation(s)
- Geven Piir
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Iiris Kahn
- Department of Chemistry and Biotechnology, Tallinn University of Technology, Tallinn, Estonia
| | | | - Sulev Sild
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Priit Ahte
- Department of Chemistry and Biotechnology, Tallinn University of Technology, Tallinn, Estonia
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| |
Collapse
|
11
|
Ruusmann V, Sild S, Maran U. QSAR DataBank repository: open and linked qualitative and quantitative structure-activity relationship models. J Cheminform 2015; 7:32. [PMID: 26110025 PMCID: PMC4479250 DOI: 10.1186/s13321-015-0082-6] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Accepted: 06/08/2015] [Indexed: 11/29/2022] Open
Abstract
Background Structure–activity relationship models have been used to gain insight into chemical and physical processes in biomedicine, toxicology, biotechnology, etc. for almost a century. They have been recognized as valuable tools in decision support workflows for qualitative and quantitative predictions. The main obstacle preventing broader adoption of quantitative structure–activity relationships [(Q)SARs] is that published models are still relatively difficult to discover, retrieve and redeploy in a modern computer-oriented environment. This publication describes a digital repository that makes in silico (Q)SAR-type descriptive and predictive models archivable, citable and usable in a novel way for most common research and applied science purposes. Description The QSAR DataBank (QsarDB) repository aims to make the processes and outcomes of in silico modelling work transparent, reproducible and accessible. Briefly, the models are represented in the QsarDB data format and stored in a content-aware repository (a.k.a. smart repository). Content awareness has two dimensions. First, models are organized into collections and then into collection hierarchies based on their metadata. Second, the repository is not only an environment for browsing and downloading models (the QDB archive) but also offers integrated services, such as model analysis and visualization and prediction making. Conclusions The QsarDB repository unlocks the potential of descriptive and predictive in silico (Q)SAR-type models by allowing new and different types of collaboration between model developers and model users. The key enabling factor is the representation of (Q)SAR models in the QsarDB data format, which makes it easy to preserve and share all relevant data, information and knowledge. Model developers can become more productive by effectively reusing prior art. Model users can make more confident decisions by relying on supporting information that is larger and more diverse than before. Furthermore, the smart repository automates most of the mundane work (e.g., collecting, systematizing, and reporting data), thereby reducing the time to decision. Graphical abstract ![]()
Collapse
Affiliation(s)
- V Ruusmann
- Institute of Chemistry, University of Tartu, Ravila 14a, 50411 Tartu, Estonia
| | - S Sild
- Institute of Chemistry, University of Tartu, Ravila 14a, 50411 Tartu, Estonia
| | - U Maran
- Institute of Chemistry, University of Tartu, Ravila 14a, 50411 Tartu, Estonia
| |
Collapse
|
12
|
Takkis K, García-Sosa AT, Sild S. Virtual Screening for HIV Protease Inhibitors Using a Novel Database Filtering Procedure. Mol Inform 2015; 34:485-92. [PMID: 27490392 DOI: 10.1002/minf.201400170] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Accepted: 05/06/2015] [Indexed: 11/06/2022]
Abstract
A virtual screening to find novel inhibitors for HIV protease was performed on the ZINC database.1 A critical part in virtual screening and associated techniques is preliminary database filtering and size reduction and for that purpose a novel feature matrix matching procedure was used. The reduction of ∼14 million available ligands to a subset of 14299 ligands was achieved with a structure based approach where the analysis of the 3D structure of the active site of the protease produced a graph with hydrogen bond donor, hydrogen bond acceptor and hydrophobic subsites represented as graph nodes. A similar treatment was also applied to the compound database content and the comparison of binding site and ligand graphs was used to preselect potentially active ligands. The resulting set was further subjected to docking. The algorithm used was able to find several novel as well as previously known and experimentally tested ligands, demonstrating the validity of the approach.
Collapse
Affiliation(s)
- Kalev Takkis
- Institute of Chemistry, University of Tartu, Ravila 14a, Tartu, 50411, Estonia
| | | | - Sulev Sild
- Institute of Chemistry, University of Tartu, Ravila 14a, Tartu, 50411, Estonia.
| |
Collapse
|
13
|
Piir G, Sild S, Maran U. Classifying bio-concentration factor with random forest algorithm, influence of the bio-accumulative vs. non-bio-accumulative compound ratio to modelling result, and applicability domain for random forest model. SAR QSAR Environ Res 2014; 25:967-81. [PMID: 25482723 DOI: 10.1080/1062936x.2014.969310] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Accepted: 08/03/2014] [Indexed: 05/27/2023]
Abstract
In environmental risk assessment, the bio-concentration factor (BCF) is a widely used parameter in the estimation of the bio-accumulation potential of chemicals. BCF data often have an uneven distribution of classes (bio-accumulative vs. non-bio-accumulative), which could severely bias the classification results towards the prevailing class. The present study focuses on the influence of uneven distribution of the classes in training phase of Random Forest (RF) classification models. Three different training set designs were used and descriptors selected to the models based on the occurrence frequency in RF trees and considering the mechanistic aspects they reflect. Models were compared and their classification performance was analysed, indicating good predictive characteristics (sensitivity = 0.90 and specificity = 0.83) for the balanced set; also imbalanced sets have their strengths in certain application scenarios. The confidence of classifications was assessed with a new schema for the applicability domain that makes use of the RF proximity matrix by analysing the similarity between the predicted compound and the training set of the model. All developed models were made available in the transparent, accessible and reproducible way in QsarDB repository (http://dx.doi.org/10.15152/QDB.116).
Collapse
Affiliation(s)
- G Piir
- a Institute of Chemistry , University of Tartu , Tartu , Estonia
| | | | | |
Collapse
|
14
|
Ruusmann V, Sild S, Maran U. QSAR DataBank - an approach for the digital organization and archiving of QSAR model information. J Cheminform 2014; 6:25. [PMID: 24910716 PMCID: PMC4047268 DOI: 10.1186/1758-2946-6-25] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2013] [Accepted: 03/11/2014] [Indexed: 12/02/2022] Open
Abstract
Background Research efforts in the field of descriptive and predictive Quantitative Structure-Activity Relationships or Quantitative Structure–Property Relationships produce around one thousand scientific publications annually. All the materials and results are mainly communicated using printed media. The printed media in its present form have obvious limitations when they come to effectively representing mathematical models, including complex and non-linear, and large bodies of associated numerical chemical data. It is not supportive of secondary information extraction or reuse efforts while in silico studies poses additional requirements for accessibility, transparency and reproducibility of the research. This gap can and should be bridged by introducing domain-specific digital data exchange standards and tools. The current publication presents a formal specification of the quantitative structure-activity relationship data organization and archival format called the QSAR DataBank (QsarDB for shorter, or QDB for shortest). Results The article describes QsarDB data schema, which formalizes QSAR concepts (objects and relationships between them) and QsarDB data format, which formalizes their presentation for computer systems. The utility and benefits of QsarDB have been thoroughly tested by solving everyday QSAR and predictive modeling problems, with examples in the field of predictive toxicology, and can be applied for a wide variety of other endpoints. The work is accompanied with open source reference implementation and tools. Conclusions The proposed open data, open source, and open standards design is open to public and proprietary extensions on many levels. Selected use cases exemplify the benefits of the proposed QsarDB data format. General ideas for future development are discussed.
Collapse
Affiliation(s)
- Villu Ruusmann
- Institute of Chemistry, University of Tartu, Ravila 14a, Tartu 50411, Estonia
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, Ravila 14a, Tartu 50411, Estonia
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Ravila 14a, Tartu 50411, Estonia
| |
Collapse
|
15
|
Piir G, Sild S, Maran U. Comparative analysis of local and consensus quantitative structure-activity relationship approaches for the prediction of bioconcentration factor. SAR QSAR Environ Res 2013; 24:175-199. [PMID: 23410132 DOI: 10.1080/1062936x.2012.762426] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Quantitative structure-activity relationships (QSARs) are broadly classified as global or local, depending on their molecular constitution. Global models use large and diverse training sets covering a wide range of chemical space. Local models focus on smaller structurally or chemically similar subsets that are conventionally selected by human experts or alternatively using clustering analysis. The current study focuses on the comparative analysis of different clustering algorithms (expectation-maximization, K-means and hierarchical) for seven different descriptor sets as structural characteristics and two rule-based approaches to select subsets for designing local QSAR models. A total of 111 local QSAR models are developed for predicting bioconcentration factor. Predictions from local models were compared with corresponding predictions from the global model. The comparison of coefficients of determination (r(2)) and standard deviations for local models with similar subsets from the global model show improved prediction quality in 97% of cases. The descriptor content of derived QSARs is discussed and analyzed. Local QSAR models were further consolidated within the framework of consensus approach. All different consensus approaches increased performance over the global and local models. The consensus approach reduced the number of strongly deviating predictions by evening out prediction errors, which were produced by some local QSARs.
Collapse
Affiliation(s)
- G Piir
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | | | | |
Collapse
|
16
|
García-Sosa AT, Sild S, Takkis K, Maran U. Combined Approach Using Ligand Efficiency, Cross-Docking, and Antitarget Hits for Wild-Type and Drug-Resistant Y181C HIV-1 Reverse Transcriptase. J Chem Inf Model 2011; 51:2595-611. [DOI: 10.1021/ci200203h] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, Ravila 14a, Tartu 50411, Estonia
| | - Kalev Takkis
- Institute of Chemistry, University of Tartu, Ravila 14a, Tartu 50411, Estonia
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Ravila 14a, Tartu 50411, Estonia
| |
Collapse
|
17
|
Piir G, Sild S, Roncaglioni A, Benfenati E, Maran U. QSAR model for the prediction of bio-concentration factor using aqueous solubility and descriptors considering various electronic effects. SAR QSAR Environ Res 2010; 21:711-729. [PMID: 21120758 DOI: 10.1080/1062936x.2010.528596] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The in silico modelling of bio-concentration factor (BCF) is of considerable interest in environmental sciences, because it is an accepted indicator for the accumulation potential of chemicals in organisms. Numerous QSAR models have been developed for the BCF, and the majority utilize the octanol/water partition coefficient (log P) to account for the penetration characteristics of the chemicals. The present work used descriptors from a variety of software packages for the development of a multi-linear regression model to estimate BCF. The modelled data set of 473 diverse compounds covers a wide range of log BCF values. In the proposed QSAR model, most of the variation is described by the calculated solubility in water. Other contributing descriptors describe, for instance, hydrophobic surface area, hydrogen bonding and other electronic effects. The model was validated internally by using a variety of statistical approaches. Two external validations were also performed. For the former validation, a subset from the same data source was used. The 2nd external validation was based on an independent data set collected from different resources. All validations showed the consistency of the model. The applicability domain of the model was discussed and described and a thorough outlier analysis was performed.
Collapse
Affiliation(s)
- G Piir
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | | | | | | | | |
Collapse
|
18
|
Maran U, Sild S, Tulp I, Takkis K, Moosus M. Chapter 6. Molecular Descriptors from Two-Dimensional Chemical Structure. In Silico Toxicology 2010. [DOI: 10.1039/9781849732093-00148] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
19
|
Tulp I, Sild S, Maran U. Relationship Between Structure and Permeability in Artificial Membranes: Theoretical Whole Molecule Descriptors in Development of QSAR Models. ACTA ACUST UNITED AC 2009. [DOI: 10.1002/qsar.200860160] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
20
|
|
21
|
|
22
|
Kipper K, Hetényi C, Sild S, Remme J, Liiv A. Ribosomal Intersubunit Bridge B2a Is Involved in Factor-Dependent Translation Initiation and Translational Processivity. J Mol Biol 2009; 385:405-22. [DOI: 10.1016/j.jmb.2008.10.065] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2008] [Revised: 10/14/2008] [Accepted: 10/15/2008] [Indexed: 10/21/2022]
|
23
|
|
24
|
García-Sosa AT, Sild S, Maran U. Design of multi-binding-site inhibitors, ligand efficiency, and consensus screening of avian influenza H5N1 wild-type neuraminidase and of the oseltamivir-resistant H274Y variant. J Chem Inf Model 2008; 48:2074-80. [PMID: 18847186 DOI: 10.1021/ci800242z] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The binding sites of wild-type avian influenza A H5N1 neuraminidase, as well as those of the Tamiflu (oseltamivir)-resistant H274Y variant, were explored computationally to design inhibitors that target simultaneously several adjacent binding sites of the open conformation of the virus protein. The compounds with the best computed free energies of binding, in agreement by two docking methods, consensus scoring, and ligand efficiency values, suggest that mimicking a polysaccharide, beta-lactam, and other structures, including known drugs, could be routes for multibinding site inhibitor design. This new virtual screening method based on consensus scoring and ligand efficiency indices is introduced, which allows the combination of pharmacodynamic and pharmacokinetic properties into unique measures.
Collapse
|
25
|
Schuller B, Demuth B, Mix H, Rasch K, Romberg M, Sild S, Maran U, Bała P, del Grosso E, Casalegno M, Piclin N, Pintore M, Sudholt W, Baldridge KK. Chemomentum - UNICORE 6 Based Infrastructure for Complex Applications in Science and Technology. ACTA ACUST UNITED AC 2008. [DOI: 10.1007/978-3-540-78474-6_12] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2023]
|
26
|
Kahn I, Sild S, Maran U. Modeling the Toxicity of Chemicals to Tetrahymena pyriformis Using Heuristic Multilinear Regression and Heuristic Back-Propagation Neural Networks. J Chem Inf Model 2007; 47:2271-9. [DOI: 10.1021/ci700231c] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Affiliation(s)
- Iiris Kahn
- Institute of Chemistry, University of Tartu, 2 Jakobi Str., Tartu 51014, Estonia
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, 2 Jakobi Str., Tartu 51014, Estonia
| | - Uko Maran
- Institute of Chemistry, University of Tartu, 2 Jakobi Str., Tartu 51014, Estonia
| |
Collapse
|
27
|
Abstract
Solubility of polyaromatic hydrocarbons (PAH) and carbon nanostructures is important both from the technical and environmental points of view. In the present work, two general quantitative structure-property relationship (QSPR) models were developed, describing the solubility of PAH-s and fullerene (C60) in two different condensed media (1-octanol and n-heptane). Statistically good QSPR models were obtained by using forward selection techniques from large space of theoretical molecular descriptors. The physical meaning of the models is discussed and analyzed.
Collapse
Affiliation(s)
- Dana Martin
- Department of Chemistry, University of Tartu, 2 Jakobi Street, Tartu 51014, Estonia
| | | | | | | |
Collapse
|
28
|
|
29
|
Abstract
Grid is an emerging infrastructure for distributed computing that provides secure and scalable mechanisms for discovering and accessing remote software and data resources. Applications built on this infrastructure have great potential for addressing and solving large scale chemical, pharmaceutical, and material science problems. The article describes the concept behind grid computing and will present the OpenMolGRID system that is an open computing grid for molecular science and engineering. This system provides grid enabled components, such as a data warehouse for chemical data, software for building QSPR/QSAR models, and molecular engineering tools for generating compounds with predefined chemical properties or biological activities. The article also provides an overview about the availability of chemical applications in the grid.
Collapse
Affiliation(s)
- Sulev Sild
- Department of Chemistry, University of Tartu, Tartu, Jakobi 2, 51014 Estonia.
| | | | | | | |
Collapse
|
30
|
Abstract
Multilinear regression and neural network methods have been used to develop QSPR models for the prediction of the dielectric constant (epsilon) and Kirkwood function (epsilon - 1)/(2epsilon + 1) of organic liquids. Both methods can provide acceptable models for the prediction of these properties. The QSPR models developed from the training set of 155 diverse compounds use theoretical molecular descriptors encoding electronic properties of the molecule and the intermolecular interaction between molecules. The QSPR models for the Kirkwood function appear to be more reliable than the models for the dielectric constant. The average prediction error of the best model for the dielectric constant is 27.0%. The average prediction error of the best model for the Kirkwood function is 4.1%.
Collapse
Affiliation(s)
- Sulev Sild
- Institute of Chemical Physics, University of Tartu, Jakobi Street 2, Tartu, 51014, Estonia
| | | |
Collapse
|
31
|
Affiliation(s)
- Alan R. Katritzky
- Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, P.O. Box 117200, Gainesville, Florida 32611-7200, and Department of Chemistry, University of Tartu, 2 Jacobi Str., Tartu 51014, Estonia
| | - Tarmo Tamm
- Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, P.O. Box 117200, Gainesville, Florida 32611-7200, and Department of Chemistry, University of Tartu, 2 Jacobi Str., Tartu 51014, Estonia
| | - Yilin Wang
- Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, P.O. Box 117200, Gainesville, Florida 32611-7200, and Department of Chemistry, University of Tartu, 2 Jacobi Str., Tartu 51014, Estonia
| | - Sulev Sild
- Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, P.O. Box 117200, Gainesville, Florida 32611-7200, and Department of Chemistry, University of Tartu, 2 Jacobi Str., Tartu 51014, Estonia
| | - Mati Karelson
- Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, P.O. Box 117200, Gainesville, Florida 32611-7200, and Department of Chemistry, University of Tartu, 2 Jacobi Str., Tartu 51014, Estonia
| |
Collapse
|
32
|
Lučić B, Trinajstić N, Sild S, Karelson M, Katritzky AR. A New Efficient Approach for Variable Selection Based on Multiregression: Prediction of Gas Chromatographic Retention Times and Response Factors. ACTA ACUST UNITED AC 1999. [DOI: 10.1021/ci980161a] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
33
|
Affiliation(s)
- Alan R. Katritzky
- Center for Heterocyclic Compounds, University of Florida, Gainesville, Florida 32611-7200, and Institute of Chemical Physics, University of Tartu, 2 Jakobi Street, EE 2400 Tartu, Estonia
| | - Sulev Sild
- Center for Heterocyclic Compounds, University of Florida, Gainesville, Florida 32611-7200, and Institute of Chemical Physics, University of Tartu, 2 Jakobi Street, EE 2400 Tartu, Estonia
| | - Mati Karelson
- Center for Heterocyclic Compounds, University of Florida, Gainesville, Florida 32611-7200, and Institute of Chemical Physics, University of Tartu, 2 Jakobi Street, EE 2400 Tartu, Estonia
| |
Collapse
|
34
|
Katritzky AR, Sild S, Karelson M. General Quantitative Structure−Property Relationship Treatment of the Refractive Index of Organic Compounds. ACTA ACUST UNITED AC 1998. [DOI: 10.1021/ci980028i] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Alan R. Katritzky
- Center for Heterocyclic Compounds, University of Florida, Gainesville, Florida 32611-7200, and Institute of Chemical Physics, University of Tartu, 2 Jakobi Street, EE 2400 Tartu, Estonia
| | - Sulev Sild
- Center for Heterocyclic Compounds, University of Florida, Gainesville, Florida 32611-7200, and Institute of Chemical Physics, University of Tartu, 2 Jakobi Street, EE 2400 Tartu, Estonia
| | - Mati Karelson
- Center for Heterocyclic Compounds, University of Florida, Gainesville, Florida 32611-7200, and Institute of Chemical Physics, University of Tartu, 2 Jakobi Street, EE 2400 Tartu, Estonia
| |
Collapse
|
35
|
Katritzky AR, Wang Y, Sild S, Tamm T, Karelson M. QSPR Studies on Vapor Pressure, Aqueous Solubility, and the Prediction of Water−Air Partition Coefficients. ACTA ACUST UNITED AC 1998. [DOI: 10.1021/ci980022t] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Alan R. Katritzky
- Center of Heterocyclic Compounds, Department of Chemistry, University of Florida, Gainesville, Florida 32611-7200
| | - Yilin Wang
- Center of Heterocyclic Compounds, Department of Chemistry, University of Florida, Gainesville, Florida 32611-7200
| | - Sulev Sild
- Center of Heterocyclic Compounds, Department of Chemistry, University of Florida, Gainesville, Florida 32611-7200
| | - Tarmo Tamm
- Center of Heterocyclic Compounds, Department of Chemistry, University of Florida, Gainesville, Florida 32611-7200
| | - Mati Karelson
- Department of Chemistry, University of Tartu, 2 Jakobi Str., Tartu EE 2400, Estonia
| |
Collapse
|
36
|
Katritzky AR, Karelson M, Sild S, Krygowski TM, Jug K. Aromaticity as a Quantitative Concept. 7. Aromaticity Reaffirmed as a Multidimensional Characteristic. J Org Chem 1998. [DOI: 10.1021/jo970939b] [Citation(s) in RCA: 245] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Alan R. Katritzky
- Center for Heterocyclic Compounds, University of Florida, Gainesville, Florida 32611-7200, Institute of Chemical Physics, University of Tartu, 2 Jakobi Street, EE2400 Tartu, Estonia, Department of Chemistry, University of Warsaw, ul. L. Pasteura 1, 02093 Warszawa, Poland, and Theoretische Chemie, Universität Hannover, D-30167 Hannover, Am Kleinen Felde 30, Germany
| | - Mati Karelson
- Center for Heterocyclic Compounds, University of Florida, Gainesville, Florida 32611-7200, Institute of Chemical Physics, University of Tartu, 2 Jakobi Street, EE2400 Tartu, Estonia, Department of Chemistry, University of Warsaw, ul. L. Pasteura 1, 02093 Warszawa, Poland, and Theoretische Chemie, Universität Hannover, D-30167 Hannover, Am Kleinen Felde 30, Germany
| | - Sulev Sild
- Center for Heterocyclic Compounds, University of Florida, Gainesville, Florida 32611-7200, Institute of Chemical Physics, University of Tartu, 2 Jakobi Street, EE2400 Tartu, Estonia, Department of Chemistry, University of Warsaw, ul. L. Pasteura 1, 02093 Warszawa, Poland, and Theoretische Chemie, Universität Hannover, D-30167 Hannover, Am Kleinen Felde 30, Germany
| | - T. Marek Krygowski
- Center for Heterocyclic Compounds, University of Florida, Gainesville, Florida 32611-7200, Institute of Chemical Physics, University of Tartu, 2 Jakobi Street, EE2400 Tartu, Estonia, Department of Chemistry, University of Warsaw, ul. L. Pasteura 1, 02093 Warszawa, Poland, and Theoretische Chemie, Universität Hannover, D-30167 Hannover, Am Kleinen Felde 30, Germany
| | - Karl Jug
- Center for Heterocyclic Compounds, University of Florida, Gainesville, Florida 32611-7200, Institute of Chemical Physics, University of Tartu, 2 Jakobi Street, EE2400 Tartu, Estonia, Department of Chemistry, University of Warsaw, ul. L. Pasteura 1, 02093 Warszawa, Poland, and Theoretische Chemie, Universität Hannover, D-30167 Hannover, Am Kleinen Felde 30, Germany
| |
Collapse
|
37
|
Katritzky AR, Sild S, Lobanov V, Karelson M. Quantitative Structure−Property Relationship (QSPR) Correlation of Glass Transition Temperatures of High Molecular Weight Polymers. ACTA ACUST UNITED AC 1998. [DOI: 10.1021/ci9700687] [Citation(s) in RCA: 99] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Alan R. Katritzky
- Center for Heterocyclic Compounds, University of Florida, Gainesville, Florida 32611-7200, and Institute of Chemical Physics, University of Tartu, 2 Jakobi Street, EE2400 Tartu, Estonia
| | - Sulev Sild
- Center for Heterocyclic Compounds, University of Florida, Gainesville, Florida 32611-7200, and Institute of Chemical Physics, University of Tartu, 2 Jakobi Street, EE2400 Tartu, Estonia
| | - Victor Lobanov
- Center for Heterocyclic Compounds, University of Florida, Gainesville, Florida 32611-7200, and Institute of Chemical Physics, University of Tartu, 2 Jakobi Street, EE2400 Tartu, Estonia
| | - Mati Karelson
- Center for Heterocyclic Compounds, University of Florida, Gainesville, Florida 32611-7200, and Institute of Chemical Physics, University of Tartu, 2 Jakobi Street, EE2400 Tartu, Estonia
| |
Collapse
|