1
|
Zhai S, Tan Y, Zhu C, Zhang C, Gao Y, Mao Q, Zhang Y, Duan H, Yin Y. PepExplainer: An explainable deep learning model for selection-based macrocyclic peptide bioactivity prediction and optimization. Eur J Med Chem 2024; 275:116628. [PMID: 38944933 DOI: 10.1016/j.ejmech.2024.116628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 06/21/2024] [Accepted: 06/24/2024] [Indexed: 07/02/2024]
Abstract
Macrocyclic peptides possess unique features, making them highly promising as a drug modality. However, evaluating their bioactivity through wet lab experiments is generally resource-intensive and time-consuming. Despite advancements in artificial intelligence (AI) for bioactivity prediction, challenges remain due to limited data availability and the interpretability issues in deep learning models, often leading to less-than-ideal predictions. To address these challenges, we developed PepExplainer, an explainable graph neural network based on substructure mask explanation (SME). This model excels at deciphering amino acid substructures, translating macrocyclic peptides into detailed molecular graphs at the atomic level, and efficiently handling non-canonical amino acids and complex macrocyclic peptide structures. PepExplainer's effectiveness is enhanced by utilizing the correlation between peptide enrichment data from selection-based focused library and bioactivity data, and employing transfer learning to improve bioactivity predictions of macrocyclic peptides against IL-17C/IL-17 RE interaction. Additionally, PepExplainer underwent further validation for bioactivity prediction using an additional set of thirteen newly synthesized macrocyclic peptides. Moreover, it enabled the optimization of the IC50 of a macrocyclic peptide, reducing it from 15 nM to 5.6 nM based on the contribution score provided by PepExplainer. This achievement underscores PepExplainer's skill in deciphering complex molecular patterns, highlighting its potential to accelerate the discovery and optimization of macrocyclic peptides.
Collapse
Affiliation(s)
- Silong Zhai
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Yahong Tan
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Cheng Zhu
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Chengyun Zhang
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Yan Gao
- Qilu Institute of Technology, Jinan, 250200, China
| | - Qingyi Mao
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Youming Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China.
| | - Yizhen Yin
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China; Shandong Research Institute of Industrial Technology, Jinan, 250101, China.
| |
Collapse
|
2
|
Yang Z, Wang L, Yang Y, Pang X, Sun Y, Liang Y, Cao H. Screening of the Antagonistic Activity of Potential Bisphenol A Alternatives toward the Androgen Receptor Using Machine Learning and Molecular Dynamics Simulation. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:2817-2829. [PMID: 38291630 DOI: 10.1021/acs.est.3c09779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Over the past few decades, extensive research has indicated that exposure to bisphenol A (BPA) increases the health risks in humans. Toxicological studies have demonstrated that BPA can bind to the androgen receptor (AR), resulting in endocrine-disrupting effects. In recent investigations, many alternatives to BPA have been detected in various environmental media as major pollutants. However, related experimental evaluations of BPA alternatives have not been systematically implemented for the assessment of chemical safety and the effects of structural characteristics on the antagonistic activity of the AR. To promote the green development of BPA alternatives, high-throughput toxicological screening is fundamental for prioritizing chemical tests. Therefore, we proposed a hybrid deep learning architecture that combines molecular descriptors and molecular graphs to predict AR antagonistic activity. Compared to previous models, this hybrid architecture can extract substantial chemical information from various molecular representations to improve the model's generalization ability for BPA alternatives. Our predictions suggest that lignin-derivable bisguaiacols, as alternatives to BPA, are likely to be nonantagonist for AR compared to bisphenol analogues. Additionally, molecular dynamics (MD) simulations identified the dihydrotestosterone-bound pocket, rather than the surface, as the major binding site of bisphenol analogues. The conformational changes of key helix H12 from an agonistic to an antagonistic conformation can be evaluated qualitatively by accelerated MD simulations to explain the underlying mechanism. Overall, our computational study is helpful for toxicological screening of BPA alternatives and the design of environmentally friendly BPA alternatives.
Collapse
Affiliation(s)
- Zeguo Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ling Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ying Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Xudi Pang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yuzhen Sun
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yong Liang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Huiming Cao
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| |
Collapse
|
3
|
Collins SP, Mailloux B, Kulkarni S, Gagné M, Long AS, Barton-Maclaren TS. Development and application of consensus in silico models for advancing high-throughput toxicological predictions. Front Pharmacol 2024; 15:1307905. [PMID: 38333007 PMCID: PMC10850302 DOI: 10.3389/fphar.2024.1307905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 01/02/2024] [Indexed: 02/10/2024] Open
Abstract
Computational toxicology models have been successfully implemented to prioritize and screen chemicals. There are numerous in silico (quantitative) structure-activity relationship ([Q]SAR) models for the prediction of a range of human-relevant toxicological endpoints, but for a given endpoint and chemical, not all predictions are identical due to differences in their training sets, algorithms, and methodology. This poses an issue for high-throughput screening of a large chemical inventory as it necessitates several models to cover diverse chemistries but will then generate data conflicts. To address this challenge, we developed a consensus modeling strategy to combine predictions obtained from different existing in silico (Q)SAR models into a single predictive value while also expanding chemical space coverage. This study developed consensus models for nine toxicological endpoints relating to estrogen receptor (ER) and androgen receptor (AR) interactions (i.e., binding, agonism, and antagonism) and genotoxicity (i.e., bacterial mutation, in vitro chromosomal aberration, and in vivo micronucleus). Consensus models were created by combining different (Q)SAR models using various weighting schemes. As a multi-objective optimization problem, there is no single best consensus model, and therefore, Pareto fronts were determined for each endpoint to identify the consensus models that optimize the multiple-criterion decisions simultaneously. Accordingly, this work presents sets of solutions for each endpoint that contain the optimal combination, regardless of the trade-off, with the results demonstrating that the consensus models improved both the predictive power and chemical space coverage. These solutions were further analyzed to find trends between the best consensus models and their components. Here, we demonstrate the development of a flexible and adaptable approach for in silico consensus modeling and its application across nine toxicological endpoints related to ER activity, AR activity, and genotoxicity. These consensus models are developed to be integrated into a larger multi-tier NAM-based framework to prioritize chemicals for further investigation and support the transition to a non-animal approach to risk assessment in Canada.
Collapse
Affiliation(s)
- Sean P. Collins
- Existing Substances Risk Assessment Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, Canada
| | | | | | | | | | | |
Collapse
|
4
|
Ghaemi Z, Asadollahi-Baboli M. Developing reliable classification of dual IDO1/TDO inhibitors using data fusion and majority voting. J Biomol Struct Dyn 2023:1-9. [PMID: 37921776 DOI: 10.1080/07391102.2023.2278079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 10/25/2023] [Indexed: 11/04/2023]
Abstract
Indoleamine 2,3-dioxygenase 1 (IDO1) and tryptophan 2,3-dioxygenase (TDO) are promising dual-targeting inhibitors in cancer and neurodegenerative diseases treatment. Data fusion of receptor-based and ligand-based information of dual IDO1/TDO inhibitors were employed for active/inactive classification performance. A reliable decision making procedure was used here to identify active/inactive dual IDO1/TDO inhibitors using majority voting method and pools of individual classifications instead of individual models. All classification models were validated using prediction set, cross-validation and y-scrambling tests. The classification outcomes indicate that the sensitivity, specificity, precision, accuracy, G-mean and F1 score values increases up to ∼90% using data fusion and majority voting method. Compare to individual classification models with a single prediction point, the majority voting method has more reliable results due to the integration of the pool of individual classification models. This classification strategy may lead to more reliable identification of active/inactive dual-targeting inhibitors in cancer immunotherapy.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Zahra Ghaemi
- Department of Chemistry, Faculty of Science, Babol Noshirvani University of Technology, Babol, Mazandaran, Iran
| | - M Asadollahi-Baboli
- Department of Chemistry, Faculty of Science, Babol Noshirvani University of Technology, Babol, Mazandaran, Iran
| |
Collapse
|
5
|
Boldini D, Grisoni F, Kuhn D, Friedrich L, Sieber SA. Practical guidelines for the use of gradient boosting for molecular property prediction. J Cheminform 2023; 15:73. [PMID: 37641120 PMCID: PMC10464382 DOI: 10.1186/s13321-023-00743-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 08/09/2023] [Indexed: 08/31/2023] Open
Abstract
Decision tree ensembles are among the most robust, high-performing and computationally efficient machine learning approaches for quantitative structure-activity relationship (QSAR) modeling. Among them, gradient boosting has recently garnered particular attention, for its performance in data science competitions, virtual screening campaigns, and bioactivity prediction. However, different variants of gradient boosting exist, the most popular being XGBoost, LightGBM and CatBoost. Our study provides the first comprehensive comparison of these approaches for QSAR. To this end, we trained 157,590 gradient boosting models, which were evaluated on 16 datasets and 94 endpoints, comprising 1.4 million compounds in total. Our results show that XGBoost generally achieves the best predictive performance, while LightGBM requires the least training time, especially for larger datasets. In terms of feature importance, the models surprisingly rank molecular features differently, reflecting differences in regularization techniques and decision tree structures. Thus, expert knowledge must always be employed when evaluating data-driven explanations of bioactivity. Furthermore, our results show that the relevance of each hyperparameter varies greatly across datasets and that it is crucial to optimize as many hyperparameters as possible to maximize the predictive performance. In conclusion, our study provides the first set of guidelines for cheminformatics practitioners to effectively train, optimize and evaluate gradient boosting models for virtual screening and QSAR applications.
Collapse
Affiliation(s)
- Davide Boldini
- Department of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, Garching bei Munich, Germany
| | - Francesca Grisoni
- Department of Biomedical Engineering, Institute for Complex Molecular Sciences, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/E, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | | | - Stephan A Sieber
- Department of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, Garching bei Munich, Germany.
| |
Collapse
|
6
|
Schaduangrat N, Anuwongcharoen N, Charoenkwan P, Shoombuatong W. DeepAR: a novel deep learning-based hybrid framework for the interpretable prediction of androgen receptor antagonists. J Cheminform 2023; 15:50. [PMID: 37149650 PMCID: PMC10163717 DOI: 10.1186/s13321-023-00721-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 04/08/2023] [Indexed: 05/08/2023] Open
Abstract
Drug resistance represents a major obstacle to therapeutic innovations and is a prevalent feature in prostate cancer (PCa). Androgen receptors (ARs) are the hallmark therapeutic target for prostate cancer modulation and AR antagonists have achieved great success. However, rapid emergence of resistance contributing to PCa progression is the ultimate burden of their long-term usage. Hence, the discovery and development of AR antagonists with capability to combat the resistance, remains an avenue for further exploration. Therefore, this study proposes a novel deep learning (DL)-based hybrid framework, named DeepAR, to accurately and rapidly identify AR antagonists by using only the SMILES notation. Specifically, DeepAR is capable of extracting and learning the key information embedded in AR antagonists. Firstly, we established a benchmark dataset by collecting active and inactive compounds against AR from the ChEMBL database. Based on this dataset, we developed and optimized a collection of baseline models by using a comprehensive set of well-known molecular descriptors and machine learning algorithms. Then, these baseline models were utilized for creating probabilistic features. Finally, these probabilistic features were combined and used for the construction of a meta-model based on a one-dimensional convolutional neural network. Experimental results indicated that DeepAR is a more accurate and stable approach for identifying AR antagonists in terms of the independent test dataset, by achieving an accuracy of 0.911 and MCC of 0.823. In addition, our proposed framework is able to provide feature importance information by leveraging a popular computational approach, named SHapley Additive exPlanations (SHAP). In the meanwhile, the characterization and analysis of potential AR antagonist candidates were achieved through the SHAP waterfall plot and molecular docking. The analysis inferred that N-heterocyclic moieties, halogenated substituents, and a cyano functional group were significant determinants of potential AR antagonists. Lastly, we implemented an online web server by using DeepAR (at http://pmlabstack.pythonanywhere.com/DeepAR ). We anticipate that DeepAR could be a useful computational tool for community-wide facilitation of AR candidates from a large number of uncharacterized compounds.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nuttapat Anuwongcharoen
- Department of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand.
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
7
|
Kim C, Jeong J, Choi J. Effects of Class Imbalance and Data Scarcity on the Performance of Binary Classification Machine Learning Models Developed Based on ToxCast/Tox21 Assay Data. Chem Res Toxicol 2022; 35:2219-2226. [PMID: 36475638 DOI: 10.1021/acs.chemrestox.2c00189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The development of toxicity classification models using the ToxCast database has been extensively studied. Machine learning approaches are effective in identifying the bioactivity of untested chemicals. However, ToxCast assays differ in the amount of data and degree of class imbalance (CI). Therefore, the resampling algorithm employed should vary depending on the data distribution to achieve optimal classification performance. In this study, the effects of CI and data scarcity (DS) on the performance of binary classification models were investigated using ToxCast bioassay data. An assay matrix based on CI and DS was prepared for 335 assays with biologically intended target information, and 28 CI assays and 3 DS assays were selected. Thirty models established by combining five molecular fingerprints (i.e., Morgan, MACCS, RDKit, Pattern, and Layered) and six algorithms [i.e., gradient boosting tree, random forest (RF), multi-layered perceptron, k-nearest neighbor, logistic regression, and naive Bayes] were trained using the selected assay data set. Of the 30 trained models, MACCS-RF showed the best performance and thus was selected for analyses of the effects of CI and DS. Results showed that recall and F1 were significantly lower when training with the CI assays than with the DS assays. In addition, hyperparameter tuning of the RF algorithm significantly improved F1 on CI assays. This study provided a basis for developing a toxicity classification model with improved performance by evaluating the effects of data set characteristics. This study also emphasized the importance of using appropriate evaluation metrics and tuning hyperparameters in model development.
Collapse
Affiliation(s)
- Changhun Kim
- Chemical Bigdata Research Center, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea.,School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| | - Jaeseong Jeong
- Chemical Bigdata Research Center, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea.,School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| | - Jinhee Choi
- Chemical Bigdata Research Center, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea.,School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| |
Collapse
|
8
|
Houssein EH, Hosney ME, Mohamed WM, Ali AA, Younis EMG. Fuzzy-based hunger games search algorithm for global optimization and feature selection using medical data. Neural Comput Appl 2022; 35:5251-5275. [PMID: 36340595 PMCID: PMC9628476 DOI: 10.1007/s00521-022-07916-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 09/30/2022] [Indexed: 11/06/2022]
Abstract
Feature selection (FS) is one of the basic data preprocessing steps in data mining and machine learning. It is used to reduce feature size and increase model generalization. In addition to minimizing feature dimensionality, it also enhances classification accuracy and reduces model complexity, which are essential in several applications. Traditional methods for feature selection often fail in the optimal global solution due to the large search space. Many hybrid techniques have been proposed depending on merging several search strategies which have been used individually as a solution to the FS problem. This study proposes a modified hunger games search algorithm (mHGS), for solving optimization and FS problems. The main advantages of the proposed mHGS are to resolve the following drawbacks that have been raised in the original HGS; (1) avoiding the local search, (2) solving the problem of premature convergence, and (3) balancing between the exploitation and exploration phases. The mHGS has been evaluated by using the IEEE Congress on Evolutionary Computation 2020 (CEC'20) for optimization test and ten medical and chemical datasets. The data have dimensions up to 20000 features or more. The results of the proposed algorithm have been compared to a variety of well-known optimization methods, including improved multi-operator differential evolution algorithm (IMODE), gravitational search algorithm, grey wolf optimization, Harris Hawks optimization, whale optimization algorithm, slime mould algorithm and hunger search games search. The experimental results suggest that the proposed mHGS can generate effective search results without increasing the computational cost and improving the convergence speed. It has also improved the SVM classification performance.
Collapse
Affiliation(s)
- Essam H. Houssein
- Faculty of Computers and Information, Minia University, Minia, Egypt
| | - Mosa E. Hosney
- Faculty of Computers and Information, Luxor University, Luxor, Egypt
| | - Waleed M. Mohamed
- Faculty of Computers and Information, Minia University, Minia, Egypt
| | - Abdelmgeid A. Ali
- Faculty of Computers and Information, Minia University, Minia, Egypt
| | - Eman M. G. Younis
- Faculty of Computers and Information, Minia University, Minia, Egypt
| |
Collapse
|
9
|
Islam MT, Mustafa HA. Multi-Layer Hybrid (MLH) balancing technique: A combined approach to remove data imbalance. DATA KNOWL ENG 2022. [DOI: 10.1016/j.datak.2022.102105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
10
|
Rezaie H, Asadollahi-Baboli M, Hassaninejad-Darzi SK. Hybrid consensus and k-nearest neighbours (kNN) strategies to classify dual BRD4/PLK1 inhibitors. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2022; 33:779-792. [PMID: 36330747 DOI: 10.1080/1062936x.2022.2139292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 10/17/2022] [Indexed: 06/16/2023]
Abstract
A novel decision-making procedure is proposed here for the first time to identify active/inactive and selective/non-selective dual inhibitors using consensus approaches and pools of k-nearest neighbours (kNN) classifications instead of individual models. Dual BRD4/PLK1 inhibition with adequate selectivity is a potential therapeutic strategy for targeting tumour cells in high-risk patients. We report the unique way to identify both active and selective dual BRD4/PLK1 inhibitors using consensus and kNN strategies together with two sources of receptor-based and ligand-based information which are the ranked binding energies of residues and important molecular features, respectively. The results of consensus approaches were compared with the results of individual kNN models. The chemical space similarity was measured using three different distance functions to increase the reliability. All activity and selectivity classification models were validated using cross-validation and y-randomization tests. The outcomes show that consensus approaches can increase the reliability and accuracy of active/inactive or selective/non-selective detections up to 90%. Consensus approaches also reached more balanced values of sensitivity and specificity compared to the individual kNN models because of the compensation in the integration of diverse sources of information.
Collapse
Affiliation(s)
- H Rezaie
- Department of Chemistry, Faculty of Science, Babol Noshirvani University of Technology, Babol, Iran
| | - M Asadollahi-Baboli
- Department of Chemistry, Faculty of Science, Babol Noshirvani University of Technology, Babol, Iran
| | - S K Hassaninejad-Darzi
- Department of Chemistry, Faculty of Science, Babol Noshirvani University of Technology, Babol, Iran
| |
Collapse
|
11
|
Collins SP, Barton-Maclaren TS. Novel machine learning models to predict endocrine disruption activity for high-throughput chemical screening. FRONTIERS IN TOXICOLOGY 2022; 4:981928. [PMID: 36204696 PMCID: PMC9530987 DOI: 10.3389/ftox.2022.981928] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 09/02/2022] [Indexed: 11/13/2022] Open
Abstract
An area of ongoing concern in toxicology and chemical risk assessment is endocrine disrupting chemicals (EDCs). However, thousands of legacy chemicals lack the toxicity testing required to assess their respective EDC potential, and this is where computational toxicology can play a crucial role. The US (United States) Environmental Protection Agency (EPA) has run two programs, the Collaborative Estrogen Receptor Activity Project (CERAPP) and the Collaborative Modeling Project for Receptor Activity (CoMPARA) which aim to predict estrogen and androgen activity, respectively. The US EPA solicited research groups from around the world to provide endocrine receptor activity Qualitative (or Quantitative) Structure Activity Relationship ([Q]SAR) models and then combined them to create consensus models for different toxicity endpoints. Random Forest (RF) models were developed to cover a broader range of substances with high predictive capabilities using large datasets from CERAPP and CoMPARA for estrogen and androgen activity, respectively. By utilizing simple descriptors from open-source software and large training datasets, RF models were created to expand the domain of applicability for predicting endocrine disrupting activity and help in the screening and prioritization of extensive chemical inventories. In addition, RFs were trained to conservatively predict the activity, meaning models are more likely to make false-positive predictions to minimize the number of False Negatives. This work presents twelve binary and multi-class RF models to predict binding, agonism, and antagonism for estrogen and androgen receptors. The RF models were found to have high predictive capabilities compared to other in silico modes, with some models reaching balanced accuracies of 93% while having coverage of 89%. These models are intended to be incorporated into evolving priority-setting workflows and integrated strategies to support the screening and selection of chemicals for further testing and assessment by identifying potential endocrine-disrupting substances.
Collapse
|
12
|
Tuning hyperparameters of machine learning algorithms and deep neural networks using metaheuristics: A bioinformatics study on biomedical and biological cases. Comput Biol Chem 2022; 97:107619. [DOI: 10.1016/j.compbiolchem.2021.107619] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 08/23/2021] [Accepted: 12/17/2021] [Indexed: 12/14/2022]
|
13
|
Sellami A, Réau M, Montes M, Lagarde N. Review of in silico studies dedicated to the nuclear receptor family: Therapeutic prospects and toxicological concerns. Front Endocrinol (Lausanne) 2022; 13:986016. [PMID: 36176461 PMCID: PMC9513233 DOI: 10.3389/fendo.2022.986016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 08/08/2022] [Indexed: 11/13/2022] Open
Abstract
Being in the center of both therapeutic and toxicological concerns, NRs are widely studied for drug discovery application but also to unravel the potential toxicity of environmental compounds such as pesticides, cosmetics or additives. High throughput screening campaigns (HTS) are largely used to detect compounds able to interact with this protein family for both therapeutic and toxicological purposes. These methods lead to a large amount of data requiring the use of computational approaches for a robust and correct analysis and interpretation. The output data can be used to build predictive models to forecast the behavior of new chemicals based on their in vitro activities. This atrticle is a review of the studies published in the last decade and dedicated to NR ligands in silico prediction for both therapeutic and toxicological purposes. Over 100 articles concerning 14 NR subfamilies were carefully read and analyzed in order to retrieve the most commonly used computational methods to develop predictive models, to retrieve the databases deployed in the model building process and to pinpoint some of the limitations they faced.
Collapse
|
14
|
Stanfield Z, Addington CK, Dionisio KL, Lyons D, Tornero-Velez R, Phillips KA, Buckley TJ, Isaacs KK. Mining of Consumer Product Ingredient and Purchasing Data to Identify Potential Chemical Coexposures. ENVIRONMENTAL HEALTH PERSPECTIVES 2021; 129:67006. [PMID: 34160298 PMCID: PMC8221370 DOI: 10.1289/ehp8610] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
BACKGROUND Chemicals in consumer products are a major contributor to human chemical coexposures. Consumers purchase and use a wide variety of products containing potentially thousands of chemicals. There is a need to identify potential real-world chemical coexposures to prioritize in vitro toxicity screening. However, due to the vast number of potential chemical combinations, this identification has been a major challenge. OBJECTIVES We aimed to develop and implement a data-driven procedure for identifying prevalent chemical combinations to which humans are exposed through purchase and use of consumer products. METHODS We applied frequent itemset mining to an integrated data set linking consumer product chemical ingredient data with product purchasing data from 60,000 households to identify chemical combinations resulting from co-use of consumer products. RESULTS We identified co-occurrence patterns of chemicals over all households as well as those specific to demographic groups based on race/ethnicity, income, education, and family composition. We also identified chemicals with the highest potential for aggregate exposure by identifying chemicals occurring in multiple products used by the same household. Last, a case study of chemicals active in estrogen and androgen receptor in silico models revealed priority chemical combinations co-targeting receptors involved in important biological signaling pathways. DISCUSSION Integration and comprehensive analysis of household purchasing data and product-chemical information provided a means to assess human near-field exposure and inform selection of chemical combinations for high-throughput screening in in vitro assays. https://doi.org/10.1289/EHP8610.
Collapse
Affiliation(s)
- Zachary Stanfield
- Oak Ridge Associated Universities (ORAU), Oak Ridge, Tennessee, USA
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Cody K Addington
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
- Oak Ridge Institute for Science and Education (ORISE), Oak Ridge, Tennessee, USA
| | - Kathie L Dionisio
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - David Lyons
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Rogelio Tornero-Velez
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Katherine A Phillips
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Timothy J Buckley
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Kristin K Isaacs
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| |
Collapse
|
15
|
Masoudi-Sobhanzadeh Y, Motieghader H, Omidi Y, Masoudi-Nejad A. A machine learning method based on the genetic and world competitive contests algorithms for selecting genes or features in biological applications. Sci Rep 2021; 11:3349. [PMID: 33558580 PMCID: PMC7870651 DOI: 10.1038/s41598-021-82796-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Accepted: 01/25/2021] [Indexed: 01/30/2023] Open
Abstract
Gene/feature selection is an essential preprocessing step for creating models using machine learning techniques. It also plays a critical role in different biological applications such as the identification of biomarkers. Although many feature/gene selection algorithms and methods have been introduced, they may suffer from problems such as parameter tuning or low level of performance. To tackle such limitations, in this study, a universal wrapper approach is introduced based on our introduced optimization algorithm and the genetic algorithm (GA). In the proposed approach, candidate solutions have variable lengths, and a support vector machine scores them. To show the usefulness of the method, thirteen classification and regression-based datasets with different properties were chosen from various biological scopes, including drug discovery, cancer diagnostics, clinical applications, etc. Our findings confirmed that the proposed method outperforms most of the other currently used approaches and can also free the users from difficulties related to the tuning of various parameters. As a result, users may optimize their biological applications such as obtaining a biomarker diagnostic kit with the minimum number of genes and maximum separability power.
Collapse
Affiliation(s)
- Yosef Masoudi-Sobhanzadeh
- grid.412888.f0000 0001 2174 8913Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Habib Motieghader
- grid.459617.80000 0004 0494 2783Department of Bioinformatics, Biotechnology Research Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran ,grid.459617.80000 0004 0494 2783Department of Basic Sciences, Gowgan Educational Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran
| | - Yadollah Omidi
- grid.261241.20000 0001 2168 8324Department of Pharmaceutical Sciences, College of Pharmacy, Nova Southeastern University, Fort Lauderdale, Florida, 33328 USA
| | - Ali Masoudi-Nejad
- grid.46072.370000 0004 0612 7950Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| |
Collapse
|
16
|
Abstract
Molecular descriptors encode a variety of molecular representations for computer-assisted drug discovery. Here, we focus on the Weighted Holistic Atom Localization and Entity Shape (WHALES) descriptors, which were originally designed for scaffold hopping from natural products to synthetic molecules. WHALES descriptors capture molecular shape and partial charges simultaneously. We introduce the key aspects of the WHALES concept and provide a step-by-step guide on how to use these descriptors for virtual compound screening and scaffold hopping. The results presented can be reproduced by using the code freely available from URL: github.com/ETHmodlab/scaffold_hopping_whales .
Collapse
Affiliation(s)
- Francesca Grisoni
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Zurich, Switzerland.
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
17
|
Piir G, Sild S, Maran U. Binary and multi-class classification for androgen receptor agonists, antagonists and binders. CHEMOSPHERE 2021; 262:128313. [PMID: 33182081 DOI: 10.1016/j.chemosphere.2020.128313] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 08/24/2020] [Accepted: 09/10/2020] [Indexed: 06/11/2023]
Abstract
Androgens and androgen receptor regulate a variety of biological effects in the human body. The impaired functioning of androgen receptor may have different adverse health effects from cancer to infertility. Therefore, it is important to determine whether new chemicals have any binding activity and act as androgen agonists or antagonists before commercial use. Due to the large number of chemicals that require experimental testing, the computational methods are a viable alternative. Therefore, the aim of the present study was to develop predictive QSAR models for classifying compounds according to their activity at the androgen receptor. A large data set of chemicals from the CoMPARA project was used for this purpose and random forest classification models have been developed for androgen binding, agonistic, and antagonistic activity. In addition, a unique effort has been made for multi-class approach that discriminates between inactive compounds, agonists and antagonists simultaneously. For the evaluation set, the classification models predicted agonists with 80% of accuracy and for the antagonists' and binders' the respective metrics were 72% and 78%. Combining agonists, antagonists and inactive compounds into a multi-class approach added complexity to the modelling task and resulted to 64% prediction accuracy for the evaluation set. Considering the size of the training data sets and their imbalance, the achieved evaluation accuracy is very good. The final classification models are available for exploring and predicting at QsarDB repository (https://doi.org/10.15152/QDB.236).
Collapse
Affiliation(s)
- Geven Piir
- University of Tartu, Institute of Chemistry, Ravila 14A, Tartu, 50411, Estonia
| | - Sulev Sild
- University of Tartu, Institute of Chemistry, Ravila 14A, Tartu, 50411, Estonia
| | - Uko Maran
- University of Tartu, Institute of Chemistry, Ravila 14A, Tartu, 50411, Estonia.
| |
Collapse
|
18
|
Zorn KM, Foil DH, Lane TR, Hillwalker W, Feifarek DJ, Jones F, Klaren WD, Brinkman AM, Ekins S. Comparing Machine Learning Models for Aromatase (P450 19A1). ENVIRONMENTAL SCIENCE & TECHNOLOGY 2020; 54:15546-15555. [PMID: 33207874 PMCID: PMC8194505 DOI: 10.1021/acs.est.0c05771] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Aromatase, or cytochrome P450 19A1, catalyzes the aromatization of androgens to estrogens within the body. Changes in the activity of this enzyme can produce hormonal imbalances that can be detrimental to sexual and skeletal development. Inhibition of this enzyme can occur with drugs and natural products as well as environmental chemicals. Therefore, predicting potential endocrine disruption via exogenous chemicals requires that aromatase inhibition be considered in addition to androgen and estrogen pathway interference. Bayesian machine learning methods can be used for prospective prediction from the molecular structure without the need for experimental data. Herein, the generation and evaluation of multiple machine learning models utilizing different sources of aromatase inhibition data are described. These models are applied to two test sets for external validation with molecules relevant to drug discovery from the public domain. In addition, the performance of multiple machine learning algorithms was evaluated by comparing internal five-fold cross-validation statistics of the training data. These methods to predict aromatase inhibition from molecular structure, when used in concert with estrogen and androgen machine learning models, allow for a more holistic assessment of endocrine-disrupting potential of chemicals with limited empirical data and enable the reduction of the use of hazardous substances.
Collapse
Affiliation(s)
- Kimberley M. Zorn
- Collaborations Pharmaceuticals Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, USA
| | - Daniel H. Foil
- Collaborations Pharmaceuticals Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, USA
| | - Thomas R. Lane
- Collaborations Pharmaceuticals Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, USA
| | - Wendy Hillwalker
- Global Product Safety, SC Johnson and Son, Inc., Racine, WI, USA
| | | | - Frank Jones
- Global Product Safety, SC Johnson and Son, Inc., Racine, WI, USA
| | | | | | - Sean Ekins
- Collaborations Pharmaceuticals Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, USA
| |
Collapse
|
19
|
Zorn KM, Foil DH, Lane TR, Hillwalker W, Feifarek DJ, Jones F, Klaren WD, Brinkman AM, Ekins S. Comparison of Machine Learning Models for the Androgen Receptor. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2020; 54:13690-13700. [PMID: 33085465 PMCID: PMC8243727 DOI: 10.1021/acs.est.0c03984] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
The androgen receptor (AR) is a target of interest for endocrine disruption research, as altered signaling can affect normal reproductive and neurological development for generations. In an effort to prioritize compounds with alternative methodologies, the U.S. Environmental Protection Agency (EPA) used in vitro data from 11 assays to construct models of AR agonist and antagonist signaling pathways. While these EPA ToxCast AR models require in vitro data to assign a bioactivity score, Bayesian machine learning methods can be used for prospective prediction from molecule structure alone. This approach was applied to multiple types of data corresponding to the EPA's AR signaling pathway with proprietary software, Assay Central. The training performance of all machine learning models, including six other algorithms, was evaluated by internal 5-fold cross-validation statistics. Bayesian machine learning models were also evaluated with external predictions of reference chemicals to compare prediction accuracies to published results from the EPA. The machine learning model group selected for further studies of endocrine disruption consisted of continuous AC50 data from the February 2019 release of ToxCast/Tox21. These efforts demonstrate how machine learning can be used to predict AR-mediated bioactivity and can also be applied to other targets of endocrine disruption.
Collapse
Affiliation(s)
- Kimberley M. Zorn
- Collaborations Pharmaceuticals Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, USA
| | - Daniel H. Foil
- Collaborations Pharmaceuticals Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, USA
| | - Thomas R. Lane
- Collaborations Pharmaceuticals Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, USA
| | - Wendy Hillwalker
- Global Product Safety, SC Johnson and Son, Inc., Racine, WI, USA
| | | | - Frank Jones
- Global Product Safety, SC Johnson and Son, Inc., Racine, WI, USA
| | | | | | - Sean Ekins
- Collaborations Pharmaceuticals Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, USA
| |
Collapse
|
20
|
Jiménez-Luna J, Grisoni F, Schneider G. Drug discovery with explainable artificial intelligence. NAT MACH INTELL 2020. [DOI: 10.1038/s42256-020-00236-4] [Citation(s) in RCA: 152] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
21
|
Falchetti M, Prediger RD, Zanotto-Filho A. Classification algorithms applied to blood-based transcriptome meta-analysis to predict idiopathic Parkinson's disease. Comput Biol Med 2020; 124:103925. [PMID: 32889300 DOI: 10.1016/j.compbiomed.2020.103925] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Accepted: 07/19/2020] [Indexed: 11/18/2022]
Abstract
Diagnosis of Parkinson's disease (PD) remains a challenge in clinical practice, mostly due to lack of peripheral blood markers. Transcriptomic analysis of blood samples has emerged as a potential means to identify biomarkers and gene signatures of PD. In this context, classification algorithms can assist in detecting data patterns such as phenotypes and transcriptional signatures with potential diagnostic application. In this study, we performed gene expression meta-analysis of blood transcriptome from PD and control patients in order to identify a gene-set capable of predicting PD using classification algorithms. We examined microarray data from public repositories and, after systematic review, 4 independent cohorts (GSE6613, GSE57475, GSE72267 and GSE99039) comprising 711 samples (388 idiopathic PD and 323 healthy individuals) were selected. Initially, analysis of differentially expressed genes resulted in minimal overlap among datasets. To circumvent this, we carried out meta-analysis of 17,712 genes across datasets, and calculated weighted mean Hedges' g effect sizes. From the top-100- positive and negative gene effect sizes, algorithms of collinearity recognition and recursive feature elimination were used to generate a 59-gene signature of idiopathic PD. This signature was evaluated by 9 classification algorithms and 4 sample size-adjusted training groups to create 36 models. Of these, 33 showed accuracy higher than the non-information rate, and 2 models built on Support Vector Machine Regression bestowed best accuracy to predict PD and healthy control samples. In summary, the gene meta-analysis followed by machine learning methodology employed herein identified a gene-set capable of accurately predicting idiopathic PD in blood samples.
Collapse
Affiliation(s)
- Marcelo Falchetti
- Laboratório Experimental de Doenças Neurodegenerativas, Departamento de Farmacologia, Universidade Federal de Santa Catarina (UFSC), Florianópolis, Santa Catarina, Brazil; Laboratório de Farmacologia Bioquímica e Molecular, Departamento de Farmacologia, Universidade Federal de Santa Catarina (UFSC), Florianópolis, Santa Catarina, Brazil
| | - Rui Daniel Prediger
- Laboratório Experimental de Doenças Neurodegenerativas, Departamento de Farmacologia, Universidade Federal de Santa Catarina (UFSC), Florianópolis, Santa Catarina, Brazil
| | - Alfeu Zanotto-Filho
- Laboratório de Farmacologia Bioquímica e Molecular, Departamento de Farmacologia, Universidade Federal de Santa Catarina (UFSC), Florianópolis, Santa Catarina, Brazil.
| |
Collapse
|
22
|
Valsecchi C, Grisoni F, Consonni V, Ballabio D. Consensus versus Individual QSARs in Classification: Comparison on a Large-Scale Case Study. J Chem Inf Model 2020; 60:1215-1223. [PMID: 32073844 PMCID: PMC7997107 DOI: 10.1021/acs.jcim.9b01057] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
![]()
Consensus strategies have been widely
applied in many different
scientific fields, based on the assumption that the fusion of several
sources of information increases the outcome reliability. Despite
the widespread application of consensus approaches, their advantages
in quantitative structure–activity relationship (QSAR) modeling
have not been thoroughly evaluated, mainly due to the lack of appropriate
large-scale data sets. In this study, we evaluated the advantages
and drawbacks of consensus approaches compared to single classification
QSAR models. To this end, we used a data set of three properties (androgen
receptor binding, agonism, and antagonism) for approximately 4000
molecules with predictions performed by more than 20 QSAR models,
made available in a large-scale collaborative project. The individual
QSAR models were compared with two consensus approaches, majority
voting and the Bayes consensus with discrete probability distributions,
in both protective and nonprotective forms. Consensus strategies proved
to be more accurate and to better cover the analyzed chemical space
than individual QSARs on average, thus motivating their widespread
application for property prediction. Scripts and data to reproduce
the results of this study are available for download.
Collapse
Affiliation(s)
- Cecile Valsecchi
- Milano Chemometrics and QSAR Research Group, University of Milano Bicocca, P.za della Scienza 1, 20126 Milano, Italy
| | - Francesca Grisoni
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8049 Zurich, Switzerland
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, University of Milano Bicocca, P.za della Scienza 1, 20126 Milano, Italy
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, University of Milano Bicocca, P.za della Scienza 1, 20126 Milano, Italy
| |
Collapse
|
23
|
Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL, Andrade CH, Bai F, Balabin I, Ballabio D, Benfenati E, Bhhatarai B, Boyer S, Chen J, Consonni V, Farag S, Fourches D, García-Sosa AT, Gramatica P, Grisoni F, Grulke CM, Hong H, Horvath D, Hu X, Huang R, Jeliazkova N, Li J, Li X, Liu H, Manganelli S, Mangiatordi GF, Maran U, Marcou G, Martin T, Muratov E, Nguyen DT, Nicolotti O, Nikolov NG, Norinder U, Papa E, Petitjean M, Piir G, Pogodin P, Poroikov V, Qiao X, Richard AM, Roncaglioni A, Ruiz P, Rupakheti C, Sakkiah S, Sangion A, Schramm KW, Selvaraj C, Shah I, Sild S, Sun L, Taboureau O, Tang Y, Tetko IV, Todeschini R, Tong W, Trisciuzzi D, Tropsha A, Van Den Driessche G, Varnek A, Wang Z, Wedebye EB, Williams AJ, Xie H, Zakharov AV, Zheng Z, Judson RS. CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity. ENVIRONMENTAL HEALTH PERSPECTIVES 2020; 128:27002. [PMID: 32074470 DOI: 10.23645/epacomptox.5176876] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
BACKGROUND Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being addressed using high-throughput screening (HTS) in vitro approaches and computational modeling. OBJECTIVES In support of the Endocrine Disruptor Screening Program, the U.S. Environmental Protection Agency (EPA) led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities. Here, we describe the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts, which follows the steps of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP). METHODS The CoMPARA list of screened chemicals built on CERAPP's list of 32,464 chemicals to include additional chemicals of interest, as well as simulated ToxCast™ metabolites, totaling 55,450 chemical structures. Computational toxicology scientists from 25 international groups contributed 91 predictive models for binding, agonist, and antagonist activity predictions. Models were underpinned by a common training set of 1,746 chemicals compiled from a combined data set of 11 ToxCast™/Tox21 HTS in vitro assays. RESULTS The resulting models were evaluated using curated literature data extracted from different sources. To overcome the limitations of single-model approaches, CoMPARA predictions were combined into consensus models that provided averaged predictive accuracy of approximately 80% for the evaluation set. DISCUSSION The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of ∼875,000 chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.
Collapse
Affiliation(s)
- Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
- ScitoVation LLC, Research Triangle Park, North Carolina, USA
- Integrated Laboratory Systems, Inc., Morrisville, North Carolina, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM), National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Ahmed M Abdelaziz
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Domenico Alberga
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Vinicius M Alves
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Carolina H Andrade
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
| | - Fang Bai
- School of Pharmacy, Lanzhou University, China
| | - Ilya Balabin
- Information Systems & Global Solutions (IS&GS), Lockheed Martin, USA
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Emilio Benfenati
- Istituto di Ricerche Farmacologiche "Mario Negri", IRCCS, Milan, Italy
| | - Barun Bhhatarai
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Scott Boyer
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Jingwen Chen
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Sherif Farag
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Denis Fourches
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | | | - Paola Gramatica
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Francesca Grisoni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Chris M Grulke
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Dragos Horvath
- Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Xin Hu
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ruili Huang
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Jiazhong Li
- School of Pharmacy, Lanzhou University, China
| | - Xuehua Li
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | | | - Serena Manganelli
- Istituto di Ricerche Farmacologiche "Mario Negri", IRCCS, Milan, Italy
| | | | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Gilles Marcou
- Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Todd Martin
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
| | - Eugene Muratov
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Orazio Nicolotti
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Nikolai G Nikolov
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Ulf Norinder
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Ester Papa
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Michel Petitjean
- Computational Modeling of Protein-Ligand Interactions (CMPLI)-INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Geven Piir
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Pavel Pogodin
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Vladimir Poroikov
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Xianliang Qiao
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Ann M Richard
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | | | - Patricia Ruiz
- Computational Toxicology and Methods Development Laboratory, Division of Toxicology and Human Health Sciences, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Chetan Rupakheti
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
- Department of Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, USA
| | - Sugunadevi Sakkiah
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Alessandro Sangion
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Karl-Werner Schramm
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Chandrabose Selvaraj
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Imran Shah
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Lixia Sun
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Olivier Taboureau
- Computational Modeling of Protein-Ligand Interactions (CMPLI)-INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Yun Tang
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Igor V Tetko
- BIGCHEM GmbH, Neuherberg, Germany
- Helmholtz Zentrum Muenchen - German Research Center for Environmental Health (GmbH), Neuherberg, Germany
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | | | - Alexander Tropsha
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - George Van Den Driessche
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Zhongyu Wang
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Eva B Wedebye
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Antony J Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Hongbin Xie
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Alexey V Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ziye Zheng
- Chemistry Department, Umeå University, Umeå, Sweden
| | - Richard S Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| |
Collapse
|
24
|
Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL, Andrade CH, Bai F, Balabin I, Ballabio D, Benfenati E, Bhhatarai B, Boyer S, Chen J, Consonni V, Farag S, Fourches D, García-Sosa AT, Gramatica P, Grisoni F, Grulke CM, Hong H, Horvath D, Hu X, Huang R, Jeliazkova N, Li J, Li X, Liu H, Manganelli S, Mangiatordi GF, Maran U, Marcou G, Martin T, Muratov E, Nguyen DT, Nicolotti O, Nikolov NG, Norinder U, Papa E, Petitjean M, Piir G, Pogodin P, Poroikov V, Qiao X, Richard AM, Roncaglioni A, Ruiz P, Rupakheti C, Sakkiah S, Sangion A, Schramm KW, Selvaraj C, Shah I, Sild S, Sun L, Taboureau O, Tang Y, Tetko IV, Todeschini R, Tong W, Trisciuzzi D, Tropsha A, Van Den Driessche G, Varnek A, Wang Z, Wedebye EB, Williams AJ, Xie H, Zakharov AV, Zheng Z, Judson RS. CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity. ENVIRONMENTAL HEALTH PERSPECTIVES 2020; 128:27002. [PMID: 32074470 PMCID: PMC7064318 DOI: 10.1289/ehp5580] [Citation(s) in RCA: 104] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 11/27/2019] [Accepted: 12/05/2019] [Indexed: 05/04/2023]
Abstract
BACKGROUND Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being addressed using high-throughput screening (HTS) in vitro approaches and computational modeling. OBJECTIVES In support of the Endocrine Disruptor Screening Program, the U.S. Environmental Protection Agency (EPA) led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities. Here, we describe the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts, which follows the steps of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP). METHODS The CoMPARA list of screened chemicals built on CERAPP's list of 32,464 chemicals to include additional chemicals of interest, as well as simulated ToxCast™ metabolites, totaling 55,450 chemical structures. Computational toxicology scientists from 25 international groups contributed 91 predictive models for binding, agonist, and antagonist activity predictions. Models were underpinned by a common training set of 1,746 chemicals compiled from a combined data set of 11 ToxCast™/Tox21 HTS in vitro assays. RESULTS The resulting models were evaluated using curated literature data extracted from different sources. To overcome the limitations of single-model approaches, CoMPARA predictions were combined into consensus models that provided averaged predictive accuracy of approximately 80% for the evaluation set. DISCUSSION The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of ∼ 875,000 chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.
Collapse
Affiliation(s)
- Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
- ScitoVation LLC, Research Triangle Park, North Carolina, USA
- Integrated Laboratory Systems, Inc., Morrisville, North Carolina, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM), National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Ahmed M. Abdelaziz
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Domenico Alberga
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Vinicius M. Alves
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Carolina H. Andrade
- Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil
| | - Fang Bai
- School of Pharmacy, Lanzhou University, China
| | - Ilya Balabin
- Information Systems & Global Solutions (IS&GS), Lockheed Martin, USA
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Emilio Benfenati
- Istituto di Ricerche Farmacologiche “Mario Negri”, IRCCS, Milan, Italy
| | - Barun Bhhatarai
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Scott Boyer
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Jingwen Chen
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Sherif Farag
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Denis Fourches
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | | | - Paola Gramatica
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Francesca Grisoni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Chris M. Grulke
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Dragos Horvath
- Laboratoire de Chémoinformatique—UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Xin Hu
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ruili Huang
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Jiazhong Li
- School of Pharmacy, Lanzhou University, China
| | - Xuehua Li
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | | | - Serena Manganelli
- Istituto di Ricerche Farmacologiche “Mario Negri”, IRCCS, Milan, Italy
| | | | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Gilles Marcou
- Laboratoire de Chémoinformatique—UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Todd Martin
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
| | - Eugene Muratov
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Orazio Nicolotti
- Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy
| | - Nikolai G. Nikolov
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Ulf Norinder
- Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden
| | - Ester Papa
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Michel Petitjean
- Computational Modeling of Protein-Ligand Interactions (CMPLI)–INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Geven Piir
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Pavel Pogodin
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Vladimir Poroikov
- Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia
| | - Xianliang Qiao
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Ann M. Richard
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | | | - Patricia Ruiz
- Computational Toxicology and Methods Development Laboratory, Division of Toxicology and Human Health Sciences, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Chetan Rupakheti
- National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA
- Department of Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, USA
| | - Sugunadevi Sakkiah
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Alessandro Sangion
- QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Karl-Werner Schramm
- Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany
| | - Chandrabose Selvaraj
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | - Imran Shah
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Lixia Sun
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Olivier Taboureau
- Computational Modeling of Protein-Ligand Interactions (CMPLI)–INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France
| | - Yun Tang
- Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Igor V. Tetko
- BIGCHEM GmbH, Neuherberg, Germany
- Helmholtz Zentrum Muenchen – German Research Center for Environmental Health (GmbH), Neuherberg, Germany
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA
| | | | - Alexander Tropsha
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - George Van Den Driessche
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique—UMR7140, University of Strasbourg/CNRS, Strasbourg, France
| | - Zhongyu Wang
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Eva B. Wedebye
- Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark
| | - Antony J. Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| | - Hongbin Xie
- School of Environmental Science and Technology, Dalian University of Technology, Dalian, China
| | - Alexey V. Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Ziye Zheng
- Chemistry Department, Umeå University, Umeå, Sweden
| | - Richard S. Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA
| |
Collapse
|
25
|
Schneider M, Pons JL, Labesse G, Bourguet W. In Silico Predictions of Endocrine Disruptors Properties. Endocrinology 2019; 160:2709-2716. [PMID: 31265055 PMCID: PMC6804484 DOI: 10.1210/en.2019-00382] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 06/26/2019] [Indexed: 01/12/2023]
Abstract
Endocrine-disrupting chemicals (EDCs) are a broad class of molecules present in our environment that are suspected to cause adverse effects in the endocrine system by interfering with the synthesis, transport, degradation, or action of endogenous ligands. The characterization of the harmful interaction between environmental compounds and their potential cellular targets and the development of robust in vivo, in vitro, and in silico screening methods are important for assessment of the toxic potential of large numbers of chemicals. In this context, computer-aided technologies that will allow for activity prediction of endocrine disruptors and environmental risk assessments are being developed. These technologies must be able to cope with diverse data and connect chemistry at the atomic level with the biological activity at the cellular, organ, and organism levels. Quantitative structure-activity relationship methods became popular for toxicity issues. They correlate the chemical structure of compounds with biological activity through a number of molecular descriptors (e.g., molecular weight and parameters to account for hydrophobicity, topology, or electronic properties). Chemical structure analysis is a first step; however, modeling intermolecular interactions and cellular behavior will also be essential. The increasing number of three-dimensional crystal structures of EDCs' targets has provided a wealth of structural information that can be used to predict their interactions with EDCs using docking and scoring procedures. In the present review, we have described the various computer-assisted approaches that use ligands and targets properties to predict endocrine disruptor activities.
Collapse
Affiliation(s)
- Melanie Schneider
- Centre de Biochimie Structurale, CNRS, INSERM, Université de Montpellier, Montpellier, France
| | - Jean-Luc Pons
- Centre de Biochimie Structurale, CNRS, INSERM, Université de Montpellier, Montpellier, France
| | - Gilles Labesse
- Centre de Biochimie Structurale, CNRS, INSERM, Université de Montpellier, Montpellier, France
- Correspondence: Gilles Labesse, PhD, or William Bourguet, PhD, Centre de Biochimie Structurale, 29 rue de Navacelles, 34090 Montpellier, France. E-mail: or
| | - William Bourguet
- Centre de Biochimie Structurale, CNRS, INSERM, Université de Montpellier, Montpellier, France
- Correspondence: Gilles Labesse, PhD, or William Bourguet, PhD, Centre de Biochimie Structurale, 29 rue de Navacelles, 34090 Montpellier, France. E-mail: or
| |
Collapse
|
26
|
Wahab HA, Amaro RE, Cournia Z. A Celebration of Women in Computational Chemistry. J Chem Inf Model 2019; 59:1683-1692. [DOI: 10.1021/acs.jcim.9b00368] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
| | - Rommie E. Amaro
- Department of Chemistry and Biochemistry, University of California, San Diego, 3234 Urey Hall, #0340, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - Zoe Cournia
- Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece
| |
Collapse
|
27
|
Sun L, Yang H, Cai Y, Li W, Liu G, Tang Y. In Silico Prediction of Endocrine Disrupting Chemicals Using Single-Label and Multilabel Models. J Chem Inf Model 2019; 59:973-982. [PMID: 30807141 DOI: 10.1021/acs.jcim.8b00551] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Endocrine disruption (ED) has become a serious public health issue and also poses a significant threat to the ecosystem. Due to complex mechanisms of ED, traditional in silico models focusing on only one mechanism are insufficient for detection of endocrine disrupting chemicals (EDCs), let alone offering an overview of possible action mechanisms for a known EDC. To remove these limitations, in this study both single-label and multilabel models were constructed across six ED targets, namely, AR (androgen receptor), ER (estrogen receptor alpha), TR (thyroid receptor), GR (glucocorticoid receptor), PPARg (peroxisome proliferator-activated receptor gamma), and aromatase. Two machine learning methods were used to build the single-label models, with multiple random under-sampling combining voting classification to overcome the challenge of data imbalance. Four methods were explored to construct the multilabel models that can predict the interaction of one EDC against multiple targets simultaneously. The single-label models of all the six targets have achieved reasonable performance with balanced accuracy (BA) values from 0.742 to 0.816. Each top single-label model was then joined to predict the multilabel test set with BA values from 0.586 to 0.711. The multilabel models could offer a significant boost over the single-label baselines with BA values for the multilabel test set from 0.659 to 0.832. Therefore, we concluded that single-label models could be employed for identification of potential EDCs, while multilabel ones are preferable for prediction of possible mechanisms of known EDCs.
Collapse
Affiliation(s)
- Lixia Sun
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China
| | - Yingchun Cai
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China
| |
Collapse
|