1
|
Zhou Y, Wang Z, Huang Z, Li W, Chen Y, Yu X, Tang Y, Liu G. In silico prediction of ocular toxicity of compounds using explainable machine learning and deep learning approaches. J Appl Toxicol 2024; 44:892-907. [PMID: 38329145 DOI: 10.1002/jat.4586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/16/2024] [Accepted: 01/16/2024] [Indexed: 02/09/2024]
Abstract
The accurate identification of chemicals with ocular toxicity is of paramount importance in health hazard assessment. In contemporary chemical toxicology, there is a growing emphasis on refining, reducing, and replacing animal testing in safety evaluations. Therefore, the development of robust computational tools is crucial for regulatory applications. The performance of predictive models is heavily reliant on the quality and quantity of data. In this investigation, we amalgamated the most extensive dataset (4901 compounds) sourced from governmental GHS-compliant databases and literature to develop binary classification models of chemical ocular toxicity. We employed 12 molecular representations in conjunction with six machine learning algorithms and two deep learning algorithms to create a series of binary classification models. The findings indicated that the deep learning method GCN outperformed the machine learning models in cross-validation, achieving an impressive AUC of 0.915. However, the top-performing machine learning model (RF-Descriptor) demonstrated excellent performance with an AUC of 0.869 on the test set and was therefore selected as the best model. To enhance model interpretability, we conducted the SHAP method and attention weights analysis. The two approaches offered visual depictions of the relevance of key descriptors and substructures in predicting ocular toxicity of chemicals. Thus, we successfully struck a delicate balance between data quality and model interpretability, rendering our model valuable for predicting and comprehending potential ocular-toxic compounds in the early stages of drug discovery.
Collapse
Affiliation(s)
- Yiqing Zhou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Ze Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Zejun Huang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Yuanting Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Xinxin Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| |
Collapse
|
2
|
Jia X, Wang T, Zhu H. Advancing Computational Toxicology by Interpretable Machine Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17690-17706. [PMID: 37224004 PMCID: PMC10666545 DOI: 10.1021/acs.est.3c00653] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/05/2023] [Accepted: 05/05/2023] [Indexed: 05/26/2023]
Abstract
Chemical toxicity evaluations for drugs, consumer products, and environmental chemicals have a critical impact on human health. Traditional animal models to evaluate chemical toxicity are expensive, time-consuming, and often fail to detect toxicants in humans. Computational toxicology is a promising alternative approach that utilizes machine learning (ML) and deep learning (DL) techniques to predict the toxicity potentials of chemicals. Although the applications of ML- and DL-based computational models in chemical toxicity predictions are attractive, many toxicity models are "black boxes" in nature and difficult to interpret by toxicologists, which hampers the chemical risk assessments using these models. The recent progress of interpretable ML (IML) in the computer science field meets this urgent need to unveil the underlying toxicity mechanisms and elucidate the domain knowledge of toxicity models. In this review, we focused on the applications of IML in computational toxicology, including toxicity feature data, model interpretation methods, use of knowledge base frameworks in IML development, and recent applications. The challenges and future directions of IML modeling in toxicology are also discussed. We hope this review can encourage efforts in developing interpretable models with new IML algorithms that can assist new chemical assessments by illustrating toxicity mechanisms in humans.
Collapse
Affiliation(s)
- Xuelian Jia
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Tong Wang
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Hao Zhu
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| |
Collapse
|
3
|
Chung E, Russo DP, Ciallella HL, Wang YT, Wu M, Aleksunes LM, Zhu H. Data-Driven Quantitative Structure-Activity Relationship Modeling for Human Carcinogenicity by Chronic Oral Exposure. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:6573-6588. [PMID: 37040559 PMCID: PMC10134506 DOI: 10.1021/acs.est.3c00648] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 03/28/2023] [Accepted: 03/29/2023] [Indexed: 06/19/2023]
Abstract
Traditional methodologies for assessing chemical toxicity are expensive and time-consuming. Computational modeling approaches have emerged as low-cost alternatives, especially those used to develop quantitative structure-activity relationship (QSAR) models. However, conventional QSAR models have limited training data, leading to low predictivity for new compounds. We developed a data-driven modeling approach for constructing carcinogenicity-related models and used these models to identify potential new human carcinogens. To this goal, we used a probe carcinogen dataset from the US Environmental Protection Agency's Integrated Risk Information System (IRIS) to identify relevant PubChem bioassays. Responses of 25 PubChem assays were significantly relevant to carcinogenicity. Eight assays inferred carcinogenicity predictivity and were selected for QSAR model training. Using 5 machine learning algorithms and 3 types of chemical fingerprints, 15 QSAR models were developed for each PubChem assay dataset. These models showed acceptable predictivity during 5-fold cross-validation (average CCR = 0.71). Using our QSAR models, we can correctly predict and rank 342 IRIS compounds' carcinogenic potentials (PPV = 0.72). The models predicted potential new carcinogens, which were validated by a literature search. This study portends an automated technique that can be applied to prioritize potential toxicants using validated QSAR models based on extensive training sets from public data resources.
Collapse
Affiliation(s)
- Elena Chung
- Department
of Chemistry and Biochemistry, Rowan University, 201 Mullica Hill Road, Glassboro, New Jersey 08028, United States
| | - Daniel P. Russo
- Department
of Chemistry and Biochemistry, Rowan University, 201 Mullica Hill Road, Glassboro, New Jersey 08028, United States
| | - Heather L. Ciallella
- Department
of Toxicology, Cuyahoga County Medical Examiner’s
Office, 11001 Cedar Avenue, Cleveland, Ohio 44106, United States
| | - Yu-Tang Wang
- Institute
of Agro-Products Processing Science and Technology, Chinese Academy of Agricultural Sciences/Key Laboratory of Agro-Products
Processing, Ministry of Agriculture, Beijing 100193, China
| | - Min Wu
- School
of Life Science and Technology, China Pharmaceutical
University, No. 24, Tong Jia Xiang, Nanjing 210009, China
| | - Lauren M. Aleksunes
- Department
of Pharmacology and Toxicology, Rutgers
University, Ernest Mario School of Pharmacy, 170 Frelinghuysen Road, Piscataway, New Jersey 08854, United States
| | - Hao Zhu
- Department
of Chemistry and Biochemistry, Rowan University, 201 Mullica Hill Road, Glassboro, New Jersey 08028, United States
| |
Collapse
|
4
|
Di P, Zheng M, Yang T, Chen G, Ren J, Li X, Jiang H. Prediction of serious eye damage or eye irritation potential of compounds via consensus labelling models and active learning models based on uncertainty strategies. Food Chem Toxicol 2022; 169:113420. [PMID: 36108981 DOI: 10.1016/j.fct.2022.113420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 08/24/2022] [Accepted: 09/06/2022] [Indexed: 12/01/2022]
Abstract
Serious eye damage and eye irritation have been authenticated to be significant human health issues in various fields such as ophthalmic pharmaceuticals. Due to the shortcomings of traditional animal testing methods, in silico methods have advanced to study eye toxicity. The models for predicting serious eye damage and eye irritation potential of compounds were developed using 2299 and 5214 compounds, respectively. The 40 global single models and 40 local models were developed by combining 5 molecular description methods and 4 machine learning methods. The 40 active learning models were developed by adopting uncertainty-based active learning strategies and taking local models as initial models. The 110 global consensus models based on 40 global single models were developed using a consensus strategy. Active learning models and global consensus models performed high prediction accuracy. The test accuracy of the best serious eye damage model and eye irritation model reached 0.972 and 0.959, respectively. The applicability domains for all models were calculated to verify the rationality of prediction effect. In addition, 8 structural alerts probably causing serious eye damage or eye irritation were sought out. The prediction models and structural alerts contributed to providing hazard identification and assessing chemical safety.
Collapse
Affiliation(s)
- Peiwen Di
- School of Pharmacology Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China; Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
| | - Mingyue Zheng
- School of Pharmacology Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China; Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
| | - Tianbiao Yang
- School of Pharmacology Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China
| | - Geng Chen
- School of Pharmacology Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China
| | - Jianan Ren
- School of Pharmacology Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
| | - Hualiang Jiang
- School of Pharmacology Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China; Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
| |
Collapse
|
5
|
Ciallella HL, Russo DP, Sharma S, Li Y, Sloter E, Sweet L, Huang H, Zhu H. Predicting Prenatal Developmental Toxicity Based On the Combination of Chemical Structures and Biological Data. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:5984-5998. [PMID: 35451820 PMCID: PMC9191745 DOI: 10.1021/acs.est.2c01040] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
For hazard identification, classification, and labeling purposes, animal testing guidelines are required by law to evaluate the developmental toxicity potential of new and existing chemical products. However, guideline developmental toxicity studies are costly, time-consuming, and require many laboratory animals. Computational modeling has emerged as a promising, animal-sparing, and cost-effective method for evaluating the developmental toxicity potential of chemicals, such as endocrine disruptors, without the use of animals. We aimed to develop a predictive and explainable computational model for developmental toxicants. To this end, a comprehensive dataset of 1244 chemicals with developmental toxicity classifications was curated from public repositories and literature sources. Data from 2140 toxicological high-throughput screening assays were extracted from PubChem and the ToxCast program for this dataset and combined with information about 834 chemical fragments to group assays based on their chemical-mechanistic relationships. This effort revealed two assay clusters containing 83 and 76 assays, respectively, with high positive predictive rates for developmental toxicants identified with animal testing guidelines (PPV = 72.4 and 77.3% during cross-validation). These two assay clusters can be used as developmental toxicity models and were applied to predict new chemicals for external validation. This study provides a new strategy for constructing alternative chemical developmental toxicity evaluations that can be replicated for other toxicity modeling studies.
Collapse
Affiliation(s)
- Heather L. Ciallella
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, 08103, USA
| | - Daniel P. Russo
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, 08103, USA
- Department of Chemistry, Rutgers University, Camden, NJ, 08102, USA
| | - Swati Sharma
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, 08103, USA
| | - Yafan Li
- The Lubrizol Corporation, Wickliffe, OH, 44092, USA
| | - Eddie Sloter
- The Lubrizol Corporation, Wickliffe, OH, 44092, USA
| | - Len Sweet
- The Lubrizol Corporation, Wickliffe, OH, 44092, USA
| | - Heng Huang
- Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, 15261, USA
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, 08103, USA
- Department of Chemistry, Rutgers University, Camden, NJ, 08102, USA
- Corresponding Author333 Hao Zhu, 201 South Broadway, Joint Health Sciences Center, Rutgers University, Camden, New Jersey 08103; Telephone: (856) 225-6781;
| |
Collapse
|
6
|
Kang Y, Jeong B, Lim DH, Lee D, Lim KM. In silico prediction of the full United Nations Globally Harmonized System eye irritation categories of liquid chemicals by IATA-like bottom-up approach of random forest method. JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH. PART A 2021; 84:960-972. [PMID: 34328061 DOI: 10.1080/15287394.2021.1956661] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
As an alternative to in vivo Draize rabbit eye irritation test, this study aimed to construct an in silico model to predict the complete United Nations (UN) Globally Harmonized System (GHS) for classification and labeling of chemicals for eye irritation category [eye damage (Category 1), irritating to eye (Category 2) and nonirritating (No category)] of liquid chemicals with Integrated approaches to testing and assessment (IATA)-like two-stage random forest approach. Liquid chemicals (n = 219) with 34 physicochemical descriptors and quality in vivo data were collected with no missing values. Seven machine learning algorithms (Naive Bayes, Logistic Regression, First Large Margin, Neural Net, Random Forest (RF), Gradient Boosted Tree, and Support Vector Machine) were examined for the ternary categorization of eye irritation potential at a single run through 10-fold cross-validation. RF, which performed best, was further improved by applying the 'Bottom-up approach' concept of IATA, namely, separating No category first, and discriminating Category 1 from 2, thereafter. The best performing training dataset achieved an overall accuracy of 73% and the correct prediction for Category 1, 2, and No category was 80%, 50%, and 77%, respectively for the test dataset. This prediction model was further validated with an external dataset of 28 chemicals, for which an overall accuracy of 71% was achieved.
Collapse
Affiliation(s)
- Yeonsoo Kang
- College of Pharmacy, Ewha Womans University, Seoul, Republic of Korea
| | - Boram Jeong
- Department of Statistics, Ewha Womans University, Seoul, Republic of Korea
| | | | - Donghwan Lee
- Department of Statistics, Ewha Womans University, Seoul, Republic of Korea
| | - Kyung-Min Lim
- College of Pharmacy, Ewha Womans University, Seoul, Republic of Korea
| |
Collapse
|
7
|
Silva AC, Borba JV, Alves VM, Hall SU, Furnham N, Kleinstreuer N, Muratov E, Tropsha A, Andrade CH. Novel computational models offer alternatives to animal testing for assessing eye irritation and corrosion potential of chemicals. ARTIFICIAL INTELLIGENCE IN THE LIFE SCIENCES 2021; 1. [PMID: 35935266 PMCID: PMC9355119 DOI: 10.1016/j.ailsci.2021.100028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Eye irritation and corrosion are fundamental considerations in developing chemicals to be used in or near the eye, from cleaning products to ophthalmic solutions. Unfortunately, animal testing is currently the standard method to identify compounds that cause eye irritation or corrosion. Yet, there is growing pressure on the part of regulatory agencies both in the USA and abroad to develop New Approach Methodologies (NAMs) that help reduce the need for animal testing and address unmet need to modernize safety evaluation of chemical hazards. In furthering the development and applications of computational NAMs in chemical safety assessment, in this study we have collected the largest expertly curated dataset of compounds tested for eye irritation and corrosion, and employed this data to build and validate binary and multi-classification Quantitative Structure-Activity Relationships (QSAR) models that can reliably assess eye irritation/corrosion potential of novel untested compounds. QSAR models were generated with Random Forest (RF) and Multi-Descriptor Read Across (MuDRA) machine learning (ML) methods, and validated using a 5-fold external cross-validation protocol. These models demonstrated high balanced accuracy (CCR of 0.68–0.88), sensitivity (SE of 0.61–0.84), positive predictive value (PPV of 0.65–0.90), specificity (SP of 0.56–0.91), and negative predictive value (NPV of 0.68–0.85). Overall, MuDRA models outperformed RF models and were applied to predict compounds’ irritation/corrosion potential from the Inactive Ingredient Database, which contains components present in FDA-approved drug products, and from the Cosmetic Ingredient Database, the European Commission source of information on cosmetic substances. All models built and validated in this study are publicly available at the STopTox web portal (https://stoptox.mml.unc.edu/). These models can be employed as reliable tools for identifying potential eye irritant/corrosive compounds
Collapse
|
8
|
Ciallella HL, Russo DP, Aleksunes LM, Grimm FA, Zhu H. Revealing Adverse Outcome Pathways from Public High-Throughput Screening Data to Evaluate New Toxicants by a Knowledge-Based Deep Neural Network Approach. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2021; 55:10875-10887. [PMID: 34304572 PMCID: PMC8713073 DOI: 10.1021/acs.est.1c02656] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Traditional experimental testing to identify endocrine disruptors that enhance estrogenic signaling relies on expensive and labor-intensive experiments. We sought to design a knowledge-based deep neural network (k-DNN) approach to reveal and organize public high-throughput screening data for compounds with nuclear estrogen receptor α and β (ERα and ERβ) binding potentials. The target activity was rodent uterotrophic bioactivity driven by ERα/ERβ activations. After training, the resultant network successfully inferred critical relationships among ERα/ERβ target bioassays, shown as weights of 6521 edges between 1071 neurons. The resultant network uses an adverse outcome pathway (AOP) framework to mimic the signaling pathway initiated by ERα and identify compounds that mimic endogenous estrogens (i.e., estrogen mimetics). The k-DNN can predict estrogen mimetics by activating neurons representing several events in the ERα/ERβ signaling pathway. Therefore, this virtual pathway model, starting from a compound's chemistry initiating ERα activation and ending with rodent uterotrophic bioactivity, can efficiently and accurately prioritize new estrogen mimetics (AUC = 0.864-0.927). This k-DNN method is a potential universal computational toxicology strategy to utilize public high-throughput screening data to characterize hazards and prioritize potentially toxic compounds.
Collapse
Affiliation(s)
- Heather L Ciallella
- Center for Computational and Integrative Biology, Rutgers University Camden, Camden, New Jersey 08103, United States
| | - Daniel P Russo
- Center for Computational and Integrative Biology, Rutgers University Camden, Camden, New Jersey 08103, United States
- Department of Chemistry, Rutgers University Camden, Camden, New Jersey 08102, United States
| | - Lauren M Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Fabian A Grimm
- ExxonMobil Biomedical Sciences, Inc., Annandale, New Jersey 08801, United States
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University Camden, Camden, New Jersey 08103, United States
- Department of Chemistry, Rutgers University Camden, Camden, New Jersey 08102, United States
| |
Collapse
|
9
|
Zhao L, Russo DP, Wang W, Aleksunes LM, Zhu H. Mechanism-Driven Read-Across of Chemical Hepatotoxicants Based on Chemical Structures and Biological Data. Toxicol Sci 2021; 174:178-188. [PMID: 32073637 DOI: 10.1093/toxsci/kfaa005] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Hepatotoxicity is a leading cause of attrition in the drug development process. Traditional preclinical and clinical studies to evaluate hepatotoxicity liabilities are expensive and time consuming. With the advent of critical advancements in high-throughput screening, there has been a rapid accumulation of in vitro toxicity data available to inform the risk assessment of new pharmaceuticals and chemicals. To this end, we curated and merged all available in vivo hepatotoxicity data obtained from the literature and public resources, which yielded a comprehensive database of 4089 compounds that includes hepatotoxicity classifications. After dividing the original database of chemicals into modeling and test sets, PubChem assay data were automatically extracted using an in-house data mining tool and clustered based on relationships between structural fragments and cellular responses in in vitro assays. The resultant PubChem assay clusters were further investigated. During the cross-validation procedure, the biological data obtained from several assay clusters exhibited high predictivity of hepatotoxicity and these assays were selected to evaluate the test set compounds. The read-across results indicated that if a new compound contained specific identified chemical fragments (ie, Molecular Initiating Event) and showed active responses in the relevant selected PubChem assays, there was potential for the chemical to be hepatotoxic in vivo. Furthermore, several mechanisms that might contribute to toxicity were derived from the modeling results including alterations in nuclear receptor signaling and inhibition of DNA repair. This modeling strategy can be further applied to the investigation of other complex chemical toxicity phenomena (eg, developmental and reproductive toxicities) as well as drug efficacy.
Collapse
Affiliation(s)
- Linlin Zhao
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey
| | - Daniel P Russo
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey
| | - Wenyi Wang
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey
| | - Lauren M Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey.,Department of Chemistry, Rutgers University, Camden, New Jersey
| |
Collapse
|
10
|
Ciallella HL, Russo DP, Aleksunes LM, Grimm FA, Zhu H. Predictive modeling of estrogen receptor agonism, antagonism, and binding activities using machine- and deep-learning approaches. J Transl Med 2021; 101:490-502. [PMID: 32778734 PMCID: PMC7873171 DOI: 10.1038/s41374-020-00477-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 07/19/2020] [Accepted: 07/21/2020] [Indexed: 11/23/2022] Open
Abstract
As defined by the World Health Organization, an endocrine disruptor is an exogenous substance or mixture that alters function(s) of the endocrine system and consequently causes adverse health effects in an intact organism, its progeny, or (sub)populations. Traditional experimental testing regimens to identify toxicants that induce endocrine disruption can be expensive and time-consuming. Computational modeling has emerged as a promising and cost-effective alternative method for screening and prioritizing potentially endocrine-active compounds. The efficient identification of suitable chemical descriptors and machine-learning algorithms, including deep learning, is a considerable challenge for computational toxicology studies. Here, we sought to apply classic machine-learning algorithms and deep-learning approaches to a panel of over 7500 compounds tested against 18 Toxicity Forecaster assays related to nuclear estrogen receptor (ERα and ERβ) activity. Three binary fingerprints (Extended Connectivity FingerPrints, Functional Connectivity FingerPrints, and Molecular ACCess System) were used as chemical descriptors in this study. Each descriptor was combined with four machine-learning and two deep- learning (normal and multitask neural networks) approaches to construct models for all 18 ER assays. The resulting model performance was evaluated using the area under the receiver- operating curve (AUC) values obtained from a fivefold cross-validation procedure. The results showed that individual models have AUC values that range from 0.56 to 0.86. External validation was conducted using two additional sets of compounds (n = 592 and n = 966) with established interactions with nuclear ER demonstrated through experimentation. An agonist, antagonist, or binding score was determined for each compound by averaging its predicted probabilities in relevant assay models as an external validation, yielding AUC values ranging from 0.63 to 0.91. The results suggest that multitask neural networks offer advantages when modeling mechanistically related endpoints. Consensus predictions based on the average values of individual models remain the best modeling strategy for computational toxicity evaluations.
Collapse
Affiliation(s)
- Heather L Ciallella
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
| | - Daniel P Russo
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
| | - Lauren M Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ, USA
| | - Fabian A Grimm
- ExxonMobil Biomedical Sciences, Inc., Annandale, NJ, USA
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA.
- Department of Chemistry, Rutgers University, Camden, NJ, USA.
| |
Collapse
|
11
|
Development of In Vitro Corneal Models: Opportunity for Pharmacological Testing. Methods Protoc 2020; 3:mps3040074. [PMID: 33147693 PMCID: PMC7711486 DOI: 10.3390/mps3040074] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Accepted: 10/30/2020] [Indexed: 12/12/2022] Open
Abstract
The human eye is a specialized organ with a complex anatomy and physiology, because it is characterized by different cell types with specific physiological functions. Given the complexity of the eye, ocular tissues are finely organized and orchestrated. In the last few years, many in vitro models have been developed in order to meet the 3Rs principle (Replacement, Reduction and Refinement) for eye toxicity testing. This procedure is highly necessary to ensure that the risks associated with ophthalmic products meet appropriate safety criteria. In vitro preclinical testing is now a well-established practice of significant importance for evaluating the efficacy and safety of cosmetic, pharmaceutical, and nutraceutical products. Along with in vitro testing, also computational procedures, herein described, for evaluating the pharmacological profile of potential ocular drug candidates including their toxicity, are in rapid expansion. In this review, the ocular cell types and functionality are described, providing an overview about the scientific challenge for the development of three-dimensional (3D) in vitro models.
Collapse
|
12
|
Wang YT, Russo DP, Liu C, Zhou Q, Zhu H, Zhang YH. Predictive Modeling of Angiotensin I-Converting Enzyme Inhibitory Peptides Using Various Machine Learning Approaches. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2020; 68:12132-12140. [PMID: 32915574 DOI: 10.1021/acs.jafc.0c04624] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Food-derived angiotensin I-converting enzyme (ACE) inhibitory peptides could potentially be used as safe supportive therapeutic products for high blood pressure. Theoretical approaches are promising methods with the advantage through exploring the relationships between peptide structures and their bioactivities. In this study, peptides with ACE inhibitory activity were collected and curated. Quantitative structure-activity relationship (QSAR) models were developed by using the combination of various machine learning approaches and chemical descriptors. The resultant models have revealed several structure features accounting for the ACE inhibitions. 14 new dipeptides predicted to lower blood pressure by inhibiting ACE were selected. Molecular docking indicated that these dipeptides formed hydrogen bonds with ACE. Five of these dipeptides were synthesized for experimental testing. The QSAR models developed were proofed to design and propose novel ACE inhibitory peptides. Machine learning algorithms and properly selected chemical descriptors can be promising modeling approaches for rational design of natural functional food components.
Collapse
Affiliation(s)
- Yu-Tang Wang
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Daniel P Russo
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
| | - Chang Liu
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Qian Zhou
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
- Department of Chemistry, Rutgers University, Camden, New Jersey 08102, United States
| | - Ying-Hua Zhang
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| |
Collapse
|
13
|
Chakravarti SK. Reason Vectors: Abstract Representation of Chemistry–Biology Interaction Outcomes, for Reasoning and Prediction. J Chem Inf Model 2020; 60:4614-4628. [DOI: 10.1021/acs.jcim.0c00601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Suman K. Chakravarti
- MultiCASE Inc., 23811 Chagrin Blvd., Suite 305, Beachwood, Ohio 44122, United States
| |
Collapse
|
14
|
Liu G, Yan X, Sedykh A, Pan X, Zhao X, Yan B, Zhu H. Analysis of model PM 2.5-induced inflammation and cytotoxicity by the combination of a virtual carbon nanoparticle library and computational modeling. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2020; 191:110216. [PMID: 31972454 PMCID: PMC7018436 DOI: 10.1016/j.ecoenv.2020.110216] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 12/04/2019] [Accepted: 01/13/2020] [Indexed: 05/02/2023]
Abstract
Health risks induced by PM2.5 have become one of the major concerns among living populations, especially in regions facing serious pollution such as China and India. Furthermore, the composition of PM2.5 is complex and it also varies with time and locations. To facilitate our understanding of PM2.5-induced toxicity, a predictive modeling framework was developed in the present study. The core of this study was 1) to construct a virtual carbon nanoparticle library based on the experimental data to simulate the PM2.5 structures; 2) to quantify the nanoparticle structures by novel nanodescriptors; and 3) to perform computational modeling for critical toxicity endpoints. The virtual carbon nanoparticle library was developed to represent the nanostructures of 20 carbon nanoparticles, which were synthesized to simulate PM2.5 structures and tested for potential health risks. Based on the calculated nanodescriptors from virtual carbon nanoparticles, quantitative nanostructure-activity relationship (QNAR) models were developed to predict cytotoxicity and four different inflammatory responses induced by model PM2.5. The high predictability (R2 > 0.65 for leave-one-out validations) of the resulted consensus models indicated that this approach could be a universal tool to predict and analyze the potential toxicity of model PM2.5, ultimately understanding and evaluating the ambient PM2.5-induced toxicity.
Collapse
Affiliation(s)
- Guohong Liu
- School of Chemistry and Chemical Engineering, Shandong University, Jinan, 250100, China
| | - Xiliang Yan
- School of Chemistry and Chemical Engineering, Shandong University, Jinan, 250100, China; The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA
| | - Alexander Sedykh
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA; Sciome, Research Triangle Park, NC, 27709, USA
| | - Xiujiao Pan
- School of Chemistry and Chemical Engineering, Shandong University, Jinan, 250100, China
| | - Xiaoli Zhao
- Department of Physiological Science, Eastern Virginia Medical School, Norfolk, VA, 23507, USA
| | - Bing Yan
- Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Institute of Environmental Research at Greater Bay, Guangzhou University, Guangzhou, 510006, China; School of Environmental Science and Engineering, Shandong University, Jinan, 250100, China.
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA; Department of Chemistry, Rutgers University, Camden, NJ, 08102, USA.
| |
Collapse
|
15
|
Toxicity Prediction Method Based on Multi-Channel Convolutional Neural Network. Molecules 2019; 24:molecules24183383. [PMID: 31533341 PMCID: PMC6766985 DOI: 10.3390/molecules24183383] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Revised: 09/03/2019] [Accepted: 09/13/2019] [Indexed: 02/08/2023] Open
Abstract
Molecular toxicity prediction is one of the key studies in drug design. In this paper, a deep learning network based on a two-dimension grid of molecules is proposed to predict toxicity. At first, the van der Waals force and hydrogen bond were calculated according to different descriptors of molecules, and multi-channel grids were generated, which could discover more detail and helpful molecular information for toxicity prediction. The generated grids were fed into a convolutional neural network to obtain the result. A Tox21 dataset was used for the evaluation. This dataset contains more than 12,000 molecules. It can be seen from the experiment that the proposed method performs better compared to other traditional deep learning and machine learning methods.
Collapse
|
16
|
Abstract
Due to the massive data sets available for drug candidates, modern drug discovery has advanced to the big data era. Central to this shift is the development of artificial intelligence approaches to implementing innovative modeling based on the dynamic, heterogeneous, and large nature of drug data sets. As a result, recently developed artificial intelligence approaches such as deep learning and relevant modeling studies provide new solutions to efficacy and safety evaluations of drug candidates based on big data modeling and analysis. The resulting models provided deep insights into the continuum from chemical structure to in vitro, in vivo, and clinical outcomes. The relevant novel data mining, curation, and management techniques provided critical support to recent modeling studies. In summary, the new advancement of artificial intelligence in the big data era has paved the road to future rational drug development and optimization, which will have a significant impact on drug discovery procedures and, eventually, public health.
Collapse
Affiliation(s)
- Hao Zhu
- Department of Chemistry and Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey 08102, USA;
| |
Collapse
|
17
|
Guo Y, Zhao L, Zhang X, Zhu H. Using a hybrid read-across method to evaluate chemical toxicity based on chemical structure and biological data. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2019; 178:178-187. [PMID: 31004930 PMCID: PMC6508079 DOI: 10.1016/j.ecoenv.2019.04.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Revised: 04/05/2019] [Accepted: 04/07/2019] [Indexed: 05/08/2023]
Abstract
Read-across has become a primary approach to fill data gaps for chemical safety assessments. Chemical similarity based on structure, reactivity, and physic-chemical property information is a traditional approach applied for read-across toxicity studies. However, toxicity mechanisms are usually complicated in a biological system, so only using chemical similarity to perform the read-across for new compounds was not satisfactory for most toxicity endpoints, especially when the chemically similar compounds show dissimilar toxicities. This study aims to develop an enhanced read-across method for chemical toxicity predictions. To this end, we used two large toxicity datasets for read-across purposes. One consists of 3979 compounds with Ames mutagenicity data, and the other contains 7332 compounds with rat acute oral toxicity data. First, biological data for all compounds in these two datasets were obtained by querying thousands of PubChem bioassays. The PubChem bioassays with at least five compounds from either of these two datasets showing active responses were selected to generate comprehensive bioprofiles. The read-across studies were performed by using chemical similarity search only and also by using a hybrid similarity search based on both chemical descriptors and bioprofiles. Compared to traditional read-across based on chemical similarity, the hybrid read-across approach showed improved accuracy of predictions for both Ames mutagenicity and acute oral toxicity. Furthermore, we could illustrate potential toxicity mechanisms by analyzing the bioprofiles used for this hybrid read-across study. The results of this study indicate that the new hybrid read-across approach could be an applicable computational tool for chemical toxicity predictions. In this way, the bottleneck of traditional read-across studies can be overcome by introducing public biological data into the traditional process. The incorporation of bioprofiles generated from the additional biological data for compounds can partially solve the "activity cliff" issue and reveal their potential toxicity mechanisms. This study leads to a promising direction to utilize data-driven approaches for computational toxicology studies in the big data era.
Collapse
Affiliation(s)
- Yajie Guo
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, China
| | - Linlin Zhao
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
| | - Xiaoyi Zhang
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, China.
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA; Department of Chemistry, Rutgers University, Camden, NJ, USA.
| |
Collapse
|
18
|
Yang S, Shen Y, Lu W, Yang Y, Wang H, Li L, Wu C, Du G. Evaluation and Identification of the Neuroprotective Compounds of Xiaoxuming Decoction by Machine Learning: A Novel Mode to Explore the Combination Rules in Traditional Chinese Medicine Prescription. BIOMED RESEARCH INTERNATIONAL 2019; 2019:6847685. [PMID: 31360720 PMCID: PMC6652039 DOI: 10.1155/2019/6847685] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2019] [Revised: 05/13/2019] [Accepted: 05/26/2019] [Indexed: 12/18/2022]
Abstract
Xiaoxuming decoction (XXMD), a classic traditional Chinese medicine (TCM) prescription, has been used as a therapeutic in the treatment of stroke in clinical practice for over 1200 years. However, the pharmacological mechanisms of XXMD have not yet been elucidated. The purpose of this study was to develop neuroprotective models for identifying neuroprotective compounds in XXMD against hypoxia-induced and H2O2-induced brain cell damage. In this study, a phenotype-based classification method was designed by machine learning to identify neuroprotective compounds and to clarify the compatibility of XXMD components. Four different single classifiers (AB, kNN, CT, and RF) and molecular fingerprint descriptors were used to construct stacked naïve Bayesian models. Among them, the RF algorithm had a better performance with an average MCC value of 0.725±0.014 and 0.774±0.042 from 5-fold cross-validation and test set, respectively. The probability values calculated by four models were then integrated into a stacked Bayesian model. In total, two optimal models, s-NB-1-LPFP6 and s-NB-2-LPFP6, were obtained. The two validated optimal models revealed Matthews correlation coefficients (MCC) of 0.968 and 0.993 for 5-fold cross-validation and of 0.874 and 0.959 for the test set, respectively. Furthermore, the two models were used for virtual screening experiments to identify neuroprotective compounds in XXMD. Ten representative compounds with potential therapeutic effects against the two phenotypes were selected for further cell-based assays. Among the selected compounds, two compounds significantly inhibited H2O2-induced and Na2S2O4-induced neurotoxicity simultaneously. Together, our findings suggested that machine learning algorithms such as combination Bayesian models were feasible to predict neuroprotective compounds and to preliminarily demonstrate the pharmacological mechanisms of TCM.
Collapse
Affiliation(s)
- Shilun Yang
- School of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, No. 103, Wen hua Road, Shenyang 110016, China
- Beijing Key Laboratory of Drug Targets Identification and Drug Screening, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 2, Nan wei Road, Beijing 100050, China
| | - Yanjia Shen
- Beijing Key Laboratory of Drug Targets Identification and Drug Screening, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 2, Nan wei Road, Beijing 100050, China
| | - Wendan Lu
- Beijing Key Laboratory of Drug Targets Identification and Drug Screening, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 2, Nan wei Road, Beijing 100050, China
| | - Yinglin Yang
- Beijing Key Laboratory of Drug Targets Identification and Drug Screening, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 2, Nan wei Road, Beijing 100050, China
| | - Haigang Wang
- Beijing Key Laboratory of Drug Targets Identification and Drug Screening, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 2, Nan wei Road, Beijing 100050, China
| | - Li Li
- Beijing Key Laboratory of Drug Targets Identification and Drug Screening, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 2, Nan wei Road, Beijing 100050, China
| | - Chunfu Wu
- School of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, No. 103, Wen hua Road, Shenyang 110016, China
| | - Guanhua Du
- School of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, No. 103, Wen hua Road, Shenyang 110016, China
- Beijing Key Laboratory of Drug Targets Identification and Drug Screening, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 2, Nan wei Road, Beijing 100050, China
| |
Collapse
|
19
|
Russo DP, Strickland J, Karmaus AL, Wang W, Shende S, Hartung T, Aleksunes LM, Zhu H. Nonanimal Models for Acute Toxicity Evaluations: Applying Data-Driven Profiling and Read-Across. ENVIRONMENTAL HEALTH PERSPECTIVES 2019; 127:47001. [PMID: 30933541 PMCID: PMC6785238 DOI: 10.1289/ehp3614] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
BACKGROUND Low-cost, high-throughput in vitro bioassays have potential as alternatives to animal models for toxicity testing. However, incorporating in vitro bioassays into chemical toxicity evaluations such as read-across requires significant data curation and analysis based on knowledge of relevant toxicity mechanisms, lowering the enthusiasm of using the massive amount of unstructured public data. OBJECTIVE We aimed to develop a computational method to automatically extract useful bioassay data from a public repository (i.e., PubChem) and assess its ability to predict animal toxicity using a novel bioprofile-based read-across approach. METHODS A training database containing 7,385 compounds with diverse rat acute oral toxicity data was searched against PubChem to establish in vitro bioprofiles. Using a novel subspace clustering algorithm, bioassay groups that may inform on relevant toxicity mechanisms underlying acute oral toxicity were identified. These bioassays groups were used to predict animal acute oral toxicity using read-across through a cross-validation process. Finally, an external test set of over 600 new compounds was used to validate the resulting model predictivity. RESULTS Several bioassay clusters showed high predictivity for acute oral toxicity (positive prediction rates range from 62-100%) through cross-validation. After incorporating individual clusters into an ensemble model, chemical toxicants in the external test set were evaluated for putative acute toxicity (positive prediction rate equal to 76%). Additionally, chemical fragment -in vitro-in vivo relationships were identified to illustrate new animal toxicity mechanisms. CONCLUSIONS The in vitro bioassay data-driven profiling strategy developed in this study meets the urgent needs of computational toxicology in the current big data era and can be extended to develop predictive models for other complex toxicity end points. https://doi.org/10.1289/EHP3614.
Collapse
Affiliation(s)
- Daniel P. Russo
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | - Judy Strickland
- Integrated Laboratory Systems (ILS), Research Triangle Park, North Carolina, USA
| | - Agnes L. Karmaus
- Integrated Laboratory Systems (ILS), Research Triangle Park, North Carolina, USA
| | - Wenyi Wang
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | - Sunil Shende
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
- Department of Computer Science, Rutgers University, Camden, New Jersey, USA
| | - Thomas Hartung
- Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, Maryland, USA
- University of Konstanz, CAAT-Europe, Konstanz, Germany
| | - Lauren M. Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey, USA
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
- Department of Chemistry, Rutgers University, Camden, New Jersey, USA
| |
Collapse
|
20
|
Zorn KM, Lane TR, Russo DP, Clark AM, Makarov V, Ekins S. Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets. Mol Pharm 2019; 16:1620-1632. [PMID: 30779585 DOI: 10.1021/acs.molpharmaceut.8b01297] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The human immunodeficiency virus (HIV) causes over a million deaths every year and has a huge economic impact in many countries. The first class of drugs approved were nucleoside reverse transcriptase inhibitors. A newer generation of reverse transcriptase inhibitors have become susceptible to drug resistant strains of HIV, and hence, alternatives are urgently needed. We have recently pioneered the use of Bayesian machine learning to generate models with public data to identify new compounds for testing against different disease targets. The current study has used the NIAID ChemDB HIV, Opportunistic Infection and Tuberculosis Therapeutics Database for machine learning studies. We curated and cleaned data from HIV-1 wild-type cell-based and reverse transcriptase (RT) DNA polymerase inhibition assays. Compounds from this database with ≤1 μM HIV-1 RT DNA polymerase activity inhibition and cell-based HIV-1 inhibition are correlated (Pearson r = 0.44, n = 1137, p < 0.0001). Models were trained using multiple machine learning approaches (Bernoulli Naive Bayes, AdaBoost Decision Tree, Random Forest, support vector classification, k-Nearest Neighbors, and deep neural networks as well as consensus approaches) and then their predictive abilities were compared. Our comparison of different machine learning methods demonstrated that support vector classification, deep learning, and a consensus were generally comparable and not significantly different from each other using 5-fold cross validation and using 24 training and test set combinations. This study demonstrates findings in line with our previous studies for various targets that training and testing with multiple data sets does not demonstrate a significant difference between support vector machine and deep neural networks.
Collapse
Affiliation(s)
- Kimberley M Zorn
- Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States
| | - Thomas R Lane
- Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States
| | - Daniel P Russo
- Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States.,The Rutgers Center for Computational and Integrative Biology , Camden , New Jersey 08102 , United States
| | - Alex M Clark
- Molecular Materials Informatics, Inc. , 2234 Duvernay Street , Montreal , Quebec H3J2Y3 , Canada
| | - Vadim Makarov
- Bach Institute of Biochemistry , Research Center of Biotechnology of the Russian Academy of Sciences , Leninsky Prospekt 33-2 , Moscow 119071 , Russia
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States
| |
Collapse
|
21
|
Wang W, Sedykh A, Sun H, Zhao L, Russo DP, Zhou H, Yan B, Zhu H. Predicting Nano-Bio Interactions by Integrating Nanoparticle Libraries and Quantitative Nanostructure Activity Relationship Modeling. ACS NANO 2017; 11:12641-12649. [PMID: 29149552 PMCID: PMC5772766 DOI: 10.1021/acsnano.7b07093] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
The discovery of biocompatible or bioactive nanoparticles for medicinal applications is an expensive and time-consuming process that may be significantly facilitated by incorporating more rational approaches combining both experimental and computational methods. However, it is currently hindered by two limitations: (1) the lack of high-quality comprehensive data for computational modeling and (2) the lack of an effective modeling method for the complex nanomaterial structures. In this study, we tackled both issues by first synthesizing a large library of nanoparticles and obtained comprehensive data on their characterizations and bioactivities. Meanwhile, we virtually simulated each individual nanoparticle in this library by calculating their nanostructural characteristics and built models that correlate their nanostructure diversity to the corresponding biological activities. The resulting models were then used to predict and design nanoparticles with desired bioactivities. The experimental testing results of the designed nanoparticles were consistent with the model predictions. These findings demonstrate that rational design approaches combining high-quality nanoparticle libraries, big experimental data sets, and intelligent computational models can significantly reduce the efforts and costs of nanomaterial discovery.
Collapse
Affiliation(s)
- Wenyi Wang
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
| | - Alexander Sedykh
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
- Sciome, Research Triangle Park, North Carolina 27709, United States
| | - Hainan Sun
- School of Environmental Science and Engineering, Shandong University, Jinan 250100, China
| | - Linlin Zhao
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
| | - Daniel P. Russo
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
| | - Hongyu Zhou
- School of Environment, Jinan University, Guangzhou 510632, China
| | - Bing Yan
- School of Environmental Science and Engineering, Shandong University, Jinan 250100, China
- Corresponding Authors. (B. Yan): . (H. Zhu):
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
- Department of Chemistry, Rutgers University, Camden, New Jersey 08102, United States
- Corresponding Authors. (B. Yan): . (H. Zhu):
| |
Collapse
|
22
|
Zhang L, Tan J, Han D, Zhu H. From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discov Today 2017; 22:1680-1685. [PMID: 28881183 DOI: 10.1016/j.drudis.2017.08.010] [Citation(s) in RCA: 275] [Impact Index Per Article: 39.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Revised: 07/13/2017] [Accepted: 08/30/2017] [Indexed: 01/29/2023]
Abstract
Machine intelligence, which is normally presented as artificial intelligence, refers to the intelligence exhibited by computers. In the history of rational drug discovery, various machine intelligence approaches have been applied to guide traditional experiments, which are expensive and time-consuming. Over the past several decades, machine-learning tools, such as quantitative structure-activity relationship (QSAR) modeling, were developed that can identify potential biological active molecules from millions of candidate compounds quickly and cheaply. However, when drug discovery moved into the era of 'big' data, machine learning approaches evolved into deep learning approaches, which are a more powerful and efficient way to deal with the massive amounts of data generated from modern drug discovery approaches. Here, we summarize the history of machine learning and provide insight into recently developed deep learning approaches and their applications in rational drug discovery. We suggest that this evolution of machine intelligence now provides a guide for early-stage drug design and discovery in the current big data era.
Collapse
Affiliation(s)
- Lu Zhang
- College of Life Science and Bio-engineering, Beijing University of Technology, Beijing, 100124, China
| | - Jianjun Tan
- College of Life Science and Bio-engineering, Beijing University of Technology, Beijing, 100124, China.
| | - Dan Han
- College of Life Science and Bio-engineering, Beijing University of Technology, Beijing, 100124, China
| | - Hao Zhu
- College of Life Science and Bio-engineering, Beijing University of Technology, Beijing, 100124, China; Department of Chemistry, Rutgers University, Camden, NJ 08102, USA; The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA.
| |
Collapse
|
23
|
Zhao L, Wang W, Sedykh A, Zhu H. Experimental Errors in QSAR Modeling Sets: What We Can Do and What We Cannot Do. ACS OMEGA 2017; 2:2805-2812. [PMID: 28691113 PMCID: PMC5494643 DOI: 10.1021/acsomega.7b00274] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Accepted: 04/27/2017] [Indexed: 05/04/2023]
Abstract
Numerous chemical data sets have become available for quantitative structure-activity relationship (QSAR) modeling studies. However, the quality of different data sources may be different based on the nature of experimental protocols. Therefore, potential experimental errors in the modeling sets may lead to the development of poor QSAR models and further affect the predictions of new compounds. In this study, we explored the relationship between the ratio of questionable data in the modeling sets, which was obtained by simulating experimental errors, and the QSAR modeling performance. To this end, we used eight data sets (four continuous endpoints and four categorical endpoints) that have been extensively curated both in-house and by our collaborators to create over 1800 various QSAR models. Each data set was duplicated to create several new modeling sets with different ratios of simulated experimental errors (i.e., randomizing the activities of part of the compounds) in the modeling process. A fivefold cross-validation process was used to evaluate the modeling performance, which deteriorates when the ratio of experimental errors increases. All of the resulting models were also used to predict external sets of new compounds, which were excluded at the beginning of the modeling process. The modeling results showed that the compounds with relatively large prediction errors in cross-validation processes are likely to be those with simulated experimental errors. However, after removing a certain number of compounds with large prediction errors in the cross-validation process, the external predictions of new compounds did not show improvement. Our conclusion is that the QSAR predictions, especially consensus predictions, can identify compounds with potential experimental errors. But removing those compounds by the cross-validation procedure is not a reasonable means to improve model predictivity due to overfitting.
Collapse
Affiliation(s)
- Linlin Zhao
- The
Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
| | - Wenyi Wang
- The
Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
| | - Alexander Sedykh
- Sciome
LLC, Durham, North Carolina 27709, United States
- E-mail: (A.S.)
| | - Hao Zhu
- The
Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
- Department
of Chemistry, Rutgers University, Camden, New Jersey 08102, United States
- E-mail: . Tel: (856) 225-6781 (H.Z.)
| |
Collapse
|
24
|
A three-tier QSAR modeling strategy for estimating eye irritation potential of diverse chemicals in rabbit for regulatory purposes. Regul Toxicol Pharmacol 2016; 77:282-91. [DOI: 10.1016/j.yrtph.2016.03.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Revised: 02/22/2016] [Accepted: 03/18/2016] [Indexed: 01/08/2023]
|
25
|
Casman EA, Gernand JM. Nanotoxicology: Seeing the trees for the forest. NATURE NANOTECHNOLOGY 2016; 11:405-407. [PMID: 26925825 DOI: 10.1038/nnano.2016.5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Affiliation(s)
- Elizabeth A Casman
- Department of Engineering and Public Policy at Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Jeremy M Gernand
- Department of Energy and Mineral Engineering, Environmental Health and Safety Engineering, at Pennsylvania State University, State College, Pennsylvania 16801, USA
| |
Collapse
|
26
|
Ribay K, Kim MT, Wang W, Pinolini D, Zhu H. Predictive Modeling of Estrogen Receptor Binding Agents Using Advanced Cheminformatics Tools and Massive Public Data. FRONTIERS IN ENVIRONMENTAL SCIENCE 2016; 4:12. [PMID: 27642585 PMCID: PMC5023020 DOI: 10.3389/fenvs.2016.00012] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Estrogen receptors (ERα) are a critical target for drug design as well as a potential source of toxicity when activated unintentionally. Thus, evaluating potential ERα binding agents is critical in both drug discovery and chemical toxicity areas. Using computational tools, e.g., Quantitative Structure-Activity Relationship (QSAR) models, can predict potential ERα binding agents before chemical synthesis. The purpose of this project was to develop enhanced predictive models of ERα binding agents by utilizing advanced cheminformatics tools that can integrate publicly available bioassay data. The initial ERα binding agent data set, consisting of 446 binders and 8307 non-binders, was obtained from the Tox21 Challenge project organized by the NIH Chemical Genomics Center (NCGC). After removing the duplicates and inorganic compounds, this data set was used to create a training set (259 binders and 259 non-binders). This training set was used to develop QSAR models using chemical descriptors. The resulting models were then used to predict the binding activity of 264 external compounds, which were available to us after the models were developed. The cross-validation results of training set [Correct Classification Rate (CCR) = 0.72] were much higher than the external predictivity of the unknown compounds (CCR = 0.59). To improve the conventional QSAR models, all compounds in the training set were used to search PubChem and generate a profile of their biological responses across thousands of bioassays. The most important bioassays were prioritized to generate a similarity index that was used to calculate the biosimilarity score between each two compounds. The nearest neighbors for each compound within the set were then identified and its ERα binding potential was predicted by its nearest neighbors in the training set. The hybrid model performance (CCR = 0.94 for cross validation; CCR = 0.68 for external prediction) showed significant improvement over the original QSAR models, particularly for the activity cliffs that induce prediction errors. The results of this study indicate that the response profile of chemicals from public data provides useful information for modeling and evaluation purposes. The public big data resources should be considered along with chemical structure information when predicting new compounds, such as unknown ERα binding agents.
Collapse
Affiliation(s)
- Kathryn Ribay
- Department of Chemistry, Rutgers University, Camden, NJ, USA
| | - Marlene T. Kim
- Department of Chemistry, Rutgers University, Camden, NJ, USA
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, USA
| | - Wenyi Wang
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, USA
| | - Daniel Pinolini
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, USA
| | - Hao Zhu
- Department of Chemistry, Rutgers University, Camden, NJ, USA
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, USA
- Correspondence: Hao Zhu,
| |
Collapse
|
27
|
Zhu H, Bouhifd M, Donley E, Egnash L, Kleinstreuer N, Kroese ED, Liu Z, Luechtefeld T, Palmer J, Pamies D, Shen J, Strauss V, Wu S, Hartung T. Supporting read-across using biological data. ALTEX 2016; 33:167-82. [PMID: 26863516 PMCID: PMC4834201 DOI: 10.14573/altex.1601252] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Accepted: 02/09/2016] [Indexed: 01/08/2023]
Abstract
Read-across, i.e. filling toxicological data gaps by relating to similar chemicals, for which test data are available, is usually done based on chemical similarity. Besides structure and physico-chemical properties, however, biological similarity based on biological data adds extra strength to this process. In the context of developing Good Read-Across Practice guidance, a number of case studies were evaluated to demonstrate the use of biological data to enrich read-across. In the simplest case, chemically similar substances also show similar test results in relevant in vitro assays. This is a well-established method for the read-across of e.g. genotoxicity assays. Larger datasets of biological and toxicological properties of hundreds and thousands of substances become increasingly available enabling big data approaches in read-across studies. Several case studies using various big data sources are described in this paper. An example is given for the US EPA's ToxCast dataset allowing read-across for high quality uterotrophic assays for estrogenic endocrine disruption. Similarly, an example for REACH registration data enhancing read-across for acute toxicity studies is given. A different approach is taken using omics data to establish biological similarity: Examples are given for stem cell models in vitro and short-term repeated dose studies in rats in vivo to support read-across and category formation. These preliminary biological data-driven read-across studies highlight the road to the new generation of read-across approaches that can be applied in chemical safety assessment.
Collapse
Affiliation(s)
- Hao Zhu
- Department of Chemistry and Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
| | - Mounir Bouhifd
- Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
| | | | - Laura Egnash
- Stemina Biomarker Discovery Inc., Madison, WI, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA
| | - E Dinant Kroese
- Risk Analysis for Products in Development, TNO Zeist, The Netherlands
| | | | - Thomas Luechtefeld
- Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
| | | | - David Pamies
- Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
| | - Jie Shen
- Research Institute for Fragrance Materials, Inc. Woodcliff Lake, New Jersey, USA
| | - Volker Strauss
- BASF Aktiengesellschaft, Experimental Toxicology and Ecology, Ludwigshafen, Germany
| | | | - Thomas Hartung
- Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
- University of Konstanz, CAAT-Europe, Konstanz, Germany
| |
Collapse
|
28
|
Lei T, Li Y, Song Y, Li D, Sun H, Hou T. ADMET evaluation in drug discovery: 15. Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling. J Cheminform 2016; 8:6. [PMID: 26839598 PMCID: PMC4736633 DOI: 10.1186/s13321-016-0117-7] [Citation(s) in RCA: 85] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 01/20/2016] [Indexed: 01/31/2023] Open
Abstract
Background
Determination of acute toxicity, expressed as median lethal dose (LD50), is one of the most important steps in drug discovery pipeline. Because in vivo assays for oral acute toxicity in mammals are time-consuming and costly, there is thus an urgent need to develop in silico prediction models of oral acute toxicity.
Results In this study, based on a comprehensive data set containing 7314 diverse chemicals with rat oral LD50 values, relevance vector machine (RVM) technique was employed to build the regression models for the prediction of oral acute toxicity in rate, which were compared with those built using other six machine learning approaches, including k-nearest-neighbor regression, random forest (RF), support vector machine, local approximate Gaussian process, multilayer perceptron ensemble, and eXtreme gradient boosting. A subset of the original molecular descriptors and structural fingerprints (PubChem or SubFP) was chosen by the Chi squared statistics. The prediction capabilities of individual QSAR models, measured by qext2 for the test set containing 2376 molecules, ranged from 0.572 to 0.659. Conclusion Considering the overall prediction accuracy for the test set, RVM with Laplacian kernel and RF were recommended to build in silico models with better predictivity for rat oral acute toxicity. By combining the predictions from individual models, four consensus models were developed, yielding better prediction capabilities for the test set (qext2 = 0.669–0.689). Finally, some essential descriptors and substructures relevant to oral acute toxicity were identified and analyzed, and they may be served as property or substructure alerts to avoid toxicity. We believe that the best consensus model with high prediction accuracy can be used as a reliable virtual screening tool to filter out compounds with high rat oral acute toxicity.
Workflow of combinatorial QSAR modelling to predict rat oral acute toxicity ![]()
Collapse
Affiliation(s)
- Tailong Lei
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang People's Republic of China
| | - Youyong Li
- Institute of Functional Nano and Soft Materials (FUNSOM), Soochow University, Suzhou, 215123 Jiangsu People's Republic of China
| | - Yunlong Song
- Department of Medicinal Chemistry, School of Pharmacy, Second Military Medical University, Shanghai, 200433 People's Republic of China
| | - Dan Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang People's Republic of China
| | - Huiyong Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang People's Republic of China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang People's Republic of China ; State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058 Zhejiang People's Republic of China
| |
Collapse
|
29
|
Wang W, Kim MT, Sedykh A, Zhu H. Developing Enhanced Blood-Brain Barrier Permeability Models: Integrating External Bio-Assay Data in QSAR Modeling. Pharm Res 2015; 32:3055-65. [PMID: 25862462 DOI: 10.1007/s11095-015-1687-1] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Accepted: 03/20/2015] [Indexed: 02/02/2023]
Abstract
PURPOSE Experimental Blood-Brain Barrier (BBB) permeability models for drug molecules are expensive and time-consuming. As alternative methods, several traditional Quantitative Structure-Activity Relationship (QSAR) models have been developed previously. In this study, we aimed to improve the predictivity of traditional QSAR BBB permeability models by employing relevant public bio-assay data in the modeling process. METHODS We compiled a BBB permeability database consisting of 439 unique compounds from various resources. The database was split into a modeling set of 341 compounds and a validation set of 98 compounds. Consensus QSAR modeling workflow was employed on the modeling set to develop various QSAR models. A five-fold cross-validation approach was used to validate the developed models, and the resulting models were used to predict the external validation set compounds. Furthermore, we used previously published membrane transporter models to generate relevant transporter profiles for target compounds. The transporter profiles were used as additional biological descriptors to develop hybrid QSAR BBB models. RESULTS The consensus QSAR models have R(2) = 0.638 for five-fold cross-validation and R(2) = 0.504 for external validation. The consensus model developed by pooling chemical and transporter descriptors showed better predictivity (R(2) = 0.646 for five-fold cross-validation and R(2) = 0.526 for external validation). Moreover, several external bio-assays that correlate with BBB permeability were identified using our automatic profiling tool. CONCLUSIONS The BBB permeability models developed in this study can be useful for early evaluation of new compounds (e.g., new drug candidates). The combination of chemical and biological descriptors shows a promising direction to improve the current traditional QSAR models.
Collapse
Affiliation(s)
- Wenyi Wang
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey, 08102, USA
| | | | | | | |
Collapse
|
30
|
Lampa E, Lind L, Lind PM, Bornefalk-Hermansson A. The identification of complex interactions in epidemiology and toxicology: a simulation study of boosted regression trees. Environ Health 2014; 13:57. [PMID: 24993424 PMCID: PMC4120739 DOI: 10.1186/1476-069x-13-57] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2013] [Accepted: 06/28/2014] [Indexed: 05/29/2023]
Abstract
BACKGROUND There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants. The usual approach is to formulate an additive statistical model and check for departures using product terms between the variables of interest. In this paper, we present an approach to search for interaction effects among several variables using boosted regression trees. METHODS We simulate a continuous outcome from real data on 27 environmental contaminants, some of which are correlated, and test the method's ability to uncover the simulated interactions. The simulated outcome contains one four-way interaction, one non-linear effect and one interaction between a continuous variable and a binary variable. Four scenarios reflecting different strengths of association are simulated. We illustrate the method using real data. RESULTS The method succeeded in identifying the true interactions in all scenarios except where the association was weakest. Some spurious interactions were also found, however. The method was also capable to identify interactions in the real data set. CONCLUSIONS We conclude that boosted regression trees can be used to uncover complex interaction effects in epidemiological studies.
Collapse
Affiliation(s)
- Erik Lampa
- Department of Medical Sciences, Occupational and Environmental Medicine, Uppsala University, 75185 Uppsala Sweden
| | - Lars Lind
- Department of Medical Sciences, Cardiovascular Epidemiology, Uppsala University, 75185 Uppsala Sweden
| | - P Monica Lind
- Department of Medical Sciences, Occupational and Environmental Medicine, Uppsala University, 75185 Uppsala Sweden
| | | |
Collapse
|
31
|
Design, synthesis and experimental validation of novel potential chemopreventive agents using random forest and support vector machine binary classifiers. J Comput Aided Mol Des 2014; 28:631-46. [PMID: 24840854 DOI: 10.1007/s10822-014-9748-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2013] [Accepted: 05/05/2014] [Indexed: 10/25/2022]
Abstract
Compared to the current knowledge on cancer chemotherapeutic agents, only limited information is available on the ability of organic compounds, such as drugs and/or natural products, to prevent or delay the onset of cancer. In order to evaluate chemical chemopreventive potentials and design novel chemopreventive agents with low to no toxicity, we developed predictive computational models for chemopreventive agents in this study. First, we curated a database containing over 400 organic compounds with known chemoprevention activities. Based on this database, various random forest and support vector machine binary classifiers were developed. All of the resulting models were validated by cross validation procedures. Then, the validated models were applied to virtually screen a chemical library containing around 23,000 natural products and derivatives. We selected a list of 148 novel chemopreventive compounds based on the consensus prediction of all validated models. We further analyzed the predicted active compounds by their ease of organic synthesis. Finally, 18 compounds were synthesized and experimentally validated for their chemopreventive activity. The experimental validation results paralleled the cross validation results, demonstrating the utility of the developed models. The predictive models developed in this study can be applied to virtually screen other chemical libraries to identify novel lead compounds for the chemoprevention of cancers.
Collapse
|
32
|
Kar S, Roy K. Quantification of contributions of molecular fragments for eye irritation of organic chemicals using QSAR study. Comput Biol Med 2014; 48:102-8. [DOI: 10.1016/j.compbiomed.2014.02.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Revised: 02/20/2014] [Accepted: 02/24/2014] [Indexed: 11/28/2022]
|
33
|
Gernand JM, Casman EA. A meta-analysis of carbon nanotube pulmonary toxicity studies--how physical dimensions and impurities affect the toxicity of carbon nanotubes. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2014; 34:583-597. [PMID: 24024907 DOI: 10.1111/risa.12109] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
This article presents a regression-tree-based meta-analysis of rodent pulmonary toxicity studies of uncoated, nonfunctionalized carbon nanotube (CNT) exposure. The resulting analysis provides quantitative estimates of the contribution of CNT attributes (impurities, physical dimensions, and aggregation) to pulmonary toxicity indicators in bronchoalveolar lavage fluid: neutrophil and macrophage count, and lactate dehydrogenase and total protein concentrations. The method employs classification and regression tree (CART) models, techniques that are relatively insensitive to data defects that impair other types of regression analysis: high dimensionality, nonlinearity, correlated variables, and significant quantities of missing values. Three types of analysis are presented: the RT, the random forest (RF), and a random-forest-based dose-response model. The RT shows the best single model supported by all the data and typically contains a small number of variables. The RF shows how much variance reduction is associated with every variable in the data set. The dose-response model is used to isolate the effects of CNT attributes from the CNT dose, showing the shift in the dose-response caused by the attribute across the measured range of CNT doses. It was found that the CNT attributes that contribute the most to pulmonary toxicity were metallic impurities (cobalt significantly increased observed toxicity, while other impurities had mixed effects), CNT length (negatively correlated with most toxicity indicators), CNT diameter (significantly positively associated with toxicity), and aggregate size (negatively correlated with cell damage indicators and positively correlated with immune response indicators). Increasing CNT N2 -BET-specific surface area decreased toxicity indicators.
Collapse
Affiliation(s)
- Jeremy M Gernand
- Engineering and Public Policy, Carnegie Mellon University, Pittsburgh, PA, USA
| | | |
Collapse
|
34
|
Ekins S. Progress in computational toxicology. J Pharmacol Toxicol Methods 2013; 69:115-40. [PMID: 24361690 DOI: 10.1016/j.vascn.2013.12.003] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2013] [Accepted: 12/08/2013] [Indexed: 01/02/2023]
Abstract
INTRODUCTION Computational methods have been widely applied to toxicology across pharmaceutical, consumer product and environmental fields over the past decade. Progress in computational toxicology is now reviewed. METHODS A literature review was performed on computational models for hepatotoxicity (e.g. for drug-induced liver injury (DILI)), cardiotoxicity, renal toxicity and genotoxicity. In addition various publications have been highlighted that use machine learning methods. Several computational toxicology model datasets from past publications were used to compare Bayesian and Support Vector Machine (SVM) learning methods. RESULTS The increasing amounts of data for defined toxicology endpoints have enabled machine learning models that have been increasingly used for predictions. It is shown that across many different models Bayesian and SVM perform similarly based on cross validation data. DISCUSSION Considerable progress has been made in computational toxicology in a decade in both model development and availability of larger scale or 'big data' models. The future efforts in toxicology data generation will likely provide us with hundreds of thousands of compounds that are readily accessible for machine learning models. These models will cover relevant chemistry space for pharmaceutical, consumer product and environmental applications.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay Varina, NC 27526, USA; Department of Pharmaceutical Sciences, University of Maryland, 20 Penn Street, Baltimore, MD 21201, USA; Department of Pharmacology, Rutgers University-Robert Wood Johnson Medical School, 675 Hoes Lane, Piscataway, NJ 08854, USA; Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, NC 27599-7355, USA.
| |
Collapse
|
35
|
Exploring QSTR modeling and toxicophore mapping for identification of important molecular features contributing to the chemical toxicity in Escherichia coli. Toxicol In Vitro 2013; 28:265-72. [PMID: 24246193 DOI: 10.1016/j.tiv.2013.11.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Revised: 10/31/2013] [Accepted: 11/04/2013] [Indexed: 11/24/2022]
Abstract
Biodiversity deprivation can affect functions and services of the ecosystem. Changes in biodiversity alter ecosystem processes and change the resilience of ecosystems to ecological changes. Bacterial communities are the main form of biomass in the ecosystem and one of largest populations on the planet. Bacterial communities provide important services to biodiversity. They break down pollutants, municipal waste and ingested food, and they are the primary route for recycling of organic matter to plants and other autotrophs, conversion of inorganic matter into new biological tissue using sunlight, management of energy crisis through use of biofuel. In the present study, computational chemistry and statistical modeling have been used to develop mathematical equations which can be applied to calculate toxicity of new/unknown chemicals/biofuels/metabolites in Escherichia coli. 2D and 3D descriptors were generated from molecular structure of compounds and mathematical models have been developed using genetic function approximation followed by multiple linear regression (GFA-MLR) method. Model validity was checked through defined internal (R(2)=0.751 and Q(2)=0.711), and external (Rpred(2)=0.773) statistical parameters. Molecular features responsible for toxicity were also assessed through 3D toxicophore study. The toxicophore-based model was validated (R=0.785) using qualitative statistical metrics and randomization test (Fischer validation).
Collapse
|
36
|
Ekins S, Freundlich JS, Reynolds RC. Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation. J Chem Inf Model 2013; 53:3054-63. [PMID: 24144044 DOI: 10.1021/ci400480s] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The search for new tuberculosis treatments continues as we need to find molecules that can act more quickly, be accommodated in multidrug regimens, and overcome ever increasing levels of drug resistance. Multiple large scale phenotypic high-throughput screens against Mycobacterium tuberculosis (Mtb) have generated dose response data, enabling the generation of machine learning models. These models also incorporated cytotoxicity data and were recently validated with a large external data set. A cheminformatics data-fusion approach followed by Bayesian machine learning, Support Vector Machine, or Recursive Partitioning model development (based on publicly available Mtb screening data) was used to compare individual data sets and subsequent combined models. A set of 1924 commercially available molecules with promising antitubercular activity (and lack of relative cytotoxicity to Vero cells) were used to evaluate the predictive nature of the models. We demonstrate that combining three data sets incorporating antitubercular and cytotoxicity data in Vero cells from our previous screens results in external validation receiver operator curve (ROC) of 0.83 (Bayesian or RP Forest). Models that do not have the highest 5-fold cross-validation ROC scores can outperform other models in a test set dependent manner. We demonstrate with predictions for a recently published set of Mtb leads from GlaxoSmithKline that no single machine learning model may be enough to identify compounds of interest. Data set fusion represents a further useful strategy for machine learning construction as illustrated with Mtb. Coverage of chemistry and Mtb target spaces may also be limiting factors for the whole-cell screening data generated to date.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| | | | | |
Collapse
|