1
|
Yu X, Chen Y, Chen L, Li W, Wang Y, Tang Y, Liu G. GCLmf: A Novel Molecular Graph Contrastive Learning Framework Based on Hard Negatives and Application in Toxicity Prediction. Mol Inform 2024:e202400169. [PMID: 39421969 DOI: 10.1002/minf.202400169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 09/23/2024] [Accepted: 09/24/2024] [Indexed: 10/19/2024]
Abstract
In silico methods for prediction of chemical toxicity can decrease the cost and increase the efficiency in the early stage of drug discovery. However, due to low accessibility of sufficient and reliable toxicity data, constructing robust and accurate prediction models is challenging. Contrastive learning, a type of self-supervised learning, leverages large unlabeled data to obtain more expressive molecular representations, which can boost the prediction performance on downstream tasks. While molecular graph contrastive learning has gathered growing attentions, current models neglect the quality of negative data set. Here, we proposed a self-supervised pretraining deep learning framework named GCLmf. We first utilized molecular fragments that meet specific conditions as hard negative samples to boost the quality of the negative set and thus increase the difficulty of the proxy tasks during pre-training to learn informative representations. GCLmf has shown excellent predictive power on various molecular property benchmarks and demonstrates high performance in 33 toxicity tasks in comparison with multiple baselines. In addition, we further investigated the necessity of introducing hard negatives in model building and the impact of the proportion of hard negatives on the model.
Collapse
Affiliation(s)
- Xinxin Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Yuanting Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Long Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Yuhao Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| |
Collapse
|
2
|
Lotfi S, Ahmadi S, Azimi A, Kumar P. In silico aquatic toxicity prediction of chemicals towards Daphnia magna and fathead minnow using Monte Carlo approaches. Toxicol Mech Methods 2024:1-21. [PMID: 39397353 DOI: 10.1080/15376516.2024.2416226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 09/05/2024] [Accepted: 10/08/2024] [Indexed: 10/15/2024]
Abstract
The fast-increasing use of chemicals led to large numbers of chemical compounds entering the aquatic environment, raising concerns about their potential effects on ecosystems. Therefore, assessment of the ecotoxicological features of organic compounds on aquatic organisms is very important. Daphnia magna and Fathead minnow are two aquatic species that are commonly tested as standard test organisms for aquatic risk assessment and are typically chosen as the biological model for the ecotoxicology investigations of chemical pollutants. Herein, global quantitative structure-toxicity relationship (QSTR) models have been developed to predict the toxicity (pEC(LC)50) of a large dataset comprising 2106 chemicals towards Daphnia magna and Fathead minnow. The optimal descriptor of correlation weights (DCW) is calculated using the notation of simplified molecular input-line entry system (SMILES) and is used to construct QSTR models. Three target functions, TF1, TF2, and TF3 are utilized to generate 12 QSTR models from four splits, and their statistical characteristics are also compared. The designed QSTR models are validated using both internal and external validation criteria and are found to be reliable, robust, and excellent-predictive. Among the models, those generated using the TF3 demonstrate the best statistical quality with R2 values ranging from 0.9467 to 0.9607, Q2 values ranging from 0.9462 to 0.9603 and RMSE values ranging from 0.3764 to 0.4413 for the validation set. The applicability domain and the mechanistic interpretations of generated models were also discussed.
Collapse
Affiliation(s)
- Shahram Lotfi
- Department of Chemistry, Payame Noor University (PNU), 19395-4697 Tehran, Iran
| | - Shahin Ahmadi
- Department of Pharmaceutical Chemistry, Faculty of Pharmaceutical Chemistry, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran
| | - Ali Azimi
- Department of Chemistry, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Parvin Kumar
- Department of Chemistry, Kurukshetra University, Kurukshetra, Haryana, 136119, India
| |
Collapse
|
3
|
Afzal M, Qais FA, Abduh NA, Christy M, Ayub R, Alarifi A. Identification of bioactive compounds of Zanthoxylum armatum as potential inhibitor of pyruvate kinase M2 (PKM2): Computational and virtual screening approaches. Heliyon 2024; 10:e27361. [PMID: 38495183 PMCID: PMC10943388 DOI: 10.1016/j.heliyon.2024.e27361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 02/26/2024] [Accepted: 02/28/2024] [Indexed: 03/19/2024] Open
Abstract
PKM2 (Pyruvate kinase M2) is the isoform of pyruvate kinase which is known to catalyse the last step of glycolysis that is responsible for energy production. This specific isoform is known to be highly expressed in certain cancerous conditions. Considering the role of this protein in various cancer conditions, we used PKM2 as a target protein to identify the potential compounds against this target. In this study, we have examined 96 compounds of Zanthoxylum armatum using an array of computational and in silico tools. The compounds were assessed for toxicity then their anticancer potential was predicted. The virtual screening was done with molecular docking followed by a detailed examination using molecular dynamics simulation. The majority of the compounds showed a higher probability of being antineoplastic. Based on toxicity, predicted anticancer potential, binding affinity, and binding site, three compounds (nevadensin, asarinin, and kaempferol) were selected as hit compounds. The binding energy of these compounds with PKM2 ranged from -7.7 to -8.3 kcal/mol and all hit compounds interact at the active site of the protein. The selected hit compounds formed a stable complex with PKM2 when simulated under physiological conditions. The dynamic analysis showed that these compounds remained attached to the active site till the completion of molecular simulation. MM-PBSA analysis showed that nevadensin exhibited a higher affinity towards PKM2 compared to asarinin and kaempferol. These compounds need to be assessed properties in vivo and in vitro to validate their efficacy.
Collapse
Affiliation(s)
- Mohd Afzal
- Department of Chemistry, College of Science, King Saud University, Riyadh, 11451, Saudi Arabia
| | - Faizan Abul Qais
- Department of Agricultural Microbiology, Faculty of Agricultural Sciences, Aligarh Muslim University, Aligarh, UP, 202002, India
| | - Naaser A.Y. Abduh
- Department of Chemistry, College of Science, King Saud University, Riyadh, 11451, Saudi Arabia
| | - Maria Christy
- Department of Energy Engineering, Hanyang University, 222 Wangsimni-ro, Seongdong-gu, Seoul, 04763, South Korea
| | - Rashid Ayub
- Department of Science Technology and Innovation, King Saud University, Riyadh, 11451, Saudi Arabia
| | - Abdullah Alarifi
- Department of Chemistry, College of Science, King Saud University, Riyadh, 11451, Saudi Arabia
| |
Collapse
|
4
|
Gao YY, Zhao W, Huang YQ, Kumar V, Zhang X, Hao GF. In silico environmental risk assessment improves efficiency for pesticide safety management. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 908:167878. [PMID: 37858821 DOI: 10.1016/j.scitotenv.2023.167878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 10/09/2023] [Accepted: 10/14/2023] [Indexed: 10/21/2023]
Abstract
Pesticides are indispensable to maintain crop quality and food production worldwide, but their use also poses environmental risks. Pesticide risk assessment involves a series of complex, expensive and time-consuming toxicity tests. To improve the efficiency and accuracy for assessing the environmental impact of pesticides, numerous computational tools have been developed. However, there is a notable deficiency in critical analysis or a systematic summary of environmental risk assessment tools and their applicable contexts. Here, many of the current approaches and tools for assessing environmental risks posed by pesticides are reviewed, and the question of whether these tools are fit for use on complex multicomponent scenarios is discussed. We analyze the adaptations of these tools to aquatic and terrestrial ecosystems, followed by the provision of resources for predicting pesticide concentrations in environmental medias, including air, soil and water. The successful application of computational tools for risk assessment and interpretation of predicted results will also be discussed. This assessment serves as a valuable resource, enabling scientists to utilize suitable models to enhance the robustness of pesticides risk assessments.
Collapse
Affiliation(s)
- Yang-Yang Gao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Wei Zhao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Yuan-Qin Huang
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Vinit Kumar
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Xiao Zhang
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China; National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan 430079, PR China.
| |
Collapse
|
5
|
Dutschmann TM, Kinzel L, Ter Laak A, Baumann K. Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation. J Cheminform 2023; 15:49. [PMID: 37118768 PMCID: PMC10142532 DOI: 10.1186/s13321-023-00709-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Accepted: 03/10/2023] [Indexed: 04/30/2023] Open
Abstract
It is insightful to report an estimator that describes how certain a model is in a prediction, additionally to the prediction alone. For regression tasks, most approaches implement a variation of the ensemble method, apart from few exceptions. Instead of a single estimator, a group of estimators yields several predictions for an input. The uncertainty can then be quantified by measuring the disagreement between the predictions, for example by the standard deviation. In theory, ensembles should not only provide uncertainties, they also boost the predictive performance by reducing errors arising from variance. Despite the development of novel methods, they are still considered the "golden-standard" to quantify the uncertainty of regression models. Subsampling-based methods to obtain ensembles can be applied to all models, regardless whether they are related to deep learning or traditional machine learning. However, little attention has been given to the question whether the ensemble method is applicable to virtually all scenarios occurring in the field of cheminformatics. In a widespread and diversified attempt, ensembles are evaluated for 32 datasets of different sizes and modeling difficulty, ranging from physicochemical properties to biological activities. For increasing ensemble sizes with up to 200 members, the predictive performance as well as the applicability as uncertainty estimator are shown for all combinations of five modeling techniques and four molecular featurizations. Useful recommendations were derived for practitioners regarding the success and minimum size of ensembles, depending on whether predictive performance or uncertainty quantification is of more importance for the task at hand.
Collapse
Affiliation(s)
- Thomas-Martin Dutschmann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany
| | - Lennart Kinzel
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany
| | - Antonius Ter Laak
- Bayer AG, Research & Development, Pharmaceuticals, Muellerstrasse 178, 13353, Berlin, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany.
| |
Collapse
|
6
|
Khan K, Kumar V, Colombo E, Lombardo A, Benfenati E, Roy K. Intelligent consensus predictions of bioconcentration factor of pharmaceuticals using 2D and fragment-based descriptors. ENVIRONMENT INTERNATIONAL 2022; 170:107625. [PMID: 36375281 DOI: 10.1016/j.envint.2022.107625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 10/30/2022] [Accepted: 11/09/2022] [Indexed: 06/16/2023]
Abstract
Bioconcentration factors (BCFs) are markers of chemical substance accumulation in organisms, and they play a significant role in determining the environmental risk of various chemicals. Experiments to obtain BCFs are expensive and time-consuming; therefore, it is better to estimate BCF early in the chemical development process. The current research aims to evaluate the ecotoxicity potential of 122 pharmaceuticals and identify possible important structural attributes using BCF as the determining feature against a group of fish species. We have calculated the theoretical 2D descriptors from the OCHEM platform and SiRMS descriptor calculating software. The regression-based quantitative structure-property relationship (QSPR) modeling was used to identify the chemical features responsible for acute fish bioconcentration. Multiple models with the "intelligent consensus" algorithm were employed for the regression-based approach improving the predictive ability of the models. To ensure the robustness and interpretability of the developed models, rigorous validation was performed employing various statistical internal and external validation metrics. From the developed models, it can be specified that the presence of large lipophilic and electronegative moieties greatly enhances the bioaccumulative potential of pharmaceuticals, whereas the hydrophilic characteristics have shown a negative impact on BCF. Furthermore, the developed models were employed to screen the DrugBank database (https://go.drugbank.com/) for assessing the BCF properties of the entire database. The evidence acquired from the modeled descriptors might be used for aquatic risk assessment in the future, with the added benefit of providing an early caution of their probable negative impact on aquatic ecosystems for regulatory purposes.
Collapse
Affiliation(s)
- Kabiruddin Khan
- Department of Pharmaceutical Technology, Jadavpur University, 188 Raja S C Mullick Road, 700032 Kolkata, India; QSAR Lab, ul. Trzy Lipy 3, Gdańsk, Poland
| | - Vinay Kumar
- Department of Pharmaceutical Technology, Jadavpur University, 188 Raja S C Mullick Road, 700032 Kolkata, India
| | - Erika Colombo
- Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCSS, via Mario Negri 2, 20156 Milano, Italy
| | - Anna Lombardo
- Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCSS, via Mario Negri 2, 20156 Milano, Italy
| | - Emilio Benfenati
- Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCSS, via Mario Negri 2, 20156 Milano, Italy.
| | - Kunal Roy
- Department of Pharmaceutical Technology, Jadavpur University, 188 Raja S C Mullick Road, 700032 Kolkata, India.
| |
Collapse
|
7
|
Boros BV, Dascalu D, Ostafe V, Isvoran A. Assessment of the Effects of Chitosan, Chitooligosaccharides and Their Derivatives on Lemna minor. Molecules 2022; 27:6123. [PMID: 36144862 PMCID: PMC9502776 DOI: 10.3390/molecules27186123] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 09/08/2022] [Accepted: 09/12/2022] [Indexed: 11/16/2022] Open
Abstract
Chitosan, chitooligosaccharides and their derivatives’ production and use in many fields may result in their release to the environment, possibly affecting aquatic organisms. Both an experimental and a computational approach were considered for evaluating the effects of these compounds on Lemna minor. Based on the determined EC50 values against L. minor, only D-glucosamine hydrochloride (EC50 = 11.55 mg/L) was considered as “slightly toxic” for aquatic environments, while all the other investigated compounds, having EC50 > 100 mg/L, were considered as “practically non-toxic”. The results obtained in the experimental approach were in good agreement with the predictions obtained using the admetSAR2.0 computational tool, revealing that the investigated compounds were not considered toxic for crustacean, fish and Tetrahymena pyriformis aquatic microorganisms. The ADMETLab2.0 computational tool predicted the values of IGC50 for Tetrahymena pyriformis and the LC50 for fathead minnow and Daphnia magna, with the lowest values of these parameters being revealed by totally acetylated chitooligosaccharides in correlation with their lowest solubility. The effects of the chitooligosaccharides and chitosan on L. minor decreased with increased molecular weight, increased with the degree of deacetylation and were reliant on acetylation patterns. Furthermore, the solubility mainly influenced the effects on the aqueous environment, with a higher solubility conducted to lower toxicity.
Collapse
Affiliation(s)
- Bianca-Vanesa Boros
- Department of Biology-Chemistry, Faculty of Chemistry, Biology, Geography, West University of Timisoara, 16 Pestalozzi, 300115 Timisoara, Romania
- Advanced Environmental Research Laboratories (AERL), 4 Oituz, 300086 Timisoara, Romania
| | - Daniela Dascalu
- Department of Biology-Chemistry, Faculty of Chemistry, Biology, Geography, West University of Timisoara, 16 Pestalozzi, 300115 Timisoara, Romania
- Advanced Environmental Research Laboratories (AERL), 4 Oituz, 300086 Timisoara, Romania
| | - Vasile Ostafe
- Department of Biology-Chemistry, Faculty of Chemistry, Biology, Geography, West University of Timisoara, 16 Pestalozzi, 300115 Timisoara, Romania
- Advanced Environmental Research Laboratories (AERL), 4 Oituz, 300086 Timisoara, Romania
| | - Adriana Isvoran
- Department of Biology-Chemistry, Faculty of Chemistry, Biology, Geography, West University of Timisoara, 16 Pestalozzi, 300115 Timisoara, Romania
- Advanced Environmental Research Laboratories (AERL), 4 Oituz, 300086 Timisoara, Romania
| |
Collapse
|
8
|
Qais FA, Alomar SY, Imran MA, Hashmi MA. In-Silico Analysis of Phytocompounds of Olea europaea as Potential Anti-Cancer Agents to Target PKM2 Protein. Molecules 2022; 27:molecules27185793. [PMID: 36144527 PMCID: PMC9503632 DOI: 10.3390/molecules27185793] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 08/07/2022] [Accepted: 08/26/2022] [Indexed: 11/30/2022] Open
Abstract
Globally, cancer is the second leading cause of mortality and morbidity. The growth and development of cancer are extremely complex. It is caused by a variety of pathways and involves various types of enzymes. Pyruvate kinase M2 (PKM2) is an isoform of pyruvate kinase, that catalyses the last steps of glycolysis to produce energy. PKM2 is relatively more expressed in tumour cells where it tends to exist in a dimer form. Various medicinal plants are available that contain a variety of micronutrients to combat against different cancers. The phytocompounds of the olive tree (Olea europaea) leaves play an important role in inhibiting the proliferation of several cancers. In this study, the phytocompounds of olive leaf extract (OLE) were studied using various in silico tools, such as pkCSM software to predict ADMET properties and PASS Online software to predict anticancer activity. However, the molecular docking study provided the binding energies and inhibition constant and confirmed the interaction between PKM2 and the ligands. The dynamic behaviour, conformational changes, and stability between PKM2 and the top three hit compounds (Verbascoside (Ver), Rutin (Rut), and Luteolin_7_O_glucoside (Lut)) are studied by MD simulations.
Collapse
Affiliation(s)
- Faizan Abul Qais
- Department of Agricultural Microbiology, Faculty of Agricultural Sciences, Aligarh Muslim University, Aligarh UP-202002, India
- Correspondence: ; Tel.: +91-571-2703516
| | - Suliman Yousef Alomar
- Department of Zoology, College of Science, King Saud University, Riyadh 11451, Saudi Arabia
| | - Mohammad Azhar Imran
- Department of Internal Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul 06273, Korea
| | - Md Amiruddin Hashmi
- Interdisciplinary Biotechnology Unit, Aligarh Muslim University, Aligarh UP-202002, India
| |
Collapse
|
9
|
Zhang R, Guo H, Hua Y, Cui X, Shi Y, Li X. Modeling and insights into the structural basis of chemical acute aquatic toxicity. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2022; 242:113940. [PMID: 35999760 DOI: 10.1016/j.ecoenv.2022.113940] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 07/16/2022] [Accepted: 07/29/2022] [Indexed: 06/15/2023]
Abstract
It has become a top global regulatory priority to prevent and control pollution from the release of synthetic chemicals, which continues to affect the aquatic communities. In the past decades, computational tools were largely used to significantly reduce the budget and time cost of chemical acute aquatic toxicity assessment. But the structural basis of toxic compounds was rarely analyzed. In the present study, we collected 1438, 485 and 961 chemicals with acute toxicity data records for three representative aquatic species, including Tetrahymena pyriformis, Daphnia magna, and Fathead minnow, respectively. A series of artificial intelligence models were developed using OCHEM tools. For each aquatic toxicity endpoint, a consensus model was developed based on the top performed individual models. The consensus models provided good performance on external validation sets with total accuracy values 96.88 %, 90.63 %, and 84.90 % for Tetrahymena pyriformis toxicity (TPT), Daphnia magna toxicity (DMT), and Fathead minnow toxicity (FMT), respectively. The models can be freely accessed via https://ochem.eu/article/146910. Moreover, the analysis of physical-chemical properties suggested that several key molecular properties of aquatic toxic compounds were significantly different with those of non-toxic compounds. Thus, these descriptors may be associated to chemical acute aquatic toxicity, and may be useful for the understand of chemical aquatic toxicity. Besides, in this study, the structural alerts for aquatic toxicity were detected using f-score and frequency ratio analysis of predefined substructures. A total of 112, 58 and 33 structural alerts were identified responsible for TPT, DMT, and FMT, respectively. These structural alerts could provide useful information for the mechanisms of chemical aquatic toxicity and visual alerts for environmental assessment. All the structural alerts were integrated in the web-server SApredictor (www.sapredictor.cn).
Collapse
Affiliation(s)
- Ruiqiu Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Huizhu Guo
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Yuqing Hua
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Xueyan Cui
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Yinping Shi
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Xiao Li
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China; Department of Clinical Pharmacy, Shandong Provincial Qianfoshan Hospital, Shandong University, Jinan 250014, China.
| |
Collapse
|
10
|
Hua Y, Cui X, Liu B, Shi Y, Guo H, Zhang R, Li X. SApredictor: An Expert System for Screening Chemicals Against Structural Alerts. Front Chem 2022; 10:916614. [PMID: 35910729 PMCID: PMC9326022 DOI: 10.3389/fchem.2022.916614] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 06/20/2022] [Indexed: 11/13/2022] Open
Abstract
The rapid and accurate evaluation of chemical toxicity is of great significance for estimation of chemical safety. In the past decades, a great number of excellent computational models have been developed for chemical toxicity prediction. But most machine learning models tend to be “black box”, which bring about poor interpretability. In the present study, we focused on the identification and collection of structural alerts (SAs) responsible for a series of important toxicity endpoints. Then, we carried out effective storage of these structural alerts and developed a web-server named SApredictor (www.sapredictor.cn) for screening chemicals against structural alerts. People can quickly estimate the toxicity of chemicals with SApredictor, and the specific key substructures which cause the chemical toxicity will be intuitively displayed to provide valuable information for the structural optimization by medicinal chemists.
Collapse
Affiliation(s)
- Yuqing Hua
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, China
| | - Xueyan Cui
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, China
| | - Bo Liu
- Institute of Materia Medica, Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China
| | - Yinping Shi
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, China
| | - Huizhu Guo
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, China
| | - Ruiqiu Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, China
| | - Xiao Li
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan, China
- Department of Clinical Pharmacy, Shandong Provincial Qianfoshan Hospital, Shandong University, Jinan, China
- *Correspondence: Xiao Li, , , orcid.org/0000-0002-1148-9898
| |
Collapse
|
11
|
Toma C, Cappelli CI, Manganaro A, Lombardo A, Arning J, Benfenati E. New Models to Predict the Acute and Chronic Toxicities of Representative Species of the Main Trophic Levels of Aquatic Environments. Molecules 2021; 26:6983. [PMID: 34834075 PMCID: PMC8618112 DOI: 10.3390/molecules26226983] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 11/12/2021] [Accepted: 11/16/2021] [Indexed: 11/17/2022] Open
Abstract
To assess the impact of chemicals on an aquatic environment, toxicological data for three trophic levels are needed to address the chronic and acute toxicities. The use of non-testing methods, such as predictive computational models, was proposed to avoid or reduce the need for animal models and speed up the process when there are many substances to be tested. We developed predictive models for Raphidocelis subcapitata, Daphnia magna, and fish for acute and chronic toxicities. The random forest machine learning approach gave the best results. The models gave good statistical quality for all endpoints. These models are freely available for use as individual models in the VEGA platform and for prioritization in JANUS software.
Collapse
Affiliation(s)
- Cosimo Toma
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy; (C.T.); (C.I.C.); (E.B.)
| | - Claudia I. Cappelli
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy; (C.T.); (C.I.C.); (E.B.)
| | | | - Anna Lombardo
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy; (C.T.); (C.I.C.); (E.B.)
| | - Jürgen Arning
- Umweltbundesamt-German Federal Environment Agency, Wörlitzer Platz 1, 06844 Dessau-Roßlau, Germany;
| | - Emilio Benfenati
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy; (C.T.); (C.I.C.); (E.B.)
| |
Collapse
|
12
|
Dutschmann TM, Baumann K. Evaluating High-Variance Leaves as Uncertainty Measure for Random Forest Regression. Molecules 2021; 26:molecules26216514. [PMID: 34770921 PMCID: PMC8588039 DOI: 10.3390/molecules26216514] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/19/2021] [Accepted: 10/22/2021] [Indexed: 01/31/2023] Open
Abstract
Uncertainty measures estimate the reliability of a predictive model. Especially in the field of molecular property prediction as part of drug design, model reliability is crucial. Besides other techniques, Random Forests have a long tradition in machine learning related to chemoinformatics and are widely used. Random Forests consist of an ensemble of individual regression models, namely, decision trees and, therefore, provide an uncertainty measure already by construction. Regarding the disagreement of single-model predictions, a narrower distribution of predictions is interpreted as a higher reliability. The standard deviation of the decision tree ensemble predictions is the default uncertainty measure for Random Forests. Due to the increasing application of machine learning in drug design, there is a constant search for novel uncertainty measures that, ideally, outperform classical uncertainty criteria. When analyzing Random Forests, it appears obvious to consider the variance of the dependent variables within each terminal decision tree leaf to obtain predictive uncertainties. Hereby, predictions that arise from more leaves of high variance are considered less reliable. Expectedly, the number of such high-variance leaves yields a reasonable uncertainty measure. Depending on the dataset, it can also outperform ensemble uncertainties. However, small-scale comparisons, i.e., considering only a few datasets, are insufficient, since they are more prone to chance correlations. Therefore, large-scale estimations are required to make general claims about the performance of uncertainty measures. On several chemoinformatic regression datasets, high-variance leaves are compared to the standard deviation of ensemble predictions. It turns out that high-variance leaf uncertainty is meaningful, not superior to the default ensemble standard deviation. A brief possible explanation is offered.
Collapse
|
13
|
Méndez A, Rivera-Valentín EG, Schulze-Makuch D, Filiberto J, Ramírez RM, Wood TE, Dávila A, McKay C, Ceballos KNO, Jusino-Maldonado M, Torres-Santiago NJ, Nery G, Heller R, Byrne PK, Malaska MJ, Nathan E, Simões MF, Antunes A, Martínez-Frías J, Carone L, Izenberg NR, Atri D, Chitty HIC, Nowajewski-Barra P, Rivera-Hernández F, Brown CY, Lynch KL, Catling D, Zuluaga JI, Salazar JF, Chen H, González G, Jagadeesh MK, Haqq-Misra J. Habitability Models for Astrobiology. ASTROBIOLOGY 2021; 21:1017-1027. [PMID: 34382857 DOI: 10.1089/ast.2020.2342] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Habitability has been generally defined as the capability of an environment to support life. Ecologists have been using Habitat Suitability Models (HSMs) for more than four decades to study the habitability of Earth from local to global scales. Astrobiologists have been proposing different habitability models for some time, with little integration and consistency among them, being different in function to those used by ecologists. Habitability models are not only used to determine whether environments are habitable, but they also are used to characterize what key factors are responsible for the gradual transition from low to high habitability states. Here we review and compare some of the different models used by ecologists and astrobiologists and suggest how they could be integrated into new habitability standards. Such standards will help improve the comparison and characterization of potentially habitable environments, prioritize target selections, and study correlations between habitability and biosignatures. Habitability models are the foundation of planetary habitability science, and the synergy between ecologists and astrobiologists is necessary to expand our understanding of the habitability of Earth, the Solar System, and extrasolar planets.
Collapse
Affiliation(s)
- Abel Méndez
- Planetary Habitability Laboratory, University of Puerto Rico at Arecibo, Puerto Rico, USA
| | | | - Dirk Schulze-Makuch
- Center for Astronomy and Astrophysics, Technische Universität Berlin, Berlin, Germany; German Research Centre for Geosciences, Section Geomicrobiology, Potsdam, Germany; Leibniz-Institute of Freshwater Ecology and Inland Fisheries, Stechlin, Germany
| | | | - Ramses M Ramírez
- University of Central Florida, Department of Physics, Orlando, Florida, USA; Space Science Institute, Boulder, Colorado, USA
| | - Tana E Wood
- USDA Forest Service International Institute of Tropical Forestry, San Juan, Puerto Rico, USA
| | - Alfonso Dávila
- NASA Ames Research Center, Moffett Field, California, USA
| | - Chris McKay
- NASA Ames Research Center, Moffett Field, California, USA
| | - Kevin N Ortiz Ceballos
- Planetary Habitability Laboratory, University of Puerto Rico at Arecibo, Puerto Rico, USA
| | | | | | | | - René Heller
- Max Planck Institute for Solar System Research; Institute for Astrophysics, University of Göttingen, Germany
| | - Paul K Byrne
- North Carolina State University, Raleigh, North Carolina, USA
| | - Michael J Malaska
- Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, USA
| | - Erica Nathan
- Department of Earth, Environmental and Planetary Sciences, Brown University, Providence, Rhode Island, USA
| | - Marta Filipa Simões
- State Key Laboratory of Lunar and Planetary Sciences, Macau University of Science and Technology, Taipa, Macau SAR, China
| | - André Antunes
- State Key Laboratory of Lunar and Planetary Sciences, Macau University of Science and Technology, Taipa, Macau SAR, China
| | | | | | - Noam R Izenberg
- Johns Hopkins Applied Physics Laboratory, Laurel, Maryland, USA
| | - Dimitra Atri
- Center for Space Science, New York University Abu Dhabi, United Arab Emirates
| | | | | | | | | | - Kennda L Lynch
- Lunar and Planetary Institute, USRA, Houston, Texas, USA
| | | | - Jorge I Zuluaga
- Institute of Physics / FCEN - Universidad de Antioquia, Medellín, Colombia
| | - Juan F Salazar
- GIGA, Escuela Ambiental, Facultad de Ingeniería, Universidad de Antioquia, Medellín, Colombia
| | - Howard Chen
- Northwestern University, Evanston, Illinois, USA
| | - Grizelle González
- USDA Forest Service International Institute of Tropical Forestry, San Juan, Puerto Rico, USA
| | | | - Jacob Haqq-Misra
- Blue Marble Space Institute of Science, Seattle, Washington, USA
| |
Collapse
|
14
|
Tinkov OV, Grigorev VY, Grigoreva LD. QSAR analysis of the acute toxicity of avermectins towards Tetrahymena pyriformis. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021; 32:541-571. [PMID: 34157880 DOI: 10.1080/1062936x.2021.1932583] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 05/17/2021] [Indexed: 06/13/2023]
Abstract
Avermectins have been effectively used in medicine, veterinary medicine, and agriculture as antiparasitic agents for many years. However, there are still no reliable data on the main ecotoxicological characteristics of most individual avermectins. Although many QSAR models have been proposed to describe the acute toxicity of organic compounds towards Tetrahymena pyriformis (T. pyriformis), avermectins are outside the applicability domain of these models. The influence of the molecular structures of various organic compounds on the acute toxicity towards T. pyriformis was studied using the OCHEM web platform (https://ochem.eu). A data set of 1792 toxicants was used to create models. The QSAR (Quantitative Structure-Activity Relationship) models were developed using the molecular descriptors Dragon, ISIDA, CDK, PyDescriptor, alvaDesc, and SIRMS and machine learning methods, such as Least Squares Support Vector Machine and Transformer Convolutional Neural Network. The HYBOT descriptors and Random Forest were used for a comparative QSAR investigation. Since the best predictive ability was demonstrated by the Transformer Convolutional Neural Network model, it was used to predict the toxicity of individual avermectins towards T. pyriformis. During a structural interpretation of the developed QSAR model, we determined the significant molecular transformations that increase and decrease the acute toxicity of organic compounds.
Collapse
Affiliation(s)
- O V Tinkov
- Department of Pharmacology and Pharmaceutical Chemistry, Medical Faculty, Shevchenko Transnistria State University, Tiraspol, Moldova
- Department of Computer Science, Military Institute of the Ministry of Defense, Tiraspol, Moldova
| | - V Y Grigorev
- Department of Computer-aided Molecular Design, Institute of Physiologically Active Compounds of the Russian Academy of Science, Chernogolovka, Russia
| | - L D Grigoreva
- Department of Fundamental Physicochemical Engineering, Moscow State University, Moscow, Russia
| |
Collapse
|
15
|
Shi H, Pan Y, Yang F, Cao J, Tan X, Yuan B, Jiang J. Nano-SAR Modeling for Predicting the Cytotoxicity of Metal Oxide Nanoparticles to PaCa2. Molecules 2021; 26:molecules26082188. [PMID: 33920258 PMCID: PMC8069170 DOI: 10.3390/molecules26082188] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 04/03/2021] [Accepted: 04/06/2021] [Indexed: 11/16/2022] Open
Abstract
Nowadays, the impact of engineered nanoparticles (NPs) on human health and environment has aroused widespread attention. It is essential to assess and predict the biological activity, toxicity, and physicochemical properties of NPs. Computation-based methods have been developed to be efficient alternatives for understanding the negative effects of nanoparticles on the environment and human health. Here, a classification-based structure-activity relationship model for nanoparticles (nano-SAR) was developed to predict the cellular uptake of 109 functionalized magneto-fluorescent nanoparticles to pancreatic cancer cells (PaCa2). The norm index descriptors were employed for describing the structure characteristics of the involved nanoparticles. The Random forest algorithm (RF), combining with the Recursive Feature Elimination (RFE) was employed to develop the nano-SAR model. The resulted model showed satisfactory statistical performance, with the accuracy (ACC) of the test set and the training set of 0.950 and 0.966, respectively, demonstrating that the model had satisfactory classification effect. The model was rigorously verified and further extensively compared with models in the literature. The proposed model could be reasonably expected to predict the cellular uptakes of nanoparticles and provide some guidance for the design and manufacture of safer nanomaterials.
Collapse
Affiliation(s)
- Haihua Shi
- Jiangsu Key Laboratory of Hazardous Chemicals Safety and Control, College of Safety Science and Engineering, Nanjing Tech University, Nanjing 210009, China; (H.S.); (F.Y.); (J.C.); (X.T.); (B.Y.); (J.J.)
| | - Yong Pan
- Jiangsu Key Laboratory of Hazardous Chemicals Safety and Control, College of Safety Science and Engineering, Nanjing Tech University, Nanjing 210009, China; (H.S.); (F.Y.); (J.C.); (X.T.); (B.Y.); (J.J.)
- Correspondence: ; Tel.: +86-25-581-398-73
| | - Fan Yang
- Jiangsu Key Laboratory of Hazardous Chemicals Safety and Control, College of Safety Science and Engineering, Nanjing Tech University, Nanjing 210009, China; (H.S.); (F.Y.); (J.C.); (X.T.); (B.Y.); (J.J.)
| | - Jiakai Cao
- Jiangsu Key Laboratory of Hazardous Chemicals Safety and Control, College of Safety Science and Engineering, Nanjing Tech University, Nanjing 210009, China; (H.S.); (F.Y.); (J.C.); (X.T.); (B.Y.); (J.J.)
| | - Xinlong Tan
- Jiangsu Key Laboratory of Hazardous Chemicals Safety and Control, College of Safety Science and Engineering, Nanjing Tech University, Nanjing 210009, China; (H.S.); (F.Y.); (J.C.); (X.T.); (B.Y.); (J.J.)
| | - Beilei Yuan
- Jiangsu Key Laboratory of Hazardous Chemicals Safety and Control, College of Safety Science and Engineering, Nanjing Tech University, Nanjing 210009, China; (H.S.); (F.Y.); (J.C.); (X.T.); (B.Y.); (J.J.)
| | - Juncheng Jiang
- Jiangsu Key Laboratory of Hazardous Chemicals Safety and Control, College of Safety Science and Engineering, Nanjing Tech University, Nanjing 210009, China; (H.S.); (F.Y.); (J.C.); (X.T.); (B.Y.); (J.J.)
- School of Environment & Safety Engineering, Changzhou University, Changzhou 213164, China
| |
Collapse
|
16
|
Jain S, Siramshetty VB, Alves VM, Muratov EN, Kleinstreuer N, Tropsha A, Nicklaus MC, Simeonov A, Zakharov AV. Large-Scale Modeling of Multispecies Acute Toxicity End Points Using Consensus of Multitask Deep Learning Methods. J Chem Inf Model 2021; 61:653-663. [PMID: 33533614 DOI: 10.1021/acs.jcim.0c01164] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Computational methods to predict molecular properties regarding safety and toxicology represent alternative approaches to expedite drug development, screen environmental chemicals, and thus significantly reduce associated time and costs. There is a strong need and interest in the development of computational methods that yield reliable predictions of toxicity, and many approaches, including the recently introduced deep neural networks, have been leveraged towards this goal. Herein, we report on the collection, curation, and integration of data from the public data sets that were the source of the ChemIDplus database for systemic acute toxicity. These efforts generated the largest publicly available such data set comprising > 80,000 compounds measured against a total of 59 acute systemic toxicity end points. This data was used for developing multiple single- and multitask models utilizing random forest, deep neural networks, convolutional, and graph convolutional neural network approaches. For the first time, we also reported the consensus models based on different multitask approaches. To the best of our knowledge, prediction models for 36 of the 59 end points have never been published before. Furthermore, our results demonstrated a significantly better performance of the consensus model obtained from three multitask learning approaches that particularly predicted the 29 smaller tasks (less than 300 compounds) better than other models developed in the study. The curated data set and the developed models have been made publicly available at https://github.com/ncats/ld50-multitask, https://predictor.ncats.io/, and https://cactus.nci.nih.gov/download/acute-toxicity-db (data set only) to support regulatory and research applications.
Collapse
Affiliation(s)
- Sankalp Jain
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Vishal B Siramshetty
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Vinicius M Alves
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Nicole Kleinstreuer
- Division of Intramural Research, Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Durham, North Carolina 27709, United States.,National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Durham, North Carolina 27709, United States
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Marc C Nicklaus
- Computer-Aided Drug Design (CADD) Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, NCI-Frederick, 376 Boyles Street, Frederick, Maryland 21702, United States
| | - Anton Simeonov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Alexey V Zakharov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| |
Collapse
|
17
|
Rodrigues JF, Florea L, de Oliveira MCF, Diamond D, Oliveira ON. Big data and machine learning for materials science. DISCOVER MATERIALS 2021; 1:12. [PMID: 33899049 PMCID: PMC8054236 DOI: 10.1007/s43939-021-00012-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/01/2021] [Indexed: 05/11/2023]
Abstract
Herein, we review aspects of leading-edge research and innovation in materials science that exploit big data and machine learning (ML), two computer science concepts that combine to yield computational intelligence. ML can accelerate the solution of intricate chemical problems and even solve problems that otherwise would not be tractable. However, the potential benefits of ML come at the cost of big data production; that is, the algorithms demand large volumes of data of various natures and from different sources, from material properties to sensor data. In the survey, we propose a roadmap for future developments with emphasis on computer-aided discovery of new materials and analysis of chemical sensing compounds, both prominent research fields for ML in the context of materials science. In addition to providing an overview of recent advances, we elaborate upon the conceptual and practical limitations of big data and ML applied to materials science, outlining processes, discussing pitfalls, and reviewing cases of success and failure.
Collapse
Affiliation(s)
- Jose F. Rodrigues
- Institute of Mathematical Sciences and Computing, University of São Paulo (USP), São Carlos, SP Brazil
| | - Larisa Florea
- SFI Research Centre for Advanced Materials and BioEngineering Research Trinity College Dublin, The University of Dublin, Dublin, Ireland
| | - Maria C. F. de Oliveira
- Institute of Mathematical Sciences and Computing, University of São Paulo (USP), São Carlos, SP Brazil
| | - Dermot Diamond
- Insight Centre for Data Analytics, National Centre for Sensor Research, Dublin City University, Dublin 9, Dublin, Ireland
| | - Osvaldo N. Oliveira
- São Carlos Institute of Physics, University of São Paulo (USP), São Carlos, SP Brazil
| |
Collapse
|
18
|
Pereira JC, Daher SS, Zorn KM, Sherwood M, Russo R, Perryman AL, Wang X, Freundlich MJ, Ekins S, Freundlich JS. Machine Learning Platform to Discover Novel Growth Inhibitors of Neisseria gonorrhoeae. Pharm Res 2020; 37:141. [PMID: 32661900 DOI: 10.1007/s11095-020-02876-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 07/06/2020] [Indexed: 12/17/2022]
Abstract
PURPOSE To advance fundamental biological and translational research with the bacterium Neisseria gonorrhoeae through the prediction of novel small molecule growth inhibitors via naïve Bayesian modeling methodology. METHODS Inspection and curation of data from the publicly available ChEMBL web site for small molecule growth inhibition data of the bacterium Neisseria gonorrhoeae resulted in a training set for the construction of machine learning models. A naïve Bayesian model for bacterial growth inhibition was utilized in a workflow to predict novel antibacterial agents against this bacterium of global health relevance from a commercial library of >105 drug-like small molecules. Follow-up efforts involved empirical assessment of the predictions and validation of the hits. RESULTS Specifically, two small molecules were found that exhibited promising activity profiles and represent novel chemotypes for agents against N. gonorrrhoeae. CONCLUSIONS This represents, to the best of our knowledge, the first machine learning approach to successfully predict novel growth inhibitors of this bacterium. To assist the chemical tool and drug discovery fields, we have made our curated training set available as part of the Supplementary Material and the Bayesian model is accessible via the web. Graphical Abstract.
Collapse
Affiliation(s)
- Janaina Cruz Pereira
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA
| | - Samer S Daher
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA
| | - Kimberley M Zorn
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Matthew Sherwood
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA
| | - Riccardo Russo
- Division of Infectious Disease, Department of Medicine and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA
| | - Alexander L Perryman
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA.,Repare Therapeutics,, 7210 Rue Frederick-Banting Suite 100, Montreal, QC, H4S 2A1, Canada
| | - Xin Wang
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA.,Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Madeleine J Freundlich
- Stuart Country Day School of the Sacred Heart, 1200 Stuart Road, Princeton, NJ, 08540, USA
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.,Collaborations in Chemistry, Inc. 5616 Hilltop Needmore Road, Fuquay-, Varina, NC, 27526, USA
| | - Joel S Freundlich
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA. .,Division of Infectious Disease, Department of Medicine and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA.
| |
Collapse
|
19
|
Takata M, Lin BL, Xue M, Zushi Y, Terada A, Hosomi M. Predicting the acute ecotoxicity of chemical substances by machine learning using graph theory. CHEMOSPHERE 2020; 238:124604. [PMID: 31450113 DOI: 10.1016/j.chemosphere.2019.124604] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 08/13/2019] [Accepted: 08/15/2019] [Indexed: 06/10/2023]
Abstract
Accurate in silico predictions of chemical substance ecotoxicity has become an important issue in recent years. Most conventional methods, such as the Ecological Structure-Activity Relationship (ECOSAR) model, cluster chemical substances empirically based on structural information and then predict toxicity by employing a log P linear regression model. Due to empirical classification, the prediction accuracy does not improve even if new ecotoxicity test data are added. In addition, most of the conventional methods are not appropriate for predicting the ecotoxicity on inorganic and/or ionized compounds. Furthermore, a user faces difficulty in handling multiple Quantitative Structure-Activity Relationship (QSAR) formulas with one chemical substance. To overcome the flaws of the conventional methods, in this study a new method was developed that applied unsupervised machine learning and graph theory to predict acute ecotoxicity. The proposed machine learning technique is based on the large AIST-MeRAM ecotoxicity test dataset, a software program developed by the National Institute of Advanced Industry Science and Technology for Multi-purpose Ecological Risk Assessment and Management, and the Molecular ACCess System (MACCS) keys that vectorize a chemical structure to 166-bit binary information. The acute toxicity of fish, daphnids, and algae can be predicted with good accuracy, without requiring log P and linear regression models in existing methods. Results from the new method were cross-validated and compared with ECOSAR predictions and show that the new method provides better accuracy for a wider range of chemical substances, including inorganic and ionized compounds.
Collapse
Affiliation(s)
- Michiyoshi Takata
- Department of Chemical Engineering, Tokyo University of Agriculture and Technology, Japan
| | - Bin-Le Lin
- Research Institute of Science for Safety and Sustainability, National Institute of Advanced Industrial Science and Technology (AIST), Japan.
| | - Mianqiang Xue
- Research Institute of Science for Safety and Sustainability, National Institute of Advanced Industrial Science and Technology (AIST), Japan
| | - Yasuyuki Zushi
- Research Institute of Science for Safety and Sustainability, National Institute of Advanced Industrial Science and Technology (AIST), Japan
| | - Akihiko Terada
- Department of Chemical Engineering, Tokyo University of Agriculture and Technology, Japan
| | - Masaaki Hosomi
- Department of Chemical Engineering, Tokyo University of Agriculture and Technology, Japan
| |
Collapse
|
20
|
Zhang Y, Han Z, Gao Q, Bai X, Zhang C, Hou H. Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches. Curr Pharm Des 2019; 25:4296-4302. [PMID: 31696803 DOI: 10.2174/1381612825666191107092214] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/04/2019] [Indexed: 12/14/2022]
Abstract
BACKGROUND β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. METHODS In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. RESULTS The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. CONCLUSION This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.
Collapse
Affiliation(s)
- Yuan Zhang
- Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| | - Zhenyan Han
- Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| | - Qian Gao
- Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| | - Xiaoyi Bai
- Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| | - Chi Zhang
- Huaxia Eye Hospital of Foshan, Huaxia Eye Hospital Group, Foshan, Guangdong, China.,University of Auckland, Auckland, New Zealand
| | - Hongying Hou
- Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| |
Collapse
|
21
|
Ai H, Wu X, Zhang L, Qi M, Zhao Y, Zhao Q, Zhao J, Liu H. QSAR modelling study of the bioconcentration factor and toxicity of organic compounds to aquatic organisms using machine learning and ensemble methods. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2019; 179:71-78. [PMID: 31026752 DOI: 10.1016/j.ecoenv.2019.04.035] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Revised: 03/27/2019] [Accepted: 04/11/2019] [Indexed: 06/09/2023]
Abstract
Bioconcentration factors and median lethal concentrations (LC50s) are important when assessing risks posed by organic pollutants to aquatic ecosystems. Various quantitative structure-activity relationship models have been developed to predict bioconcentration factors and classify acute toxicity. In the study, we developed a regression model using Recursive Feature Elimination (RFE) method combined with the Support Vector Machine (SVM) algorithm. We calculated 2D molecular descriptors from a dataset containing 450 diverse chemicals in our regression model. Then we built three ensemble models using three machine learning algorithms and calculated 12 molecular fingerprints from a dataset containing 400 diverse chemicals in our classification models. In the regression model, the R2 and Rpred2 for the regression model were 0.860 and 0.757, respectively. Other parameters indicated that the regression model made good predictions and could efficiently predict a new set of compounds following standards set by Golbraikh, Tropsha, and Roy. In the classification models, the ensemble-SVM classification model gave an overall accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve of 92.2, 95.1, 86.0, and 0.965, respectively, in a five-fold cross-validation and of 87.3, 92.6, 76.0, and 0.940, respectively, in an external validation. These parameters indicated that our ensemble-SVM model was more stable and gave more accurate predictions than previous models. The model could therefore be used to effectively predict aquatic toxicity and assess risks posed to aquatic ecosystems. We identified several structures most relevant to acute aquatic toxicity through predictions made by the two types of models, and this information may be important to aquatic toxicology experiments and aquatic system management.
Collapse
Affiliation(s)
- Haixin Ai
- Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, China; Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang, 110036, China; School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Xuewei Wu
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Li Zhang
- Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, China; Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang, 110036, China; School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Mengyuan Qi
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Ying Zhao
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Qi Zhao
- School of Mathematics, Liaoning University, Shenyang, 110036, China
| | - Jian Zhao
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Hongsheng Liu
- Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, China; Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang, 110036, China; School of Life Science, Liaoning University, Shenyang, 110036, China.
| |
Collapse
|
22
|
He L, Xiao K, Zhou C, Li G, Yang H, Li Z, Cheng J. Insights into pesticide toxicity against aquatic organism: QSTR models on Daphnia Magna. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2019; 173:285-292. [PMID: 30776561 DOI: 10.1016/j.ecoenv.2019.02.014] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Revised: 01/30/2019] [Accepted: 02/04/2019] [Indexed: 06/09/2023]
Abstract
The toxicities of agrochemicals to non-target aquatic organisms are key items in chemical ecological risk assessment. However, it is still an urgent need to develop new tools to assess the agrochemical aquatic toxicity efficiently and accurately. In this work, QSTR studies were performed on a data set containing 639 diverse pesticides with measured EC50 toxicity against Daphnia magna, by using five machine learning methods combined with seven fingerprints and a set of molecular descriptors. The imbalance problem of the data set was successfully solved by clustering analysis. The top-10 QSTR models displayed greater predicative abilities than ECOSAR. The optimal model, Ext-SVM, showed the best performance in 10-fold cross validation (Qhigh=0.807, Qmoderate=0.806, Qlow=0.755, Qtotal=0.794), and also in the test set verification (Qhigh=0.865, Qmoderate=0.783, Qlow=0.931, Qtotal=0.848). The relevance of the key physical-chemical properties with the toxicity was also investigated, in which the MW, a_np, logP(o/w), GCUT_SLOGP_1, chilv and SMR_VSA7 values displayed positive correlation with Daphnia magna toxicity, whereas the logS and a_don showed negative correlation. The robust QSTR models provided efficient tools for assessing agrochemical aquatic toxicity, and the revealed different physical-chemical properties between the high and low toxic compounds might be useful in the discovery and design of low aquatic toxic pesticides.
Collapse
Affiliation(s)
- Lujue He
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China; Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Keya Xiao
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Cong Zhou
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China; Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Guanglong Li
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Zhong Li
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Jiagao Cheng
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China; Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China.
| |
Collapse
|
23
|
Peterson LE. Small Molecule Docking of DNA Repair Proteins Associated with Cancer Survival Following PCNA Metagene Adjustment: A Potential Novel Class of Repair Inhibitors. Molecules 2019; 24:E645. [PMID: 30759820 PMCID: PMC6384788 DOI: 10.3390/molecules24030645] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Revised: 02/05/2019] [Accepted: 02/11/2019] [Indexed: 11/16/2022] Open
Abstract
Natural and synthetic small molecules from the NCI Developmental Therapeutics Program (DTP) were employed in molecular dynamics-based docking with DNA repair proteins whose RNA-Seq based expression was associated with overall cancer survival (OS) after adjustment for the PCNA metagene. The compounds employed were required to elicit a sensitive response (vs. resistance) in more than half of the cell lines tested for each cancer. Methodological approaches included peptide sequence alignments and homology modeling for 3D protein structure determination, ligand preparation, docking, toxicity and ADME prediction. Docking was performed for unique lists of DNA repair proteins which predict OS for AML, cancers of the breast, lung, colon, and ovaries, GBM, melanoma, and renal papillary cancer. Results indicate hundreds of drug-like and lead-like ligands with best-pose binding energies less than -6 kcal/mol. Ligand solubility for the top 20 drug-like hits approached lower bounds, while lipophilicity was acceptable. Most ligands were also blood-brain barrier permeable with high intestinal absorption rates. While the majority of ligands lacked positive prediction for HERG channel blockage and Ames carcinogenicity, there was a considerable variation for predicted fathead minnow, honey bee, and Tetrahymena pyriformis toxicity. The computational results suggest the potential for new targets and mechanisms of repair inhibition and can be directly employed for in vitro and in vivo confirmatory laboratory experiments to identify new targets of therapy for cancer survival.
Collapse
Affiliation(s)
- Leif E Peterson
- Department of Healthcare Policy and Research, Weill Cornell Medical College, Cornell University, New York City, NY 10065, USA.
- Center for Biostatistics, Institute for Academic Medicine, Houston Methodist Research Institute, 6565 Fannin Street, Houston, TX 77030, USA.
| |
Collapse
|
24
|
Affiliation(s)
- María E. Elguero
- Facultad de Farmacia y Bioquímica, Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Nanobiotecnología (NANOBIOTEC), Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Clara B. Nudel
- Facultad de Farmacia y Bioquímica, Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Nanobiotecnología (NANOBIOTEC), Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Alejandro D. Nusblat
- Facultad de Farmacia y Bioquímica, Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Nanobiotecnología (NANOBIOTEC), Universidad de Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
25
|
Yang H, Sun L, Li W, Liu G, Tang Y. In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts. Front Chem 2018; 6:30. [PMID: 29515993 PMCID: PMC5826228 DOI: 10.3389/fchem.2018.00030] [Citation(s) in RCA: 108] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 02/05/2018] [Indexed: 12/17/2022] Open
Abstract
During drug development, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future.
Collapse
Affiliation(s)
| | | | | | | | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| |
Collapse
|
26
|
Li X, Chen Y, Song X, Zhang Y, Li H, Zhao Y. The development and application of in silico models for drug induced liver injury. RSC Adv 2018; 8:8101-8111. [PMID: 35542036 PMCID: PMC9078522 DOI: 10.1039/c7ra12957b] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Accepted: 02/09/2018] [Indexed: 11/23/2022] Open
Abstract
Drug-induced liver injury (DILI), caused by drugs, herbal agents or nutritional supplements, is a major issue for patients and the pharmaceutical industry. It has been a leading cause of clinical trials failure and withdrawal of FDA approval. In this research, we focused on in silico estimation of chemical DILI potential on humans based on structurally diverse organic chemicals. We developed a series of binary classification models using five different machine learning methods and eight different feature reduction methods. The model, developed with the support vector machine (SVM) and the MACCS fingerprint, performed best both on the test set and external validation. It achieved a prediction accuracy of 80.39% on the test set and 82.78% on external validation. We made this model available at http://opensource.vslead.com/. The user can freely predict the DILI potential of molecules. Furthermore, we analyzed the difference of distributions of 12 key physical-chemical properties between DILI-positive and DILI-negative compounds and 20 privileged substructures responsible for DILI were identified from the Klekota-Roth fingerprint. Moreover, since traditional Chinese medicine (TCM)-induced liver injury is also one of the major concerns among the toxic effects, we evaluated the DILI potential of TCM ingredients using the MACCS_SVM model developed in this study. We hope the model and privileged substructures could be useful complementary tools for chemical DILI evaluation.
Collapse
Affiliation(s)
- Xiao Li
- Beijing Beike Deyuan Bio-Pharm Technology Co. Ltd. 7 Fengxian road Beijing 100094 China +86-10-5934-1890
- Beijing Key Laboratory of Cloud Computing Key Technology and Application, Beijing Computing Center, Beijing Academy of Science and Technology 7 Fengxian road Beijing 100094 China +86-10-5934-1855 +86-10-5934-1764
| | - Yaojie Chen
- Beijing Beike Deyuan Bio-Pharm Technology Co. Ltd. 7 Fengxian road Beijing 100094 China +86-10-5934-1890
| | - Xinrui Song
- Beijing Beike Deyuan Bio-Pharm Technology Co. Ltd. 7 Fengxian road Beijing 100094 China +86-10-5934-1890
| | - Yuan Zhang
- Beijing Beike Deyuan Bio-Pharm Technology Co. Ltd. 7 Fengxian road Beijing 100094 China +86-10-5934-1890
| | - Huanhuan Li
- Beijing Beike Deyuan Bio-Pharm Technology Co. Ltd. 7 Fengxian road Beijing 100094 China +86-10-5934-1890
| | - Yong Zhao
- Beijing Beike Deyuan Bio-Pharm Technology Co. Ltd. 7 Fengxian road Beijing 100094 China +86-10-5934-1890
- Beijing Key Laboratory of Cloud Computing Key Technology and Application, Beijing Computing Center, Beijing Academy of Science and Technology 7 Fengxian road Beijing 100094 China +86-10-5934-1855 +86-10-5934-1764
| |
Collapse
|
27
|
Li X, Zhang Y, Chen H, Li H, Zhao Y. Insights into the Molecular Basis of the Acute Contact Toxicity of Diverse Organic Chemicals in the Honey Bee. J Chem Inf Model 2017; 57:2948-2957. [PMID: 29161513 DOI: 10.1021/acs.jcim.7b00476] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Use of chemical pollutants, including pesticides and other industrial chemicals, has resulted in significant risks to the whole ecosystem. Therefore, ecological risk assessment of chemicals is vital and necessary. Since the honey bee (Apis mellifera) is probably among the most exposed species to the polluting chemicals, we focused on the in silico estimation of honey bee toxicity (HBT) of chemicals and the analysis of the relevance of chemical HBT and several key physical-chemical properties and structural characteristics. A total of 40 classification models were developed by combination of five machine learning methods along with seven kinds of fingerprints and a set of molecular descriptors. After 5-fold cross validation and external validation, several models showed good predictive power. The relevance of 12 key physical-chemical properties and chemical HBT was also investigated. Five properties, including AlogP, logD, molecular weight (MW), molecular surface area (MSA), and the number of rotatable bonds (nRTB), indicated positive correlation coefficients with HBT, while molecular solubility (logS) and the number of hydrogen bond donors (nHBD) indicated negative correlation coefficients. Finally, seven privileged substructures responsible for chemical HBT were identified from KRFP and SubFP fingerprints. The results of this study should provide critical information and useful tools for chemical HBT estimation in environmental risk assessment.
Collapse
Affiliation(s)
- Xiao Li
- Beijing Computing Center, Beijing Academy of Science and Technology , 7 Fengxian road, Beijing 100094, China.,Beijing Beike Deyuan Bio-Pharm Technology Co. Ltd. , 7 Fengxian road, Beijing 100094, China
| | - Yuan Zhang
- Beijing Beike Deyuan Bio-Pharm Technology Co. Ltd. , 7 Fengxian road, Beijing 100094, China
| | - Hongna Chen
- Tigermed Consulting Co., Ltd. , 20 Chaowai Street, Beijing 100020, China
| | - Huanhuan Li
- Beijing Beike Deyuan Bio-Pharm Technology Co. Ltd. , 7 Fengxian road, Beijing 100094, China
| | - Yong Zhao
- Beijing Computing Center, Beijing Academy of Science and Technology , 7 Fengxian road, Beijing 100094, China.,Beijing Beike Deyuan Bio-Pharm Technology Co. Ltd. , 7 Fengxian road, Beijing 100094, China
| |
Collapse
|
28
|
Li F, Fan D, Wang H, Yang H, Li W, Tang Y, Liu G. In silico prediction of pesticide aquatic toxicity with chemical category approaches. Toxicol Res (Camb) 2017; 6:831-842. [PMID: 30090546 PMCID: PMC6062408 DOI: 10.1039/c7tx00144d] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Accepted: 07/27/2017] [Indexed: 01/03/2023] Open
Abstract
Aquatic toxicity is an important issue in pesticide development. In this study, using nine molecular fingerprints to describe pesticides, binary and ternary classification models were constructed to predict aquatic toxicity of pesticides via six machine learning methods: Naïve Bayes (NB), Artificial Neural Network (ANN), k-Nearest Neighbor (kNN), Classification Tree (CT), Random Forest (RF) and Support Vector Machine (SVM). For the binary models, local models were obtained with 829 pesticides on rainbow trout (RT) and 151 pesticides on lepomis (LP), and global models were constructed on the basis of 1258 diverse pesticides on RT and LP and 278 on other fish species. After analyzing the local binary models, we found that fish species caused influence in terms of accuracy. Considering the data size and predictive range, the 1258 pesticides were also used to build global ternary models. The best local binary models were Maccs_ANN for RT and Maccs_SVM for LP, which exhibited accuracies of 0.90 and 0.90, respectively. For global binary models, the best model was Graph_SVM with an accuracy of 0.89. Accuracy of the best global ternary model Graph_SVM was 0.81, which was a little lower than that of the best global binary model. In addition, several substructural alerts were identified including nitrobenzene, chloroalkene and nitrile, which could significantly correlate with pesticide aquatic toxicity. This study provides a useful tool for an early evaluation of pesticide aquatic toxicity in environmental risk assessment.
Collapse
Affiliation(s)
- Fuxing Li
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . ; ; ; Tel: +86-21-64250811
| | - Defang Fan
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . ; ; ; Tel: +86-21-64250811
| | - Hao Wang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . ; ; ; Tel: +86-21-64250811
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . ; ; ; Tel: +86-21-64250811
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . ; ; ; Tel: +86-21-64250811
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . ; ; ; Tel: +86-21-64250811
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . ; ; ; Tel: +86-21-64250811
| |
Collapse
|
29
|
Abbasitabar F, Zare-Shahabadi V. In silico prediction of toxicity of phenols to Tetrahymena pyriformis by using genetic algorithm and decision tree-based modeling approach. CHEMOSPHERE 2017; 172:249-259. [PMID: 28081509 DOI: 10.1016/j.chemosphere.2016.12.095] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2016] [Revised: 11/29/2016] [Accepted: 12/19/2016] [Indexed: 05/27/2023]
Abstract
Risk assessment of chemicals is an important issue in environmental protection; however, there is a huge lack of experimental data for a large number of end-points. The experimental determination of toxicity of chemicals involves high costs and time-consuming process. In silico tools such as quantitative structure-toxicity relationship (QSTR) models, which are constructed on the basis of computational molecular descriptors, can predict missing data for toxic end-points for existing or even not yet synthesized chemicals. Phenol derivatives are known to be aquatic pollutants. With this background, we aimed to develop an accurate and reliable QSTR model for the prediction of toxicity of 206 phenols to Tetrahymena pyriformis. A multiple linear regression (MLR)-based QSTR was obtained using a powerful descriptor selection tool named Memorized_ACO algorithm. Statistical parameters of the model were 0.72 and 0.68 for Rtraining2 and Rtest2, respectively. To develop a high-quality QSTR model, classification and regression tree (CART) was employed. Two approaches were considered: (1) phenols were classified into different modes of action using CART and (2) the phenols in the training set were partitioned to several subsets by a tree in such a manner that in each subset, a high-quality MLR could be developed. For the first approach, the statistical parameters of the resultant QSTR model were improved to 0.83 and 0.75 for Rtraining2 and Rtest2, respectively. Genetic algorithm was employed in the second approach to obtain an optimal tree, and it was shown that the final QSTR model provided excellent prediction accuracy for the training and test sets (Rtraining2 and Rtest2 were 0.91 and 0.93, respectively). The mean absolute error for the test set was computed as 0.1615.
Collapse
Affiliation(s)
- Fatemeh Abbasitabar
- Department of Chemistry, Marvdasht Branch, Islamic Azad University, Marvdasht, Iran.
| | - Vahid Zare-Shahabadi
- Department of Chemistry, Mahshahr Branch, Islamic Azad University, Mahshahr, Iran
| |
Collapse
|
30
|
Trifunović J, Borčić V, Vukmirović S, Vasović V, Mikov M. Bile acids and their oxo derivatives: environmentally safe materials for drug design and delivery. Drug Chem Toxicol 2016; 40:397-405. [DOI: 10.1080/01480545.2016.1244680] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Jovana Trifunović
- Department of Pharmacology, Toxicology and Clinical Pharmacology, Faculty of Medicine, University of Novi Sad, Novi Sad, Serbia
| | - Vladan Borčić
- Department of Pharmacology, Toxicology and Clinical Pharmacology, Faculty of Medicine, University of Novi Sad, Novi Sad, Serbia
| | - Saša Vukmirović
- Department of Pharmacology, Toxicology and Clinical Pharmacology, Faculty of Medicine, University of Novi Sad, Novi Sad, Serbia
| | - Velibor Vasović
- Department of Pharmacology, Toxicology and Clinical Pharmacology, Faculty of Medicine, University of Novi Sad, Novi Sad, Serbia
| | - Momir Mikov
- Department of Pharmacology, Toxicology and Clinical Pharmacology, Faculty of Medicine, University of Novi Sad, Novi Sad, Serbia
| |
Collapse
|
31
|
Yin Y, Xu C, Gu S, Li W, Liu G, Tang Y. Quantitative Regression Models for the Prediction of Chemical Properties by an Efficient Workflow. Mol Inform 2016; 34:679-88. [PMID: 27490968 DOI: 10.1002/minf.201400119] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2014] [Accepted: 03/10/2015] [Indexed: 11/08/2022]
Abstract
Rapid safety assessment is more and more needed for the increasing chemicals both in chemical industries and regulators around the world. The traditional experimental methods couldn't meet the current demand any more. With the development of the information technology and the growth of experimental data, in silico modeling has become a practical and rapid alternative for the assessment of chemical properties, especially for the toxicity prediction of organic chemicals. In this study, a quantitative regression workflow was built by KNIME to predict chemical properties. With this regression workflow, quantitative values of chemical properties can be obtained, which is different from the binary-classification model or multi-classification models that can only give qualitative results. To illustrate the usage of the workflow, two predictive models were constructed based on datasets of Tetrahymena pyriformis toxicity and Aqueous solubility. The qcv (2) and qtest (2) of 5-fold cross validation and external validation for both types of models were greater than 0.7, which implies that our models are robust and reliable, and the workflow is very convenient and efficient in prediction of various chemical properties.
Collapse
Affiliation(s)
- Yongmin Yin
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, P.R. China tel: +86-21-64250811; fax: +86-21-64251033
| | - Congying Xu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, P.R. China tel: +86-21-64250811; fax: +86-21-64251033
| | - Shikai Gu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, P.R. China tel: +86-21-64250811; fax: +86-21-64251033
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, P.R. China tel: +86-21-64250811; fax: +86-21-64251033
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, P.R. China tel: +86-21-64250811; fax: +86-21-64251033.
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, P.R. China tel: +86-21-64250811; fax: +86-21-64251033.
| |
Collapse
|
32
|
A three-tier QSAR modeling strategy for estimating eye irritation potential of diverse chemicals in rabbit for regulatory purposes. Regul Toxicol Pharmacol 2016; 77:282-91. [DOI: 10.1016/j.yrtph.2016.03.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Revised: 02/22/2016] [Accepted: 03/18/2016] [Indexed: 01/08/2023]
|
33
|
Zhang Y, Wong YS, Deng J, Anton C, Gabos S, Zhang W, Huang DY, Jin C. Machine learning algorithms for mode-of-action classification in toxicity assessment. BioData Min 2016; 9:19. [PMID: 27182283 PMCID: PMC4866020 DOI: 10.1186/s13040-016-0098-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Accepted: 04/30/2016] [Indexed: 12/29/2022] Open
Abstract
Background Real Time Cell Analysis (RTCA) technology is used to monitor cellular changes continuously over the entire exposure period. Combining with different testing concentrations, the profiles have potential in probing the mode of action (MOA) of the testing substances. Results In this paper, we present machine learning approaches for MOA assessment. Computational tools based on artificial neural network (ANN) and support vector machine (SVM) are developed to analyze the time-concentration response curves (TCRCs) of human cell lines responding to tested chemicals. The techniques are capable of learning data from given TCRCs with known MOA information and then making MOA classification for the unknown toxicity. A novel data processing step based on wavelet transform is introduced to extract important features from the original TCRC data. From the dose response curves, time interval leading to higher classification success rate can be selected as input to enhance the performance of the machine learning algorithm. This is particularly helpful when handling cases with limited and imbalanced data. The validation of the proposed method is demonstrated by the supervised learning algorithm applied to the exposure data of HepG2 cell line to 63 chemicals with 11 concentrations in each test case. Classification success rate in the range of 85 to 95 % are obtained using SVM for MOA classification with two clusters to cases up to four clusters. Conclusions Wavelet transform is capable of capturing important features of TCRCs for MOA classification. The proposed SVM scheme incorporated with wavelet transform has a great potential for large scale MOA classification and high-through output chemical screening. Electronic supplementary material The online version of this article (doi:10.1186/s13040-016-0098-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yile Zhang
- Department of Mathematical and Statistical Science, University of Alberta, T6G 2G1, Edmonton, Canada
| | - Yau Shu Wong
- Department of Mathematical and Statistical Science, University of Alberta, T6G 2G1, Edmonton, Canada
| | - Jian Deng
- Department of Mathematical and Statistical Science, University of Alberta, T6G 2G1, Edmonton, Canada
| | - Cristina Anton
- Department of Mathematics and Statistics, Grant MacEwan University, T5P 2P7, Edmonton, Canada
| | - Stephan Gabos
- Department of Laboratory Medicine and Pathology, University of Alberta, T6G 2B7, Edmonton, Canada
| | | | - Dorothy Yu Huang
- Alberta Centre for Toxicology, University of Calgary, T2N 4N1, Calgary, Canada
| | - Can Jin
- AACEA Biosciences Inc, San Diego, 92121 USA
| |
Collapse
|
34
|
Predicting the acute neurotoxicity of diverse organic solvents using probabilistic neural networks based QSTR modeling approaches. Neurotoxicology 2016; 53:45-52. [DOI: 10.1016/j.neuro.2015.12.013] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Revised: 12/17/2015] [Accepted: 12/17/2015] [Indexed: 12/23/2022]
|
35
|
Toropova AP, Schultz TW, Toropov AA. Building up a QSAR model for toxicity toward Tetrahymena pyriformis by the Monte Carlo method: A case of benzene derivatives. ENVIRONMENTAL TOXICOLOGY AND PHARMACOLOGY 2016; 42:135-145. [PMID: 26851376 DOI: 10.1016/j.etap.2016.01.010] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Revised: 01/12/2016] [Accepted: 01/14/2016] [Indexed: 06/05/2023]
Abstract
Data on toxicity toward Tetrahymena pyriformis is indicator of applicability of a substance in ecologic and pharmaceutical aspects. Quantitative structure-activity relationships (QSARs) between the molecular structure of benzene derivatives and toxicity toward T. pyriformis (expressed as the negative logarithms of the population growth inhibition dose, mmol/L) are established. The available data were randomly distributed three times into the visible training and calibration sets, and invisible validation sets. The statistical characteristics for the validation set are the following: r(2)=0.8179 and s=0.338 (first distribution); r(2)=0.8682 and s=0.341 (second distribution); r(2)=0.8435 and s=0.323 (third distribution). These models are built up using only information on the molecular structure: no data on physicochemical parameters, 3D features of the molecular structure and quantum mechanics descriptors are involved in the modeling process.
Collapse
Affiliation(s)
- Alla P Toropova
- IRCCS-Istituto di Ricerche Farmacologiche Mario Negri, Via La Masa 19, Milano, Italy.
| | - Terry W Schultz
- College of Veterinary Medicine, The University of Tennessee, 2407 River Drive, Knoxville, TN 37996-4543, United States
| | - Andrey A Toropov
- IRCCS-Istituto di Ricerche Farmacologiche Mario Negri, Via La Masa 19, Milano, Italy
| |
Collapse
|
36
|
Zhang C, Zhou Y, Gu S, Wu Z, Wu W, Liu C, Wang K, Liu G, Li W, Lee PW, Tang Y. In silico prediction of hERG potassium channel blockage by chemical category approaches. Toxicol Res (Camb) 2016; 5:570-582. [PMID: 30090371 DOI: 10.1039/c5tx00294j] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 01/13/2016] [Indexed: 12/18/2022] Open
Abstract
The human ether-a-go-go related gene (hERG) plays an important role in cardiac action potential. It encodes an ion channel protein named Kv11.1, which is related to long QT syndrome and may cause avoidable sudden cardiac death. Therefore, it is important to assess the hERG channel blockage of lead compounds in an early drug discovery process. In this study, we collected a large data set containing 1163 diverse compounds with IC50 values determined by the patch clamp method on mammalian cell lines. The whole data set was divided into 80% as the training set and 20% as the test set. Then, five machine learning methods were applied to build a series of binary classification models based on 13 molecular descriptors, five fingerprints and molecular descriptors combining fingerprints at four IC50 thresholds to discriminate hERG blockers from nonblockers, respectively. Models built by molecular descriptors combining fingerprints were validated by using an external validation set containing 407 compounds collected from the hERGCentral database. The performance indicated that the model built by molecular descriptors combining fingerprints yielded the best results and each threshold had its best suitable method, which means that hERG blockage assessment might depend on threshold values. Meanwhile, kNN and SVM methods were better than the others for model building. Furthermore, six privileged substructures were identified using information gain and frequency analysis methods, which could be regarded as structural alerts of cardiac toxicity mediated by hERG channel blockage.
Collapse
Affiliation(s)
- Chen Zhang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Yuan Zhou
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Shikai Gu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Zengrui Wu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Wenjie Wu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Changming Liu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Kaidong Wang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Philip W Lee
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| |
Collapse
|
37
|
Gupta S, Basant N, Singh KP. Three-Tier Strategy for Screening High-Energy Molecules Using Structure–Property Relationship Modeling Approaches. Ind Eng Chem Res 2016. [DOI: 10.1021/acs.iecr.5b03575] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Shikha Gupta
- Environmental
Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001, India
| | | | - Kunwar P. Singh
- Environmental
Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001, India
| |
Collapse
|
38
|
Basant N, Gupta S, Singh KP. Predicting binding affinities of diverse pharmaceutical chemicals to human serum plasma proteins using QSPR modelling approaches. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2016; 27:67-85. [PMID: 26854728 DOI: 10.1080/1062936x.2015.1133700] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The prediction of the plasma protein binding (PPB) affinity of chemicals is of paramount significance in the drug development process. In this study, ensemble machine learning-based QSPR models have been established for a four-category classification and PPB affinity prediction of diverse compounds using a large PPB dataset of 930 compounds and in accordance with the OECD guidelines. The structural diversity of the chemicals was tested by the Tanimoto similarity index. The external predictive power of the developed QSPR models was evaluated through internal and external validations. In the QSPR models, XLogP was the most important descriptor. In the test data, the classification QSPR models rendered an accuracy of >93%, while the regression QSPR models yielded r(2) of >0.920 between the measured and predicted PPB affinities, with the root mean squared error <9.77. Values of statistical coefficients derived for the test data were above their threshold limits, thus put a high confidence in this analysis. The QSPR models in this study performed better than any of the previous studies. The results suggest that the developed QSPR models are reliable for predicting the PPB affinity of structurally diverse chemicals. They can be useful for initial screening of candidate molecules in the drug development process.
Collapse
Affiliation(s)
- N Basant
- a ETRC , Gomtinagar, Lucknow , India
| | - S Gupta
- b Environmental Chemistry Division , CSIR-Indian Institute of Toxicology Research , Lucknow , India
| | - K P Singh
- b Environmental Chemistry Division , CSIR-Indian Institute of Toxicology Research , Lucknow , India
| |
Collapse
|
39
|
Gupta S, Basant N, Rai P, Singh KP. Modeling the binding affinity of structurally diverse industrial chemicals to carbon using the artificial intelligence approaches. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2015; 22:17810-17827. [PMID: 26160122 DOI: 10.1007/s11356-015-4965-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Accepted: 06/25/2015] [Indexed: 06/04/2023]
Abstract
Binding affinity of chemical to carbon is an important characteristic as it finds vast industrial applications. Experimental determination of the adsorption capacity of diverse chemicals onto carbon is both time and resource intensive, and development of computational approaches has widely been advocated. In this study, artificial intelligence (AI)-based ten different qualitative and quantitative structure-property relationship (QSPR) models (MLPN, RBFN, PNN/GRNN, CCN, SVM, GEP, GMDH, SDT, DTF, DTB) were established for the prediction of the adsorption capacity of structurally diverse chemicals to activated carbon following the OECD guidelines. Structural diversity of the chemicals and nonlinear dependence in the data were evaluated using the Tanimoto similarity index and Brock-Dechert-Scheinkman statistics. The generalization and prediction abilities of the constructed models were established through rigorous internal and external validation procedures performed employing a wide series of statistical checks. In complete dataset, the qualitative models rendered classification accuracies between 97.04 and 99.93%, while the quantitative models yielded correlation (R(2)) values of 0.877-0.977 between the measured and the predicted endpoint values. The quantitative prediction accuracies for the higher molecular weight (MW) compounds (class 4) were relatively better than those for the low MW compounds. Both in the qualitative and quantitative models, the Polarizability was the most influential descriptor. Structural alerts responsible for the extreme adsorption behavior of the compounds were identified. Higher number of carbon and presence of higher halogens in a molecule rendered higher binding affinity. Proposed QSPR models performed well and outperformed the previous reports. A relatively better performance of the ensemble learning models (DTF, DTB) may be attributed to the strengths of the bagging and boosting algorithms which enhance the predictive accuracies. The proposed AI models can be useful tools in screening the chemicals for their binding affinities toward carbon for their safe management.
Collapse
Affiliation(s)
- Shikha Gupta
- Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow, 226 001, India
| | - Nikita Basant
- KanbanSystems Pvt. Ltd., Laxmi Nagar, Delhi, 110092, India
| | - Premanjali Rai
- Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow, 226 001, India
| | - Kunwar P Singh
- Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow, 226 001, India.
| |
Collapse
|
40
|
Zhang C, Hong H, Mendrick DL, Tang Y, Cheng F. Biomarker-based drug safety assessment in the age of systems pharmacology: from foundational to regulatory science. Biomark Med 2015; 9:1241-52. [PMID: 26506997 DOI: 10.2217/bmm.15.81] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Improved biomarker-based assessment of drug safety is needed in drug discovery and development as well as regulatory evaluation. However, identifying drug safety-related biomarkers such as genes, proteins, miRNA and single-nucleotide polymorphisms remains a big challenge. The advances of 'omics' and computational technologies such as genomics, transcriptomics, metabolomics, proteomics, systems biology, network biology and systems pharmacology enable us to explore drug actions at the organ and organismal levels. Computational and experimental systems pharmacology approaches could be utilized to facilitate biomarker-based drug safety assessment for drug discovery and development and to inform better regulatory decisions. In this article, we review the current status and advances of systems pharmacology approaches for the development of predictive models to identify biomarkers for drug safety assessment.
Collapse
Affiliation(s)
- Chen Zhang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, 130 Meilong Road, Shanghai 200237, China
| | - Huixiao Hong
- National Center for Toxicological Research, US Food & Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA
| | - Donna L Mendrick
- National Center for Toxicological Research, US Food & Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, 130 Meilong Road, Shanghai 200237, China
| | - Feixiong Cheng
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, 130 Meilong Road, Shanghai 200237, China.,State Key Laboratory of Biotherapy/Collaborative Innovation Center for Biotherapy, West China Hospital, West China Medical School, Sichuan University, Chengdu 610041, Sichuan, China
| |
Collapse
|
41
|
Pires DEV, Blundell TL, Ascher DB. pkCSM: Predicting Small-Molecule Pharmacokinetic and Toxicity Properties Using Graph-Based Signatures. J Med Chem 2015; 58:4066-72. [PMID: 25860834 PMCID: PMC4434528 DOI: 10.1021/acs.jmedchem.5b00104] [Citation(s) in RCA: 2026] [Impact Index Per Article: 225.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
![]()
Drug development has a high attrition
rate, with poor pharmacokinetic
and safety properties a significant hurdle. Computational approaches
may help minimize these risks. We have developed a novel approach
(pkCSM) which uses graph-based signatures to develop predictive models
of central ADMET properties for drug development. pkCSM performs as
well or better than current methods. A freely accessible web server
(http://structure.bioc.cam.ac.uk/pkcsm), which retains
no information submitted to it, provides an integrated platform to
rapidly evaluate pharmacokinetic and toxicity properties.
Collapse
Affiliation(s)
- Douglas E V Pires
- †Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Sanger Building, Cambridge, Cambridgshire CB2 1GA, U.K.,‡Centro de Pesquisas René Rachou, Fundação Oswaldo Cruz, Belo Horizonte 30190-002, Brazil
| | - Tom L Blundell
- †Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Sanger Building, Cambridge, Cambridgshire CB2 1GA, U.K
| | - David B Ascher
- †Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Sanger Building, Cambridge, Cambridgshire CB2 1GA, U.K
| |
Collapse
|
42
|
Zhang C, Cheng F, Sun L, Zhuang S, Li W, Liu G, Lee PW, Tang Y. In silico prediction of chemical toxicity on avian species using chemical category approaches. CHEMOSPHERE 2015; 122:280-287. [PMID: 25532772 DOI: 10.1016/j.chemosphere.2014.12.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2014] [Revised: 11/28/2014] [Accepted: 12/01/2014] [Indexed: 06/04/2023]
Abstract
Avian species are sensitive to pesticides and industrial chemicals, and hence used as model species in evaluation of chemical toxicity. In present study, we assessed the toxicity of more than 663 diverse chemicals on 17 avian species. All the chemicals were classified into three categories, i.e. highly toxic, slightly toxic and non-toxic, based on the toxicity classification criteria of the United States Environmental Protection Agency (EPA). To evaluate these chemicals, the toxicity prediction models were built using chemical category approaches with molecular descriptors and five commonly used fingerprints, in which five machine learning methods were performed on two standard test species: aquatic bird mallard duck and terrestrial bird northern bobwhite quail. The support vector machine (SVM) method with Pubchem fingerprint performed best as revealed by 5-fold cross-validation and the external validation set on Japanese quail. No species difference existed in our database despite several chemicals with different toxicity on some avian species. The best model had an overall accuracy at 0.851 for the prediction of toxicity on avian species, which outperformed the work of Mazzatorta et al. Furthermore, several representative substructures for characterizing avian toxicity were identified via information gain (IG) method. This study would provide a new tool for chemical safety assessment.
Collapse
Affiliation(s)
- Chen Zhang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Feixiong Cheng
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Lu Sun
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Shulin Zhuang
- Institute of Environmental Sciences, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310058, China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Philip W Lee
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China.
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China.
| |
Collapse
|
43
|
Gupta S, Basant N, Singh KP. Qualitative and quantitative structure-activity relationship modelling for predicting blood-brain barrier permeability of structurally diverse chemicals. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2015; 26:95-124. [PMID: 25629764 DOI: 10.1080/1062936x.2014.994562] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In this study, structure-activity relationship (SAR) models have been established for qualitative and quantitative prediction of the blood-brain barrier (BBB) permeability of chemicals. The structural diversity of the chemicals and nonlinear structure in the data were tested. The predictive and generalization ability of the developed SAR models were tested through internal and external validation procedures. In complete data, the QSAR models rendered ternary classification accuracy of >98.15%, while the quantitative SAR models yielded correlation (r(2)) of >0.926 between the measured and the predicted BBB permeability values with the mean squared error (MSE) <0.045. The proposed models were also applied to an external new in vitro data and yielded classification accuracy of >82.7% and r(2) > 0.905 (MSE < 0.019). The sensitivity analysis revealed that topological polar surface area (TPSA) has the highest effect in qualitative and quantitative models for predicting the BBB permeability of chemicals. Moreover, these models showed predictive performance superior to those reported earlier in the literature. This demonstrates the appropriateness of the developed SAR models to reliably predict the BBB permeability of new chemicals, which can be used for initial screening of the molecules in the drug development process.
Collapse
Affiliation(s)
- S Gupta
- a Academy of Scientific and Innovative Research , Anusandhan Bhawan, New Delhi , India
| | | | | |
Collapse
|
44
|
Gupta S, Basant N, Singh KP. Predicting aquatic toxicities of benzene derivatives in multiple test species using local, global and interspecies QSTR modeling approaches. RSC Adv 2015. [DOI: 10.1039/c5ra12825k] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A flow diagram showing QSTR modeling strategy for aquatic toxicity prediction of benzene derivatives in multiple test species.
Collapse
Affiliation(s)
- Shikha Gupta
- Environmental Chemistry Division
- CSIR-Indian Institute of Toxicology Research
- Lucknow-226001
- India
| | | | - Kunwar P. Singh
- Environmental Chemistry Division
- CSIR-Indian Institute of Toxicology Research
- Lucknow-226001
- India
| |
Collapse
|
45
|
Sun L, Zhang C, Chen Y, Li X, Zhuang S, Li W, Liu G, Lee PW, Tang Y. In silico prediction of chemical aquatic toxicity with chemical category approaches and substructural alerts. Toxicol Res (Camb) 2015. [DOI: 10.1039/c4tx00174e] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Aquatic toxicity is an important endpoint in the evaluation of chemically adverse effects on ecosystems.
Collapse
Affiliation(s)
- Lu Sun
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| | - Chen Zhang
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| | - Yingjie Chen
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| | - Xiao Li
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| | - Shulin Zhuang
- College of Environmental and Resource Sciences
- Zhejiang University
- Hangzhou 310058
- China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| | - Philip W. Lee
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| |
Collapse
|
46
|
Singh KP, Gupta S, Basant N, Mohan D. QSTR Modeling for Qualitative and Quantitative Toxicity Predictions of Diverse Chemical Pesticides in Honey Bee for Regulatory Purposes. Chem Res Toxicol 2014; 27:1504-15. [DOI: 10.1021/tx500100m] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Kunwar P. Singh
- Academy of Scientific
and Innovative Research, Anusandhan
Bhawan, Rafi Marg, New Delhi-110 001, India
- Environmental
Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow-226 001, India
| | - Shikha Gupta
- Academy of Scientific
and Innovative Research, Anusandhan
Bhawan, Rafi Marg, New Delhi-110 001, India
- Environmental
Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow-226 001, India
| | - Nikita Basant
- Kanban Systems Pvt.
Ltd., Laxmi Nagar, Delhi-110092, India
| | - Dinesh Mohan
- School
of Environmental Sciences, Jawaharlal Nehru University, New Delhi-110067, India
| |
Collapse
|
47
|
Singh KP, Gupta S, Rai P. Investigating hydrochemistry of groundwater in Indo-Gangetic alluvial plain using multivariate chemometric approaches. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2014; 21:6001-6015. [PMID: 24464077 DOI: 10.1007/s11356-014-2517-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Accepted: 01/05/2014] [Indexed: 06/03/2023]
Abstract
Groundwater hydrochemistry of an urban industrial region in Indo-Gangetic plains of north India was investigated. Groundwater samples were collected both from the industrial and non-industrial areas of Kanpur. The hydrochemical data were analyzed using various water quality indices and nonparametric statistical methods. Principal components analysis (PCA) was performed to identify the factors responsible for groundwater contamination. Ensemble learning-based decision treeboost (DTB) models were constructed to develop discriminating and regression functions to differentiate the groundwater hydrochemistry of the three different areas, to identify the responsible factors, and to predict the groundwater quality using selected measured variables. The results indicated non-normal distribution and wide variability of water quality variables in all the study areas, suggesting for nonhomogenous distribution of sources in the region. PCA results showed contaminants of industrial origin dominating in the region. DBT classification model identified pH, redox potential, total-Cr, and λ 254 as the discriminating variables in water quality of the three areas with the average accuracy of 99.51 % in complete data. The regression model predicted the groundwater chemical oxygen demand values exhibiting high correlation with measured values (0.962 in training; 0.918 in test) and the respective low root mean-squared error of 2.24 and 2.01 in training and test arrays. The statistical and chemometric approaches used here suggest that groundwater hydrochemistry differs in the three areas and is dominated by different variables. The proposed methods can be used as effective tools in groundwater management.
Collapse
Affiliation(s)
- Kunwar P Singh
- Academy of Scientific and Innovative Research, Anusandhan Bhawan, Rafi marg, New Delhi, 110 001, India,
| | | | | |
Collapse
|
48
|
Singh KP, Gupta S, Kumar A, Mohan D. Multispecies QSAR modeling for predicting the aquatic toxicity of diverse organic chemicals for regulatory toxicology. Chem Res Toxicol 2014; 27:741-53. [PMID: 24738471 DOI: 10.1021/tx400371w] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The research aims to develop multispecies quantitative structure-activity relationships (QSARs) modeling tools capable of predicting the acute toxicity of diverse chemicals in various Organization for Economic Co-operation and Development (OECD) recommended test species of different trophic levels for regulatory toxicology. Accordingly, the ensemble learning (EL) approach based classification and regression QSAR models, such as decision treeboost (DTB) and decision tree forest (DTF) implementing stochastic gradient boosting and bagging algorithms were developed using the algae (P. subcapitata) experimental toxicity data for chemicals. The EL-QSAR models were successfully applied to predict toxicities of wide groups of chemicals in other test species including algae (S. obliguue), daphnia, fish, and bacteria. Structural diversity of the selected chemicals and those of the end-point toxicity data of five different test species were tested using the Tanimoto similarity index and Kruskal-Wallis (K-W) statistics. Predictive and generalization abilities of the constructed QSAR models were compared using statistical parameters. The developed QSAR models (DTB and DTF) yielded a considerably high classification accuracy in complete data of model building (algae) species (97.82%, 99.01%) and ranged between 92.50%-94.26% and 92.14%-94.12% in four test species, respectively, whereas regression QSAR models (DTB and DTF) rendered high correlation (R(2)) between the measured and model predicted toxicity end-point values and low mean-squared error in model building (algae) species (0.918, 0.15; 0.905, 0.21) and ranged between 0.575 and 0.672, 0.18-0.51 and 0.605-0.689 and 0.20-0.45 in four different test species. The developed QSAR models exhibited good predictive and generalization abilities in different test species of varied trophic levels and can be used for predicting the toxicities of new chemicals for screening and prioritization of chemicals for regulation.
Collapse
Affiliation(s)
- Kunwar P Singh
- Academy of Scientific and Innovative Research, Anusandhan Bhawan, Rafi Marg, New Delhi-110 001, India
| | | | | | | |
Collapse
|
49
|
Cheng F, Zhao Z. Machine learning-based prediction of drug-drug interactions by integrating drug phenotypic, therapeutic, chemical, and genomic properties. J Am Med Inform Assoc 2014; 21:e278-86. [PMID: 24644270 DOI: 10.1136/amiajnl-2013-002512] [Citation(s) in RCA: 181] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
OBJECTIVE Drug-drug interactions (DDIs) are an important consideration in both drug development and clinical application, especially for co-administered medications. While it is necessary to identify all possible DDIs during clinical trials, DDIs are frequently reported after the drugs are approved for clinical use, and they are a common cause of adverse drug reactions (ADR) and increasing healthcare costs. Computational prediction may assist in identifying potential DDIs during clinical trials. METHODS Here we propose a heterogeneous network-assisted inference (HNAI) framework to assist with the prediction of DDIs. First, we constructed a comprehensive DDI network that contained 6946 unique DDI pairs connecting 721 approved drugs based on DrugBank data. Next, we calculated drug-drug pair similarities using four features: phenotypic similarity based on a comprehensive drug-ADR network, therapeutic similarity based on the drug Anatomical Therapeutic Chemical classification system, chemical structural similarity from SMILES data, and genomic similarity based on a large drug-target interaction network built using the DrugBank and Therapeutic Target Database. Finally, we applied five predictive models in the HNAI framework: naive Bayes, decision tree, k-nearest neighbor, logistic regression, and support vector machine, respectively. RESULTS The area under the receiver operating characteristic curve of the HNAI models is 0.67 as evaluated using fivefold cross-validation. Using antipsychotic drugs as an example, several HNAI-predicted DDIs that involve weight gain and cytochrome P450 inhibition were supported by literature resources. CONCLUSIONS Through machine learning-based integration of drug phenotypic, therapeutic, structural, and genomic similarities, we demonstrated that HNAI is promising for uncovering DDIs in drug development and postmarketing surveillance.
Collapse
Affiliation(s)
- Feixiong Cheng
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, USA Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, Tennessee, USA Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
50
|
Singh KP, Gupta S. In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches. Toxicol Appl Pharmacol 2014; 275:198-212. [PMID: 24463095 DOI: 10.1016/j.taap.2014.01.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2013] [Revised: 01/04/2014] [Accepted: 01/13/2014] [Indexed: 02/03/2023]
Abstract
Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure-toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data, optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R(2)) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R(2) and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals.
Collapse
Affiliation(s)
- Kunwar P Singh
- Academy of Scientific and Innovative Research, Anusandhan Bhawan, Rafi Marg, New Delhi 110 001, India; Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001, India.
| | - Shikha Gupta
- Academy of Scientific and Innovative Research, Anusandhan Bhawan, Rafi Marg, New Delhi 110 001, India; Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001, India
| |
Collapse
|