1
|
Saifi I, Bhat BA, Hamdani SS, Bhat UY, Lobato-Tapia CA, Mir MA, Dar TUH, Ganie SA. Artificial intelligence and cheminformatics tools: a contribution to the drug development and chemical science. J Biomol Struct Dyn 2024; 42:6523-6541. [PMID: 37434311 DOI: 10.1080/07391102.2023.2234039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 07/03/2023] [Indexed: 07/13/2023]
Abstract
In the ever-evolving field of drug discovery, the integration of Artificial Intelligence (AI) and Machine Learning (ML) with cheminformatics has proven to be a powerful combination. Cheminformatics, which combines the principles of computer science and chemistry, is used to extract chemical information and search compound databases, while the application of AI and ML allows for the identification of potential hit compounds, optimization of synthesis routes, and prediction of drug efficacy and toxicity. This collaborative approach has led to the discovery, preclinical evaluations and approval of over 70 drugs in recent years. To aid researchers in the pursuit of new drugs, this article presents a comprehensive list of databases, datasets, predictive and generative models, scoring functions and web platforms that have been launched between 2021 and 2022. These resources provide a wealth of information and tools for computer-assisted drug development, and are a valuable asset for those working in the field of cheminformatics. Overall, the integration of AI, ML and cheminformatics has greatly advanced the drug discovery process and continues to hold great potential for the future. As new resources and technologies become available, we can expect to see even more groundbreaking discoveries and advancements in these fields.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ifra Saifi
- Chaudhary Charan Singh University, Meerut, Uttar Pradesh, India
| | - Basharat Ahmad Bhat
- Department of Bioresources, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| | - Syed Suhail Hamdani
- Department of Bioresources, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| | - Umar Yousuf Bhat
- Department of Zoology, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| | | | - Mushtaq Ahmad Mir
- Department of Clinical Laboratory Sciences, College of Applied Medical Science, King Khalid University, KSA, Saudi Arabia
| | - Tanvir Ul Hasan Dar
- Department of Biotechnology, School of Biosciences and Biotechnology, BGSB University, Rajouri, India
| | - Showkat Ahmad Ganie
- Department of Clinical Biochemistry, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| |
Collapse
|
2
|
Schuh M, Boldini D, Sieber SA. Synergizing Chemical Structures and Bioassay Descriptions for Enhanced Molecular Property Prediction in Drug Discovery. J Chem Inf Model 2024; 64:4640-4650. [PMID: 38836773 PMCID: PMC11200265 DOI: 10.1021/acs.jcim.4c00765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 05/23/2024] [Accepted: 05/23/2024] [Indexed: 06/06/2024]
Abstract
The precise prediction of molecular properties can greatly accelerate the development of new drugs. However, in silico molecular property prediction approaches have been limited so far to assays for which large amounts of data are available. In this study, we develop a new computational approach leveraging both the textual description of the assay of interest and the chemical structure of target compounds. By combining these two sources of information via self-supervised learning, our tool can provide accurate predictions for assays where no measurements are available. Remarkably, our approach achieves state-of-the-art performance on the FS-Mol benchmark for zero-shot prediction, outperforming a wide variety of deep learning approaches. Additionally, we demonstrate how our tool can be used for tailoring screening libraries for the assay of interest, showing promising performance in a retrospective case study on a high-throughput screening campaign. By accelerating the early identification of active molecules in drug discovery and development, this method has the potential to streamline the identification of novel therapeutics.
Collapse
Affiliation(s)
- Maximilian
G. Schuh
- TUM School of Natural Sciences, Department
of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, 85748 Garching
bei München, Germany
| | - Davide Boldini
- TUM School of Natural Sciences, Department
of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, 85748 Garching
bei München, Germany
| | - Stephan A. Sieber
- TUM School of Natural Sciences, Department
of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, 85748 Garching
bei München, Germany
| |
Collapse
|
3
|
Amorim AM, Piochi LF, Gaspar AT, Preto A, Rosário-Ferreira N, Moreira IS. Advancing Drug Safety in Drug Development: Bridging Computational Predictions for Enhanced Toxicity Prediction. Chem Res Toxicol 2024; 37:827-849. [PMID: 38758610 PMCID: PMC11187637 DOI: 10.1021/acs.chemrestox.3c00352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/29/2024] [Accepted: 05/07/2024] [Indexed: 05/19/2024]
Abstract
The attrition rate of drugs in clinical trials is generally quite high, with estimates suggesting that approximately 90% of drugs fail to make it through the process. The identification of unexpected toxicity issues during preclinical stages is a significant factor contributing to this high rate of failure. These issues can have a major impact on the success of a drug and must be carefully considered throughout the development process. These late-stage rejections or withdrawals of drug candidates significantly increase the costs associated with drug development, particularly when toxicity is detected during clinical trials or after market release. Understanding drug-biological target interactions is essential for evaluating compound toxicity and safety, as well as predicting therapeutic effects and potential off-target effects that could lead to toxicity. This will enable scientists to predict and assess the safety profiles of drug candidates more accurately. Evaluation of toxicity and safety is a critical aspect of drug development, and biomolecules, particularly proteins, play vital roles in complex biological networks and often serve as targets for various chemicals. Therefore, a better understanding of these interactions is crucial for the advancement of drug development. The development of computational methods for evaluating protein-ligand interactions and predicting toxicity is emerging as a promising approach that adheres to the 3Rs principles (replace, reduce, and refine) and has garnered significant attention in recent years. In this review, we present a thorough examination of the latest breakthroughs in drug toxicity prediction, highlighting the significance of drug-target binding affinity in anticipating and mitigating possible adverse effects. In doing so, we aim to contribute to the development of more effective and secure drugs.
Collapse
Affiliation(s)
- Ana M.
B. Amorim
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD
Programme in Biosciences, Department of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PURR.AI,
Rua Pedro Nunes, IPN Incubadora, Ed C, 3030-199 Coimbra, Portugal
| | - Luiz F. Piochi
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Ana T. Gaspar
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - António
J. Preto
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD Programme
in Experimental Biology and Biomedicine, Institute for Interdisciplinary
Research (IIIUC), University of Coimbra, Casa Costa Alemão, 3030-789 Coimbra, Portugal
| | - Nícia Rosário-Ferreira
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Irina S. Moreira
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| |
Collapse
|
4
|
Daghighi A, Casanola-Martin GM, Iduoku K, Kusic H, González-Díaz H, Rasulev B. Multi-Endpoint Acute Toxicity Assessment of Organic Compounds Using Large-Scale Machine Learning Modeling. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:10116-10127. [PMID: 38797941 DOI: 10.1021/acs.est.4c01017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
In recent years, alternative animal testing methods such as computational and machine learning approaches have become increasingly crucial for toxicity testing. However, the complexity and scarcity of available biomedical data challenge the development of predictive models. Combining nonlinear machine learning together with multicondition descriptors offers a solution for using data from various assays to create a robust model. This work applies multicondition descriptors (MCDs) to develop a QSTR (Quantitative Structure-Toxicity Relationship) model based on a large toxicity data set comprising more than 80,000 compounds and 59 different end points (122,572 data points). The prediction capabilities of developed single-task multi-end point machine learning models as well as a novel data analysis approach with the use of Convolutional Neural Networks (CNN) are discussed. The results show that using MCDs significantly improves the model and using them with CNN-1D yields the best result (R2train = 0.93, R2ext = 0.70). Several structural features showed a high level of contribution to the toxicity, including van der Waals surface area (VSA), number of nitrogen-containing fragments (nN+), presence of S-P fragments, ionization potential, and presence of C-N fragments. The developed models can be very useful tools to predict the toxicity of various compounds under different conditions, enabling quick toxicity assessment of new compounds.
Collapse
Affiliation(s)
- Amirreza Daghighi
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
- Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States
| | - Gerardo M Casanola-Martin
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
| | - Kweeni Iduoku
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
- Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States
| | - Hrvoje Kusic
- Faculty of Chemical Engineering and Technology, University of Zagreb, Marulicev Trg 19, Zagreb 10000, Croatia
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, Leioa 48940, Spain
- BIOFISIKA, Basque Center for Biophysics CSIC-UPVEH, Leioa 48940, Spain
- IKERBASQUE, Basque Foundation for Science,Bilbao, Biscay 48011, Spain
| | - Bakhtiyor Rasulev
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
- Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States
| |
Collapse
|
5
|
Tinkov OV, Osipov VN, Kolotaev AV, Khachatryan DS, Grigorev VY. HT_PREDICT: a machine learning-based computational open-source tool for screening HDAC6 inhibitors. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2024; 35:505-530. [PMID: 39007781 DOI: 10.1080/1062936x.2024.2371155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Accepted: 06/17/2024] [Indexed: 07/16/2024]
Abstract
Histone deacetylase 6 (HDAC6) is a promising drug target for the treatment of human diseases such as cancer, neurodegenerative diseases (in particular, Alzheimer's disease), and multiple sclerosis. Considerable attention is paid to the development of selective non-toxic HDAC6 inhibitors. To this end, we successfully form a set of 3854 compounds and proposed adequate regression QSAR models for HDAC6 inhibitors. The models have been developed using the PubChem, Klekota-Roth, 2D atom pair fingerprints, and RDkit descriptors and the gradient boosting, support vector machines, neural network, and k-nearest neighbours methods. The models are integrated into the developed HT_PREDICT application, which is freely available at https://htpredict.streamlit.app/. In vitro studies have confirmed the predictive ability of the proposed QSAR models integrated into the HT_PREDICT web application. In addition, the virtual screening performed with the HT_PREDICT web application allowed us to propose two promising inhibitors for further investigations.
Collapse
Affiliation(s)
- O V Tinkov
- Department of Pharmacology and Pharmaceutical Chemistry, Medical Faculty, Shevchenko Transnistria State University, Tiraspol, Moldova
| | - V N Osipov
- Department of Chemical Synthesis, Blokhin National Medical Research Center of Oncology, Ministry of Health of the Russian Federation, Moscow, Russia
| | - A V Kolotaev
- Laboratory of Natural Compounds, National Research Centre "Kurchatov Institute", Moscow, Russia
| | - D S Khachatryan
- Laboratory of Natural Compounds, National Research Centre "Kurchatov Institute", Moscow, Russia
| | - V Y Grigorev
- Institute of Physiologically Active Compounds, Federal Research Center of Problems of Chemical Physics and Medicinal Chemistry, Russian Academy of Sciences, Chernogolovka, Russia
| |
Collapse
|
6
|
Shkil DO, Muhamedzhanova AA, Petrov PI, Skorb EV, Aliev TA, Steshin IS, Tumanov AV, Kislinskiy AS, Fedorov MV. Expanding Predictive Capacities in Toxicology: Insights from Hackathon-Enhanced Data and Model Aggregation. Molecules 2024; 29:1826. [PMID: 38675645 PMCID: PMC11055041 DOI: 10.3390/molecules29081826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/11/2024] [Accepted: 04/15/2024] [Indexed: 04/28/2024] Open
Abstract
In the realm of predictive toxicology for small molecules, the applicability domain of QSAR models is often limited by the coverage of the chemical space in the training set. Consequently, classical models fail to provide reliable predictions for wide classes of molecules. However, the emergence of innovative data collection methods such as intensive hackathons have promise to quickly expand the available chemical space for model construction. Combined with algorithmic refinement methods, these tools can address the challenges of toxicity prediction, enhancing both the robustness and applicability of the corresponding models. This study aimed to investigate the roles of gradient boosting and strategic data aggregation in enhancing the predictivity ability of models for the toxicity of small organic molecules. We focused on evaluating the impact of incorporating fragment features and expanding the chemical space, facilitated by a comprehensive dataset procured in an open hackathon. We used gradient boosting techniques, accounting for critical features such as the structural fragments or functional groups often associated with manifestations of toxicity.
Collapse
Affiliation(s)
- Dmitrii O. Shkil
- Syntelly LLC, Moscow 121205, Russia; (A.A.M.); (I.S.S.); (A.V.T.); (A.S.K.)
- Moscow Institute of Physics and Technology, Moscow 141700, Russia
| | | | | | - Ekaterina V. Skorb
- Infochemistry Scientific Center, ITMO University, Saint-Petersburg 191002, Russia; (E.V.S.); (T.A.A.)
| | - Timur A. Aliev
- Infochemistry Scientific Center, ITMO University, Saint-Petersburg 191002, Russia; (E.V.S.); (T.A.A.)
| | - Ilya S. Steshin
- Syntelly LLC, Moscow 121205, Russia; (A.A.M.); (I.S.S.); (A.V.T.); (A.S.K.)
| | | | | | - Maxim V. Fedorov
- Kharkevich Institute for Information Transmission Problems of Russian Academy of Sciences, Moscow 127994, Russia
| |
Collapse
|
7
|
Mostafa F, Chen M. Computational models for predicting liver toxicity in the deep learning era. FRONTIERS IN TOXICOLOGY 2024; 5:1340860. [PMID: 38312894 PMCID: PMC10834666 DOI: 10.3389/ftox.2023.1340860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 12/22/2023] [Indexed: 02/06/2024] Open
Abstract
Drug-induced liver injury (DILI) is a severe adverse reaction caused by drugs and may result in acute liver failure and even death. Many efforts have centered on mitigating risks associated with potential DILI in humans. Among these, quantitative structure-activity relationship (QSAR) was proven to be a valuable tool for early-stage hepatotoxicity screening. Its advantages include no requirement for physical substances and rapid delivery of results. Deep learning (DL) made rapid advancements recently and has been used for developing QSAR models. This review discusses the use of DL in predicting DILI, focusing on the development of QSAR models employing extensive chemical structure datasets alongside their corresponding DILI outcomes. We undertake a comprehensive evaluation of various DL methods, comparing with those of traditional machine learning (ML) approaches, and explore the strengths and limitations of DL techniques regarding their interpretability, scalability, and generalization. Overall, our review underscores the potential of DL methodologies to enhance DILI prediction and provides insights into future avenues for developing predictive models to mitigate DILI risk in humans.
Collapse
Affiliation(s)
- Fahad Mostafa
- Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, United States
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | - Minjun Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| |
Collapse
|
8
|
Banerjee A, Roy K. Read-across-based intelligent learning: development of a global q-RASAR model for the efficient quantitative predictions of skin sensitization potential of diverse organic chemicals. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2023; 25:1626-1644. [PMID: 37682520 DOI: 10.1039/d3em00322a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Environmental chemicals and contaminants cause a wide array of harmful implications to terrestrial and aquatic life which ranges from skin sensitization to acute oral toxicity. The current study aims to assess the quantitative skin sensitization potential of a large set of industrial and environmental chemicals acting through different mechanisms using the novel quantitative Read-Across Structure-Activity Relationship (q-RASAR) approach. Based on the identified important set of structural and physicochemical features, Read-Across-based hyperparameters were optimized using the training set compounds followed by the calculation of similarity and error-based RASAR descriptors. Data fusion, further feature selection, and removal of prediction confidence outliers were performed to generate a partial least squares (PLS) q-RASAR model, followed by the application of various Machine Learning (ML) tools to check the quality of predictions. The PLS model was found to be the best among different models. A simple user-friendly Java-based software tool was developed based on the PLS model, which efficiently predicts the toxicity value(s) of query compound(s) along with their status of Applicability Domain (AD) in terms of leverage values. This model has been developed using structurally diverse compounds and is expected to predict efficiently and quantitatively the skin sensitization potential of environmental chemicals to estimate their occupational and health hazards.
Collapse
Affiliation(s)
- Arkaprava Banerjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
9
|
Viljanen M, Minnema J, Wassenaar PNH, Rorije E, Peijnenburg W. What is the ecotoxicity of a given chemical for a given aquatic species? Predicting interactions between species and chemicals using recommender system techniques. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2023; 34:765-788. [PMID: 37670728 DOI: 10.1080/1062936x.2023.2254225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 08/27/2023] [Indexed: 09/07/2023]
Abstract
Ecotoxicological safety assessment of chemicals requires toxicity data on multiple species, despite the general desire of minimizing animal testing. Predictive models, specifically machine learning (ML) methods, are one of the tools capable of solving this apparent contradiction as they allow to generalize toxicity patterns across chemicals and species. However, despite the availability of large public toxicity datasets, the data is highly sparse, complicating model development. The aim of this study is to provide insights into how ML can predict toxicity using a large but sparse dataset. We developed models to predict LC50-values, based on experimental LC50-data covering 2431 organic chemicals and 1506 aquatic species from the ECOTOX-database. Several well-known ML techniques were evaluated and a new ML model was developed, inspired by recommender systems. This new model involves a simple linear model that learns low-rank interactions between species and chemicals using factorization machines. We evaluated the predictive performances of the developed models based on two validation settings: 1) predicting unseen chemical-species pairs, and 2) predicting unseen chemicals. The results of this study show that ML models can accurately predict LC50-values in both validation settings. Moreover, we show that the novel factorization machine approach can match well-tuned, complex, ML approaches.
Collapse
Affiliation(s)
- M Viljanen
- Department of Statistics, Data Science and Modelling, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - J Minnema
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - P N H Wassenaar
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - E Rorije
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - W Peijnenburg
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
- Institute of Environmental Sciences (CML), Leiden University, Leiden, The Netherlands
| |
Collapse
|
10
|
Boldini D, Grisoni F, Kuhn D, Friedrich L, Sieber SA. Practical guidelines for the use of gradient boosting for molecular property prediction. J Cheminform 2023; 15:73. [PMID: 37641120 PMCID: PMC10464382 DOI: 10.1186/s13321-023-00743-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 08/09/2023] [Indexed: 08/31/2023] Open
Abstract
Decision tree ensembles are among the most robust, high-performing and computationally efficient machine learning approaches for quantitative structure-activity relationship (QSAR) modeling. Among them, gradient boosting has recently garnered particular attention, for its performance in data science competitions, virtual screening campaigns, and bioactivity prediction. However, different variants of gradient boosting exist, the most popular being XGBoost, LightGBM and CatBoost. Our study provides the first comprehensive comparison of these approaches for QSAR. To this end, we trained 157,590 gradient boosting models, which were evaluated on 16 datasets and 94 endpoints, comprising 1.4 million compounds in total. Our results show that XGBoost generally achieves the best predictive performance, while LightGBM requires the least training time, especially for larger datasets. In terms of feature importance, the models surprisingly rank molecular features differently, reflecting differences in regularization techniques and decision tree structures. Thus, expert knowledge must always be employed when evaluating data-driven explanations of bioactivity. Furthermore, our results show that the relevance of each hyperparameter varies greatly across datasets and that it is crucial to optimize as many hyperparameters as possible to maximize the predictive performance. In conclusion, our study provides the first set of guidelines for cheminformatics practitioners to effectively train, optimize and evaluate gradient boosting models for virtual screening and QSAR applications.
Collapse
Affiliation(s)
- Davide Boldini
- Department of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, Garching bei Munich, Germany
| | - Francesca Grisoni
- Department of Biomedical Engineering, Institute for Complex Molecular Sciences, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/E, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | | | - Stephan A Sieber
- Department of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, Garching bei Munich, Germany.
| |
Collapse
|
11
|
Bo T, Lin Y, Han J, Hao Z, Liu J. Machine learning-assisted data filtering and QSAR models for prediction of chemical acute toxicity on rat and mouse. JOURNAL OF HAZARDOUS MATERIALS 2023; 452:131344. [PMID: 37027914 DOI: 10.1016/j.jhazmat.2023.131344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 03/20/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
Machine learning (ML) methods provide a new opportunity to build quantitative structure-activity relationship (QSAR) models for predicting chemicals' toxicity based on large toxicity data sets, but they are limited in insufficient model robustness due to poor data set quality for chemicals with certain structures. To address this issue and improve model robustness, we built a large data set on rat oral acute toxicity for thousands of chemicals, then used ML to filter chemicals favorable for regression models (CFRM). In comparison to chemicals not favorable for regression models (CNRM), CFRM accounted for 67% of chemicals in the original data set, and had a higher structural similarity and a smaller toxicity distribution in 2-4 log10 (mg/kg). The performance of established regression models for CFRM was greatly improved, with root-mean-square deviations (RMSE) in the range of 0.45-0.48 log10 (mg/kg). Classification models were built for CNRM using all chemicals in the original data set, and the area under receiver operating characteristic (AUROC) reached 0.75-0.76. The proposed strategy was successfully applied to a mouse oral acute data set, yielding RMSE and AUROC in the range of 0.36-0.38 log10 (mg/kg) and 0.79, respectively.
Collapse
Affiliation(s)
- Tao Bo
- School of Environment, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China; State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, P.O. Box 2871, Beijing 100085, China
| | - Yaohui Lin
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, P.O. Box 2871, Beijing 100085, China; Key Laboratory for Analytical Science of Food Safety and Biology of MOE, Fujian Provincial Key Lab of Analysis and Detection for Food Safety, College of Chemistry, Fuzhou University, Fuzhou, Fujian 350116, China
| | - Jinglong Han
- State Key Laboratory of Urban Water Resource and Environment, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China
| | - Zhineng Hao
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, P.O. Box 2871, Beijing 100085, China.
| | - Jingfu Liu
- School of Environment, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China; State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, P.O. Box 2871, Beijing 100085, China.
| |
Collapse
|
12
|
Hu Y, Ren Q, Liu X, Gao L, Xiao L, Yu W. In Silico Prediction of Human Organ Toxicity via Artificial Intelligence Methods. Chem Res Toxicol 2023. [PMID: 37300507 DOI: 10.1021/acs.chemrestox.2c00411] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Unpredicted human organ level toxicity remains one of the major reasons for drug clinical failure. There is a critical need for cost-efficient strategies in the early stages of drug development for human toxicity assessment. At present, artificial intelligence methods are popularly regarded as a promising solution in chemical toxicology. Thus, we provided comprehensive in silico prediction models for eight significant human organ level toxicity end points using machine learning, deep learning, and transfer learning algorithms. In this work, our results showed that the graph-based deep learning approach was generally better than the conventional machine learning models, and good performances were observed for most of the human organ level toxicity end points in this study. In addition, we found that the transfer learning algorithm could improve model performance for skin sensitization end point using source domain of in vivo acute toxicity data and in vitro data of the Tox21 project. It can be concluded that our models can provide useful guidance for the rapid identification of the compounds with human organ level toxicity for drug discovery.
Collapse
Affiliation(s)
- Yuxuan Hu
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing 210009, China
| | - Qiuhan Ren
- School of Science, China Pharmaceutical University, Nanjing 211198, China
| | - Xintong Liu
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing 210009, China
| | - Liming Gao
- School of Science, China Pharmaceutical University, Nanjing 211198, China
| | - Lecheng Xiao
- School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Wenying Yu
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing 210009, China
| |
Collapse
|
13
|
Tran TTV, Surya Wibowo A, Tayara H, Chong KT. Artificial Intelligence in Drug Toxicity Prediction: Recent Advances, Challenges, and Future Perspectives. J Chem Inf Model 2023; 63:2628-2643. [PMID: 37125780 DOI: 10.1021/acs.jcim.3c00200] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Toxicity prediction is a critical step in the drug discovery process that helps identify and prioritize compounds with the greatest potential for safe and effective use in humans, while also reducing the risk of costly late-stage failures. It is estimated that over 30% of drug candidates are discarded owing to toxicity. Recently, artificial intelligence (AI) has been used to improve drug toxicity prediction as it provides more accurate and efficient methods for identifying the potentially toxic effects of new compounds before they are tested in human clinical trials, thus saving time and money. In this review, we present an overview of recent advances in AI-based drug toxicity prediction, including the use of various machine learning algorithms and deep learning architectures, of six major toxicity properties and Tox21 assay end points. Additionally, we provide a list of public data sources and useful toxicity prediction tools for the research community and highlight the challenges that must be addressed to enhance model performance. Finally, we discuss future perspectives for AI-based drug toxicity prediction. This review can aid researchers in understanding toxicity prediction and pave the way for new methods of drug discovery.
Collapse
Affiliation(s)
- Thi Tuyet Van Tran
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Faculty of Information Technology, An Giang University, Long Xuyen 880000, Vietnam
- Vietnam National University - Ho Chi Minh City, Ho Chi Minh 700000, Vietnam
| | - Agung Surya Wibowo
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Department of Electrical Engineering, Telkom University, Bandung 40257, Indonesia
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Kil To Chong
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea
| |
Collapse
|
14
|
Sosnina EA, Sosnin S, Fedorov MV. Improvement of multi-task learning by data enrichment: application for drug discovery. J Comput Aided Mol Des 2023; 37:183-200. [PMID: 36943645 DOI: 10.1007/s10822-023-00500-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 02/21/2023] [Indexed: 03/23/2023]
Abstract
Multi-task learning in deep neural networks has become a topic of growing importance in many research fields, including drug discovery. However, applying multi-task learning poses new challenges in improving prediction performance. This study investigated the potential of training data enrichment to enhance multi-task model prediction quality in drug discovery. The study evaluated four scenarios with varying degrees of information capacity of the training data and applied two types of test data to evaluate prediction performance. We used three datasets: ViralChEMBL, which consisted of binary activities of compounds against viral species, was applied for the classification task; pQSAR(159) and pQSAR(4267), which consisted of bio-activities of compounds and assays from the research of the profile-QSAR method, were applied for regression tasks. We built multi-task models based on the feed-forward DNNs using the PyTorch framework. Our findings showed that training data enrichment could be an effective means of enhancing prediction performance in multi-task learning, but the degree of improvement depends on the quality of the training data. The more unique compounds and targets the training data included, the more new compound-target interactions are required for prediction improvement. Also, we found out that even using multi-task learning, one could not predict the interactions of compounds that are highly dissimilar from those used for model training. The study provides some recommendations for effectively employing multi-task learning in drug discovery to improve prediction accuracy and facilitate the discovery of novel drug candidates.
Collapse
Affiliation(s)
- Ekaterina A Sosnina
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow, Russia, 143026.
| | - Sergey Sosnin
- Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1190, Vienna, Austria
| | - Maxim V Fedorov
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow, Russia, 143026
- Sirius University of Science and Technology, Olympiisky Prospect 1, Sochi, Russia, 354340
| |
Collapse
|
15
|
Gajewicz-Skretna A, Wyrzykowska E, Gromelski M. Quantitative multi-species toxicity modeling: Does a multi-species, machine learning model provide better performance than a single-species model for the evaluation of acute aquatic toxicity by organic pollutants? THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 861:160590. [PMID: 36473653 DOI: 10.1016/j.scitotenv.2022.160590] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 11/25/2022] [Accepted: 11/26/2022] [Indexed: 06/17/2023]
Abstract
The toxicological profile of any chemical is defined by multiple endpoints and testing procedures, including representative test species from different trophic levels. While computer-aided methods play an increasingly important role in supporting ecotoxicology research and chemical hazard assessment, most of the recently developed machine learning models are directed towards a single, specific endpoint. To overcome this limitation and accelerate the process of identifying potentially hazardous environmental pollutants, we are introducing an effective approach for quantitative, multi-species modeling. The proposed approach is based on canonical correlation analysis that finds a pair(s) of uncorrelated, linear combinations of the original variables that best defines the overall variability within and between multiple biological responses and predictor variables. Its effectiveness was confirmed by the machine learning model for estimating acute toxicity of diverse organic pollutants in aquatic species from three trophic levels: algae (Pseudokirchneriella subcapitata), daphnia (Daphnia magna), and fish (Oryzias latipes). The multi-species model achieved a favorable predictive performance that were in line with predictive models derived for the aquatic organisms individually. The chemical bioavailability and reactivity parameters (n-octanol/water partition coefficient, chemical potential, and molecular size and volume) were important to accurately predict acute ecotoxicity to the three aquatic organisms. To facilitate the use of this approach, an open-source, Python-based script, named qMTM (quantitative Multi-species Toxicity Modeling) has been provided.
Collapse
Affiliation(s)
- Agnieszka Gajewicz-Skretna
- Laboratory of Environmental Chemoinformatics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland.
| | - Ewelina Wyrzykowska
- Laboratory of Environmental Chemoinformatics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Maciej Gromelski
- Laboratory of Environmental Chemoinformatics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland
| |
Collapse
|
16
|
Wu L, Yan B, Han J, Li R, Xiao J, He S, Bo X. TOXRIC: a comprehensive database of toxicological data and benchmarks. Nucleic Acids Res 2022; 51:D1432-D1445. [PMID: 36400569 PMCID: PMC9825425 DOI: 10.1093/nar/gkac1074] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 10/10/2022] [Accepted: 10/26/2022] [Indexed: 11/20/2022] Open
Abstract
The toxic effects of compounds on environment, humans, and other organisms have been a major focus of many research areas, including drug discovery and ecological research. Identifying the potential toxicity in the early stage of compound/drug discovery is critical. The rapid development of computational methods for evaluating various toxicity categories has increased the need for comprehensive and system-level collection of toxicological data, associated attributes, and benchmarks. To contribute toward this goal, we proposed TOXRIC (https://toxric.bioinforai.tech/), a database with comprehensive toxicological data, standardized attribute data, practical benchmarks, informative visualization of molecular representations, and an intuitive function interface. The data stored in TOXRIC contains 113 372 compounds, 13 toxicity categories, 1474 toxicity endpoints covering in vivo/in vitro endpoints and 39 feature types, covering structural, target, transcriptome, metabolic data, and other descriptors. All the curated datasets of endpoints and features can be retrieved, downloaded and directly used as output or input to Machine Learning (ML)-based prediction models. In addition to serving as a data repository, TOXRIC also provides visualization of benchmarks and molecular representations for all endpoint datasets. Based on these results, researchers can better understand and select optimal feature types, molecular representations, and baseline algorithms for each endpoint prediction task. We believe that the rich information on compound toxicology, ML-ready datasets, benchmarks and molecular representation distribution can greatly facilitate toxicological investigations, interpretation of toxicological mechanisms, compound/drug discovery and the development of computational methods.
Collapse
Affiliation(s)
| | | | - Junshan Han
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Ruijiang Li
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Jian Xiao
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, China,Institute for Rational and Safe Medication Practices, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008, Hunan, China
| | - Song He
- Correspondence may also be addressed to Song He. Tel: +86 01066931450;
| | - Xiaochen Bo
- To whom correspondence should be addressed. Tel: +86 01066931207; ;
| |
Collapse
|
17
|
Zwickl CM, Graham J, Jolly R, Bassan A, Ahlberg E, Amberg A, Anger LT, Barton-Maclaren T, Beilke L, Bellion P, Brigo A, Cronin MT, Custer L, Devlin A, Burleigh-Flayers H, Fish T, Glover K, Glowienke S, Gromek K, Jones D, Karmaus A, Kemper R, Piparo EL, Madia F, Martin M, Masuda-Herrera M, McAtee B, Mestre J, Milchak L, Moudgal C, Mumtaz M, Muster W, Neilson L, Patlewicz G, Paulino A, Roncaglioni A, Ruiz P, Suarez D, Szabo DT, Valentin JP, Vardakou I, Woolley D, Myatt G. Principles and Procedures for Assessment of Acute Toxicity Incorporating In Silico Methods. COMPUTATIONAL TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2022; 24:100237. [PMID: 36818760 PMCID: PMC9934006 DOI: 10.1016/j.comtox.2022.100237] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Acute toxicity in silico models are being used to support an increasing number of application areas including (1) product research and development, (2) product approval and registration as well as (3) the transport, storage and handling of chemicals. The adoption of such models is being hindered, in part, because of a lack of guidance describing how to perform and document an in silico analysis. To address this issue, a framework for an acute toxicity hazard assessment is proposed. This framework combines results from different sources including in silico methods and in vitro or in vivo experiments. In silico methods that can assist the prediction of in vivo outcomes (i.e., LD50) are analyzed concluding that predictions obtained using in silico approaches are now well-suited for reliably supporting assessment of LD50-based acute toxicity for the purpose of GHS classification. A general overview is provided of the endpoints from in vitro studies commonly evaluated for predicting acute toxicity (e.g., cytotoxicity/cytolethality as well as assays targeting specific mechanisms). The increased understanding of pathways and key triggering mechanisms underlying toxicity and the increased availability of in vitro data allow for a shift away from assessments solely based on endpoints such as LD50, to mechanism-based endpoints that can be accurately assessed in vitro or by using in silico prediction models. This paper also highlights the importance of an expert review of all available information using weight-of-evidence considerations and illustrates, using a series of diverse practical use cases, how in silico approaches support the assessment of acute toxicity.
Collapse
Affiliation(s)
| | - Jessica Graham
- Genentech, Inc., 1 DNA Way, South San Francisco, CA 94080, USA
| | - Robert Jolly
- Eli Lilly and Company, Indianapolis, IN 46285, USA
| | - Arianna Bassan
- Innovatune srl, Via Giulio Zanon 130/D, 35129 Padova, Italy
| | - Ernst Ahlberg
- Universal Prediction AB, Gothenburg, Sweden
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Alexander Amberg
- Sanofi, R&D Preclinical Safety Frankfurt, Industriepark Hoechst, D-65926 Frankfurt am Main, Germany
| | | | - Tara Barton-Maclaren
- Healthy Environments and Consumer Safety Branch, Health Canada / Government of Canada
| | - Lisa Beilke
- Toxicology Solutions, Inc., 10531 4S Commons Dr. #594, San Diego, CA 92127, USA
| | - Phillip Bellion
- Boehringer Ingelheim Animal Health, Binger Str. 128, 55216 Ingelheim am Rhein, Germany
| | - Alessandro Brigo
- Roche Pharmaceutical Research & Early Development, Roche Innovation Center Basel, Grenzacherstrasse 124, 4070, Basel, Switzerland
| | | | | | - Amy Devlin
- FDA Center for Drug Evaluation and Research, Silver Spring, MD 20993, USA
| | | | - Trevor Fish
- Nelson Laboratories, Salt Lake City, Utah, USA
| | | | | | | | - David Jones
- MHRA, 10 South Colonnade, Canary Wharf, London E14 4PU
| | - Agnes Karmaus
- Integrated Laboratory Systems, LLC, Morrisville, NC, USA
| | | | - Elena Lo Piparo
- Chemical Food Safety Group, Nestlé Research, Lausanne, Switzerland
| | - Federica Madia
- European Commission, Joint Research Centre (JRC), Ispra, Italy
| | | | | | | | - Jordi Mestre
- IMIM Institut Hospital Del Mar d’Investigacions Mèdiques and Universitat Pompeu Fabra, Doctor Aiguader 88, Parc de Recerca Biomèdica, 08003 Barcelona, Spain
- Chemotargets SL, Baldiri Reixac 4, Parc Científic de Barcelona, 08028 Barcelona, Spain
| | | | | | - Moiz Mumtaz
- Office of the Associate Director for Science, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Wolfgang Muster
- Roche Pharmaceutical Research & Early Development, Roche Innovation Center Basel, Grenzacherstrasse 124, 4070, Basel, Switzerland
| | | | - Grace Patlewicz
- Centre for Computational Toxicology and Exposure (CCTE), US Environmental Protection Agency, Research Triangle Park, NC, USA
| | | | - Alessandra Roncaglioni
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Patricia Ruiz
- Centers for Disease Control and Prevention (CDC), Atlanta, GA 30341, USA
| | - Diana Suarez
- FSTox Consulting LTD, 2 Brooks Road Raunds Wellingborough NN9 6NS
| | | | - Jean-Pierre Valentin
- UCB-Biopharma SRL, Development Science, Avenue de l’industrie, Braine l’Alleud, Wallonia, Belgium
| | - Ioanna Vardakou
- British American Tobacco (Investments) Ltd., R&D Centre, Southampton, Hampshire SO15 8TL, UK
| | | | - Glenn Myatt
- Instem, 1393 Dublin Rd, Columbus, OH 43215, USA
| |
Collapse
|
18
|
Hochuli J, Jain S, Melo-Filho C, Sessions ZL, Bobrowski T, Choe J, Zheng J, Eastman R, Talley DC, Rai G, Simeonov A, Tropsha A, Muratov EN, Baljinnyam B, Zakharov AV. Allosteric Binders of ACE2 Are Promising Anti-SARS-CoV-2 Agents. ACS Pharmacol Transl Sci 2022; 5:468-478. [PMID: 35821746 PMCID: PMC9236207 DOI: 10.1021/acsptsci.2c00049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The COVID-19 pandemic has had enormous health, economic, and social consequences. Vaccines have been successful in reducing rates of infection and hospitalization, but there is still a need for acute treatment of the disease. We investigate whether compounds that bind the human angiotensin-converting enzyme 2 (ACE2) protein can decrease SARS-CoV-2 replication without impacting ACE2's natural enzymatic function. Initial screening of a diversity library resulted in hit compounds active in an ACE2-binding assay, which showed little inhibition of ACE2 enzymatic activity (116 actives, success rate ∼4%), suggesting they were allosteric binders. Subsequent application of in silico techniques boosted success rates to ∼14% and resulted in 73 novel confirmed ACE2 binders with K d values as low as 6 nM. A subsequent SARS-CoV-2 assay revealed that five of these compounds inhibit the viral life cycle in human cells. Further effort is required to completely elucidate the antiviral mechanism of these ACE2-binders, but they present a valuable starting point for both the development of acute treatments for COVID-19 and research into the host-directed therapy.
Collapse
Affiliation(s)
- Joshua
E. Hochuli
- Molecular
Modeling Laboratory, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
- Curriculum
in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Sankalp Jain
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Cleber Melo-Filho
- Molecular
Modeling Laboratory, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Zoe L. Sessions
- Molecular
Modeling Laboratory, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Tesia Bobrowski
- Molecular
Modeling Laboratory, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Jun Choe
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Johnny Zheng
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Richard Eastman
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Daniel C. Talley
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Ganesha Rai
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Anton Simeonov
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Alexander Tropsha
- Molecular
Modeling Laboratory, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Eugene N. Muratov
- Molecular
Modeling Laboratory, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Bolormaa Baljinnyam
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Alexey V. Zakharov
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| |
Collapse
|
19
|
Jeong J, Choi J. Artificial Intelligence-Based Toxicity Prediction of Environmental Chemicals: Future Directions for Chemical Management Applications. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:7532-7543. [PMID: 35666838 DOI: 10.1021/acs.est.1c07413] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Recently, research on the development of artificial intelligence (AI)-based computational toxicology models that predict toxicity without the use of animal testing has emerged because of the rapid development of computer technology. Various computational toxicology techniques that predict toxicity based on the structure of chemical substances are gaining attention, including the quantitative structure-activity relationship. To understand the recent development of these models, we analyzed the databases, molecular descriptors, fingerprints, and algorithms considered in recent studies. Based on a selection of 96 papers published since 2014, we found that AI models have been developed to predict approximately 30 different toxicity end points using more than 20 toxicity databases. For model development, molecular access system and extended-connectivity fingerprints are the most commonly used molecular descriptors. The most used algorithm among the machine learning techniques is the random forest, while the most used algorithm among the deep learning techniques is a deep neural network. The use of AI technology in the development of toxicity prediction models is a new concept that will aid in achieving a scientific accord and meet regulatory applications. The comprehensive overview provided in this study will provide a useful guide for the further development and application of toxicity prediction models.
Collapse
Affiliation(s)
- Jaeseong Jeong
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, South Korea
| | - Jinhee Choi
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, South Korea
| |
Collapse
|
20
|
Hochuli JE, Jain S, Melo-filho C, Sessions ZL, Bobrowski T, Choe J, Zheng J, Eastman R, Talley DC, Rai G, Simeonov A, Tropsha A, Muratov EN, Baljinnyam B, Zakharov AV. Allosteric binders of ACE2 are promising anti-SARS-CoV-2 agents.. [PMID: 35313579 PMCID: PMC8936107 DOI: 10.1101/2022.03.15.484484] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
AbstractThe COVID-19 pandemic has had enormous health, economic, and social consequences. Vaccines have been successful in reducing rates of infection and hospitalization, but there is still a need for an acute treatment for the disease. We investigate whether compounds that bind the human ACE2 protein can interrupt SARS-CoV-2 replication without damaging ACE2’s natural enzymatic function. Initial compounds were screened for binding to ACE2 but little interruption of ACE2 enzymatic activity. This set of compounds was extended by application of quantitative structure-activity analysis, which resulted in 512 virtual hits for further confirmatory screening. A subsequent SARS-CoV-2 replication assay revealed that five of these compounds inhibit SARS-CoV-2 replication in human cells. Further effort is required to completely determine the antiviral mechanism of these compounds, but they serve as a strong starting point for both development of acute treatments for COVID-19 and research into the mechanism of infection.Abstract FigureTOC Graphic: Overall study design.
Collapse
|
21
|
Nakarin F, Boonpalit K, Kinchagawat J, Wachiraphan P, Rungrotmongkol T, Nutanong S. Assisting Multitargeted Ligand Affinity Prediction of Receptor Tyrosine Kinases Associated Nonsmall Cell Lung Cancer Treatment with Multitasking Principal Neighborhood Aggregation. Molecules 2022; 27:molecules27041226. [PMID: 35209011 PMCID: PMC8878292 DOI: 10.3390/molecules27041226] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 01/30/2022] [Accepted: 01/31/2022] [Indexed: 11/16/2022] Open
Abstract
A multitargeted therapeutic approach with hybrid drugs is a promising strategy to enhance anticancer efficiency and overcome drug resistance in nonsmall cell lung cancer (NSCLC) treatment. Estimating affinities of small molecules against targets of interest typically proceeds as a preliminary action for recent drug discovery in the pharmaceutical industry. In this investigation, we employed machine learning models to provide a computationally affordable means for computer-aided screening to accelerate the discovery of potential drug compounds. In particular, we introduced a quantitative structure–activity-relationship (QSAR)-based multitask learning model to facilitate an in silico screening system of multitargeted drug development. Our method combines a recently developed graph-based neural network architecture, principal neighborhood aggregation (PNA), with a descriptor-based deep neural network supporting synergistic utilization of molecular graph and fingerprint features. The model was generated by more than ten-thousands affinity-reported ligands of seven crucial receptor tyrosine kinases in NSCLC from two public data sources. As a result, our multitask model demonstrated better performance than all other benchmark models, as well as achieving satisfying predictive ability regarding applicable QSAR criteria for most tasks within the model’s applicability. Since our model could potentially be a screening tool for practical use, we have provided a model implementation platform with a tutorial that is freely accessible hence, advising the first move in a long journey of cancer drug development.
Collapse
Affiliation(s)
- Fahsai Nakarin
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong 21210, Thailand; (K.B.); (J.K.); (P.W.); (S.N.)
- Correspondence: ; Tel.: +66-33-014-444
| | - Kajjana Boonpalit
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong 21210, Thailand; (K.B.); (J.K.); (P.W.); (S.N.)
| | - Jiramet Kinchagawat
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong 21210, Thailand; (K.B.); (J.K.); (P.W.); (S.N.)
| | - Patcharapol Wachiraphan
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong 21210, Thailand; (K.B.); (J.K.); (P.W.); (S.N.)
| | - Thanyada Rungrotmongkol
- Center of Excellence in Biocatalyst and Sustainable Biotechnology, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand;
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok 10330, Thailand
| | - Sarana Nutanong
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong 21210, Thailand; (K.B.); (J.K.); (P.W.); (S.N.)
| |
Collapse
|
22
|
Feinstein J, Sivaraman G, Picel K, Peters B, Vázquez-Mayagoitia Á, Ramanathan A, MacDonell M, Foster I, Yan E. Uncertainty-Informed Deep Transfer Learning of Perfluoroalkyl and Polyfluoroalkyl Substance Toxicity. J Chem Inf Model 2021; 61:5793-5803. [PMID: 34905348 DOI: 10.1021/acs.jcim.1c01204] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Perfluoroalkyl and polyfluoroalkyl substances (PFAS) pose a significant hazard because of their widespread industrial uses, environmental persistence, and bioaccumulation. A growing, increasingly diverse inventory of PFAS, including 8163 chemicals, has recently been updated by the U.S. Environmental Protection Agency. However, with the exception of a handful of well-studied examples, little is known about their human toxicity potential because of the substantial resources required for in vivo toxicity experiments. We tackle the problem of expensive in vivo experiments by evaluating multiple machine learning (ML) methods, including random forests, deep neural networks (DNN), graph convolutional networks, and Gaussian processes, for predicting acute toxicity (e.g., median lethal dose, or LD50) of PFAS compounds. To address the scarcity of toxicity information for PFAS, publicly available datasets of oral rat LD50 for all organic compounds are aggregated and used to develop state-of-the-art ML source models for transfer learning. A total of 519 fluorinated compounds containing two or more C-F bonds with known toxicity are used for knowledge transfer to ensembles of the best-performing source model, DNN, to generate the target models for the PFAS domain with access to uncertainty. This study predicts toxicity for PFAS with a defined chemical structure. To further inform prediction confidence, the transfer-learned model is embedded within a SelectiveNet architecture, where the model is allowed to identify regions of prediction with greater confidence and abstain from those with high uncertainty using a calibrated cutoff rate.
Collapse
Affiliation(s)
- Jeremy Feinstein
- Environmental Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Ganesh Sivaraman
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Kurt Picel
- Environmental Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Brian Peters
- Environmental Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | | | - Arvind Ramanathan
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Margaret MacDonell
- Environmental Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Ian Foster
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Eugene Yan
- Environmental Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
23
|
Wang Y, Wang B, Jiang J, Guo J, Lai J, Lian XY, Wu J. Multitask CapsNet: An Imbalanced Data Deep Learning Method for Predicting Toxicants. ACS OMEGA 2021; 6:26545-26555. [PMID: 34661009 PMCID: PMC8515573 DOI: 10.1021/acsomega.1c03842] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 09/14/2021] [Indexed: 05/17/2023]
Abstract
Drug development has a high failure rate, with safety properties constituting a considerable challenge. To reduce risk, in silico tools, including various machine learning methods, have been applied for toxicity prediction. However, these approaches often confront a serious problem: the training data sets are usually biased (imbalanced positive and negative samples), which would result in model training difficulty and unsatisfactory prediction accuracy. Multitask networks obtained significantly better predictive accuracies than single-task methods, and capsule neural networks showed excellent performance in sparse data sets in previous studies. In this study, we developed a new multitask framework based on a capsule neural network (multitask CapsNet) to measure 12 different toxic effects simultaneously. We found that multitask CapsNet excelled in toxicity prediction and outperformed many other computational approaches using the multitask strategy. Only after training on biased data sets did multitask CapsNet achieve significantly improved prediction accuracy on the Tox21 Data Challenge, which gave the largest ratio of highest accuracy (8/12) among compared models. Our model gave a prediction accuracy of 96.6% for the target NR.PPAR.gamma, whose ratio of negative to positive samples was up to 36:1. These results suggested that multitask CapsNet could overcome the bias problems and would provide a novel, accurate, and efficient approach for predicting the toxicities of compounds.
Collapse
Affiliation(s)
- Yiwei Wang
- School
of Preclinical Medicine, Southwest Medical
University, Luzhou 646000, China
| | - Binyou Wang
- School
of Pharmacy, Southwest Medical University, Luzhou 646000, China
| | - Jie Jiang
- School
of Preclinical Medicine, Southwest Medical
University, Luzhou 646000, China
| | - Jianmin Guo
- School
of Preclinical Medicine, Southwest Medical
University, Luzhou 646000, China
| | - Jia Lai
- School
of Pharmacy, Southwest Medical University, Luzhou 646000, China
| | - Xiao-Yuan Lian
- School
of Pharmacy, Zhejiang University, Hangzhou 310011, China
| | - Jianming Wu
- Key
Laboratory of Medical Electrophysiology, Ministry of Education of
China, Medical Key Laboratory for Drug Discovery and Druggability
Evaluation of Sichuan Province, Luzhou Key
Laboratory of Activity Screening and Druggability Evaluation for Chinese
Materia Medica, Luzhou 646000, China
| |
Collapse
|
24
|
Jain S, Talley DC, Baljinnyam B, Choe J, Hanson Q, Zhu W, Xu M, Chen CZ, Zheng W, Hu X, Shen M, Rai G, Hall MD, Simeonov A, Zakharov AV. Hybrid In Silico Approach Reveals Novel Inhibitors of Multiple SARS-CoV-2 Variants. ACS Pharmacol Transl Sci 2021; 4:1675-1688. [PMID: 34608449 PMCID: PMC8482323 DOI: 10.1021/acsptsci.1c00176] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Indexed: 11/30/2022]
Abstract
The National Center for Advancing Translational Sciences (NCATS) has been actively generating SARS-CoV-2 high-throughput screening data and disseminates it through the OpenData Portal (https://opendata.ncats.nih.gov/covid19/). Here, we provide a hybrid approach that utilizes NCATS screening data from the SARS-CoV-2 cytopathic effect reduction assay to build predictive models, using both machine learning and pharmacophore-based modeling. Optimized models were used to perform two iterative rounds of virtual screening to predict small molecules active against SARS-CoV-2. Experimental testing with live virus provided 100 (∼16% of predicted hits) active compounds (efficacy > 30%, IC50 ≤ 15 μM). Systematic clustering analysis of active compounds revealed three promising chemotypes which have not been previously identified as inhibitors of SARS-CoV-2 infection. Further investigation resulted in the identification of allosteric binders to host receptor angiotensin-converting enzyme 2; these compounds were then shown to inhibit the entry of pseudoparticles bearing spike protein of wild-type SARS-CoV-2, as well as South African B.1.351 and UK B.1.1.7 variants.
Collapse
Affiliation(s)
- Sankalp Jain
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Daniel C. Talley
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Bolormaa Baljinnyam
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Jun Choe
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Quinlin Hanson
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Wei Zhu
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Miao Xu
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Catherine Z. Chen
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Wei Zheng
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Xin Hu
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Min Shen
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Ganesha Rai
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Matthew D. Hall
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Anton Simeonov
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Alexey V. Zakharov
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| |
Collapse
|
25
|
Tripathi MK, Nath A, Singh TP, Ethayathulla AS, Kaur P. Evolving scenario of big data and Artificial Intelligence (AI) in drug discovery. Mol Divers 2021; 25:1439-1460. [PMID: 34159484 PMCID: PMC8219515 DOI: 10.1007/s11030-021-10256-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 06/14/2021] [Indexed: 12/24/2022]
Abstract
The accumulation of massive data in the plethora of Cheminformatics databases has made the role of big data and artificial intelligence (AI) indispensable in drug design. This has necessitated the development of newer algorithms and architectures to mine these databases and fulfil the specific needs of various drug discovery processes such as virtual drug screening, de novo molecule design and discovery in this big data era. The development of deep learning neural networks and their variants with the corresponding increase in chemical data has resulted in a paradigm shift in information mining pertaining to the chemical space. The present review summarizes the role of big data and AI techniques currently being implemented to satisfy the ever-increasing research demands in drug discovery pipelines.
Collapse
Affiliation(s)
- Manish Kumar Tripathi
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, 110029, India
| | - Abhigyan Nath
- Department of Biochemistry, Pt. Jawahar Lal Nehru Memorial Medical College, Raipur, 492001, India
| | - Tej P Singh
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, 110029, India
| | - A S Ethayathulla
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, 110029, India
| | - Punit Kaur
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, 110029, India.
| |
Collapse
|
26
|
Medina-Franco JL. Expanding the Chemical Information Science gateway. F1000Res 2021; 10. [PMID: 33953903 PMCID: PMC8063543 DOI: 10.12688/f1000research.52192.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/08/2021] [Indexed: 11/25/2022] Open
Abstract
As chemical information evolves, impacting many chemistry areas, effective ways to disseminate results by the scientific community are also changing. Thus, publication schemes adapt to meet the needs of researchers across disciplines to share high-quality data, information, and knowledge. Since 2015, the F1000Research Chemical Information Science (CIS) gateway has offered an open and unique model to disseminate science at the interface of chemoinformatics, bioinformatics, and several other informatic-related disciplines. In response to the evolution of chemical information science, the F1000Research CIS gateway has incorporated new members to the advisory board. It is also reinforcing and expanding the gateway areas with a particular focus on machine learning and metabolomics. The range of available article types, availability of data, exposure within complementary multidisciplinary F1000Research gateways, and indexing in major bibliographic databases increases the visibility of all contributions. As part of progressing open science in this field, we look forward to your high-quality contributions to the CIS gateway.
Collapse
Affiliation(s)
- José L Medina-Franco
- DIFACQUIM research group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| |
Collapse
|