101
|
Using adverse outcome pathways to contextualise (Q)SAR predictions for reproductive toxicity – A case study with aromatase inhibition. Reprod Toxicol 2022; 108:43-55. [DOI: 10.1016/j.reprotox.2022.01.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 01/14/2022] [Accepted: 01/21/2022] [Indexed: 12/22/2022]
|
102
|
Abstract
Artificial intelligence (AI) offers new possibilities for hit and lead finding in medicinal chemistry. Several instances of AI have been used for prospective de novo drug design. Among these, chemical language models have been shown to perform well in various experimental scenarios. In this study, we provide a hands-on introduction to chemical language modeling. A technique based on recurrent neural networks is discussed in detail, together with a step-by-step guide to applying this AI method for focused compound library design. The program code is freely available at URL: github.com/ETHmodlab/de_novo_design_RNN .
Collapse
Affiliation(s)
- Francesca Grisoni
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
- Eindhoven University of Technology, Department of Biomedical Engineering, Eindhoven, Netherlands.
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
| |
Collapse
|
103
|
Korn D, Pervitsky V, Bobrowski T, Alves VM, Schmitt C, Bizon C, Baker N, Chirkova R, Cherkasov A, Muratov E, Tropsha A. COVID-19 Knowledge Extractor (COKE): A Curated Repository of Drug-Target Associations Extracted from the CORD-19 Corpus of Scientific Publications on COVID-19. J Chem Inf Model 2021; 61:5734-5741. [PMID: 34783553 PMCID: PMC8610010 DOI: 10.1021/acs.jcim.1c01285] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Indexed: 12/31/2022]
Abstract
The COVID-19 pandemic has catalyzed a widespread effort to identify drug candidates and biological targets of relevance to SARS-COV-2 infection, which resulted in large numbers of publications on this subject. We have built the COVID-19 Knowledge Extractor (COKE), a web application to extract, curate, and annotate essential drug-target relationships from the research literature on COVID-19. SciBiteAI ontological tagging of the COVID Open Research Data set (CORD-19), a repository of COVID-19 scientific publications, was employed to identify drug-target relationships. Entity identifiers were resolved through lookup routines using UniProt and DrugBank. A custom algorithm was used to identify co-occurrences of the target protein and drug terms, and confidence scores were calculated for each entity pair. COKE processing of the current CORD-19 database identified about 3000 drug-protein pairs, including 29 unique proteins and 500 investigational, experimental, and approved drugs. Some of these drugs are presently undergoing clinical trials for COVID-19. The COKE repository and web application can serve as a useful resource for drug repurposing against SARS-CoV-2. COKE is freely available at https://coke.mml.unc.edu/, and the code is available at https://github.com/DnlRKorn/CoKE.
Collapse
Affiliation(s)
- Daniel Korn
- Department of Computer Science, The
University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
27599, United States
- Laboratory for Molecular Modeling, Division of
Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy,
The University of North Carolina at Chapel Hill, Chapel Hill,
North Carolina 27599, United States
| | - Vera Pervitsky
- Laboratory for Molecular Modeling, Division of
Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy,
The University of North Carolina at Chapel Hill, Chapel Hill,
North Carolina 27599, United States
| | - Tesia Bobrowski
- Laboratory for Molecular Modeling, Division of
Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy,
The University of North Carolina at Chapel Hill, Chapel Hill,
North Carolina 27599, United States
| | - Vinicius M. Alves
- Office of Data Science, National Toxicology Program,
NIEHS, Morrisville, North Carolina 27560, United
States
| | - Charles Schmitt
- Office of Data Science, National Toxicology Program,
NIEHS, Morrisville, North Carolina 27560, United
States
| | - Chris Bizon
- Renaissance Computing Institute, The
University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
27599, United States
| | - Nancy Baker
- ParlezChem, 123 W. Union
Street, Hillsborough, North Carolina 27278, United States
| | - Rada Chirkova
- Department of Computer Science, North Carolina
State University, Raleigh, North Carolina 27606-5550, United
States
| | - Artem Cherkasov
- Vancouver Prostate Centre, University of
British Columbia, Vancouver, BC V6H 3Z6, Canada
| | - Eugene Muratov
- Laboratory for Molecular Modeling, Division of
Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy,
The University of North Carolina at Chapel Hill, Chapel Hill,
North Carolina 27599, United States
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of
Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy,
The University of North Carolina at Chapel Hill, Chapel Hill,
North Carolina 27599, United States
| |
Collapse
|
104
|
An in silico pipeline for the discovery of multitarget ligands: A case study for epi-polypharmacology based on DNMT1/HDAC2 inhibition. ARTIFICIAL INTELLIGENCE IN THE LIFE SCIENCES 2021; 1. [PMID: 35475037 PMCID: PMC9038114 DOI: 10.1016/j.ailsci.2021.100008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The search for novel therapeutic compounds remains an overwhelming task owing to the time-consuming and expensive nature of the drug development process and low success rates. Traditional methodologies that rely on the one drug-one target paradigm have proven insufficient for the treatment of multifactorial diseases, leading to a shift to multitarget approaches. In this emerging paradigm, molecules with off-target and promiscuous interactions may result in preferred therapies. In this study, we developed a general pipeline combining machine learning algorithms and a deep generator network to train a dual inhibitor classifier capable of identifying putative pharmacophoric traits. As a case study, we focused on dual inhibitors targeting DNA methyltransferase 1 (DNMT) and histone deacetylase 2 (HDAC2), two enzymes that play a central role in epigenetic regulation. We used this approach to identify dual inhibitors from a novel large natural product database in the public domain. We used docking and atomistic simulations as complementary approaches to establish the ligand-interaction profiles between the best hits and DNMT1/HDAC2. By using the combined ligand- and structure-based approaches, we discovered two promising novel scaffolds that can be used to simultaneously target both DNMT1 and HDAC2. We conclude that the flexibility and adaptability of the proposed pipeline has predictive capabilities of similar or derivative methods and is readily applicable to the discovery of small molecules targeting many other therapeutically relevant proteins.
Collapse
|
105
|
Silva AC, Borba JV, Alves VM, Hall SU, Furnham N, Kleinstreuer N, Muratov E, Tropsha A, Andrade CH. Novel computational models offer alternatives to animal testing for assessing eye irritation and corrosion potential of chemicals. ARTIFICIAL INTELLIGENCE IN THE LIFE SCIENCES 2021; 1. [PMID: 35935266 PMCID: PMC9355119 DOI: 10.1016/j.ailsci.2021.100028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Eye irritation and corrosion are fundamental considerations in developing chemicals to be used in or near the eye, from cleaning products to ophthalmic solutions. Unfortunately, animal testing is currently the standard method to identify compounds that cause eye irritation or corrosion. Yet, there is growing pressure on the part of regulatory agencies both in the USA and abroad to develop New Approach Methodologies (NAMs) that help reduce the need for animal testing and address unmet need to modernize safety evaluation of chemical hazards. In furthering the development and applications of computational NAMs in chemical safety assessment, in this study we have collected the largest expertly curated dataset of compounds tested for eye irritation and corrosion, and employed this data to build and validate binary and multi-classification Quantitative Structure-Activity Relationships (QSAR) models that can reliably assess eye irritation/corrosion potential of novel untested compounds. QSAR models were generated with Random Forest (RF) and Multi-Descriptor Read Across (MuDRA) machine learning (ML) methods, and validated using a 5-fold external cross-validation protocol. These models demonstrated high balanced accuracy (CCR of 0.68–0.88), sensitivity (SE of 0.61–0.84), positive predictive value (PPV of 0.65–0.90), specificity (SP of 0.56–0.91), and negative predictive value (NPV of 0.68–0.85). Overall, MuDRA models outperformed RF models and were applied to predict compounds’ irritation/corrosion potential from the Inactive Ingredient Database, which contains components present in FDA-approved drug products, and from the Cosmetic Ingredient Database, the European Commission source of information on cosmetic substances. All models built and validated in this study are publicly available at the STopTox web portal (https://stoptox.mml.unc.edu/). These models can be employed as reliable tools for identifying potential eye irritant/corrosive compounds
Collapse
|
106
|
Morency M, Néron S, Iftimie R, Wuest JD. Predicting p Ka Values of Quinols and Related Aromatic Compounds with Multiple OH Groups. J Org Chem 2021; 86:14444-14460. [PMID: 34613729 DOI: 10.1021/acs.joc.1c01279] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Quinonoid compounds play central roles as redox-active agents in photosynthesis and respiration and are also promising replacements for inorganic materials currently used in batteries. To design new quinonoid compounds and predict their state of protonation and redox behavior under various conditions, their pKa values must be known. Methods that can predict the pKa values of simple phenols cannot reliably handle complex analogues in which multiple OH groups are present and may form intramolecular hydrogen bonds. We have therefore developed a straightforward method based on a linear relationship between experimental pKa values and calculated differences in energy between quinols and their deprotonated forms. Simple adjustments allow reliable predictions of pKa values when intramolecular hydrogen bonds are present. Our approach has been validated by showing that predicted and experimental values for over 100 quinols and related compounds differ by an average of only 0.3 units. This accuracy makes it possible to select proper pKa values when experimental data vary, predict the acidity of quinols and related compounds before they are made, and determine the sites and orders of deprotonation in complex structures with multiple OH groups.
Collapse
Affiliation(s)
- Mathieu Morency
- Département de Chimie, Université de Montréal, Montréal, Québec H2V 0B3, Canada
| | - Sébastien Néron
- Département de Chimie, Université de Montréal, Montréal, Québec H2V 0B3, Canada
| | - Radu Iftimie
- Département de Chimie, Université de Montréal, Montréal, Québec H2V 0B3, Canada
| | - James D Wuest
- Département de Chimie, Université de Montréal, Montréal, Québec H2V 0B3, Canada
| |
Collapse
|
107
|
Machine Learning Applied to the Modeling of Pharmacological and ADMET Endpoints. Methods Mol Biol 2021. [PMID: 34731464 DOI: 10.1007/978-1-0716-1787-8_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2023]
Abstract
The well-known concept of quantitative structure-activity relationships (QSAR) has been gaining significant interest in the recent years. Data, descriptors, and algorithms are the main pillars to build useful models that support more efficient drug discovery processes with in silico methods. Significant advances in all three areas are the reason for the regained interest in these models. In this book chapter we review various machine learning (ML) approaches that make use of measured in vitro/in vivo data of many compounds. We put these in context with other digital drug discovery methods and present some application examples.
Collapse
|
108
|
Dunn TB, Seabra GM, Kim TD, Juárez-Mercado KE, Li C, Medina-Franco JL, Miranda-Quintana RA. Diversity and Chemical Library Networks of Large Data Sets. J Chem Inf Model 2021; 62:2186-2201. [PMID: 34723537 DOI: 10.1021/acs.jcim.1c01013] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The quantification of chemical diversity has many applications in drug discovery, organic chemistry, food, and natural product chemistry, to name a few. As the size of the chemical space is expanding rapidly, it is imperative to develop efficient methods to quantify the diversity of large and ultralarge chemical libraries and visualize their mutual relationships in chemical space. Herein, we show an application of our recently introduced extended similarity indices to measure the fingerprint-based diversity of 19 chemical libraries typically used in drug discovery and natural products research with over 18 million compounds. Based on this concept, we introduce the Chemical Library Networks (CLNs) as a general and efficient framework to represent visually the chemical space of large chemical libraries providing a global perspective of the relation between the libraries. For the 19 compound libraries explored in this work, it was found that the (extended) Tanimoto index offers the best description of extended similarity in combination with RDKit fingerprints. CLNs are general and can be explored with any structure representation and similarity coefficient for large chemical libraries.
Collapse
Affiliation(s)
- Timothy B Dunn
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
| | - Gustavo M Seabra
- Department of Medicinal Chemistry, University of Florida, Gainesville, Florida 32610, United States.,Center for Natural Products, Drug Discovery and Development (CNPD3), University of Florida, Gainesville, Florida 32610, United States
| | - Taewon David Kim
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
| | - K Eurídice Juárez-Mercado
- DIFACQUIM Research Group, Department of Pharmacy, National Autonomous University of Mexico, Mexico City 04510, Mexico
| | - Chenglong Li
- Department of Medicinal Chemistry, University of Florida, Gainesville, Florida 32610, United States.,Center for Natural Products, Drug Discovery and Development (CNPD3), University of Florida, Gainesville, Florida 32610, United States
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, National Autonomous University of Mexico, Mexico City 04510, Mexico
| | - Ramón Alain Miranda-Quintana
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States.,Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| |
Collapse
|
109
|
Nguyen-Vo TH, Trinh QH, Nguyen L, Nguyen-Hoang PU, Nguyen TN, Nguyen DT, Nguyen BP, Le L. iCYP-MFE: Identifying Human Cytochrome P450 Inhibitors Using Multitask Learning and Molecular Fingerprint-Embedded Encoding. J Chem Inf Model 2021; 62:5059-5068. [PMID: 34672553 DOI: 10.1021/acs.jcim.1c00628] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The human cytochrome P450 (CYP) superfamily holds responsibilities for the metabolism of both endogenous and exogenous compounds such as drugs, cellular metabolites, and toxins. The inhibition exerted on the CYP enzymes is closely associated with adverse drug reactions encompassing metabolic failures and induced side effects. In modern drug discovery, identification of potential CYP inhibitors is, therefore, highly essential. Alongside experimental approaches, numerous computational models have been proposed to address this biochemical issue. In this study, we introduce iCYP-MFE, a computational framework for virtual screening on CYP inhibitors toward 1A2, 2C9, 2C19, 2D6, and 3A4 isoforms. iCYP-MFE contains a set of five robust, stable, and effective prediction models developed using multitask learning incorporated with molecular fingerprint-embedded features. The results show that multitask learning can remarkably leverage useful information from related tasks to promote global performance. Comparative analysis indicates that iCYP-MFE achieves three predominant tasks, one equivalent task, and one less effective task compared to state-of-the-art methods. The area under the receiver operating characteristic curve (AUC-ROC) and the area under the precision-recall curve (AUC-PR) were two decisive metrics used for model evaluation. The prediction task for CYP2D6-inhibition achieves the highest AUC-ROC value of 0.93 while the prediction task for CYP1A2-inhibition obtains the highest AUC-PR value of 0.92. The substructural analysis preliminarily explains the nature of the CYP-inhibitory activity of compounds. An online web server for iCYP-MFE with a user-friendly interface was also deployed to support scientific communities in identifying CYP inhibitors.
Collapse
Affiliation(s)
- Thanh-Hoang Nguyen-Vo
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand
| | - Quang H Trinh
- Computational Biology Center, International University-VNU HCMC, Ho Chi Minh City 700000, Vietnam
| | - Loc Nguyen
- Computational Biology Center, International University-VNU HCMC, Ho Chi Minh City 700000, Vietnam
| | - Phuong-Uyen Nguyen-Hoang
- Computational Biology Center, International University-VNU HCMC, Ho Chi Minh City 700000, Vietnam
| | - Thien-Ngan Nguyen
- Computational Biology Center, International University-VNU HCMC, Ho Chi Minh City 700000, Vietnam
| | - Dung T Nguyen
- School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi 100000, Vietnam
| | - Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand
| | - Ly Le
- Computational Biology Center, International University-VNU HCMC, Ho Chi Minh City 700000, Vietnam.,Vingroup Big Data Institute, Ha Noi 100000, Vietnam
| |
Collapse
|
110
|
Jain S, Talley DC, Baljinnyam B, Choe J, Hanson Q, Zhu W, Xu M, Chen CZ, Zheng W, Hu X, Shen M, Rai G, Hall MD, Simeonov A, Zakharov AV. Hybrid In Silico Approach Reveals Novel Inhibitors of Multiple SARS-CoV-2 Variants. ACS Pharmacol Transl Sci 2021; 4:1675-1688. [PMID: 34608449 PMCID: PMC8482323 DOI: 10.1021/acsptsci.1c00176] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Indexed: 11/30/2022]
Abstract
The National Center for Advancing Translational Sciences (NCATS) has been actively generating SARS-CoV-2 high-throughput screening data and disseminates it through the OpenData Portal (https://opendata.ncats.nih.gov/covid19/). Here, we provide a hybrid approach that utilizes NCATS screening data from the SARS-CoV-2 cytopathic effect reduction assay to build predictive models, using both machine learning and pharmacophore-based modeling. Optimized models were used to perform two iterative rounds of virtual screening to predict small molecules active against SARS-CoV-2. Experimental testing with live virus provided 100 (∼16% of predicted hits) active compounds (efficacy > 30%, IC50 ≤ 15 μM). Systematic clustering analysis of active compounds revealed three promising chemotypes which have not been previously identified as inhibitors of SARS-CoV-2 infection. Further investigation resulted in the identification of allosteric binders to host receptor angiotensin-converting enzyme 2; these compounds were then shown to inhibit the entry of pseudoparticles bearing spike protein of wild-type SARS-CoV-2, as well as South African B.1.351 and UK B.1.1.7 variants.
Collapse
Affiliation(s)
- Sankalp Jain
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Daniel C. Talley
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Bolormaa Baljinnyam
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Jun Choe
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Quinlin Hanson
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Wei Zhu
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Miao Xu
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Catherine Z. Chen
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Wei Zheng
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Xin Hu
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Min Shen
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Ganesha Rai
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Matthew D. Hall
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Anton Simeonov
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Alexey V. Zakharov
- National Center for Advancing
Translational Sciences (NCATS), National
Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| |
Collapse
|
111
|
Masand VH, Zaki MEA, Al-Hussain SA, Ghorbal AB, Akasapu S, Lewaa I, Ghosh A, Jawarkar RD. Identification of concealed structural alerts using QSTR modeling for Pseudokirchneriella subcapitata. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2021; 239:105962. [PMID: 34525418 DOI: 10.1016/j.aquatox.2021.105962] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Revised: 08/10/2021] [Accepted: 09/01/2021] [Indexed: 06/13/2023]
Abstract
In the present work, QSTR modeling was conducted for microalga Pseudokirchneriella subcapitata using a data set of 271 molecules belonging to different types of chemical classes for the prediction of EC50 for 72 hr based assays. The balanced QSTR model encompasses seven easily interpretable molecular descriptors and possesses statistical robustness with high predictive ability. This Genetic Algorithm Multi-linear regression (GA-MLR) model was subjected to internal validation, Y-randomization test, applicability domain analysis, and external validation as per the recommended OECD guidelines. The newly developed model fulfilled the threshold values for more than 20 recommended validation parameters including R2 = 0.72, Q2LOO = 0.70, etc. The developed QSTR model was successful in identifying the type of hybridization or specific type of atoms of previously reported and newer structural alerts. Thus, the model could be useful for data gap filling and expanding mechanistic interpretation of toxicity for different chemicals.
Collapse
Affiliation(s)
- Vijay H Masand
- Department of Chemistry, Vidya Bharati Mahavidyalaya, Amravati, Maharashtra, 444 602, India
| | - Magdi E A Zaki
- Department of Chemistry, Faculty of Science, College of Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 13318, Saudi Arabia.
| | - Sami A Al-Hussain
- Department of Chemistry, Faculty of Science, College of Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 13318, Saudi Arabia.
| | - Anis Ben Ghorbal
- Department of Mathematics and Statistics, Faculty of Science, College of Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 13318, Saudi Arabia.
| | | | - Israa Lewaa
- Assistant Lecturer of Statistics, Faculty of Business Administration, Department of Business Administration, Economics and Political Science, The British University in Egypt, Cairo, Egypt.
| | - Arabinda Ghosh
- Microbiology Division, Department of Botany, Gauhati University, Guwahati, Assam, 781014, India
| | - Rahul D Jawarkar
- Department of Medicinal Chemistry, Dr. Rajendra Gode Institute of Pharmacy, Amravati, Maharashtra, India
| |
Collapse
|
112
|
Lee KH, Fant AD, Guo J, Guan A, Jung J, Kudaibergenova M, Miranda WE, Ku T, Cao J, Wacker S, Duff HJ, Newman AH, Noskov SY, Shi L. Toward Reducing hERG Affinities for DAT Inhibitors with a Combined Machine Learning and Molecular Modeling Approach. J Chem Inf Model 2021; 61:4266-4279. [PMID: 34420294 DOI: 10.1021/acs.jcim.1c00856] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Psychostimulant drugs, such as cocaine, inhibit dopamine reuptake via blockading the dopamine transporter (DAT), which is the primary mechanism underpinning their abuse. Atypical DAT inhibitors are dissimilar to cocaine and can block cocaine- or methamphetamine-induced behaviors, supporting their development as part of a treatment regimen for psychostimulant use disorders. When developing these atypical DAT inhibitors as medications, it is necessary to avoid off-target binding that can produce unwanted side effects or toxicities. In particular, the blockade of a potassium channel, human ether-a-go-go (hERG), can lead to potentially lethal ventricular tachycardia. In this study, we established a counter screening platform for DAT and against hERG binding by combining machine learning-based quantitative structure-activity relationship (QSAR) modeling, experimental validation, and molecular modeling and simulations. Our results show that the available data are adequate to establish robust QSAR models, as validated by chemical synthesis and pharmacological evaluation of a validation set of DAT inhibitors. Furthermore, the QSAR models based on subsets of the data according to experimental approaches used have predictive power as well, which opens the door to target specific functional states of a protein. Complementarily, our molecular modeling and simulations identified the structural elements responsible for a pair of DAT inhibitors having opposite binding affinity trends at DAT and hERG, which can be leveraged for rational optimization of lead atypical DAT inhibitors with desired pharmacological properties.
Collapse
Affiliation(s)
- Kuo Hao Lee
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| | - Andrew D Fant
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| | - Jiqing Guo
- Libin Cardiovascular Institute of Alberta, Cumming School of Medicine, University of Calgary, Calgary, Alberta T2N 4N1, Canada
| | - Andy Guan
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| | - Joslyn Jung
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| | - Mary Kudaibergenova
- Centre for Molecular Simulation, Department of Biological Sciences, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Williams E Miranda
- Centre for Molecular Simulation, Department of Biological Sciences, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Therese Ku
- Medicinal Chemistry Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| | - Jianjing Cao
- Medicinal Chemistry Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| | - Soren Wacker
- Libin Cardiovascular Institute of Alberta, Cumming School of Medicine, University of Calgary, Calgary, Alberta T2N 4N1, Canada.,Centre for Molecular Simulation, Department of Biological Sciences, University of Calgary, Calgary, Alberta T2N 1N4, Canada.,Achlys Inc., 7-126 Li Ka Shing Center for Health and Innovation, Edmonton, Alberta T6G 2E1, Canada
| | - Henry J Duff
- Libin Cardiovascular Institute of Alberta, Cumming School of Medicine, University of Calgary, Calgary, Alberta T2N 4N1, Canada
| | - Amy Hauck Newman
- Medicinal Chemistry Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| | - Sergei Y Noskov
- Centre for Molecular Simulation, Department of Biological Sciences, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Lei Shi
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse-Intramural Research Program, National Institutes of Health, Baltimore, Maryland 21224, United States
| |
Collapse
|
113
|
Danishuddin, Kumar V, Faheem M, Woo Lee K. A decade of machine learning-based predictive models for human pharmacokinetics: Advances and challenges. Drug Discov Today 2021; 27:529-537. [PMID: 34592448 DOI: 10.1016/j.drudis.2021.09.013] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 07/21/2021] [Accepted: 09/22/2021] [Indexed: 11/28/2022]
Abstract
Traditionally, in vitro and in vivo methods are useful for estimating human pharmacokinetics (PK) parameters; however, it is impractical to perform these complex and expensive experiments on a large number of compounds. The integration of publicly available chemical, or medical Big Data and artificial intelligence (AI)-based approaches led to qualitative and quantitative prediction of human PK of a candidate drug. However, predicting drug response with these approaches is challenging, partially because of the adaptation of algorithmic and limitations related to experimental data. In this report, we provide an overview of machine learning (ML)-based quantitative structure-activity relationship (QSAR) models used in the assessment or prediction of PK values as well as databases available for obtaining such data.
Collapse
Affiliation(s)
- Danishuddin
- Department of Bio & Medical Big Data (BK4), Division of Life Sciences, Research Institute of Natural Sciences (RINS), Gyeongsang National University (GNU), 501 Jinju-daero, Jinju 52828, Republic of Korea
| | - Vikas Kumar
- Department of Bio & Medical Big Data (BK4), Division of Life Sciences, Research Institute of Natural Sciences (RINS), Gyeongsang National University (GNU), 501 Jinju-daero, Jinju 52828, Republic of Korea
| | - Mohammad Faheem
- Department of Biotechnology, Indian Institute of Technology, Roorkee, Uttarakhand 247667, India
| | - Keun Woo Lee
- Department of Bio & Medical Big Data (BK4), Division of Life Sciences, Research Institute of Natural Sciences (RINS), Gyeongsang National University (GNU), 501 Jinju-daero, Jinju 52828, Republic of Korea.
| |
Collapse
|
114
|
Web-Based Quantitative Structure-Activity Relationship Resources Facilitate Effective Drug Discovery. Top Curr Chem (Cham) 2021; 379:37. [PMID: 34554348 DOI: 10.1007/s41061-021-00349-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 08/17/2021] [Indexed: 12/28/2022]
Abstract
Traditional drug discovery effectively contributes to the treatment of many diseases but is limited by high costs and long cycles. Quantitative structure-activity relationship (QSAR) methods were introduced to evaluate the activity of compounds virtually, which saves the significant cost of determining the activities of the compounds experimentally. Over the past two decades, many web tools for QSAR modeling with various features have been developed to facilitate the usage of QSAR methods. These web tools significantly reduce the difficulty of using QSAR and indirectly promote drug discovery. However, there are few comprehensive summaries of these QSAR tools, and researchers may have difficulty determining which tool to use. Hence, we systematically surveyed the mainstream web tools for QSAR modeling. This work may guide researchers in choosing appropriate web tools for developing QSAR models, and may also help develop more bioinformatics tools based on these existing resources. For nonprofessionals, we also hope to make more people aware of QSAR methods and expand their use.
Collapse
|
115
|
Lombardo T, Duquesnoy M, El-Bouysidy H, Årén F, Gallo-Bueno A, Jørgensen PB, Bhowmik A, Demortière A, Ayerbe E, Alcaide F, Reynaud M, Carrasco J, Grimaud A, Zhang C, Vegge T, Johansson P, Franco AA. Artificial Intelligence Applied to Battery Research: Hype or Reality? Chem Rev 2021; 122:10899-10969. [PMID: 34529918 PMCID: PMC9227745 DOI: 10.1021/acs.chemrev.1c00108] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
![]()
This is a critical
review of artificial intelligence/machine learning
(AI/ML) methods applied to battery research. It aims at providing
a comprehensive, authoritative, and critical, yet easily understandable,
review of general interest to the battery community. It addresses
the concepts, approaches, tools, outcomes, and challenges of using
AI/ML as an accelerator for the design and optimization of the next
generation of batteries—a current hot topic. It intends to
create both accessibility of these tools to the chemistry and electrochemical
energy sciences communities and completeness in terms of the different
battery R&D aspects covered.
Collapse
Affiliation(s)
- Teo Lombardo
- Laboratoire de Réactivité et Chimie des Solides (LRCS), UMR CNRS 7314, Université de Picardie Jules Verne, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Réseau sur le Stockage Electrochimique de l'Energie (RS2E), FR CNRS 3459, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France
| | - Marc Duquesnoy
- Laboratoire de Réactivité et Chimie des Solides (LRCS), UMR CNRS 7314, Université de Picardie Jules Verne, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Réseau sur le Stockage Electrochimique de l'Energie (RS2E), FR CNRS 3459, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France
| | - Hassna El-Bouysidy
- Laboratoire de Réactivité et Chimie des Solides (LRCS), UMR CNRS 7314, Université de Picardie Jules Verne, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Department of Physics, Chalmers University of Technology, SE-41296 Göteborg, Sweden
| | - Fabian Årén
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Department of Physics, Chalmers University of Technology, SE-41296 Göteborg, Sweden
| | - Alfonso Gallo-Bueno
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Centre for Cooperative Research on Alternative Energies (CIC energiGUNE), Basque Research and Technology Alliance (BRTA), Alava Technology Park, Albert Einstein 48, 01510 Vitoria-Gasteiz, Spain
| | - Peter Bjørn Jørgensen
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Department of Energy Conversion and Storage, Technical University of Denmark, Anker Engelunds Vej, Building 301, 2800 Kgs. Lyngby, Denmark
| | - Arghya Bhowmik
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Department of Energy Conversion and Storage, Technical University of Denmark, Anker Engelunds Vej, Building 301, 2800 Kgs. Lyngby, Denmark
| | - Arnaud Demortière
- Laboratoire de Réactivité et Chimie des Solides (LRCS), UMR CNRS 7314, Université de Picardie Jules Verne, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Réseau sur le Stockage Electrochimique de l'Energie (RS2E), FR CNRS 3459, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France
| | - Elixabete Ayerbe
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,CIDETEC, Basque Research and Technology Alliance (BRTA), Po. Miramón 196, 20014 Donostia-San Sebastián, Spain
| | - Francisco Alcaide
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,CIDETEC, Basque Research and Technology Alliance (BRTA), Po. Miramón 196, 20014 Donostia-San Sebastián, Spain
| | - Marine Reynaud
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Centre for Cooperative Research on Alternative Energies (CIC energiGUNE), Basque Research and Technology Alliance (BRTA), Alava Technology Park, Albert Einstein 48, 01510 Vitoria-Gasteiz, Spain
| | - Javier Carrasco
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Centre for Cooperative Research on Alternative Energies (CIC energiGUNE), Basque Research and Technology Alliance (BRTA), Alava Technology Park, Albert Einstein 48, 01510 Vitoria-Gasteiz, Spain
| | - Alexis Grimaud
- Réseau sur le Stockage Electrochimique de l'Energie (RS2E), FR CNRS 3459, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,UMR CNRS 8260 "Chimie du Solide et Energie", Collège de France, 11 Place Marcelin Berthelot, 75231 Paris Cedex 05, France Sorbonne Universités - UPMC Univ Paris 06, 4 Place Jussieu, F-75005 Paris, France
| | - Chao Zhang
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Department of Chemistry - Ångström Laboratory, Box 538, 75121 Uppsala, Sweden
| | - Tejs Vegge
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Department of Energy Conversion and Storage, Technical University of Denmark, Anker Engelunds Vej, Building 301, 2800 Kgs. Lyngby, Denmark
| | - Patrik Johansson
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Department of Physics, Chalmers University of Technology, SE-41296 Göteborg, Sweden
| | - Alejandro A Franco
- Laboratoire de Réactivité et Chimie des Solides (LRCS), UMR CNRS 7314, Université de Picardie Jules Verne, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Réseau sur le Stockage Electrochimique de l'Energie (RS2E), FR CNRS 3459, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Institut Universitaire de France, 103 Boulevard Saint Michel, 75005 Paris, France
| |
Collapse
|
116
|
Keith JA, Vassilev-Galindo V, Cheng B, Chmiela S, Gastegger M, Müller KR, Tkatchenko A. Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems. Chem Rev 2021; 121:9816-9872. [PMID: 34232033 PMCID: PMC8391798 DOI: 10.1021/acs.chemrev.1c00107] [Citation(s) in RCA: 188] [Impact Index Per Article: 62.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Indexed: 12/23/2022]
Abstract
Machine learning models are poised to make a transformative impact on chemical sciences by dramatically accelerating computational algorithms and amplifying insights available from computational chemistry methods. However, achieving this requires a confluence and coaction of expertise in computer science and physical sciences. This Review is written for new and experienced researchers working at the intersection of both fields. We first provide concise tutorials of computational chemistry and machine learning methods, showing how insights involving both can be achieved. We follow with a critical review of noteworthy applications that demonstrate how computational chemistry and machine learning can be used together to provide insightful (and useful) predictions in molecular and materials modeling, retrosyntheses, catalysis, and drug design.
Collapse
Affiliation(s)
- John A. Keith
- Department
of Chemical and Petroleum Engineering Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Valentin Vassilev-Galindo
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Bingqing Cheng
- Accelerate
Programme for Scientific Discovery, Department
of Computer Science and Technology, 15 J. J. Thomson Avenue, Cambridge CB3 0FD, United Kingdom
| | - Stefan Chmiela
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Michael Gastegger
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea
- Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany
- Google Research, Brain Team, 10117 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| |
Collapse
|
117
|
Gimadiev TR, Lin A, Afonina VA, Batyrshin D, Nugmanov RI, Akhmetshin T, Sidorov P, Duybankova N, Verhoeven J, Wegner J, Ceulemans H, Gedich A, Madzhidov TI, Varnek A. Reaction Data Curation I: Chemical Structures and Transformations Standardization. Mol Inform 2021; 40:e2100119. [PMID: 34427989 DOI: 10.1002/minf.202100119] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 08/13/2021] [Indexed: 12/11/2022]
Abstract
The quality of experimental data for chemical reactions is a critical consideration for any reaction-driven study. However, the curation of reaction data has not been extensively discussed in the literature so far. Here, we suggest a 4 steps protocol that includes the curation of individual structures (reactants and products), chemical transformations, reaction conditions and endpoints. Its implementation in Python3 using CGRTools toolkit has been used to clean three popular reaction databases Reaxys, USPTO and Pistachio. The curated USPTO database is available in the GitHub repository (Laboratoire-de-Chemoinformatique/Reaction_Data_Cleaning).
Collapse
Affiliation(s)
- Timur R Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan
| | - Arkadii Lin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France
| | - Valentina A Afonina
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Dinar Batyrshin
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Ramil I Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Tagir Akhmetshin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France.,Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan
| | | | - Jonas Verhoeven
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Joerg Wegner
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Hugo Ceulemans
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Andrey Gedich
- Arcadia Inc., Bol'shoy Sampsoniyevskiy Prospekt, 28 κopпyc 2, 194044, St Petersburg, Russia
| | - Timur I Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan.,Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France
| |
Collapse
|
118
|
Muratov EN, Amaro R, Andrade CH, Brown N, Ekins S, Fourches D, Isayev O, Kozakov D, Medina-Franco JL, Merz KM, Oprea TI, Poroikov V, Schneider G, Todd MH, Varnek A, Winkler DA, Zakharov AV, Cherkasov A, Tropsha A. A critical overview of computational approaches employed for COVID-19 drug discovery. Chem Soc Rev 2021; 50:9121-9151. [PMID: 34212944 PMCID: PMC8371861 DOI: 10.1039/d0cs01065k] [Citation(s) in RCA: 95] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Indexed: 01/18/2023]
Abstract
COVID-19 has resulted in huge numbers of infections and deaths worldwide and brought the most severe disruptions to societies and economies since the Great Depression. Massive experimental and computational research effort to understand and characterize the disease and rapidly develop diagnostics, vaccines, and drugs has emerged in response to this devastating pandemic and more than 130 000 COVID-19-related research papers have been published in peer-reviewed journals or deposited in preprint servers. Much of the research effort has focused on the discovery of novel drug candidates or repurposing of existing drugs against COVID-19, and many such projects have been either exclusively computational or computer-aided experimental studies. Herein, we provide an expert overview of the key computational methods and their applications for the discovery of COVID-19 small-molecule therapeutics that have been reported in the research literature. We further outline that, after the first year the COVID-19 pandemic, it appears that drug repurposing has not produced rapid and global solutions. However, several known drugs have been used in the clinic to cure COVID-19 patients, and a few repurposed drugs continue to be considered in clinical trials, along with several novel clinical candidates. We posit that truly impactful computational tools must deliver actionable, experimentally testable hypotheses enabling the discovery of novel drugs and drug combinations, and that open science and rapid sharing of research results are critical to accelerate the development of novel, much needed therapeutics for COVID-19.
Collapse
Affiliation(s)
- Eugene N. Muratov
- UNC Eshelman School of Pharmacy, University of North CarolinaChapel HillNCUSA
| | - Rommie Amaro
- University of California in San DiegoSan DiegoCAUSA
| | | | | | - Sean Ekins
- Collaborations PharmaceuticalsRaleighNCUSA
| | - Denis Fourches
- Department of Chemistry, North Carolina State UniversityRaleighNCUSA
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Melon UniversityPittsburghPAUSA
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook UniversityStony BrookNYUSA
| | | | - Kenneth M. Merz
- Department of Chemistry, Michigan State UniversityEast LansingMIUSA
| | - Tudor I. Oprea
- Department of Internal Medicine and UNM Comprehensive Cancer Center, University of New Mexico, AlbuquerqueNMUSA
- Department of Rheumatology and Inflammation Research, Gothenburg UniversitySweden
- Novo Nordisk Foundation Center for Protein Research, University of CopenhagenDenmark
| | | | - Gisbert Schneider
- Institute of Pharmaceutical Sciences, Swiss Federal Institute of TechnologyZurichSwitzerland
| | | | - Alexandre Varnek
- Department of Chemistry, University of StrasbourgStrasbourgFrance
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido UniversitySapporoJapan
| | - David A. Winkler
- Monash Institute of Pharmaceutical Sciences, Monash UniversityMelbourneVICAustralia
- School of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe UniversityBundooraAustralia
- School of Pharmacy, University of NottinghamNottinghamUK
| | | | - Artem Cherkasov
- Vancouver Prostate Centre, University of British ColumbiaVancouverBCCanada
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North CarolinaChapel HillNCUSA
| |
Collapse
|
119
|
Mervin LH, Trapotsi MA, Afzal AM, Barrett IP, Bender A, Engkvist O. Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty. J Cheminform 2021; 13:62. [PMID: 34412708 PMCID: PMC8375213 DOI: 10.1186/s13321-021-00539-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 07/30/2021] [Indexed: 11/24/2022] Open
Abstract
Measurements of protein–ligand interactions have reproducibility limits due to experimental errors. Any model based on such assays will consequentially have such unavoidable errors influencing their performance which should ideally be factored into modelling and output predictions, such as the actual standard deviation of experimental measurements (σ) or the associated comparability of activity values between the aggregated heterogenous activity units (i.e., Ki versus IC50 values) during dataset assimilation. However, experimental errors are usually a neglected aspect of model generation. In order to improve upon the current state-of-the-art, we herein present a novel approach toward predicting protein–ligand interactions using a Probabilistic Random Forest (PRF) classifier. The PRF algorithm was applied toward in silico protein target prediction across ~ 550 tasks from ChEMBL and PubChem. Predictions were evaluated by taking into account various scenarios of experimental standard deviations in both training and test sets and performance was assessed using fivefold stratified shuffled splits for validation. The largest benefit in incorporating the experimental deviation in PRF was observed for data points close to the binary threshold boundary, when such information was not considered in any way in the original RF algorithm. For example, in cases when σ ranged between 0.4–0.6 log units and when ideal probability estimates between 0.4–0.6, the PRF outperformed RF with a median absolute error margin of ~ 17%. In comparison, the baseline RF outperformed PRF for cases with high confidence to belong to the active class (far from the binary decision threshold), although the RF models gave errors smaller than the experimental uncertainty, which could indicate that they were overtrained and/or over-confident. Finally, the PRF models trained with putative inactives decreased the performance compared to PRF models without putative inactives and this could be because putative inactives were not assigned an experimental pXC50 value, and therefore they were considered inactives with a low uncertainty (which in practice might not be true). In conclusion, PRF can be useful for target prediction models in particular for data where class boundaries overlap with the measurement uncertainty, and where a substantial part of the training data is located close to the classification threshold.
Collapse
Affiliation(s)
- Lewis H Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK.
| | - Maria-Anna Trapotsi
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Avid M Afzal
- Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Ian P Barrett
- Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Andreas Bender
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.,Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
120
|
Nikonenko A, Zankov D, Baskin I, Madzhidov T, Polishchuk P. Multiple Conformer Descriptors for QSAR Modeling. Mol Inform 2021; 40:e2060030. [PMID: 34342944 DOI: 10.1002/minf.202060030] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 07/19/2021] [Indexed: 12/11/2022]
Abstract
The most widely used QSAR approaches are mainly based on 2D molecular representation which ignores stereoconfiguration and conformational flexibility of compounds. 3D QSAR uses a single conformer of each compound which is difficult to choose reasonably. 4D QSAR uses multiple conformers to overcome the issues of 2D and 3D methods. However, many of existing 4D QSAR models suffer from the necessity to pre-align conformers, while alignment-independent approaches often ignore stereoconfiguration of compounds. In this study we propose a QSAR modeling approach based on transforming chirality-aware 3D pharmacophore descriptors of individual conformers into a set of latent variables representing the whole conformer set of a molecule. This is achieved by clustering together all conformers of all training set compounds. The final representation of a compound is a bit string encoding cluster membership of its conformers. In our study we used Random Forest, but this representation can be used in combination with any machine learning method. We compared this approach with conventional 2D and 3D approaches using multiple data sets and investigated the sensitivity of the approach proposed to tuning parameters: number of conformers and clusters.
Collapse
Affiliation(s)
- Aleksandra Nikonenko
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| | - Dmitry Zankov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlevskaya Str. 18, 420008, Kazan, Russia
| | - Igor Baskin
- Department of Materials Science and Engineering, Technion-Israel Institute of Technology, 3200003, Haifa, Israel
| | - Timur Madzhidov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlevskaya Str. 18, 420008, Kazan, Russia
| | - Pavel Polishchuk
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| |
Collapse
|
121
|
Mechanistic and Predictive QSAR Analysis of Diverse Molecules to Capture Salient and Hidden Pharmacophores for Anti-Thrombotic Activity. Int J Mol Sci 2021; 22:ijms22158352. [PMID: 34361118 PMCID: PMC8348508 DOI: 10.3390/ijms22158352] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 07/24/2021] [Accepted: 07/31/2021] [Indexed: 12/02/2022] Open
Abstract
Thrombosis is a life-threatening disease with a high mortality rate in many countries. Even though anti-thrombotic drugs are available, their serious side effects compel the search for safer drugs. In search of a safer anti-thrombotic drug, Quantitative Structure-Activity Relationship (QSAR) could be useful to identify crucial pharmacophoric features. The present work is based on a larger data set comprising 1121 diverse compounds to develop a QSAR model having a balance of acceptable predictive ability (Predictive QSAR) and mechanistic interpretation (Mechanistic QSAR). The developed six parametric model fulfils the recommended values for internal and external validation along with Y-randomization parameters such as R2tr = 0.831, Q2LMO = 0.828, R2ex = 0.783. The present analysis reveals that anti-thrombotic activity is found to be correlated with concealed structural traits such as positively charged ring carbon atoms, specific combination of aromatic Nitrogen and sp2-hybridized carbon atoms, etc. Thus, the model captured reported as well as novel pharmacophoric features. The results of QSAR analysis are further vindicated by reported crystal structures of compounds with factor Xa. The analysis led to the identification of useful novel pharmacophoric features, which could be used for future optimization of lead compounds.
Collapse
|
122
|
Lovrić M, Đuričić T, Tran HTN, Hussain H, Lacić E, Rasmussen MA, Kern R. Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints. Pharmaceuticals (Basel) 2021; 14:758. [PMID: 34451855 PMCID: PMC8400160 DOI: 10.3390/ph14080758] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 07/21/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023] Open
Abstract
Methods for dimensionality reduction are showing significant contributions to knowledge generation in high-dimensional modeling scenarios throughout many disciplines. By achieving a lower dimensional representation (also called embedding), fewer computing resources are needed in downstream machine learning tasks, thus leading to a faster training time, lower complexity, and statistical flexibility. In this work, we investigate the utility of three prominent unsupervised embedding techniques (principal component analysis-PCA, uniform manifold approximation and projection-UMAP, and variational autoencoders-VAEs) for solving classification tasks in the domain of toxicology. To this end, we compare these embedding techniques against a set of molecular fingerprint-based models that do not utilize additional pre-preprocessing of features. Inspired by the success of transfer learning in several fields, we further study the performance of embedders when trained on an external dataset of chemical compounds. To gain a better understanding of their characteristics, we evaluate the embedders with different embedding dimensionalities, and with different sizes of the external dataset. Our findings show that the recently popularized UMAP approach can be utilized alongside known techniques such as PCA and VAE as a pre-compression technique in the toxicology domain. Nevertheless, the generative model of VAE shows an advantage in pre-compressing the data with respect to classification accuracy.
Collapse
Affiliation(s)
- Mario Lovrić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia
| | - Tomislav Đuričić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
| | - Han T. N. Tran
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
| | - Hussain Hussain
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
| | - Emanuel Lacić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
| | - Morten A. Rasmussen
- Copenhagen Studies on Asthma in Childhood, Herlev-Gentofte Hospital, University of Copenhagen, Ledreborg Alle 34, 2820 Gentofte, Denmark;
- Department of Food Science, University of Copenhagen, Rolighedsvej 26, 1958 Frederiksberg, Denmark
| | - Roman Kern
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
| |
Collapse
|
123
|
The Psychonauts' Benzodiazepines; Quantitative Structure-Activity Relationship (QSAR) Analysis and Docking Prediction of Their Biological Activity. Pharmaceuticals (Basel) 2021; 14:ph14080720. [PMID: 34451817 PMCID: PMC8398354 DOI: 10.3390/ph14080720] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 07/19/2021] [Accepted: 07/20/2021] [Indexed: 12/28/2022] Open
Abstract
Designer benzodiazepines (DBZDs) represent a serious health concern and are increasingly reported in polydrug consumption-related fatalities. When new DBZDs are identified, very limited information is available on their pharmacodynamics. Here, computational models (i.e., quantitative structure-activity relationship/QSAR and Molecular Docking) were used to analyse DBZDs identified online by an automated web crawler (NPSfinder®) and to predict their possible activity/affinity on the gamma-aminobutyric acid A receptors (GABA-ARs). The computational software MOE was used to calculate 2D QSAR models, perform docking studies on crystallised GABA-A receptors (6HUO, 6HUP) and generate pharmacophore queries from the docking conformational results. 101 DBZDs were identified online by NPSfinder®. The validated QSAR model predicted high biological activity values for 41% of these DBDZs. These predictions were supported by the docking studies (good binding affinity) and the pharmacophore modelling confirmed the importance of the presence and location of hydrophobic and polar functions identified by QSAR. This study confirms once again the importance of web-based analysis in the assessment of drug scenarios (DBZDs), and how computational models could be used to acquire fast and reliable information on biological activity for index novel DBZDs, as preliminary data for further investigations.
Collapse
|
124
|
Tosca EM, Bartolucci R, Magni P. Application of Artificial Neural Networks to Predict the Intrinsic Solubility of Drug-Like Molecules. Pharmaceutics 2021; 13:1101. [PMID: 34371792 PMCID: PMC8309152 DOI: 10.3390/pharmaceutics13071101] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 07/15/2021] [Accepted: 07/16/2021] [Indexed: 11/25/2022] Open
Abstract
Machine learning (ML) approaches are receiving increasing attention from pharmaceutical companies and regulatory agencies, given their ability to mine knowledge from available data. In drug discovery, for example, they are employed in quantitative structure-property relationship (QSPR) models to predict biological properties from the chemical structure of a drug molecule. In this paper, following the Second Solubility Challenge (SC-2), a QSPR model based on artificial neural networks (ANNs) was built to predict the intrinsic solubility (logS0) of the 100-compound low-variance tight set and the 32-compound high-variance loose set provided by SC-2 as test datasets. First, a training dataset of 270 drug-like molecules with logS0 value experimentally determined was gathered from the literature. Then, a standard three-layer feed-forward neural network was defined by using 10 ChemGPS physico-chemical descriptors as input features. The developed ANN showed adequate predictive performances on both of the SC-2 test datasets. Benefits and limitations of ML approaches have been highlighted and discussed, starting from this case-study. The main findings confirmed that ML approaches are an attractive and promising tool to predict logS0; however, many aspects, such as data quality, molecular descriptor computation and selection, and assessment of applicability domain, are crucial but often neglected, and should be carefully considered to improve predictions based on ML.
Collapse
Affiliation(s)
| | | | - Paolo Magni
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, I-27100 Pavia, Italy; (E.M.T.); (R.B.)
| |
Collapse
|
125
|
Tinkov OV, Grigorev VY, Grigoreva LD. Prediction of an Organic Compound’s Biotransformation Time: A Study Using Avermectins. MOSCOW UNIVERSITY CHEMISTRY BULLETIN 2021. [PMCID: PMC8382113 DOI: 10.3103/s0027131421040088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The current spread of the SARS-CoV-2 coronavirus is a challenge for the entire world. Ivermectin is a promising agent, which could be used to combat the SARS-CoV-2 coronavirus. It represents a complex of semisynthetic derivatives of natural avermectins that have been taken advantage of for a long time in medicine and agriculture as antiparasitic drugs. However, the experimental ecotoxicology assessment data for individual avermectins are still scarce. In relation to this, the aim of this study is to develop a mathematical model that would allow reliably predicting the biotransformation ability of natural and semisynthetic avermectins and identifying the structural fragments of avermectin molecules that have the largest impact on this biological activity. The base for the model construction was a structurally heterogeneous set including organic compounds with experimentally determined biotransformation half-life periods (KmHL). Using the OCHEM web platform (https://ochem.eu) with the implemented PyDescriptor plugin for the descriptor calculation and Random Forest and Transformer-CNN algorithms, a satisfactory (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$R_{{{\text{test}}}}^{2}$$\end{document} = 0.81) Quantitative Relationship Structure—Activity (QSAR) model was developed. The subsequent calculations have shown that natural avermectins undergo on average faster biotransformation in fish than the semisynthetic ones. In addition, structural fragments that increase and decrease the biotransformation rate are identified.
Collapse
|
126
|
Tinkov OV, Grigorev VY, Grigoreva LD. QSAR analysis of the acute toxicity of avermectins towards Tetrahymena pyriformis. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021; 32:541-571. [PMID: 34157880 DOI: 10.1080/1062936x.2021.1932583] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 05/17/2021] [Indexed: 06/13/2023]
Abstract
Avermectins have been effectively used in medicine, veterinary medicine, and agriculture as antiparasitic agents for many years. However, there are still no reliable data on the main ecotoxicological characteristics of most individual avermectins. Although many QSAR models have been proposed to describe the acute toxicity of organic compounds towards Tetrahymena pyriformis (T. pyriformis), avermectins are outside the applicability domain of these models. The influence of the molecular structures of various organic compounds on the acute toxicity towards T. pyriformis was studied using the OCHEM web platform (https://ochem.eu). A data set of 1792 toxicants was used to create models. The QSAR (Quantitative Structure-Activity Relationship) models were developed using the molecular descriptors Dragon, ISIDA, CDK, PyDescriptor, alvaDesc, and SIRMS and machine learning methods, such as Least Squares Support Vector Machine and Transformer Convolutional Neural Network. The HYBOT descriptors and Random Forest were used for a comparative QSAR investigation. Since the best predictive ability was demonstrated by the Transformer Convolutional Neural Network model, it was used to predict the toxicity of individual avermectins towards T. pyriformis. During a structural interpretation of the developed QSAR model, we determined the significant molecular transformations that increase and decrease the acute toxicity of organic compounds.
Collapse
Affiliation(s)
- O V Tinkov
- Department of Pharmacology and Pharmaceutical Chemistry, Medical Faculty, Shevchenko Transnistria State University, Tiraspol, Moldova
- Department of Computer Science, Military Institute of the Ministry of Defense, Tiraspol, Moldova
| | - V Y Grigorev
- Department of Computer-aided Molecular Design, Institute of Physiologically Active Compounds of the Russian Academy of Science, Chernogolovka, Russia
| | - L D Grigoreva
- Department of Fundamental Physicochemical Engineering, Moscow State University, Moscow, Russia
| |
Collapse
|
127
|
Casanova-Alvarez O, Morales-Helguera A, Cabrera-Pérez MÁ, Molina-Ruiz R, Molina C. A Novel Automated Framework for QSAR Modeling of Highly Imbalanced Leishmania High-Throughput Screening Data. J Chem Inf Model 2021; 61:3213-3231. [PMID: 34191520 DOI: 10.1021/acs.jcim.0c01439] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In silico prediction of antileishmanial activity using quantitative structure-activity relationship (QSAR) models has been developed on limited and small datasets. Nowadays, the availability of large and diverse high-throughput screening data provides an opportunity to the scientific community to model this activity from the chemical structure. In this study, we present the first KNIME automated workflow to modeling a large, diverse, and highly imbalanced dataset of compounds with antileishmanial activity. Because the data is strongly biased toward inactive compounds, a novel strategy was implemented based on the selection of different balanced training sets and a further consensus model using single decision trees as the base model and three criteria for output combinations. The decision tree consensus was adopted after comparing its classification performance to consensuses built upon Gaussian-Naı̈ve-Bayes, Support-Vector-Machine, Random-Forest, Gradient-Boost, and Multi-Layer-Perceptron base models. All these consensuses were rigorously validated using internal and external test validation sets and were compared against each other using Friedman and Bonferroni-Dunn statistics. For the retained decision tree-based consensus model, which covers 100% of the chemical space of the dataset and with the lowest consensus level, the overall accuracy statistics for test and external sets were between 71 and 74% and 71 and 76%, respectively, while for a reduced chemical space (21%) and with an incremental consensus level, the accuracy statistics were substantially improved with values for the test and external sets between 86 and 92% and 88 and 92%, respectively. These results highlight the relevance of the consensus model to prioritize a relatively small set of active compounds with high prediction sensitivity using the Incremental Consensus at high level values or to predict as many compounds as possible, lowering the level of Incremental Consensus. Finally, the workflow developed eliminates human bias, improves the procedure reproducibility, and allows other researchers to reproduce our design and use it in their own QSAR problems.
Collapse
Affiliation(s)
- Omar Casanova-Alvarez
- Departamento de Química, Facultad de Química-Farmacia, Universidad Central "Marta Abreu" de Las Villas, Santa Clara, Villa Clara 54830, Cuba
| | - Aliuska Morales-Helguera
- Centro de Bioactivos Químicos, Universidad Central "Marta Abreu" de Las Villas, Santa Clara, Villa Clara 54830, Cuba
| | - Miguel Ángel Cabrera-Pérez
- Centro de Bioactivos Químicos, Universidad Central "Marta Abreu" de Las Villas, Santa Clara, Villa Clara 54830, Cuba
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos, Universidad Central "Marta Abreu" de Las Villas, Santa Clara, Villa Clara 54830, Cuba
| | - Christophe Molina
- PIKAÏROS S.A., B03 - 2 Allée de la Clairière, 31650 Saint Orens de Gameville, France
| |
Collapse
|
128
|
Abstract
This work incorporates machine learning (ML) techniques, such as multivariate regression, the multi-layer perceptron, and random forest to predict the slip length at the nanoscale. Data points are collected both from our simulation data and data from the literature, and comprise Molecular Dynamics simulations of simple monoatomic, polar, and molecular liquids. Training and test points cover a wide range of input parameters which have been found to affect the slip length value, concerning dynamical and geometrical characteristics of the model, along with simulation parameters that constitute the simulation conditions. The aim of this work is to suggest an accurate and efficient procedure capable of reproducing physical properties, such as the slip length, acting parallel to simulation methods. Non-linear models, based on neural networks and decision trees, have been found to achieve better performance compared to linear regression methods. After the model is trained on representative simulation data, it is capable of accurately predicting the slip length values in regions between or in close proximity to the input data range, at the nanoscale. Results also reveal that, as channel dimensions increase, the slip length turns into a size-independent material property, affected mainly by wall roughness and wettability.
Collapse
|
129
|
Pinacho-Castellanos SA, García-Jacas CR, Gilson MK, Brizuela CA. Alignment-Free Antimicrobial Peptide Predictors: Improving Performance by a Thorough Analysis of the Largest Available Data Set. J Chem Inf Model 2021; 61:3141-3157. [PMID: 34081438 DOI: 10.1021/acs.jcim.1c00251] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
In the last two decades, a large number of machine-learning-based predictors for the activities of antimicrobial peptides (AMPs) have been proposed. These predictors differ from one another in the learning method and in the training and testing data sets used. Unfortunately, the training data sets present several drawbacks, such as a low representativeness regarding the experimentally validated AMP space, and duplicated peptide sequences between negative and positive data sets. These limitations give a low confidence to most of the approaches to be used in prospective studies. To address these weaknesses, we propose novel modeling and assessing data sets from the largest experimentally validated nonredundant peptide data set reported to date. From these novel data sets, alignment-free quantitative sequence-activity models (AF-QSAMs) based on Random Forest are created to identify general AMPs and their antibacterial, antifungal, antiparasitic, and antiviral functional types. An applicability domain analysis is carried out to determine the reliability of the predictions obtained, which, to the best of our knowledge, is performed for the first time for AMP recognition. A benchmarking is undertaken between the models proposed and several models from the literature that are freely available in 13 programs (ClassAMP, iAMP-2L, ADAM, MLAMP, AMPScanner v2.0, AntiFP, AMPfun, PEPred-suite, AxPEP, CAMPR3, iAMPpred, APIN, and Meta-iAVP). The models proposed are those with the best performance in all of the endpoints modeled, while most of the methods from the literature have weak-to-random predictive agreements. The models proposed are also assessed through Y-scrambling and repeated k-fold cross-validation tests, demonstrating that the outcomes obtained by them are not given by chance. Three chemometric analyses also confirmed the relevance of the peptides descriptors used in the modeling. Therefore, it can be concluded that the models built by fixing the drawbacks existing in the literature contribute to identifying antibacterial, antifungal, antiparasitic, and antiviral peptides with high effectivity and reliability. Models are freely available via the AMPDiscover tool at https://biocom-ampdiscover.cicese.mx/.
Collapse
Affiliation(s)
- Sergio A Pinacho-Castellanos
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), 22860 Ensenada, Baja California, México.,Centro de Investigación y Desarrollo de Tecnología Digital (CITEDI), Instituto Politécnico Nacional (IPN), 22435 Tijuana, Baja California, México
| | - César R García-Jacas
- Cátedras CONACYT-Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), 22860 Ensenada, Baja California, México
| | - Michael K Gilson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California 92093, United States
| | - Carlos A Brizuela
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), 22860 Ensenada, Baja California, México
| |
Collapse
|
130
|
Moreira-Filho JT, Silva AC, Dantas RF, Gomes BF, Souza Neto LR, Brandao-Neto J, Owens RJ, Furnham N, Neves BJ, Silva-Junior FP, Andrade CH. Schistosomiasis Drug Discovery in the Era of Automation and Artificial Intelligence. Front Immunol 2021; 12:642383. [PMID: 34135888 PMCID: PMC8203334 DOI: 10.3389/fimmu.2021.642383] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 04/30/2021] [Indexed: 12/20/2022] Open
Abstract
Schistosomiasis is a parasitic disease caused by trematode worms of the genus Schistosoma and affects over 200 million people worldwide. The control and treatment of this neglected tropical disease is based on a single drug, praziquantel, which raises concerns about the development of drug resistance. This, and the lack of efficacy of praziquantel against juvenile worms, highlights the urgency for new antischistosomal therapies. In this review we focus on innovative approaches to the identification of antischistosomal drug candidates, including the use of automated assays, fragment-based screening, computer-aided and artificial intelligence-based computational methods. We highlight the current developments that may contribute to optimizing research outputs and lead to more effective drugs for this highly prevalent disease, in a more cost-effective drug discovery endeavor.
Collapse
Affiliation(s)
- José T. Moreira-Filho
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Arthur C. Silva
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Rafael F. Dantas
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Barbara F. Gomes
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Lauro R. Souza Neto
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Jose Brandao-Neto
- Diamond Light Source Ltd., Didcot, United Kingdom
- Research Complex at Harwell, Didcot, United Kingdom
| | - Raymond J. Owens
- The Rosalind Franklin Institute, Harwell, United Kingdom
- Division of Structural Biology, The Wellcome Centre for Human Genetic, University of Oxford, Oxford, United Kingdom
| | - Nicholas Furnham
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Bruno J. Neves
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Floriano P. Silva-Junior
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Carolina H. Andrade
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| |
Collapse
|
131
|
Costa RPO, Lucena LF, Silva LMA, Zocolo GJ, Herrera-Acevedo C, Scotti L, Da-Costa FB, Ionov N, Poroikov V, Muratov EN, Scotti MT. The SistematX Web Portal of Natural Products: An Update. J Chem Inf Model 2021; 61:2516-2522. [PMID: 34014674 DOI: 10.1021/acs.jcim.1c00083] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Natural products and their secondary metabolites are promising starting points for the development of drug prototypes and new drugs, as many current treatments for numerous diseases are directly or indirectly related to such compounds. State-of-the-art, curated, integrated, and frequently updated databases of secondary metabolites are thus highly relevant to drug discovery. The SistematX Web Portal, introduced in 2018, is undergoing development to address this need and documents crucial information about plant secondary metabolites, including the exact location of the species from which the compounds were isolated. SistematX also allows registered users to log in to the data management area and gain access to administrative pages. This study reports recent updates and modifications to the SistematX Web Portal, including a batch download option, the generation and visualization of 1H and 13C nuclear magnetic resonance spectra, and the calculation of physicochemical (drug-like and lead-like) properties and biological activity profiles. The SistematX Web Portal is freely available at http://sistematx.ufpb.br.
Collapse
Affiliation(s)
- Renan P O Costa
- Laboratory of Cheminformatics, Instituto de Pesquisa em Fármacos e Medicamentos (IPeFarM), Universidade Federal da Paraíba, Campus I, Cidade Universitária, João Pessoa 58051-900, PB, Brazil
| | - Lucas F Lucena
- Laboratory of Cheminformatics, Instituto de Pesquisa em Fármacos e Medicamentos (IPeFarM), Universidade Federal da Paraíba, Campus I, Cidade Universitária, João Pessoa 58051-900, PB, Brazil
| | - Lorena Mara A Silva
- Laboratório Multiusuário de Química de Produtos Naturais, Embrapa Agroindústria Tropical, Rua Doutora Sara Mesquita 2270, Planalto do Pici, Fortaleza 60511110, CE, Brazil
| | - Guilherme Julião Zocolo
- Laboratório Multiusuário de Química de Produtos Naturais, Embrapa Agroindústria Tropical, Rua Doutora Sara Mesquita 2270, Planalto do Pici, Fortaleza 60511110, CE, Brazil
| | - Chonny Herrera-Acevedo
- Laboratory of Cheminformatics, Instituto de Pesquisa em Fármacos e Medicamentos (IPeFarM), Universidade Federal da Paraíba, Campus I, Cidade Universitária, João Pessoa 58051-900, PB, Brazil
| | - Luciana Scotti
- Laboratory of Cheminformatics, Instituto de Pesquisa em Fármacos e Medicamentos (IPeFarM), Universidade Federal da Paraíba, Campus I, Cidade Universitária, João Pessoa 58051-900, PB, Brazil
| | - Fernando Batista Da-Costa
- AsterBioChem Research Team, Laboratory of Pharmacognosy, School of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Av do café s/n, Ribeirão Preto 14040-903, SP, Brazil
| | - Nikita Ionov
- Laboratory of Structure-Function Based Drug Design, Department of Bioinformatics, Institute of Biomedical Chemistry, Pogodinskaya Str. 10, bldg. 8, Moscow 119121, Russia
| | - Vladimir Poroikov
- Laboratory of Structure-Function Based Drug Design, Department of Bioinformatics, Institute of Biomedical Chemistry, Pogodinskaya Str. 10, bldg. 8, Moscow 119121, Russia
| | - Eugene N Muratov
- Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Marcus T Scotti
- Laboratory of Cheminformatics, Instituto de Pesquisa em Fármacos e Medicamentos (IPeFarM), Universidade Federal da Paraíba, Campus I, Cidade Universitária, João Pessoa 58051-900, PB, Brazil
| |
Collapse
|
132
|
Tarasova O, Poroikov V. Machine Learning in Discovery of New Antivirals and Optimization of Viral Infections Therapy. Curr Med Chem 2021; 28:7840-7861. [PMID: 33949929 DOI: 10.2174/0929867328666210504114351] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 02/13/2021] [Accepted: 02/24/2021] [Indexed: 11/22/2022]
Abstract
Nowadays, computational approaches play an important role in the design of new drug-like compounds and optimization of pharmacotherapeutic treatment of diseases. The emerging growth of viral infections, including those caused by the Human Immunodeficiency Virus (HIV), Ebola virus, recently detected coronavirus, and some others, leads to many newly infected people with a high risk of death or severe complications. A huge amount of chemical, biological, clinical data is at the disposal of the researchers. Therefore, there are many opportunities to find the relationships between the particular features of chemical data and the antiviral activity of biologically active compounds based on machine learning approaches. Biological and clinical data can also be used for building models to predict relationships between viral genotype and drug resistance, which might help determine the clinical outcome of treatment. In the current study, we consider machine-learning approaches in the antiviral research carried out during the past decade. We overview in detail the application of machine-learning methods for the design of new potential antiviral agents and vaccines, drug resistance prediction, and analysis of virus-host interactions. Our review also covers the perspectives of using the machine-learning approaches for antiviral research, including Dengue, Ebola viruses, Influenza A, Human Immunodeficiency Virus, coronaviruses, and some others.
Collapse
Affiliation(s)
- Olga Tarasova
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow. Russian Federation
| | - Vladimir Poroikov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow. Russian Federation
| |
Collapse
|
133
|
Wang L, Ding J, Shi P, Fu L, Pan L, Tian J, Cao D, Jiang H, Ding X. Ensemble machine learning to evaluate the in vivo acute oral toxicity and in vitro human acetylcholinesterase inhibitory activity of organophosphates. Arch Toxicol 2021; 95:2443-2457. [PMID: 33934188 DOI: 10.1007/s00204-021-03056-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 04/21/2021] [Indexed: 12/13/2022]
Abstract
Organophosphates (OPs) are hazardous chemicals widely used in industry and agriculture. Distribution of their residues in nature causes serious risks to humans, animals, and plants. To reduce hazards from OPs, quantitative structure-activity relationship (QSAR) models for predicting their acute oral toxicity in rats and mice and inhibition constants concerning human acetylcholinesterase were developed according to the bioactivity data of 456 unique OPs. Based on robust, two-dimensional molecular descriptors and quantum chemical descriptors, which accurately reflect OP electronic structures and reactivities, the influences of eight machine-learning algorithms on the prediction performance of the QSAR models were explored, and consensus QSAR models were constructed. Several strict model validation indices and the results of applicability domain evaluations show that the established consensus QSAR models exhibit good robustness, practical prediction abilities, and wide application scopes. Poor correlation was observed between acute oral toxicity at the mammalian level and the inhibition constants at the molecular level, indicating that the acute toxicity of OPs cannot be evaluated only by the experimental data of enzyme inhibitory activity, their toxicokinetic characteristics must also be considered. The constructed QSAR models described herein provide rapid, theoretical assessment of the bioactivity of unstudied or unknown OPs, as well as guidance for making decisions regarding their regulation.
Collapse
Affiliation(s)
- Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China
| | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China
| | - Peichang Shi
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China
| | - Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, China
| | - Li Pan
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China
| | - Jiahao Tian
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, China. .,Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, People's Republic of China.
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China.
| | - Xiaoqin Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China.
| |
Collapse
|
134
|
Alves VM, Auerbach SS, Kleinstreuer N, Rooney JP, Muratov EN, Rusyn I, Tropsha A, Schmitt C. Curated Data In - Trustworthy In Silico Models Out: The Impact of Data Quality on the Reliability of Artificial Intelligence Models as Alternatives to Animal Testing. Altern Lab Anim 2021; 49:73-82. [PMID: 34233495 PMCID: PMC8609471 DOI: 10.1177/02611929211029635] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
New Approach Methodologies (NAMs) that employ artificial intelligence (AI) for predicting adverse effects of chemicals have generated optimistic expectations as alternatives to animal testing. However, the major underappreciated challenge in developing robust and predictive AI models is the impact of the quality of the input data on the model accuracy. Indeed, poor data reproducibility and quality have been frequently cited as factors contributing to the crisis in biomedical research, as well as similar shortcomings in the fields of toxicology and chemistry. In this article, we review the most recent efforts to improve confidence in the robustness of toxicological data and investigate the impact that data curation has on the confidence in model predictions. We also present two case studies demonstrating the effect of data curation on the performance of AI models for predicting skin sensitisation and skin irritation. We show that, whereas models generated with uncurated data had a 7-24% higher correct classification rate (CCR), the perceived performance was, in fact, inflated owing to the high number of duplicates in the training set. We assert that data curation is a critical step in building computational models, to help ensure that reliable predictions of chemical toxicity are achieved through use of the models.
Collapse
Affiliation(s)
- Vinicius M. Alves
- Office of Data Science, Division of the National Toxicology Program (DNTP), National Institute of Environmental Health Sciences (NIEHS), Durham, NC, USA
| | - Scott S. Auerbach
- Toxinformatics Group, Predictive Toxicology Branch, DNTP, NIEHS, Durham, NC, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Scientific Director's Office, DNTP, NIEHS, Durham, NC, USA
| | - John P. Rooney
- Integrated Laboratory Systems, LLC, Morrisville, NC, USA
| | - Eugene N. Muratov
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, The University of North Carolina at Chapel Hill, NC, USA
- Department of Pharmaceutical Sciences, Federal University of Paraiba, Joao Pessoa, Paraiba, Brazil
| | - Ivan Rusyn
- Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, The University of North Carolina at Chapel Hill, NC, USA
| | - Charles Schmitt
- Office of Data Science, Division of the National Toxicology Program (DNTP), National Institute of Environmental Health Sciences (NIEHS), Durham, NC, USA
| |
Collapse
|
135
|
Watson O, Cortes-Ciriano I, Watson JA. A semi-supervised learning framework for quantitative structure-activity regression modelling. Bioinformatics 2021; 37:342-350. [PMID: 32777821 PMCID: PMC8058768 DOI: 10.1093/bioinformatics/btaa711] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2019] [Revised: 07/14/2020] [Accepted: 08/03/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Quantitative structure-activity relationship (QSAR) methods are increasingly used in assisting the process of preclinical, small molecule drug discovery. Regression models are trained on data consisting of a finite-dimensional representation of molecular structures and their corresponding target-specific activities. These supervised learning models can then be used to predict the activity of previously unmeasured novel compounds. RESULTS This work provides methods that solve three problems in QSAR modelling: (i) a method for comparing the information content between finite-dimensional representations of molecular structures (fingerprints) with respect to the target of interest, (ii) a method that quantifies how the accuracy of the model prediction degrades as a function of the distance between the testing and training data and (iii) a method to adjust for screening dependent selection bias inherent in many training datasets. For example, in the most extreme cases, only compounds which pass an activity-dependent screening threshold are reported. A semi-supervised learning framework combines (ii) and (iii) and can make predictions, which take into account the similarity of the testing compounds to those in the training data and adjust for the reporting selection bias. We illustrate the three methods using publicly available structure-activity data for a large set of compounds reported by GlaxoSmithKline (the Tres Cantos AntiMalarial Set, TCAMS) to inhibit asexual in vitro Plasmodium falciparum growth. AVAILABILITYAND IMPLEMENTATION https://github.com/owatson/PenalizedPrediction. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Oliver Watson
- Evariste Technologies Ltd, Goring on Thames RG8 9AL, UK
| | - Isidro Cortes-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK
| | - James A Watson
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford OX1 2JD, UK.,Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| |
Collapse
|
136
|
Rodríguez-Martínez X, Pascual-San-José E, Campoy-Quiles M. Accelerating organic solar cell material's discovery: high-throughput screening and big data. ENERGY & ENVIRONMENTAL SCIENCE 2021; 14:3301-3322. [PMID: 34211582 PMCID: PMC8209551 DOI: 10.1039/d1ee00559f] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 04/20/2021] [Indexed: 05/27/2023]
Abstract
The discovery of novel high-performing materials such as non-fullerene acceptors and low band gap donor polymers underlines the steady increase of record efficiencies in organic solar cells witnessed during the past years. Nowadays, the resulting catalogue of organic photovoltaic materials is becoming unaffordably vast to be evaluated following classical experimentation methodologies: their requirements in terms of human workforce time and resources are prohibitively high, which slows momentum to the evolution of the organic photovoltaic technology. As a result, high-throughput experimental and computational methodologies are fostered to leverage their inherently high exploratory paces and accelerate novel materials discovery. In this review, we present some of the computational (pre)screening approaches performed prior to experimentation to select the most promising molecular candidates from the available materials libraries or, alternatively, generate molecules beyond human intuition. Then, we outline the main high-throuhgput experimental screening and characterization approaches with application in organic solar cells, namely those based on lateral parametric gradients (measuring-intensive) and on automated device prototyping (fabrication-intensive). In both cases, experimental datasets are generated at unbeatable paces, which notably enhance big data readiness. Herein, machine-learning algorithms find a rewarding application niche to retrieve quantitative structure-activity relationships and extract molecular design rationale, which are expected to keep the material's discovery pace up in organic photovoltaics.
Collapse
Affiliation(s)
| | | | - Mariano Campoy-Quiles
- Institut de Ciència de Materials de Barcelona, ICMAB-CSIC, Campus UAB 08193 Bellaterra Spain
| |
Collapse
|
137
|
Mizera M, Latek D. Ligand-Receptor Interactions and Machine Learning in GCGR and GLP-1R Drug Discovery. Int J Mol Sci 2021; 22:ijms22084060. [PMID: 33920024 PMCID: PMC8071054 DOI: 10.3390/ijms22084060] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 03/31/2021] [Accepted: 04/07/2021] [Indexed: 12/03/2022] Open
Abstract
The large amount of data that has been collected so far for G protein-coupled receptors requires machine learning (ML) approaches to fully exploit its potential. Our previous ML model based on gradient boosting used for prediction of drug affinity and selectivity for a receptor subtype was compared with explicit information on ligand-receptor interactions from induced-fit docking. Both methods have proved their usefulness in drug response predictions. Yet, their successful combination still requires allosteric/orthosteric assignment of ligands from datasets. Our ligand datasets included activities of two members of the secretin receptor family: GCGR and GLP-1R. Simultaneous activation of two or three receptors of this family by dual or triple agonists is not a typical kind of information included in compound databases. A precise allosteric/orthosteric ligand assignment requires a continuous update based on new structural and biological data. This data incompleteness remains the main obstacle for current ML methods applied to class B GPCR drug discovery. Even so, for these two class B receptors, our ligand-based ML model demonstrated high accuracy (5-fold cross-validation Q2 > 0.63 and Q2 > 0.67 for GLP-1R and GCGR, respectively). In addition, we performed a ligand annotation using recent cryogenic-electron microscopy (cryo-EM) and X-ray crystallographic data on small-molecule complexes of GCGR and GLP-1R. As a result, we assigned GLP-1R and GCGR actives deposited in ChEMBL to four small-molecule binding sites occupied by positive and negative allosteric modulators and a full agonist. Annotated compounds were added to our recently released repository of GPCR data.
Collapse
|
138
|
Ferreira LT, Borba JVB, Moreira-Filho JT, Rimoldi A, Andrade CH, Costa FTM. QSAR-Based Virtual Screening of Natural Products Database for Identification of Potent Antimalarial Hits. Biomolecules 2021; 11:biom11030459. [PMID: 33808643 PMCID: PMC8003391 DOI: 10.3390/biom11030459] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 03/15/2021] [Accepted: 03/16/2021] [Indexed: 01/15/2023] Open
Abstract
With about 400,000 annual deaths worldwide, malaria remains a public health burden in tropical and subtropical areas, especially in low-income countries. Selection of drug-resistant Plasmodium strains has driven the need to explore novel antimalarial compounds with diverse modes of action. In this context, biodiversity has been widely exploited as a resourceful channel of biologically active compounds, as exemplified by antimalarial drugs such as quinine and artemisinin, derived from natural products. Thus, combining a natural product library and quantitative structure-activity relationship (QSAR)-based virtual screening, we have prioritized genuine and derivative natural compounds with potential antimalarial activity prior to in vitro testing. Experimental validation against cultured chloroquine-sensitive and multi-drug-resistant P. falciparum strains confirmed the potent and selective activity of two sesquiterpene lactones (LDT-597 and LDT-598) identified in silico. Quantitative structure-property relationship (QSPR) models predicted absorption, distribution, metabolism, and excretion (ADME) and physiologically based pharmacokinetic (PBPK) parameters for the most promising compound, showing that it presents good physiologically based pharmacokinetic properties both in rats and humans. Altogether, the in vitro parasite growth inhibition results obtained from in silico screened compounds encourage the use of virtual screening campaigns for identification of promising natural compound-based antimalarial molecules.
Collapse
Affiliation(s)
- Letícia Tiburcio Ferreira
- Laboratory of Tropical Diseases Prof. Dr. Luiz Jacintho da Silva, Department of Genetics, Evolution, Microbiology and Immunology, University of Campinas-UNICAMP, Campinas, SP 13083-864, Brazil; (L.T.F.); (J.V.B.B.); (A.R.)
| | - Joyce V. B. Borba
- Laboratory of Tropical Diseases Prof. Dr. Luiz Jacintho da Silva, Department of Genetics, Evolution, Microbiology and Immunology, University of Campinas-UNICAMP, Campinas, SP 13083-864, Brazil; (L.T.F.); (J.V.B.B.); (A.R.)
- Laboratory of Molecular Modeling and Drug Design, LabMol, Faculty of Pharmacy, Federal University of Goiás, Goiânia, GO 74605-170, Brazil; (J.T.M.-F.); (C.H.A.)
| | - José Teófilo Moreira-Filho
- Laboratory of Molecular Modeling and Drug Design, LabMol, Faculty of Pharmacy, Federal University of Goiás, Goiânia, GO 74605-170, Brazil; (J.T.M.-F.); (C.H.A.)
| | - Aline Rimoldi
- Laboratory of Tropical Diseases Prof. Dr. Luiz Jacintho da Silva, Department of Genetics, Evolution, Microbiology and Immunology, University of Campinas-UNICAMP, Campinas, SP 13083-864, Brazil; (L.T.F.); (J.V.B.B.); (A.R.)
| | - Carolina Horta Andrade
- Laboratory of Molecular Modeling and Drug Design, LabMol, Faculty of Pharmacy, Federal University of Goiás, Goiânia, GO 74605-170, Brazil; (J.T.M.-F.); (C.H.A.)
| | - Fabio Trindade Maranhão Costa
- Laboratory of Tropical Diseases Prof. Dr. Luiz Jacintho da Silva, Department of Genetics, Evolution, Microbiology and Immunology, University of Campinas-UNICAMP, Campinas, SP 13083-864, Brazil; (L.T.F.); (J.V.B.B.); (A.R.)
- Correspondence: ; Tel.: +55-19-3521-6288
| |
Collapse
|
139
|
Lima MNN, Borba JVB, Cassiano GC, Mottin M, Mendonça SS, Silva AC, Tomaz KCP, Calit J, Bargieri DY, Costa FTM, Andrade CH. Artificial Intelligence Applied to the Rapid Identification of New Antimalarial Candidates with Dual-Stage Activity. ChemMedChem 2021; 16:1093-1103. [PMID: 33247522 DOI: 10.1002/cmdc.202000685] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2002] [Revised: 11/16/2020] [Indexed: 01/06/2023]
Abstract
Increasing reports of multidrug-resistant malaria parasites urge the discovery of new effective drugs with different chemical scaffolds. Protein kinases play a key role in many cellular processes such as signal transduction and cell division, making them interesting targets in many diseases. Protein kinase 7 (PK7) is an orphan kinase from the Plasmodium genus, essential for the sporogonic cycle of these parasites. Here, we applied a robust and integrative artificial intelligence-assisted virtual-screening (VS) approach using shape-based and machine learning models to identify new potential PK7 inhibitors with in vitro antiplasmodial activity. Eight virtual hits were experimentally evaluated, and compound LabMol-167 inhibited ookinete conversion of Plasmodium berghei and blood stages of Plasmodium falciparum at nanomolar concentrations with low cytotoxicity in mammalian cells. As PK7 does not have an essential role in the Plasmodium blood stage and our virtual screening strategy aimed for both PK7 and blood-stage inhibition, we conducted an in silico target fishing approach and propose that this compound might also inhibit P. falciparum PK5, acting as a possible dual-target inhibitor. Finally, docking studies of LabMol-167 with P. falciparum PK7 and PK5 proteins highlighted key interactions for further hit-to lead optimization.
Collapse
Affiliation(s)
- Marilia N N Lima
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Rua 240, qd. 87, Goiânia, GO, 74605-170, Brazil
| | - Joyce V B Borba
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Rua 240, qd. 87, Goiânia, GO, 74605-170, Brazil.,Laboratory of Tropical Diseases - Prof. Dr. Luiz Jacintho da Silva, Department of Genetics Evolution, Microbiology and Immunology, Institute of Biology, 13083-970, Campinas, SP, Brazil
| | - Gustavo C Cassiano
- Laboratory of Tropical Diseases - Prof. Dr. Luiz Jacintho da Silva, Department of Genetics Evolution, Microbiology and Immunology, Institute of Biology, 13083-970, Campinas, SP, Brazil.,Global Health and Tropical Medicine (GHTM), Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisbon, Portugal
| | - Melina Mottin
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Rua 240, qd. 87, Goiânia, GO, 74605-170, Brazil
| | - Sabrina S Mendonça
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Rua 240, qd. 87, Goiânia, GO, 74605-170, Brazil
| | - Arthur C Silva
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Rua 240, qd. 87, Goiânia, GO, 74605-170, Brazil
| | - Kaira C P Tomaz
- Laboratory of Tropical Diseases - Prof. Dr. Luiz Jacintho da Silva, Department of Genetics Evolution, Microbiology and Immunology, Institute of Biology, 13083-970, Campinas, SP, Brazil
| | - Juliana Calit
- Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo, 05508-000, São Paulo, SP, Brazil
| | - Daniel Y Bargieri
- Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo, 05508-000, São Paulo, SP, Brazil
| | - Fabio T M Costa
- Laboratory of Tropical Diseases - Prof. Dr. Luiz Jacintho da Silva, Department of Genetics Evolution, Microbiology and Immunology, Institute of Biology, 13083-970, Campinas, SP, Brazil
| | - Carolina H Andrade
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Rua 240, qd. 87, Goiânia, GO, 74605-170, Brazil.,Laboratory of Tropical Diseases - Prof. Dr. Luiz Jacintho da Silva, Department of Genetics Evolution, Microbiology and Immunology, Institute of Biology, 13083-970, Campinas, SP, Brazil
| |
Collapse
|
140
|
Lovrić M, Malev O, Klobučar G, Kern R, Liu JJ, Lučić B. Predictive Capability of QSAR Models Based on the CompTox Zebrafish Embryo Assays: An Imbalanced Classification Problem. Molecules 2021; 26:1617. [PMID: 33803931 PMCID: PMC7998177 DOI: 10.3390/molecules26061617] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 03/03/2021] [Accepted: 03/11/2021] [Indexed: 02/06/2023] Open
Abstract
The CompTox Chemistry Dashboard (ToxCast) contains one of the largest public databases on Zebrafish (Danio rerio) developmental toxicity. The data consists of 19 toxicological endpoints on unique 1018 compounds measured in relatively low concentration ranges. The endpoints are related to developmental effects occurring in dechorionated zebrafish embryos for 120 hours post fertilization and monitored via gross malformations and mortality. We report the predictive capability of 209 quantitative structure-activity relationship (QSAR) models developed by machine learning methods using penalization techniques and diverse model quality metrics to cope with the imbalanced endpoints. All these QSAR models were generated to test how the imbalanced classification (toxic or non-toxic) endpoints could be predicted regardless which of three algorithms is used: logistic regression, multi-layer perceptron, or random forests. Additionally, QSAR toxicity models are developed starting from sets of classical molecular descriptors, structural fingerprints and their combinations. Only 8 out of 209 models passed the 0.20 Matthew's correlation coefficient value defined a priori as a threshold for acceptable model quality on the test sets. The best models were obtained for endpoints mortality (MORT), ActivityScore and JAW (deformation). The low predictability of the QSAR model developed from the zebrafish embryotoxicity data in the database is mainly due to a higher sensitivity of 19 measurements of endpoints carried out on dechorionated embryos at low concentrations.
Collapse
Affiliation(s)
- Mario Lovrić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (R.K.)
- Ruđer Bošković Institute, P.O. Box 180, 10002 Zagreb, Croatia;
| | - Olga Malev
- Ruđer Bošković Institute, P.O. Box 180, 10002 Zagreb, Croatia;
- Department of Biology, Faculty of Science, University of Zagreb, Rooseveltov Trg 6, 10000 Zagreb, Croatia;
| | - Göran Klobučar
- Department of Biology, Faculty of Science, University of Zagreb, Rooseveltov Trg 6, 10000 Zagreb, Croatia;
| | - Roman Kern
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (R.K.)
- Institute of Interactive Systems and Data Science, TU Graz, Inffeldgasse 16c, 8010 Graz, Austria
| | - Jay J. Liu
- Department of Chemical Engineering, Pukyong National University, Busan 608-739, Korea
| | - Bono Lučić
- Ruđer Bošković Institute, P.O. Box 180, 10002 Zagreb, Croatia;
| |
Collapse
|
141
|
Sifain AE, Rice BM, Yalkowsky SH, Barnes BC. Machine learning transition temperatures from 2D structure. J Mol Graph Model 2021; 105:107848. [PMID: 33667863 DOI: 10.1016/j.jmgm.2021.107848] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/11/2021] [Accepted: 01/19/2021] [Indexed: 10/22/2022]
Abstract
A priori knowledge of physicochemical properties such as melting and boiling could expedite materials discovery. However, theoretical modeling from first principles poses a challenge for efficient virtual screening of potential candidates. As an alternative, the tools of data science are becoming increasingly important for exploring chemical datasets and predicting material properties. Herein, we extend a molecular representation, or set of descriptors, first developed for quantitative structure-property relationship modeling by Yalkowsky and coworkers known as the Unified Physicochemical Property Estimation Relationships (UPPER). This molecular representation has group-constitutive and geometrical descriptors that map to enthalpy and entropy; two thermodynamic quantities that drive thermal phase transitions. We extend the UPPER representation to include additional information about sp2-bonded fragments. Additionally, instead of using the UPPER descriptors in a series of thermodynamically-inspired calculations, as per Yalkowsky, we use the descriptors to construct a vector representation for use with machine learning techniques. The concise and easy-to-compute representation, combined with a gradient-boosting decision tree model, provides an appealing framework for predicting experimental transition temperatures in a diverse chemical space. An application to energetic materials shows that the method is predictive, despite a relatively modest energetics reference dataset. We also report competitive results on diverse public datasets of melting points (i.e., OCHEM, Enamine, Bradley, and Bergström) comprised of over 47k structures. Open source software is available at https://github.com/USArmyResearchLab/ARL-UPPER.
Collapse
Affiliation(s)
- Andrew E Sifain
- CCDC Army Research Laboratory, Aberdeen Proving Ground, MD, 21005, USA
| | - Betsy M Rice
- CCDC Army Research Laboratory, Aberdeen Proving Ground, MD, 21005, USA
| | - Samuel H Yalkowsky
- Department of Pharmaceutics, College of Pharmacy, University of Arizona, Tucson, AZ, 85721, USA
| | - Brian C Barnes
- CCDC Army Research Laboratory, Aberdeen Proving Ground, MD, 21005, USA.
| |
Collapse
|
142
|
Guha R, Willighagen E, Zdrazil B, Jeliazkova N. What is the role of cheminformatics in a pandemic? J Cheminform 2021; 13:16. [PMID: 33653411 PMCID: PMC7922726 DOI: 10.1186/s13321-021-00491-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 01/22/2021] [Indexed: 11/10/2022] Open
Affiliation(s)
- Rajarshi Guha
- Vertex Pharmaceuticals, 50 Northern Ave, Boston, MA, 02210, USA.
| | - Egon Willighagen
- Maastricht University, Universiteitssingel 50, 6229 ER, Maastricht, Netherlands
| | - Barbara Zdrazil
- University of Vienna, Althanstraße 14, 1090, Vienna, Austria
| | | |
Collapse
|
143
|
Abstract
Simulations of fluid flows at the nanoscale feature massive data production and machine learning (ML) techniques have been developed during recent years to leverage them, presenting unique results. This work facilitates ML tools to provide an insight on properties among molecular dynamics (MD) simulations, covering missing data points and predicting states not previously located by the simulation. Taking the fluid flow of a simple Lennard-Jones liquid in nanoscale slits as a basis, ML regression-based algorithms are exploited to provide an alternative for the calculation of transport properties of fluids, e.g., the diffusion coefficient, shear viscosity and thermal conductivity and the average velocity across the nanochannels. Through appropriate training and testing, ML-predicted values can be extracted for various input variables, such as the geometrical characteristics of the slits, the interaction parameters between particles and the flow driving force. The proposed technique could act in parallel to simulation as a means of enriching the database of material properties, assisting in coupling between scales, and accelerating data-based scientific computations.
Collapse
|
144
|
Bobrowski T, Chen L, Eastman RT, Itkin Z, Shinn P, Chen CZ, Guo H, Zheng W, Michael S, Simeonov A, Hall MD, Zakharov AV, Muratov EN. Synergistic and Antagonistic Drug Combinations against SARS-CoV-2. Mol Ther 2021; 29:873-885. [PMID: 33333292 PMCID: PMC7834738 DOI: 10.1016/j.ymthe.2020.12.016] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Revised: 11/15/2020] [Accepted: 12/09/2020] [Indexed: 01/15/2023] Open
Abstract
Antiviral drug development for coronavirus disease 2019 (COVID-19) is occurring at an unprecedented pace, yet there are still limited therapeutic options for treating this disease. We hypothesized that combining drugs with independent mechanisms of action could result in synergy against SARS-CoV-2, thus generating better antiviral efficacy. Using in silico approaches, we prioritized 73 combinations of 32 drugs with potential activity against SARS-CoV-2 and then tested them in vitro. Sixteen synergistic and eight antagonistic combinations were identified; among 16 synergistic cases, combinations of the US Food and Drug Administration (FDA)-approved drug nitazoxanide with remdesivir, amodiaquine, or umifenovir were most notable, all exhibiting significant synergy against SARS-CoV-2 in a cell model. However, the combination of remdesivir and lysosomotropic drugs, such as hydroxychloroquine, demonstrated strong antagonism. Overall, these results highlight the utility of drug repurposing and preclinical testing of drug combinations for discovering potential therapies to treat COVID-19.
Collapse
Affiliation(s)
- Tesia Bobrowski
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Lu Chen
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Richard T Eastman
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Zina Itkin
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Paul Shinn
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Catherine Z Chen
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Hui Guo
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Wei Zheng
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Sam Michael
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Anton Simeonov
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Matthew D Hall
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Alexey V Zakharov
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD 20850, USA.
| | - Eugene N Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599, USA.
| |
Collapse
|
145
|
Jain S, Siramshetty VB, Alves VM, Muratov EN, Kleinstreuer N, Tropsha A, Nicklaus MC, Simeonov A, Zakharov AV. Large-Scale Modeling of Multispecies Acute Toxicity End Points Using Consensus of Multitask Deep Learning Methods. J Chem Inf Model 2021; 61:653-663. [PMID: 33533614 DOI: 10.1021/acs.jcim.0c01164] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Computational methods to predict molecular properties regarding safety and toxicology represent alternative approaches to expedite drug development, screen environmental chemicals, and thus significantly reduce associated time and costs. There is a strong need and interest in the development of computational methods that yield reliable predictions of toxicity, and many approaches, including the recently introduced deep neural networks, have been leveraged towards this goal. Herein, we report on the collection, curation, and integration of data from the public data sets that were the source of the ChemIDplus database for systemic acute toxicity. These efforts generated the largest publicly available such data set comprising > 80,000 compounds measured against a total of 59 acute systemic toxicity end points. This data was used for developing multiple single- and multitask models utilizing random forest, deep neural networks, convolutional, and graph convolutional neural network approaches. For the first time, we also reported the consensus models based on different multitask approaches. To the best of our knowledge, prediction models for 36 of the 59 end points have never been published before. Furthermore, our results demonstrated a significantly better performance of the consensus model obtained from three multitask learning approaches that particularly predicted the 29 smaller tasks (less than 300 compounds) better than other models developed in the study. The curated data set and the developed models have been made publicly available at https://github.com/ncats/ld50-multitask, https://predictor.ncats.io/, and https://cactus.nci.nih.gov/download/acute-toxicity-db (data set only) to support regulatory and research applications.
Collapse
Affiliation(s)
- Sankalp Jain
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Vishal B Siramshetty
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Vinicius M Alves
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Nicole Kleinstreuer
- Division of Intramural Research, Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Durham, North Carolina 27709, United States.,National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Durham, North Carolina 27709, United States
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Marc C Nicklaus
- Computer-Aided Drug Design (CADD) Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, NCI-Frederick, 376 Boyles Street, Frederick, Maryland 21702, United States
| | - Anton Simeonov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Alexey V Zakharov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| |
Collapse
|
146
|
In Silico Studies of Lamiaceae Diterpenes with Bioinsecticide Potential against Aphis gossypii and Drosophila melanogaster. Molecules 2021; 26:molecules26030766. [PMID: 33540716 PMCID: PMC7867283 DOI: 10.3390/molecules26030766] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 01/26/2021] [Accepted: 01/26/2021] [Indexed: 12/19/2022] Open
Abstract
Background: The growing demand for agricultural products has led to the misuse/overuse of insecticides; resulting in the use of higher concentrations and the need for ever more toxic products. Ecologically, bioinsecticides are considered better and safer than synthetic insecticides; they must be toxic to the target organism, yet with low or no toxicity to non-target organisms. Many plant extracts have seen their high insecticide potential confirmed under laboratory conditions, and in the search for plant compounds with bioinsecticidal activity, the Lamiaceae family has yielded satisfactory results. Objective: The aim of our study was to develop computer-assisted predictions for compounds with known insecticidal activity against Aphis gossypii and Drosophila melanogaster. Results and conclusion: Structure analysis revealed ent-kaurane, kaurene, and clerodane diterpenes as the most active, showing excellent results. We also found that the interactions formed by these compounds were more stable, or presented similar stability to the commercialized insecticides tested. Overall, we concluded that the compounds bistenuifolin L (1836) and bistenuifolin K (1931), were potentially active against A. gossypii enzymes; and salvisplendin C (1086) and salvixalapadiene (1195), are potentially active against D. melanogaster. We observed and highlight that the diterpenes bistenuifolin L (1836), bistenuifolin K (1931), salvisplendin C (1086), and salvixalapadiene (1195), present a high probability of activity and low toxicity against the species studied.
Collapse
|
147
|
Espinoza GZ, Angelo RM, Oliveira PR, Honorio KM. Evaluating Deep Learning models for predicting ALK-5 inhibition. PLoS One 2021; 16:e0246126. [PMID: 33508008 PMCID: PMC7842961 DOI: 10.1371/journal.pone.0246126] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 01/14/2021] [Indexed: 11/18/2022] Open
Abstract
Computational methods have been widely used in drug design. The recent developments in machine learning techniques and the ever-growing chemical and biological databases are fertile ground for discoveries in this area. In this study, we evaluated the performance of Deep Learning models in comparison to Random Forest, and Support Vector Regression for predicting the biological activity (pIC50) of ALK-5 inhibitors as candidates to treat cancer. The generalization power of the models was assessed by internal and external validation procedures. A deep neural network model obtained the best performance in this comparative study, achieving a coefficient of determination of 0.658 on the external validation set with mean square error and mean absolute error of 0.373 and 0.450, respectively. Additionally, the relevance of the chemical descriptors for the prediction of biological activity was estimated using Permutation Importance. We can conclude that the forecast model obtained by the deep neural network is suitable for the problem and can be employed to predict the biological activity of new ALK-5 inhibitors.
Collapse
Affiliation(s)
- Gabriel Z. Espinoza
- School of Arts, Sciences and Humanities, University of Sao Paulo, Sao Paulo, Sao Paulo, Brazil
| | - Rafaela M. Angelo
- School of Arts, Sciences and Humanities, University of Sao Paulo, Sao Paulo, Sao Paulo, Brazil
| | - Patricia R. Oliveira
- School of Arts, Sciences and Humanities, University of Sao Paulo, Sao Paulo, Sao Paulo, Brazil
- * E-mail: (PRO); (KMH)
| | - Kathia M. Honorio
- School of Arts, Sciences and Humanities, University of Sao Paulo, Sao Paulo, Sao Paulo, Brazil
- Federal University of ABC, Santo Andre, Sao Paulo, Brazil
- * E-mail: (PRO); (KMH)
| |
Collapse
|
148
|
Wang LL, Ding JJ, Pan L, Fu L, Tian JH, Cao DS, Jiang H, Ding XQ. Quantitative structure-toxicity relationship model for acute toxicity of organophosphates via multiple administration routes in rats and mice. JOURNAL OF HAZARDOUS MATERIALS 2021; 401:123724. [PMID: 33113726 DOI: 10.1016/j.jhazmat.2020.123724] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 07/29/2020] [Accepted: 08/13/2020] [Indexed: 06/11/2023]
Abstract
Organophosphates (OPs) are highly toxic compounds, with widespread application in agricultural and chemical industries, whose introduction into the environment poses serious hazards to humans and ecological systems. To assess and ultimately mitigate these hazards, this study predicted the acute toxicity of OPs according to their chemical structure and administration route. The acute toxicity data of 161 OPs in two species via six different administration routes were manually collected and used to develop a series of quantitative structure-toxicity relationship (QSTR) models with robust and practical predictive abilities. The random forest algorithm was used to develop the models, employing both quantum chemical and two-dimensional descriptors according to OECD guidelines. Correlation results and feature similarities indicated that whereas acute toxicity data from rats and mice via the same administration route were combinable for modeling, data from different routes were not. Six QSTR models for each route in a single species and two QSTR models for a single route in the two species were constructed, achieving practical predictive performance. Despite significant variances in their datasets, the prediction models could predict the acute toxicity of novel or unknown OPs, realize rapid assessment, and provide guidance for regulatory decisions to reduce the hazards of OPs.
Collapse
Affiliation(s)
- Liang-Liang Wang
- Beijing Institute of Pharmaceutical Chemistry, Beijing, 102205, PR China
| | - Jun-Jie Ding
- Beijing Institute of Pharmaceutical Chemistry, Beijing, 102205, PR China
| | - Li Pan
- Beijing Institute of Pharmaceutical Chemistry, Beijing, 102205, PR China
| | - Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, PR China
| | - Jia-Hao Tian
- Beijing Institute of Pharmaceutical Chemistry, Beijing, 102205, PR China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, PR China; Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, PR China.
| | - Hui Jiang
- Beijing Institute of Pharmaceutical Chemistry, Beijing, 102205, PR China.
| | - Xiao-Qin Ding
- Beijing Institute of Pharmaceutical Chemistry, Beijing, 102205, PR China.
| |
Collapse
|
149
|
Korshunova M, Ginsburg B, Tropsha A, Isayev O. OpenChem: A Deep Learning Toolkit for Computational Chemistry and Drug Design. J Chem Inf Model 2021; 61:7-13. [PMID: 33393291 DOI: 10.1021/acs.jcim.0c00971] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Deep learning models have demonstrated outstanding results in many data-rich areas of research, such as computer vision and natural language processing. Currently, there is a rise of deep learning in computational chemistry and materials informatics, where deep learning could be effectively applied in modeling the relationship between chemical structures and their properties. With the immense growth of chemical and materials data, deep learning models can begin to outperform conventional machine learning techniques such as random forest, support vector machines, and nearest neighbor. Herein, we introduce OpenChem, a PyTorch-based deep learning toolkit for computational chemistry and drug design. OpenChem offers easy and fast model development, modular software design, and several data preprocessing modules. It is freely available via the GitHub repository.
Collapse
Affiliation(s)
- Maria Korshunova
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, United States.,Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, United States
| | - Boris Ginsburg
- NVIDIA Corporation, Santa Clara, California 95050, United States
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Olexandr Isayev
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, United States.,Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, United States
| |
Collapse
|
150
|
Smit IA, Afzal AM, Allen CHG, Svensson F, Hanser T, Bender A. Systematic Analysis of Protein Targets Associated with Adverse Events of Drugs from Clinical Trials and Postmarketing Reports. Chem Res Toxicol 2020; 34:365-384. [PMID: 33351593 DOI: 10.1021/acs.chemrestox.0c00294] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Adverse drug reactions (ADRs) are undesired effects of medicines that can harm patients and are a significant source of attrition in drug development. ADRs are anticipated by routinely screening drugs against secondary pharmacology protein panels. However, there is still a lack of quantitative information on the links between these off-target proteins and the reporting of ADRs in humans. Here, we present a systematic analysis of associations between measured and predicted in vitro bioactivities of drugs and adverse events (AEs) in humans from two sources of data: the Side Effect Resource, derived from clinical trials, and the Food and Drug Administration Adverse Event Reporting System, derived from postmarketing surveillance. The ratio of a drug's therapeutic unbound plasma concentration over the drug's in vitro potency against a given protein was used to select proteins most likely to be relevant to in vivo effects. In examining individual target bioactivities as predictors of AEs, we found a trade-off between the positive predictive value and the fraction of drugs with AEs that can be detected. However, considering sets of multiple targets for the same AE can help identify a greater fraction of AE-associated drugs. Of the 45 targets with statistically significant associations to AEs, 30 are included on existing safety target panels. The remaining 15 targets include 9 carbonic anhydrases, of which CA5B is significantly associated with cholestatic jaundice. We include the full quantitative data on associations between measured and predicted in vitro bioactivities and AEs in humans in this work, which can be used to make a more informed selection of safety profiling targets.
Collapse
Affiliation(s)
- Ines A Smit
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Avid M Afzal
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Chad H G Allen
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Fredrik Svensson
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Thierry Hanser
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, United Kingdom
| | - Andreas Bender
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|