1
|
Liu J, Khan MKH, Guo W, Dong F, Ge W, Zhang C, Gong P, Patterson TA, Hong H. Machine learning and deep learning approaches for enhanced prediction of hERG blockade: a comprehensive QSAR modeling study. Expert Opin Drug Metab Toxicol 2024:1-20. [PMID: 38968091 DOI: 10.1080/17425255.2024.2377593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 06/26/2024] [Indexed: 07/07/2024]
Abstract
BACKGROUND Cardiotoxicity is a major cause of drug withdrawal. The hERG channel, regulating ion flow, is pivotal for heart and nervous system function. Its blockade is a concern in drug development. Predicting hERG blockade is essential for identifying cardiac safety issues. Various QSAR models exist, but their performance varies. Ongoing improvements show promise, necessitating continued efforts to enhance accuracy using emerging deep learning algorithms in predicting potential hERG blockade. STUDY DESIGN AND METHOD Using a large training dataset, six individual QSAR models were developed. Additionally, three ensemble models were constructed. All models were evaluated using 10-fold cross-validations and two external datasets. RESULTS The 10-fold cross-validations resulted in Mathews correlation coefficient (MCC) values from 0.682 to 0.730, surpassing the best-reported model on the same dataset (0.689). External validations yielded MCC values from 0.520 to 0.715 for the first dataset, exceeding those of previously reported models (0-0.599). For the second dataset, MCC values fell between 0.025 and 0.215, aligning with those of reported models (0.112-0.220). CONCLUSIONS The developed models can assist the pharmaceutical industry and regulatory agencies in predicting hERG blockage activity, thereby enhancing safety assessments and reducing the risk of adverse cardiac events associated with new drug candidates.
Collapse
Affiliation(s)
- Jie Liu
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Md Kamrul Hasan Khan
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Wenjing Guo
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Fan Dong
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Weigong Ge
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, USA
| | - Ping Gong
- Environmental Laboratory, US Army Engineer Research and Development Center, Vicksburg, MS, USA
| | - Tucker A Patterson
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| |
Collapse
|
2
|
Gong C, Feng Y, Zhu J, Liu G, Tang Y, Li W. Evaluation of machine learning models for cytochrome P450 3A4, 2D6, and 2C9 inhibition. J Appl Toxicol 2024; 44:1050-1066. [PMID: 38544296 DOI: 10.1002/jat.4601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 02/26/2024] [Accepted: 03/05/2024] [Indexed: 07/21/2024]
Abstract
Cytochrome P450 (CYP) enzymes are involved in the metabolism of approximately 75% of marketed drugs. Inhibition of the major drug-metabolizing P450s could alter drug metabolism and lead to undesirable drug-drug interactions. Therefore, it is of great significance to explore the inhibition of P450s in drug discovery. Currently, machine learning including deep learning algorithms has been widely used for constructing in silico models for the prediction of P450 inhibition. These models exhibited varying predictive performance depending on the use of machine learning algorithms and molecular representations. This leads to the difficulty in the selection of appropriate models for practical use. In this study, we systematically evaluated the conventional machine learning and deep learning models for three major P450 enzymes, CYP3A4, CYP2D6, and CYP2C9 from several perspectives, such as algorithms, molecular representation, and data partitioning strategies. Our results showed that the XGBoost and CatBoost algorithms coupled with the combined fingerprint/physicochemical descriptor features exhibited the best performance with Area Under Curve (AUC) of 0.92, while the deep learning models were generally inferior to the conventional machine learning models (average AUC reached 0.89) on the same test sets. We also found that data volume and sampling strategy had a minor effect on model performance. We anticipate that these results are helpful for the selection of molecular representations and machine learning/deep learning algorithms in the P450 model construction and the future model development of P450 inhibition.
Collapse
Affiliation(s)
- Changda Gong
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Yanjun Feng
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Jieyu Zhu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| |
Collapse
|
3
|
Zhang R, Yuan R, Tian B. PointGAT: A Quantum Chemical Property Prediction Model Integrating Graph Attention and 3D Geometry. J Chem Theory Comput 2024; 20:4115-4128. [PMID: 38727259 DOI: 10.1021/acs.jctc.3c01420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Predicting quantum chemical properties is a fundamental challenge for computational chemistry. While the development of graph neural networks has advanced molecular representation learning and property prediction, their performance could be further enhanced by incorporating three-dimensional (3D) structural geometry into two-dimensional (2D) molecular graph representation. In this study, we introduce the PointGAT model for quantum molecular property prediction, which integrates 3D molecular coordinates with graph-attention modeling. Comparison with other current models in molecular prediction tasks showed that PointGAT could provide higher predictive accuracy in various benchmark data sets from MoleculeNet, including ESOL, FreeSolv, Lipop, HIV, and 6 out of 12 tasks of the QM9 data set. To further examine PointGAT prediction of quantum mechanical (QM) energies, we constructed a C10 data set comprising 11,841 charged and chiral carbocation intermediates with QM energies calculated at the DM21/6-31G*//B3LYP/6-31G* levels. Notably, PointGAT achieved an R2 value of 0.950 and an MAE of 1.616 kcal/mol, outperforming even the best-performing graph neural network model with a reduction of 0.216 kcal/mol in MAE and an improvement of 0.050 in R2. Additional ablation studies indicated that incorporating molecular geometry into the model resulted in markedly higher predictive accuracy, reducing the MAE value from 1.802 to 1.616 kcal/mol. Moreover, visualization of PointGAT atomic attention weights suggested its predictions were interpretable. Findings in this study support the application of PointGAT as a powerful and versatile tool for quantum chemical property prediction that can facilitate high-accuracy modeling for fundamental exploration of chemical space as well as drug design and molecular engineering.
Collapse
Affiliation(s)
- Rong Zhang
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Rongqing Yuan
- Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Boxue Tian
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
4
|
Mostafa F, Howle V, Chen M. Machine Learning to Predict Drug-Induced Liver Injury and Its Validation on Failed Drug Candidates in Development. TOXICS 2024; 12:385. [PMID: 38922065 PMCID: PMC11207878 DOI: 10.3390/toxics12060385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 05/15/2024] [Accepted: 05/21/2024] [Indexed: 06/27/2024]
Abstract
Drug-induced liver injury (DILI) poses a significant challenge for the pharmaceutical industry and regulatory bodies. Despite extensive toxicological research aimed at mitigating DILI risk, the effectiveness of these techniques in predicting DILI in humans remains limited. Consequently, researchers have explored novel approaches and procedures to enhance the accuracy of DILI risk prediction for drug candidates under development. In this study, we leveraged a large human dataset to develop machine learning models for assessing DILI risk. The performance of these prediction models was rigorously evaluated using a 10-fold cross-validation approach and an external test set. Notably, the random forest (RF) and multilayer perceptron (MLP) models emerged as the most effective in predicting DILI. During cross-validation, RF achieved an average prediction accuracy of 0.631, while MLP achieved the highest Matthews Correlation Coefficient (MCC) of 0.245. To validate the models externally, we applied them to a set of drug candidates that had failed in clinical development due to hepatotoxicity. Both RF and MLP accurately predicted the toxic drug candidates in this external validation. Our findings suggest that in silico machine learning approaches hold promise for identifying DILI liabilities associated with drug candidates during development.
Collapse
Affiliation(s)
- Fahad Mostafa
- Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX 79409, USA; (F.M.); (V.H.)
- Division of Bioinformatics and Biostatistics, the US FDA’s National Center for Toxicological Research, Jefferson, AR 72029, USA
| | - Victoria Howle
- Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX 79409, USA; (F.M.); (V.H.)
| | - Minjun Chen
- Division of Bioinformatics and Biostatistics, the US FDA’s National Center for Toxicological Research, Jefferson, AR 72029, USA
| |
Collapse
|
5
|
Antoniou M, Papavasileiou KD, Melagraki G, Dondero F, Lynch I, Afantitis A. Development of a Robust Read-Across Model for the Prediction of Biological Potency of Novel Peroxisome Proliferator-Activated Receptor Delta Agonists. Int J Mol Sci 2024; 25:5216. [PMID: 38791255 PMCID: PMC11121726 DOI: 10.3390/ijms25105216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/02/2024] [Accepted: 05/03/2024] [Indexed: 05/26/2024] Open
Abstract
A robust predictive model was developed using 136 novel peroxisome proliferator-activated receptor delta (PPARδ) agonists, a distinct subtype of lipid-activated transcription factors of the nuclear receptor superfamily that regulate target genes by binding to characteristic sequences of DNA bases. The model employs various structural descriptors and docking calculations and provides predictions of the biological activity of PPARδ agonists, following the criteria of the Organization for Economic Co-operation and Development (OECD) for the development and validation of quantitative structure-activity relationship (QSAR) models. Specifically focused on small molecules, the model facilitates the identification of highly potent and selective PPARδ agonists and offers a read-across concept by providing the chemical neighbours of the compound under study. The model development process was conducted on Isalos Analytics Software (v. 0.1.17) which provides an intuitive environment for machine-learning applications. The final model was released as a user-friendly web tool and can be accessed through the Enalos Cloud platform's graphical user interface (GUI).
Collapse
Affiliation(s)
- Maria Antoniou
- Department of Chemoinformatics, NovaMechanics Ltd., Nicosia 1046, Cyprus; (M.A.); (K.D.P.)
- Department of ChemoInformatics, NovaMechanics MIKE, 18545 Piraeus, Greece
- Entelos Institute, Larnaca 6059, Cyprus; (F.D.); (I.L.)
| | - Konstantinos D. Papavasileiou
- Department of Chemoinformatics, NovaMechanics Ltd., Nicosia 1046, Cyprus; (M.A.); (K.D.P.)
- Department of ChemoInformatics, NovaMechanics MIKE, 18545 Piraeus, Greece
- Entelos Institute, Larnaca 6059, Cyprus; (F.D.); (I.L.)
| | - Georgia Melagraki
- Division of Physical Sciences & Applications, Hellenic Military Academy, 16672 Vari, Greece;
| | - Francesco Dondero
- Entelos Institute, Larnaca 6059, Cyprus; (F.D.); (I.L.)
- Department of Science and Technological Innovation, Università del Piemonte Orientale, 15121 Alessandria, Italy
| | - Iseult Lynch
- Entelos Institute, Larnaca 6059, Cyprus; (F.D.); (I.L.)
- School of Geography, Earth and Environmental Sciences, University of Birmingham Edgbaston, Birmingham B15 2TT, UK
| | - Antreas Afantitis
- Department of Chemoinformatics, NovaMechanics Ltd., Nicosia 1046, Cyprus; (M.A.); (K.D.P.)
- Department of ChemoInformatics, NovaMechanics MIKE, 18545 Piraeus, Greece
- Entelos Institute, Larnaca 6059, Cyprus; (F.D.); (I.L.)
| |
Collapse
|
6
|
Connor S, Li T, Qu Y, Roberts RA, Tong W. Generation of a drug-induced renal injury list to facilitate the development of new approach methodologies for nephrotoxicity. Drug Discov Today 2024; 29:103938. [PMID: 38432353 DOI: 10.1016/j.drudis.2024.103938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 02/16/2024] [Accepted: 02/27/2024] [Indexed: 03/05/2024]
Abstract
Drug-induced renal injury (DIRI) causes >1.5 million adverse events annually in the USA alone. Although standard biomarkers exist for DIRI, they lack the sensitivity or specificity to detect nephrotoxicity before the significant loss of renal function. In this study, we describe the creation of DIRIL - a list of drugs associated with DIRI and nephrotoxicity - from two literature datasets with DIRI annotation, confirmed using FDA drug labeling. DIRIL comprises 317 orally administered drugs covering all 14 anatomical, therapeutic and chemical (ATC) classification categories. Of the 317 drugs, 171 were DIRI-positive and 146 were DIRI-negative. DIRIL will be a relevant and invaluable resource for discovery of new approach methods (NAMs) to predict the occurrence and possible severity of DIRI earlier in drug development.
Collapse
Affiliation(s)
- Skylar Connor
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA
| | - Ting Li
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA
| | - Yanyan Qu
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA
| | - Ruth A Roberts
- ApconiX, Alderley Park, Alderley Edge SK10 4TG, UK; University of Birmingham, Edgbaston, Birmingham B15 2TT, UK
| | - Weida Tong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA.
| |
Collapse
|
7
|
Sharma A, Selvam S, Balaji PD, Madhavan T. ANN multi-layer perceptron for prediction of blood-brain barrier permeable compounds for central nervous system therapeutics. J Biomol Struct Dyn 2024:1-6. [PMID: 38497749 DOI: 10.1080/07391102.2024.2326671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 02/28/2024] [Indexed: 03/19/2024]
Abstract
Endothelial cells produce a semipermeable barrier known as the blood-brain barrier (BBB) to keep undesired chemicals out of the central nervous system (CNS). However, this barrier also restricts the exploration of potential new medications due to insufficient exposure. To address this challenge, machine learning (ML) algorithms can be useful to predict the BBB permeability of chemical compounds. Support vector machines, continuous neural networks, and deep learning approaches have been used to identify compounds that can penetrate the BBB. However, predicting BBB permeability based solely on chemical structure can be difficult. In the current research, we developed an ML model using a large dataset to predict BBB permeability, which could be used for early-stage drug screening of potential CNS medications. Our artificial neural network ANN algorithm exhibited an accuracy of 0.94, specificity of 0.83, sensitivity of 0.97, AUC of 0.96, and MCC of 0.83. These metrics suggest that our model has a high accuracy rate in predicting BBB permeability and therefore has the potential to advance drug discovery efforts in the CNS. This study's outcomes demonstrate the potential for ML models to predict BBB permeability accurately, aiding in the identification of new CNS therapeutic options.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Aditi Sharma
- Computational Biology Lab, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science & Technology, Kattankulathur, Tamil Nadu, India
| | - Subathra Selvam
- Computational Biology Lab, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science & Technology, Kattankulathur, Tamil Nadu, India
| | - Priya Dharshini Balaji
- Computational Biology Lab, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science & Technology, Kattankulathur, Tamil Nadu, India
| | - Thirumurthy Madhavan
- Computational Biology Lab, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science & Technology, Kattankulathur, Tamil Nadu, India
| |
Collapse
|
8
|
Lee KH, Won SJ, Oyinloye P, Shi L. Unlocking the Potential of High-Quality Dopamine Transporter Pharmacological Data: Advancing Robust Machine Learning-Based QSAR Modeling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.06.583803. [PMID: 38558976 PMCID: PMC10979915 DOI: 10.1101/2024.03.06.583803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The dopamine transporter (DAT) plays a critical role in the central nervous system and has been implicated in numerous psychiatric disorders. The ligand-based approaches are instrumental to decipher the structure-activity relationship (SAR) of DAT ligands, especially the quantitative SAR (QSAR) modeling. By gathering and analyzing data from literature and databases, we systematically assemble a diverse range of ligands binding to DAT, aiming to discern the general features of DAT ligands and uncover the chemical space for potential novel DAT ligand scaffolds. The aggregation of DAT pharmacological activity data, particularly from databases like ChEMBL, provides a foundation for constructing robust QSAR models. The compilation and meticulous filtering of these data, establishing high-quality training datasets with specific divisions of pharmacological assays and data types, along with the application of QSAR modeling, prove to be a promising strategy for navigating the pertinent chemical space. Through a systematic comparison of DAT QSAR models using training datasets from various ChEMBL releases, we underscore the positive impact of enhanced data set quality and increased data set size on the predictive power of DAT QSAR models.
Collapse
Affiliation(s)
- Kuo Hao Lee
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| | - Sung Joon Won
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| | - Precious Oyinloye
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| | - Lei Shi
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| |
Collapse
|
9
|
Hunklinger A, Hartog P, Šícho M, Godin G, Tetko IV. The openOCHEM consensus model is the best-performing open-source predictive model in the First EUOS/SLAS joint compound solubility challenge. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2024; 29:100144. [PMID: 38316342 DOI: 10.1016/j.slasd.2024.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 01/06/2024] [Accepted: 01/22/2024] [Indexed: 02/07/2024]
Abstract
The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100 K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website https://ochem.eu/article/27. We describe in detail the assumptions and steps used to select methods, descriptors and strategy which contributed to the winning solution. In particular we show that consensus based on 28 models calculated using descriptor-based and representation learning methods allowed us to obtain the best score, which was higher than those based on individual approaches or consensus models developed using each individual approach. A combination of diverse models allowed us to decrease both bias and variance of individual models and to calculate the highest score. The model based on Transformer CNN contributed the best individual score thus highlighting the power of Natural Language Processing (NLP) methods. The inclusion of information about aleatoric uncertainty would be important to better understand and use the challenge data by the contestants.
Collapse
Affiliation(s)
- Andrea Hunklinger
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), DE-85764 Neuherberg, Germany
| | - Peter Hartog
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), DE-85764 Neuherberg, Germany
| | - Martin Šícho
- Leiden Academic Centre for Drug Research, Leiden University, 55 Einsteinweg, 2333 CC Leiden, the Netherlands; CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - Guillaume Godin
- dsm-firmenich SA, Rue de la Bergère 7, CH-1242 Satigny, Switzerland
| | - Igor V Tetko
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), DE-85764 Neuherberg, Germany; BIGCHEM GmbH, Valerystr. 49, DE-85716 Unterschleißheim, Germany.
| |
Collapse
|
10
|
Das S, Merz KM. Molecular Gas-Phase Conformational Ensembles. J Chem Inf Model 2024; 64:749-760. [PMID: 38206321 DOI: 10.1021/acs.jcim.3c01309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
Accurately determining the global minima of a molecular structure is important in diverse scientific fields, including drug design, materials science, and chemical synthesis. Conformational search engines serve as valuable tools for exploring the extensive conformational space of molecules and for identifying energetically favorable conformations. In this study, we present a comparison of Auto3D, CREST, Balloon, and ETKDG (from RDKit), which are freely available conformational search engines, to evaluate their effectiveness in locating global minima. These engines employ distinct methodologies, including machine learning (ML) potential-based, semiempirical, and force field-based approaches. To validate these methods, we propose the use of collisional cross-section (CCS) values obtained from ion mobility-mass spectrometry studies. We hypothesize that experimental gas-phase CCS values can provide experimental evidence that we likely have the global minimum for a given molecule. To facilitate this effort, we used our gas-phase conformation library (GPCL) which currently consists of the full ensembles of 20 small molecules and can be used by the community to validate any conformational search engine. Further members of the GPCL can be readily created for any molecule of interest using our standard workflow used to compute CCS values, expanding the ability of the GPCL in validation exercises. These innovative validation techniques enhance our understanding of the conformational landscape and provide valuable insights into the performance of conformational generation engines. Our findings shed light on the strengths and limitations of each search engine, enabling informed decisions for their utilization in various scientific fields, where accurate molecular structure determination is crucial for understanding biological activity and designing targeted interventions. By facilitating the identification of reliable conformations, this study significantly contributes to enhancing the efficiency and accuracy of molecular structure determination, with particular focus on metabolite structure elucidation. The findings of this research also provide valuable insights for developing effective workflows for predicting the structures of unknown compounds with high precision.
Collapse
Affiliation(s)
- Susanta Das
- Department of Chemistry, Michigan State University, 578 S. Shaw Lane, East Lansing, Michigan 48824, United States
| | - Kenneth M Merz
- Department of Chemistry, Michigan State University, 578 S. Shaw Lane, East Lansing, Michigan 48824, United States
| |
Collapse
|
11
|
Li Z, Huang R, Xia M, Patterson TA, Hong H. Fingerprinting Interactions between Proteins and Ligands for Facilitating Machine Learning in Drug Discovery. Biomolecules 2024; 14:72. [PMID: 38254672 PMCID: PMC10813698 DOI: 10.3390/biom14010072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/26/2023] [Accepted: 12/28/2023] [Indexed: 01/24/2024] Open
Abstract
Molecular recognition is fundamental in biology, underpinning intricate processes through specific protein-ligand interactions. This understanding is pivotal in drug discovery, yet traditional experimental methods face limitations in exploring the vast chemical space. Computational approaches, notably quantitative structure-activity/property relationship analysis, have gained prominence. Molecular fingerprints encode molecular structures and serve as property profiles, which are essential in drug discovery. While two-dimensional (2D) fingerprints are commonly used, three-dimensional (3D) structural interaction fingerprints offer enhanced structural features specific to target proteins. Machine learning models trained on interaction fingerprints enable precise binding prediction. Recent focus has shifted to structure-based predictive modeling, with machine-learning scoring functions excelling due to feature engineering guided by key interactions. Notably, 3D interaction fingerprints are gaining ground due to their robustness. Various structural interaction fingerprints have been developed and used in drug discovery, each with unique capabilities. This review recapitulates the developed structural interaction fingerprints and provides two case studies to illustrate the power of interaction fingerprint-driven machine learning. The first elucidates structure-activity relationships in β2 adrenoceptor ligands, demonstrating the ability to differentiate agonists and antagonists. The second employs a retrosynthesis-based pre-trained molecular representation to predict protein-ligand dissociation rates, offering insights into binding kinetics. Despite remarkable progress, challenges persist in interpreting complex machine learning models built on 3D fingerprints, emphasizing the need for strategies to make predictions interpretable. Binding site plasticity and induced fit effects pose additional complexities. Interaction fingerprints are promising but require continued research to harness their full potential.
Collapse
Affiliation(s)
- Zoe Li
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA; (Z.L.); (T.A.P.)
| | - Ruili Huang
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD 20892, USA; (R.H.); (M.X.)
| | - Menghang Xia
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD 20892, USA; (R.H.); (M.X.)
| | - Tucker A. Patterson
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA; (Z.L.); (T.A.P.)
| | - Huixiao Hong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA; (Z.L.); (T.A.P.)
| |
Collapse
|
12
|
Hodyna D, Kovalishyn V, Kachaeva M, Shulha Y, Klipkov A, Shaitanova E, Kobzar O, Shablykin O, Metelytsia L. In Silico, in Vitro and in Vivo Study of Substituted Imidazolidinone Sulfonamides as Antibacterial Agents. Chem Biodivers 2023; 20:e202301267. [PMID: 37943002 DOI: 10.1002/cbdv.202301267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 11/06/2023] [Accepted: 11/08/2023] [Indexed: 11/10/2023]
Abstract
New substituted imidazolidinone sulfonamides have been developed using a rational drug design strategy. Predictive QSAR models for the search of new antibacterials were created using the OCHEM platform. Regression models were applied to verify a virtual chemical library of new imidazolidinone derivatives designed to have antibacterial activity. A number of substituted imidazolidinone sulfonamides as effective antibacterial agents were identified by QSAR prediction, synthesized and characterized by spectral and elemental, and tested in vitro. Six studied compounds have shown the highest in vitro antibacterial activity against Gram-negative E. coli and Gram-positive S. aureus multidrug-resistant strains. The in vivo acute toxicity of these imidazolidinone sulfonamides based on the LC50 value ranged from 16.01 to 44.35 mg/L (slightly toxic compounds class). The results of molecular docking suggest that the antibacterial mechanism of the compounds can be associated with the inhibition of post-translational modification processes of bacterial peptides and proteins.
Collapse
Affiliation(s)
- Diana Hodyna
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of the National Academy of Sciences of Ukraine, 02094, Academician Kukhar Str, 1, Kyiv, Ukraine
| | - Vasyl Kovalishyn
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of the National Academy of Sciences of Ukraine, 02094, Academician Kukhar Str, 1, Kyiv, Ukraine
| | - Maryna Kachaeva
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of the National Academy of Sciences of Ukraine, 02094, Academician Kukhar Str, 1, Kyiv, Ukraine
| | - Yurii Shulha
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of the National Academy of Sciences of Ukraine, 02094, Academician Kukhar Str, 1, Kyiv, Ukraine
| | - Anton Klipkov
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of the National Academy of Sciences of Ukraine, 02094, Academician Kukhar Str, 1, Kyiv, Ukraine
| | - Elena Shaitanova
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of the National Academy of Sciences of Ukraine, 02094, Academician Kukhar Str, 1, Kyiv, Ukraine
| | - Oleksandr Kobzar
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of the National Academy of Sciences of Ukraine, 02094, Academician Kukhar Str, 1, Kyiv, Ukraine
| | - Oleh Shablykin
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of the National Academy of Sciences of Ukraine, 02094, Academician Kukhar Str, 1, Kyiv, Ukraine
| | - Larysa Metelytsia
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of the National Academy of Sciences of Ukraine, 02094, Academician Kukhar Str, 1, Kyiv, Ukraine
| |
Collapse
|
13
|
McGibbon M, Shave S, Dong J, Gao Y, Houston DR, Xie J, Yang Y, Schwaller P, Blay V. From intuition to AI: evolution of small molecule representations in drug discovery. Brief Bioinform 2023; 25:bbad422. [PMID: 38033290 PMCID: PMC10689004 DOI: 10.1093/bib/bbad422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/13/2023] [Accepted: 11/01/2023] [Indexed: 12/02/2023] Open
Abstract
Within drug discovery, the goal of AI scientists and cheminformaticians is to help identify molecular starting points that will develop into safe and efficacious drugs while reducing costs, time and failure rates. To achieve this goal, it is crucial to represent molecules in a digital format that makes them machine-readable and facilitates the accurate prediction of properties that drive decision-making. Over the years, molecular representations have evolved from intuitive and human-readable formats to bespoke numerical descriptors and fingerprints, and now to learned representations that capture patterns and salient features across vast chemical spaces. Among these, sequence-based and graph-based representations of small molecules have become highly popular. However, each approach has strengths and weaknesses across dimensions such as generality, computational cost, inversibility for generative applications and interpretability, which can be critical in informing practitioners' decisions. As the drug discovery landscape evolves, opportunities for innovation continue to emerge. These include the creation of molecular representations for high-value, low-data regimes, the distillation of broader biological and chemical knowledge into novel learned representations and the modeling of up-and-coming therapeutic modalities.
Collapse
Affiliation(s)
- Miles McGibbon
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Steven Shave
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, China
| | - Yumiao Gao
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Douglas R Houston
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Jiancong Xie
- Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Yuedong Yang
- Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Vincent Blay
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| |
Collapse
|
14
|
Zhu W, Wang Y, Niu Y, Zhang L, Liu Z. Current Trends and Challenges in Drug-Likeness Prediction: Are They Generalizable and Interpretable? HEALTH DATA SCIENCE 2023; 3:0098. [PMID: 38487200 PMCID: PMC10880170 DOI: 10.34133/hds.0098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 10/20/2023] [Indexed: 03/17/2024]
Abstract
Importance: Drug-likeness of a compound is an overall assessment of its potential to succeed in clinical trials, and is essential for economizing research expenditures by filtering compounds with unfavorable properties and poor development potential. To this end, a robust drug-likeness prediction method is indispensable. Various approaches, including discriminative rules, statistical models, and machine learning models, have been developed to predict drug-likeness based on physiochemical properties and structural features. Notably, recent advancements in novel deep learning techniques have significantly advanced drug-likeness prediction, especially in classification performance. Highlights: In this review, we addressed the evolving landscape of drug-likeness prediction, with emphasis on methods employing novel deep learning techniques, and highlighted the current challenges in drug-likeness prediction, specifically regarding the aspects of generalization and interpretability. Moreover, we explored potential remedies and outlined promising avenues for future research. Conclusion: Despite the hurdles of generalization and interpretability, novel deep learning techniques have great potential in drug-likeness prediction and are worthy of further research efforts.
Collapse
Affiliation(s)
- Wenyu Zhu
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Yanxing Wang
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Yan Niu
- Department of Medicinal Chemistry,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Liangren Zhang
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Zhenming Liu
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| |
Collapse
|
15
|
Guo W, Liu J, Dong F, Song M, Li Z, Khan MKH, Patterson TA, Hong H. Review of machine learning and deep learning models for toxicity prediction. Exp Biol Med (Maywood) 2023; 248:1952-1973. [PMID: 38057999 PMCID: PMC10798180 DOI: 10.1177/15353702231209421] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023] Open
Abstract
The ever-increasing number of chemicals has raised public concerns due to their adverse effects on human health and the environment. To protect public health and the environment, it is critical to assess the toxicity of these chemicals. Traditional in vitro and in vivo toxicity assays are complicated, costly, and time-consuming and may face ethical issues. These constraints raise the need for alternative methods for assessing the toxicity of chemicals. Recently, due to the advancement of machine learning algorithms and the increase in computational power, many toxicity prediction models have been developed using various machine learning and deep learning algorithms such as support vector machine, random forest, k-nearest neighbors, ensemble learning, and deep neural network. This review summarizes the machine learning- and deep learning-based toxicity prediction models developed in recent years. Support vector machine and random forest are the most popular machine learning algorithms, and hepatotoxicity, cardiotoxicity, and carcinogenicity are the frequently modeled toxicity endpoints in predictive toxicology. It is known that datasets impact model performance. The quality of datasets used in the development of toxicity prediction models using machine learning and deep learning is vital to the performance of the developed models. The different toxicity assignments for the same chemicals among different datasets of the same type of toxicity have been observed, indicating benchmarking datasets is needed for developing reliable toxicity prediction models using machine learning and deep learning algorithms. This review provides insights into current machine learning models in predictive toxicology, which are expected to promote the development and application of toxicity prediction models in the future.
Collapse
Affiliation(s)
- Wenjing Guo
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Jie Liu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Fan Dong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Meng Song
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Zoe Li
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Md Kamrul Hasan Khan
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Tucker A Patterson
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| |
Collapse
|
16
|
Liu J, Xu L, Guo W, Li Z, Khan MKH, Ge W, Patterson TA, Hong H. Developing a SARS-CoV-2 main protease binding prediction random forest model for drug repurposing for COVID-19 treatment. Exp Biol Med (Maywood) 2023; 248:1927-1936. [PMID: 37997891 PMCID: PMC10798185 DOI: 10.1177/15353702231209413] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 09/26/2023] [Indexed: 11/25/2023] Open
Abstract
The coronavirus disease 2019 (COVID-19) global pandemic resulted in millions of people becoming infected with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus and close to seven million deaths worldwide. It is essential to further explore and design effective COVID-19 treatment drugs that target the main protease of SARS-CoV-2, a major target for COVID-19 drugs. In this study, machine learning was applied for predicting the SARS-CoV-2 main protease binding of Food and Drug Administration (FDA)-approved drugs to assist in the identification of potential repurposing candidates for COVID-19 treatment. Ligands bound to the SARS-CoV-2 main protease in the Protein Data Bank and compounds experimentally tested in SARS-CoV-2 main protease binding assays in the literature were curated. These chemicals were divided into training (516 chemicals) and testing (360 chemicals) data sets. To identify SARS-CoV-2 main protease binders as potential candidates for repurposing to treat COVID-19, 1188 FDA-approved drugs from the Liver Toxicity Knowledge Base were obtained. A random forest algorithm was used for constructing predictive models based on molecular descriptors calculated using Mold2 software. Model performance was evaluated using 100 iterations of fivefold cross-validations which resulted in 78.8% balanced accuracy. The random forest model that was constructed from the whole training dataset was used to predict SARS-CoV-2 main protease binding on the testing set and the FDA-approved drugs. Model applicability domain and prediction confidence on drugs predicted as the main protease binders discovered 10 FDA-approved drugs as potential candidates for repurposing to treat COVID-19. Our results demonstrate that machine learning is an efficient method for drug repurposing and, thus, may accelerate drug development targeting SARS-CoV-2.
Collapse
Affiliation(s)
| | | | - Wenjing Guo
- National Center for Toxicological Research, U.S. Food & Drug Administration, Jefferson, AR 72079, USA
| | - Zoe Li
- National Center for Toxicological Research, U.S. Food & Drug Administration, Jefferson, AR 72079, USA
| | - Md Kamrul Hasan Khan
- National Center for Toxicological Research, U.S. Food & Drug Administration, Jefferson, AR 72079, USA
| | - Weigong Ge
- National Center for Toxicological Research, U.S. Food & Drug Administration, Jefferson, AR 72079, USA
| | - Tucker A Patterson
- National Center for Toxicological Research, U.S. Food & Drug Administration, Jefferson, AR 72079, USA
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food & Drug Administration, Jefferson, AR 72079, USA
| |
Collapse
|
17
|
Li T, Liu Z, Thakkar S, Roberts R, Tong W. DeepAmes: A deep learning-powered Ames test predictive model with potential for regulatory application. Regul Toxicol Pharmacol 2023; 144:105486. [PMID: 37633327 DOI: 10.1016/j.yrtph.2023.105486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 07/14/2023] [Accepted: 08/23/2023] [Indexed: 08/28/2023]
Abstract
The Ames assay is required by the regulatory agencies worldwide to assess the mutagenic potential risk of consumer products. As well as this in vitro assay, in silico approaches have been widely used to predict Ames test results as outlined in the International Council for Harmonization (ICH) guidelines. Building on this in silico approach, here we describe DeepAmes, a high performance and robust model developed with a novel deep learning (DL) approach for potential utility in regulatory science. DeepAmes was developed with a large and consistent Ames dataset (>10,000 compounds) and was compared with other five standard Machine Learning (ML) methods. Using a test set of 1,543 compounds, DeepAmes was the best performer in predicting the outcome of Ames assay. In addition, DeepAmes yielded the best and most stable performance up to when compounds were >30% outside of the applicability domain (AD). Regarding the potential for regulatory application, a revised version of DeepAmes with a much-improved sensitivity of 0.87 from 0.47. In conclusion, DeepAmes provides a DL-powered Ames test predictive model for predicting the results of Ames tests; with its defined AD and clear context of use, DeepAmes has potential for utility in regulatory application.
Collapse
Affiliation(s)
- Ting Li
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, USA
| | - Zhichao Liu
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, USA
| | - Shraddha Thakkar
- Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA
| | - Ruth Roberts
- ApconiX Ltd, Alderley Park, Alderley Edge, SK10 4TG, UK; University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Weida Tong
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, USA.
| |
Collapse
|
18
|
Zhao Z, Bourne PE. Rigid Scaffolds Are Promising for Designing Macrocyclic Kinase Inhibitors. ACS Pharmacol Transl Sci 2023; 6:1182-1191. [PMID: 37588756 PMCID: PMC10425998 DOI: 10.1021/acsptsci.3c00078] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Indexed: 08/18/2023]
Abstract
Macrocyclic kinase inhibitors (MKIs) are gaining attention due to their favorable selectivity and potential to overcome drug resistance, yet they remain challenging to design because of their novel structures. To facilitate the design and discovery of MKIs, we investigate MKI rational design starting from initial acyclic compounds by performing microsecond-scale atomistic simulations for multiple MKIs, constructing an MKI database, and analyzing MKIs using hierarchical cluster analysis. Our studies demonstrate that the binding modes of MKIs are like those of their corresponding acyclic counterparts against the same kinase targets. Importantly, within the respective binding sites, the MKI scaffolds retain the same conformations as their corresponding acyclic counterparts, demonstrating the rigidity of scaffolds before and after molecular cyclization. The MKI database includes 641 nanomole-level MKIs from 56 human kinases elucidating the features of rigid scaffolds and the core structures of MKIs. Collectively these results and resources can facilitate MKI development.
Collapse
Affiliation(s)
- Zheng Zhao
- School of Data Science and Department
of Biomedical Engineering, University of
Virginia, Charlottesville, Virginia 22904, United States
| | - Philip E. Bourne
- School of Data Science and Department
of Biomedical Engineering, University of
Virginia, Charlottesville, Virginia 22904, United States
| |
Collapse
|
19
|
Mrug G, Hodyna D, Metelytsia L, Kovalishyn V, Trokhimenko O, Bondarenko S, Kondratyuk K, Kozitskiy A, Frasinyuk M. Structure-Activity Relationship Prediction-Based Synthesis and Cytotoxicity Evaluation against the HEp-2 Laryngeal Carcinoma Cell of Isoflavone-Cytisine Mannich Bases. Chem Biodivers 2023; 20:e202300560. [PMID: 37477067 DOI: 10.1002/cbdv.202300560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/15/2023] [Accepted: 07/20/2023] [Indexed: 07/22/2023]
Abstract
QSAR analysis of previously synthesized and nature-inspired virtual isoflavone-cytisine hybrids against the HEp-2 laryngeal carcinoma cell lines was performed using the OCHEM web platform. The validation of the models using an external test set proved that the models can be used to predict the activity of newly designed compounds such as 8-cytisinylmethyl derivatives of 5,7- and 6,7-dihydroxyisoflavones. The synthetic procedure for selective aminomethylation of 5,7-dihydroxyisoflavones with cytisine was developed. In vitro testing identified compound 7 f with cisplatin-level cytotoxicity against HEp-2 cell lines and compound 10 which was twice active than cisplatin after 72 h of incubation.
Collapse
Affiliation(s)
- Galyna Mrug
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
| | - Diana Hodyna
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
| | - Larysa Metelytsia
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
| | - Vasyl Kovalishyn
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
| | - Olena Trokhimenko
- Shupyk National Healthcare University of Ukraine, Kyiv, 04112, Ukraine
| | - Svitlana Bondarenko
- Department of Food Chemistry, National University of Food Technologies, Kyiv, 01601, Ukraine
| | - Kostyantyn Kondratyuk
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
| | | | - Mykhaylo Frasinyuk
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
- Enamine Ltd., Kyiv, 02094, Ukraine
| |
Collapse
|
20
|
Kovalishyn V, Severin O, Kachaeva M, Semenyuta I, Keith K, Harden E, Hartline C, James S, Metelytsia L, Brovarets V. Design and experimental validation of the oxazole and thiazole derivatives as potential antivirals against of human cytomegalovirus. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2023; 34:523-541. [PMID: 37424376 PMCID: PMC10529337 DOI: 10.1080/1062936x.2023.2232992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 06/29/2023] [Indexed: 07/11/2023]
Abstract
QSAR studies of a set of previously synthesized azole derivatives tested against human cytomegalovirus (HCMV) were performed using the OCHEM web platform. The predictive ability of the classification models has a balanced accuracy (BA) of 73-79%. The validation of the models using an external test set proved that the models can be used to predict the activity of newly designed compounds with a reasonable accuracy within the applicability domain (BA = 76-83%). The models were applied to screen a virtual chemical library with expected activity of compounds against HCMV. The five most promising new compounds were identified, synthesized and their antiviral activities against HCMV were evaluated in vitro. Two of them showed some activity against the HCMV strain AD169. According to the results of docking analysis, the most promising biotarget associated with HCMV is DNA polymerase. The docking of the most active compounds 1 and 5 in the DNA polymerase active site shows calculated binding energies of -8.6 and -7.8 kcal/mol, respectively. The ligand's complexation was stabilized by the formation of hydrogen bonds and hydrophobic interactions with amino acids Lys60, Leu43, Ile49, Pro77, Asp134, Ile135, Val136, Thr62 and Arg137.
Collapse
Affiliation(s)
- V. Kovalishyn
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Academician Kukhar Str., Kyiv, Ukraine
| | - O. Severin
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Academician Kukhar Str., Kyiv, Ukraine
| | - M. Kachaeva
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Academician Kukhar Str., Kyiv, Ukraine
| | - I. Semenyuta
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Academician Kukhar Str., Kyiv, Ukraine
| | - K.A. Keith
- Department of Pediatrics, Division of Pediatric Infectious Diseases, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - E.A. Harden
- Department of Pediatrics, Division of Pediatric Infectious Diseases, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - C.B. Hartline
- Department of Pediatrics, Division of Pediatric Infectious Diseases, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - S.H. James
- Department of Pediatrics, Division of Pediatric Infectious Diseases, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - L. Metelytsia
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Academician Kukhar Str., Kyiv, Ukraine
| | - V. Brovarets
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Academician Kukhar Str., Kyiv, Ukraine
| |
Collapse
|
21
|
Cesaro A, Bagheri M, Torres MDT, Wan F, de la Fuente-Nunez C. Deep learning tools to accelerate antibiotic discovery. Expert Opin Drug Discov 2023; 18:1245-1257. [PMID: 37794737 PMCID: PMC10790350 DOI: 10.1080/17460441.2023.2250721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 08/18/2023] [Indexed: 10/06/2023]
Abstract
INTRODUCTION As machine learning (ML) and artificial intelligence (AI) expand to many segments of our society, they are increasingly being used for drug discovery. Recent deep learning models offer an efficient way to explore high-dimensional data and design compounds with desired properties, including those with antibacterial activity. AREAS COVERED This review covers key frameworks in antibiotic discovery, highlighting physicochemical features and addressing dataset limitations. The deep learning approaches here described include discriminative models such as convolutional neural networks, recurrent neural networks, graph neural networks, and generative models like neural language models, variational autoencoders, generative adversarial networks, normalizing flow, and diffusion models. As the integration of these approaches in drug discovery continues to evolve, this review aims to provide insights into promising prospects and challenges that lie ahead in harnessing such technologies for the development of antibiotics. EXPERT OPINION Accurate antimicrobial prediction using deep learning faces challenges such as imbalanced data, limited datasets, experimental validation, target strains, and structure. The integration of deep generative models with bioinformatics, molecular dynamics, and data augmentation holds the potential to overcome these challenges, enhance model performance, and utlimately accelerate antimicrobial discovery.
Collapse
Affiliation(s)
- Angela Cesaro
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Mojtaba Bagheri
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Marcelo D. T. Torres
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Fangping Wan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
22
|
Emonts J, Buyel J. An overview of descriptors to capture protein properties - Tools and perspectives in the context of QSAR modeling. Comput Struct Biotechnol J 2023; 21:3234-3247. [PMID: 38213891 PMCID: PMC10781719 DOI: 10.1016/j.csbj.2023.05.022] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/23/2023] [Accepted: 05/23/2023] [Indexed: 01/13/2024] Open
Abstract
Proteins are important ingredients in food and feed, they are the active components of many pharmaceutical products, and they are necessary, in the form of enzymes, for the success of many technical processes. However, production can be challenging, especially when using heterologous host cells such as bacteria to express and assemble recombinant mammalian proteins. The manufacturability of proteins can be hindered by low solubility, a tendency to aggregate, or inefficient purification. Tools such as in silico protein engineering and models that predict separation criteria can overcome these issues but usually require the complex shape and surface properties of proteins to be represented by a small number of quantitative numeric values known as descriptors, as similarly used to capture the features of small molecules. Here, we review the current status of protein descriptors, especially for application in quantitative structure activity relationship (QSAR) models. First, we describe the complexity of proteins and the properties that descriptors must accommodate. Then we introduce descriptors of shape and surface properties that quantify the global and local features of proteins. Finally, we highlight the current limitations of protein descriptors and propose strategies for the derivation of novel protein descriptors that are more informative.
Collapse
Affiliation(s)
- J. Emonts
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Germany
| | - J.F. Buyel
- University of Natural Resources and Life Sciences, Vienna (BOKU), Department of Biotechnology (DBT), Institute of Bioprocess Science and Engineering (IBSE), Muthgasse 18, 1190 Vienna, Austria
- Institute for Molecular Biotechnology, Worringerweg 1, RWTH Aachen University, 52074 Aachen, Germany
| |
Collapse
|
23
|
Long W, Li S, He Y, Lin J, Li M, Wen Z. Unraveling Structural Alerts in Marketed Drugs for Improving Adverse Outcome Pathway Framework of Drug-Induced QT Prolongation. Int J Mol Sci 2023; 24:ijms24076771. [PMID: 37047744 PMCID: PMC10095420 DOI: 10.3390/ijms24076771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 03/21/2023] [Accepted: 03/30/2023] [Indexed: 04/08/2023] Open
Abstract
In pharmaceutical treatment, many non-cardiac drugs carry the risk of prolonging the QT interval, which can lead to fatal cardiac complications such as torsades de points (TdP). Although the unexpected blockade of ion channels has been widely considered to be one of the main reasons for affecting the repolarization phase of the cardiac action potential and leading to QT interval prolongation, the lack of knowledge regarding chemical structures in drugs that may induce the prolongation of the QT interval remains a barrier to further understanding the underlying mechanism and developing an effective prediction strategy. In this study, we thoroughly investigated the differences in chemical structures between QT-prolonging drugs and drugs with no drug-induced QT prolongation (DIQT) concerns, based on the Drug-Induced QT Prolongation Atlas (DIQTA) dataset. Three categories of structural alerts (SAs), namely amines, ethers, and aromatic compounds, appeared in large quantities in QT-prolonging drugs, but rarely in drugs with no DIQT concerns, indicating a close association between SAs and the risk of DIQT. Moreover, using the molecular descriptors associated with these three categories of SAs as features, the structure–activity relationship (SAR) model for predicting the high risk of inducing QT interval prolongation of marketed drugs achieved recall rates of 72.5% and 80.0% for the DIQTA dataset and the FDA Adverse Event Reporting System (FAERS) dataset, respectively. Our findings may promote a better understanding of the mechanism of DIQT and facilitate research on cardiac adverse drug reactions in drug development.
Collapse
Affiliation(s)
- Wulin Long
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Shihai Li
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yujie He
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Jinzhu Lin
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Zhining Wen
- College of Chemistry, Sichuan University, Chengdu 610064, China
- Medical Big Data Center, Sichuan University, Chengdu 610064, China
| |
Collapse
|
24
|
Mensa S, Sahin E, Tacchino F, Kl Barkoutsos P, Tavernelli I. Quantum machine learning framework for virtual screening in drug discovery: a prospective quantum advantage. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2023. [DOI: 10.1088/2632-2153/acb900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023] Open
Abstract
Abstract
Machine Learning for ligand based virtual screening (LB-VS) is an important in-silico tool for discovering new drugs in a faster and cost-effective manner, especially for emerging diseases such as COVID-19. In this paper, we propose a general-purpose framework combining a classical Support Vector Classifier algorithm with quantum kernel estimation for LB-VS on real-world databases, and we argue in favor of its prospective quantum advantage. Indeed, we heuristically prove that our quantum integrated workflow can, at least in some relevant instances, provide a tangible advantage compared to state-of-art classical algorithms operating on the same datasets, showing strong dependence on target and features selection method. Finally, we test our algorithm on IBM Quantum processors using ADRB2 and COVID-19 datasets, showing that hardware simulations provide results in line with the predicted performances and can surpass classical equivalents.
Collapse
|
25
|
Choi IH, Oh IS. Weighted edit distance optimized using genetic algorithm for SMILES-based compound similarity. Pattern Anal Appl 2023. [DOI: 10.1007/s10044-023-01141-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2023]
|
26
|
Béquignon OJM, Bongers BJ, Jespers W, IJzerman AP, van der Water B, van Westen GJP. Papyrus: a large-scale curated dataset aimed at bioactivity predictions. J Cheminform 2023; 15:3. [PMID: 36609528 PMCID: PMC9824924 DOI: 10.1186/s13321-022-00672-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/17/2022] [Indexed: 01/07/2023] Open
Abstract
With the ongoing rapid growth of publicly available ligand-protein bioactivity data, there is a trove of valuable data that can be used to train a plethora of machine-learning algorithms. However, not all data is equal in terms of size and quality and a significant portion of researchers' time is needed to adapt the data to their needs. On top of that, finding the right data for a research question can often be a challenge on its own. To meet these challenges, we have constructed the Papyrus dataset. Papyrus is comprised of around 60 million data points. This dataset contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with several smaller datasets containing high-quality data. The aggregated data has been standardised and normalised in a manner that is suitable for machine learning. We show how data can be filtered in a variety of ways and also perform some examples of quantitative structure-activity relationship analyses and proteochemometric modelling. Our ambition is that this pruned data collection constitutes a benchmark set that can be used for constructing predictive models, while also providing an accessible data source for research.
Collapse
Affiliation(s)
- O. J. M. Béquignon
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - B. J. Bongers
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - W. Jespers
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - A. P. IJzerman
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - B. van der Water
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - G. J. P. van Westen
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| |
Collapse
|
27
|
Lim S, Kim Y, Gu J, Lee S, Shin W, Kim S. Supervised chemical graph mining improves drug-induced liver injury prediction. iScience 2022; 26:105677. [PMID: 36654861 PMCID: PMC9840932 DOI: 10.1016/j.isci.2022.105677] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 11/11/2022] [Accepted: 11/23/2022] [Indexed: 12/27/2022] Open
Abstract
Drug-induced liver injury (DILI) is the main cause of drug failure in clinical trials. The characterization of toxic compounds in terms of chemical structure is important because compounds can be metabolized to toxic substances in the liver. Traditional machine learning approaches have had limited success in predicting DILI, and emerging deep graph neural network (GNN) models are yet powerful enough to predict DILI. In this study, we developed a completely different approach, supervised subgraph mining (SSM), a strategy to mine explicit subgraph features by iteratively updating individual graph transitions to maximize DILI fidelity. Our method outperformed previous methods including state-of-the-art GNN tools in classifying DILI on two different datasets: DILIst and TDC-benchmark. We also combined the subgraph features by using SMARTS-based frequent structural pattern matching and associated them with drugs' ATC code.
Collapse
Affiliation(s)
- Sangsoo Lim
- Bioinformatics Institute, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
| | - Youngkuk Kim
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
| | - Jeonghyeon Gu
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
| | - Sunho Lee
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Seoul 08826, South Korea
| | - Wonseok Shin
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Seoul 08826, South Korea
- Corresponding author
| |
Collapse
|
28
|
Muzychka LV, Verves EV, Yaremchuk IO, Zinchenko AM, Shishkina SV, Semenyuta IV, Hodyna DM, Metelytsia LO, Kovalishyn V, Smolii OB. Synthesis, QSAR modeling, and molecular docking of novel fused 7-deazaxanthine derivatives as adenosine A 2A receptor antagonists. Chem Biol Drug Des 2022; 100:1025-1032. [PMID: 34651417 DOI: 10.1111/cbdd.13975] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 09/21/2021] [Accepted: 10/10/2021] [Indexed: 01/25/2023]
Abstract
Predictive QSAR models for the search of new adenosine A2A receptor antagonists were developed by using OCHEM platform. The predictive ability of the regression models has coefficient of determination q2 = 0.65-0.71 with cross-validation and independent test set. The inhibition activities of novel fused 7-deazaxanthine compounds were predicted by the developed QSAR models. A preparative method for the synthesis of pyrimido[5',4':4,5]pyrrolo[1,2-a][1,4]diazepine derivatives was developed, and 11 new adenosine A2A receptor antagonists were obtained. Preliminary investigations into the toxicology of fused 7-deazaxanthine compounds toward commonly used model organism to assess toxicity invertebrate cladoceran D. magna were also described.
Collapse
Affiliation(s)
- Liubov V Muzychka
- Department of Chemistry of Natural Compounds, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Evgenii V Verves
- Department of Chemistry of Natural Compounds, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine.,Enamine Ltd, Kyiv, Ukraine
| | - Iryna O Yaremchuk
- Department of Chemistry of Natural Compounds, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Anna M Zinchenko
- Department of Chemistry of Natural Compounds, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Svitlana V Shishkina
- Department of X-ray Diffraction Studies and Quantum Chemistry, STC "Institute for Single Crystals", NAS of Ukraine, Kharkiv, Ukraine
| | - Ivan V Semenyuta
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Diana M Hodyna
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Larysa O Metelytsia
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Vasyl Kovalishyn
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Oleg B Smolii
- Department of Chemistry of Natural Compounds, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| |
Collapse
|
29
|
Connor S, Li T, Roberts R, Thakkar S, Liu Z, Tong W. Adaptability of AI for safety evaluation in regulatory science: A case study of drug-induced liver injury. Front Artif Intell 2022; 5:1034631. [DOI: 10.3389/frai.2022.1034631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 10/17/2022] [Indexed: 11/09/2022] Open
Abstract
Artificial intelligence (AI) has played a crucial role in advancing biomedical sciences but has yet to have the impact it merits in regulatory science. As the field advances, in silico and in vitro approaches have been evaluated as alternatives to animal studies, in a drive to identify and mitigate safety concerns earlier in the drug development process. Although many AI tools are available, their acceptance in regulatory decision-making for drug efficacy and safety evaluation is still a challenge. It is a common perception that an AI model improves with more data, but does reality reflect this perception in drug safety assessments? Importantly, a model aiming at regulatory application needs to take a broad range of model characteristics into consideration. Among them is adaptability, defined as the adaptive behavior of a model as it is retrained on unseen data. This is an important model characteristic which should be considered in regulatory applications. In this study, we set up a comprehensive study to assess adaptability in AI by mimicking the real-world scenario of the annual addition of new drugs to the market, using a model we previously developed known as DeepDILI for predicting drug-induced liver injury (DILI) with a novel Deep Learning method. We found that the target test set plays a major role in assessing the adaptive behavior of our model. Our findings also indicated that adding more drugs to the training set does not significantly affect the predictive performance of our adaptive model. We concluded that the proposed adaptability assessment framework has utility in the evaluation of the performance of a model over time.
Collapse
|
30
|
Yang J, Cai Y, Zhao K, Xie H, Chen X. Concepts and applications of chemical fingerprint for hit and lead screening. Drug Discov Today 2022; 27:103356. [PMID: 36113834 DOI: 10.1016/j.drudis.2022.103356] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 07/28/2022] [Accepted: 09/08/2022] [Indexed: 11/22/2022]
Abstract
Molecular fingerprints are used to represent chemical (structural, physicochemical, etc.) properties of large-scale chemical sets in a low computational cost way. They have a prominent role in transforming chemical data sets into consistent input formats (bit strings or numeric values) suitable for in silico approaches. In this review, we summarize and classify common and state-of-the-art fingerprints into eight different types (dictionary based, circular, topological, pharmacophore, protein-ligand interaction, shape based, reinforced, and multi). We also highlight applications of fingerprints in early drug research and development (R&D). Thus, this review provides a guide for the selection of appropriate fingerprints of compounds (or ligand-protein complexes) for use in drug R&D.
Collapse
Affiliation(s)
- Jingbo Yang
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Yiyang Cai
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Kairui Zhao
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Hongbo Xie
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| | - Xiujie Chen
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| |
Collapse
|
31
|
Automated QSPR modeling and data curation of physicochemical properties using KNIME platform: Prediction of partition coefficients. J INDIAN CHEM SOC 2022. [DOI: 10.1016/j.jics.2022.100672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
32
|
Liu J, Guo W, Dong F, Aungst J, Fitzpatrick S, Patterson TA, Hong H. Machine learning models for rat multigeneration reproductive toxicity prediction. Front Pharmacol 2022; 13:1018226. [PMID: 36238576 PMCID: PMC9552001 DOI: 10.3389/fphar.2022.1018226] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 09/09/2022] [Indexed: 11/13/2022] Open
Abstract
Reproductive toxicity is one of the prominent endpoints in the risk assessment of environmental and industrial chemicals. Due to the complexity of the reproductive system, traditional reproductive toxicity testing in animals, especially guideline multigeneration reproductive toxicity studies, take a long time and are expensive. Therefore, machine learning, as a promising alternative approach, should be considered when evaluating the reproductive toxicity of chemicals. We curated rat multigeneration reproductive toxicity testing data of 275 chemicals from ToxRefDB (Toxicity Reference Database) and developed predictive models using seven machine learning algorithms (decision tree, decision forest, random forest, k-nearest neighbors, support vector machine, linear discriminant analysis, and logistic regression). A consensus model was built based on the seven individual models. An external validation set was curated from the COSMOS database and the literature. The performances of individual and consensus models were evaluated using 500 iterations of 5-fold cross-validations and the external validation data set. The balanced accuracy of the models ranged from 58% to 65% in the 5-fold cross-validations and 45%–61% in the external validations. Prediction confidence analysis was conducted to provide additional information for more appropriate applications of the developed models. The impact of our findings is in increasing confidence in machine learning models. We demonstrate the importance of using consensus models for harnessing the benefits of multiple machine learning models (i.e., using redundant systems to check validity of outcomes). While we continue to build upon the models to better characterize weak toxicants, there is current utility in saving resources by being able to screen out strong reproductive toxicants before investing in vivo testing. The modeling approach (machine learning models) is offered for assessing the rat multigeneration reproductive toxicity of chemicals. Our results suggest that machine learning may be a promising alternative approach to evaluate the potential reproductive toxicity of chemicals.
Collapse
Affiliation(s)
- Jie Liu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | - Wenjing Guo
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | - Fan Dong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | - Jason Aungst
- Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, College Park, MD, United States
| | - Suzanne Fitzpatrick
- Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, College Park, MD, United States
| | - Tucker A. Patterson
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
- *Correspondence: Huixiao Hong,
| |
Collapse
|
33
|
Vigil-Vásquez C, Schüller A. De Novo Prediction of Drug Targets and Candidates by Chemical Similarity-Guided Network-Based Inference. Int J Mol Sci 2022; 23:ijms23179666. [PMID: 36077062 PMCID: PMC9455815 DOI: 10.3390/ijms23179666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/12/2022] [Accepted: 08/21/2022] [Indexed: 12/01/2022] Open
Abstract
Identifying drug–target interactions is a crucial step in discovering novel drugs and for drug repositioning. Network-based methods have shown great potential thanks to the straightforward integration of information from different sources and the possibility of extracting novel information from the graph topology. However, despite recent advances, there is still an urgent need for efficient and robust prediction methods. Here, we present SimSpread, a novel method that combines network-based inference with chemical similarity. This method employs a tripartite drug–drug–target network constructed from protein–ligand interaction annotations and drug–drug chemical similarity on which a resource-spreading algorithm predicts potential biological targets for both known or failed drugs and novel compounds. We describe small molecules as vectors of similarity indices to other compounds, thereby providing a flexible means to explore diverse molecular representations. We show that our proposed method achieves high prediction performance through multiple cross-validation and time-split validation procedures over a series of datasets. In addition, we demonstrate that our method performed a balanced exploration of both chemical ligand space (scaffold hopping) and biological target space (target hopping). Our results suggest robust and balanced performance, and our method may be useful for predicting drug targets, virtual screening, and drug repositioning.
Collapse
Affiliation(s)
- Carlos Vigil-Vásquez
- Department of Molecular Genetics and Microbiology, School of Biological Sciences, Pontificia Universidad Católica de Chile, Santiago 8331150, Chile
| | - Andreas Schüller
- Department of Molecular Genetics and Microbiology, School of Biological Sciences, Pontificia Universidad Católica de Chile, Santiago 8331150, Chile
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine and Biological Sciences, Pontificia Universidad Católica de Chile, Santiago 7820436, Chile
- Correspondence:
| |
Collapse
|
34
|
Oh J, Ceong HT, Na D, Park C. A machine learning model for classifying G-protein-coupled receptors as agonists or antagonists. BMC Bioinformatics 2022; 23:346. [PMID: 35982407 PMCID: PMC9389651 DOI: 10.1186/s12859-022-04877-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 08/03/2022] [Indexed: 11/13/2022] Open
Abstract
Background G-protein coupled receptors (GPCRs) sense and transmit extracellular signals into the intracellular machinery by regulating G proteins. GPCR malfunctions are associated with a variety of signaling-related diseases, including cancer and diabetes; at least a third of the marketed drugs target GPCRs. Thus, characterization of their signaling and regulatory mechanisms is crucial for the development of effective drugs. Results In this study, we developed a machine learning model to identify GPCR agonists and antagonists. We designed two-step prediction models: the first model identified the ligands binding to GPCRs and the second model classified the ligands as agonists or antagonists. Using 990 selected subset features from 5270 molecular descriptors calculated from 4590 ligands deposited in two drug databases, our model classified non-ligands, agonists, and antagonists of GPCRs, and achieved an area under the ROC curve (AUC) of 0.795, sensitivity of 0.716, specificity of 0.744, and accuracy of 0.733. In addition, we verified that 70% (44 out of 63) of FDA-approved GPCR-targeting drugs were correctly classified into their respective groups. Conclusions Studies of ligand–GPCR interaction recognition are important for the characterization of drug action mechanisms. Our GPCR–ligand interaction prediction model can be employed in the pharmaceutical sciences for the efficient virtual screening of putative GPCR-binding agonists and antagonists. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04877-7.
Collapse
Affiliation(s)
- Jooseong Oh
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea
| | - Hyi-Thaek Ceong
- Department of Multimedia, Chonnam National University, Yeosu, 59626, Republic of Korea.
| | - Dokyun Na
- Department of Biomedical Engineering, Chung-Ang University, Seoul, 06974, Republic of Korea.
| | - Chungoo Park
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea.
| |
Collapse
|
35
|
Morita K, Mizuno T, Kusuhara H. Investigation of a Data Split Strategy Involving the Time Axis in Adverse Event Prediction Using Machine Learning. J Chem Inf Model 2022; 62:3982-3992. [PMID: 35971760 DOI: 10.1021/acs.jcim.2c00765] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Adverse events are a serious issue in drug development, and many prediction methods using machine learning have been developed. The random split cross-validation is the de facto standard for model building and evaluation in machine learning, but care should be taken in adverse event prediction because this approach does not strictly match the real-world situation. The time split, which uses the time axis, is considered suitable for real-world prediction. However, the differences in model performance obtained using the time and random splits are not clear due to the lack of comparable studies. To understand the differences, we compared the model performance between the time and random splits using nine types of compound information as input, eight adverse events as targets, and six machine learning algorithms. The random split showed higher area under the curve values than did the time split for six of eight targets. The chemical spaces of the training and test datasets of the time split were similar, suggesting that the concept of applicability domain is insufficient to explain the differences derived from the splitting. The area under the curve differences were smaller for the protein interaction than for the other datasets. Subsequent detailed analyses suggested the danger of confounding in the use of knowledge-based information in the time split. These findings indicate the importance of understanding the differences between the time and random splits in adverse event prediction and suggest that appropriate use of the splitting strategies and interpretation of results are necessary for the real-world prediction of adverse events. We provide the analysis code and datasets used in the present study at https://github.com/mizuno-group/AE_prediction.
Collapse
Affiliation(s)
- Katsuhisa Morita
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, Bunkyo-ku, Tokyo 113-0033, Japan
| | - Tadahaya Mizuno
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, Bunkyo-ku, Tokyo 113-0033, Japan
| | - Hiroyuki Kusuhara
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, Bunkyo-ku, Tokyo 113-0033, Japan
| |
Collapse
|
36
|
An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning. Molecules 2022; 27:molecules27103112. [PMID: 35630587 PMCID: PMC9147181 DOI: 10.3390/molecules27103112] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/05/2022] [Accepted: 05/10/2022] [Indexed: 11/19/2022] Open
Abstract
In the process of drug discovery, drug-induced liver injury (DILI) is still an active research field and is one of the most common and important issues in toxicity evaluation research. It directly leads to the high wear attrition of the drug. At present, there are a variety of computer algorithms based on molecular representations to predict DILI. It is found that a single molecular representation method is insufficient to complete the task of toxicity prediction, and multiple molecular fingerprint fusion methods have been used as model input. In order to solve the problem of high dimensional and unbalanced DILI prediction data, this paper integrates existing datasets and designs a new algorithm framework, Rotation-Ensemble-GA (R-E-GA). The main idea is to find a feature subset with better predictive performance after rotating the fusion vector of high-dimensional molecular representation in the feature space. Then, an Adaboost-type ensemble learning method is integrated into R-E-GA to improve the prediction accuracy. The experimental results show that the performance of R-E-GA is better than other state-of-art algorithms including ensemble learning-based and graph neural network-based methods. Through five-fold cross-validation, the R-E-GA obtains an ACC of 0.77, an F1 score of 0.769, and an AUC of 0.842.
Collapse
|
37
|
Baskin I, Epshtein A, Ein-Eli Y. Benchmarking machine learning methods for modeling physical properties of ionic liquids. J Mol Liq 2022. [DOI: 10.1016/j.molliq.2022.118616] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
38
|
Soliman ME, Adewumi AT, Akawa OB, Subair TI, Okunlola FO, Akinsuku OE, Khan S. Simulation Models for Prediction of Bioavailability of Medicinal Drugs-the Interface Between Experiment and Computation. AAPS PharmSciTech 2022; 23:86. [PMID: 35292867 DOI: 10.1208/s12249-022-02229-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 02/03/2022] [Indexed: 12/17/2022] Open
Abstract
The oral drug bioavailability (BA) problems have remained inevitable over the years, impairing drug efficacy and indirectly leading to eventual human morbidity and mortality. However, some conventional lab-based methods improve drug absorption leading to enhanced BA, and the recent experimental techniques are up-and-coming. Nevertheless, some have inherent drawbacks in improving the efficacy of poorly insoluble and low impermeable drugs. Drug BA and strategies to overcome these challenges were briefly highlighted. This review has significantly unravelled the different computational models for studying and predicting drug bioavailability. Several computational approaches provide mechanistic insights into the oral drug delivery system simulation of descriptors like solubility, permeability, transport protein-ligand interactions, and molecular structures. The in silico techniques have long been known still are just being applied to unravel drug bioavailability issues. Many publications have reported novel applications of the computational models towards achieving improved drug BA, including predicting gastrointestinal tract (GIT) drug absorption properties and passive intestinal membrane permeability, thus maximizing time and resources. Also, the classical molecular simulation models for free solvation energies of soluble-related processes such as solubilization, dissolutions, supersaturation, and precipitation have been used in virtual screening studies. A few of the tools are GastroPlusTM that supports biowaiver for drugs, mainly BCS class III and predicts drug compounds' absorption and pharmacokinetic process; SimCyp® simulator for mechanistic modelling and simulation of drug formulation processes; pharmacodynamics analysis on non-linear mixed-effects modelling; and mathematical models, predicting absorption potential/maximum absorption dose. This review provides in silico-experiment annexation in the drug bioavailability enhancement, possible insights that lead to critical opinion on the applications and reliability of the various in silico models as a growing tool for drug development and discovery, thus accelerating drug development processes.
Collapse
|
39
|
Kang L, Duan Y, Chen C, Li S, Li M, Chen L, Wen Z. Structure-Activity Relationship (SAR) Model for Predicting Teratogenic Risk of Antiseizure Medications in Pregnancy by Using Support Vector Machine. Front Pharmacol 2022; 13:747935. [PMID: 35281912 PMCID: PMC8914116 DOI: 10.3389/fphar.2022.747935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 01/26/2022] [Indexed: 12/03/2022] Open
Abstract
Teratogenicity is one of the main concerns in clinical medications of pregnant women. Prescription of antiseizure medications (ASMs) in women with epilepsy during pregnancy may cause teratogenic effects on the fetus. Although large scale epilepsy pregnancy registries played an important role in evaluating the teratogenic risk of ASMs, for most ASMs, especially the newly approved ones, the potential teratogenic risk cannot be effectively assessed due to the lack of evidence. In this study, the analyses are performed on any medication, with a focus on ASMs. We curated a list containing the drugs with potential teratogenicity based on the US Food and Drug Administration (FDA)-approved drug labeling, and established a support vector machine (SVM) model for detecting drugs with high teratogenic risk. The model was validated by using the post-marketing surveillance data from US FDA Spontaneous Adverse Events Reporting System (FAERS) and applied to the prediction of potential teratogenic risk of ASMs. Our results showed that our proposed model outperformed the state-of-art approaches, including logistic regression (LR), random forest (RF) and extreme gradient boosting (XGBoost), when detecting the high teratogenic risk of drugs (MCC and recall rate were 0.312 and 0.851, respectively). Among 196 drugs with teratogenic potential reported by FAERS, 136 (69.4%) drugs were correctly predicted. For the eight commonly used ASMs, 4 of them were predicted as high teratogenic risk drugs, including topiramate, phenobarbital, valproate and phenytoin (predicted probabilities of teratogenic risk were 0.69, 0.60 0.59, and 0.56, respectively), which were consistent with the statement in FDA-approved drug labeling and the high reported prevalence of teratogenicity in epilepsy pregnancy registries. In addition, the structural alerts in ASMs that related to the genotoxic carcinogenicity and mutagenicity, idiosyncratic adverse reaction, potential electrophilic agents and endocrine disruption were identified and discussed. Our findings can be a good complementary for the teratogenic risk assessment in drug development and facilitate the determination of pharmacological therapies during pregnancy.
Collapse
Affiliation(s)
- Liyuan Kang
- College of Chemistry, Sichuan University, Chengdu, China
| | - Yifei Duan
- Department of Neurology, West China Hospital, Sichuan University, Chengdu, China
| | - Cheng Chen
- College of Chemistry, Sichuan University, Chengdu, China
| | - Shihai Li
- College of Chemistry, Sichuan University, Chengdu, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, China
| | - Lei Chen
- Department of Neurology, West China Hospital, Sichuan University, Chengdu, China
- *Correspondence: Lei Chen, ; Zhining Wen,
| | - Zhining Wen
- College of Chemistry, Sichuan University, Chengdu, China
- Medical Big Data Center, Sichuan University, Chengdu, China
- *Correspondence: Lei Chen, ; Zhining Wen,
| |
Collapse
|
40
|
Rusanov AI, Dmitrieva OA, Mamardashvili NZ, Tetko IV. More Is Not Always Better: Local Models Provide Accurate Predictions of Spectral Properties of Porphyrins. Int J Mol Sci 2022. [DOI: https://doi.org/10.3390/ijms23031201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
The development of new functional materials based on porphyrins requires fast and accurate prediction of their spectral properties. The available models in the literature for absorption wavelength and extinction coefficient of the Soret band have low accuracy for this class of compounds. We collected spectral data for porphyrins to extend the literature set and compared the performance of global and local models for their modelling using different machine learning methods. Interestingly, extension of the public database contributed models with lower accuracies compared to the models, which we built using porphyrins only. The later model calculated acceptable RMSE = 2.61 for prediction of the absorption band of 335 porphyrins synthesized in our laboratory, but had a low accuracy (RMSE = 0.52) for extinction coefficient. A development of models using only compounds from our laboratory significantly decreased errors for these compounds (RMSE = 0.5 and 0.042 for absorption band and extinction coefficient, respectively), but limited their applicability only to these homologous series. When developing models, one should clearly keep in mind their potential use and select a strategy that could contribute the most accurate predictions for the target application. The models and data are publicly available.
Collapse
|
41
|
Rusanov AI, Dmitrieva OA, Mamardashvili NZ, Tetko IV. More Is Not Always Better: Local Models Provide Accurate Predictions of Spectral Properties of Porphyrins. Int J Mol Sci 2022; 23:ijms23031201. [PMID: 35163123 PMCID: PMC8835262 DOI: 10.3390/ijms23031201] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 01/19/2022] [Indexed: 02/05/2023] Open
Abstract
The development of new functional materials based on porphyrins requires fast and accurate prediction of their spectral properties. The available models in the literature for absorption wavelength and extinction coefficient of the Soret band have low accuracy for this class of compounds. We collected spectral data for porphyrins to extend the literature set and compared the performance of global and local models for their modelling using different machine learning methods. Interestingly, extension of the public database contributed models with lower accuracies compared to the models, which we built using porphyrins only. The later model calculated acceptable RMSE = 2.61 for prediction of the absorption band of 335 porphyrins synthesized in our laboratory, but had a low accuracy (RMSE = 0.52) for extinction coefficient. A development of models using only compounds from our laboratory significantly decreased errors for these compounds (RMSE = 0.5 and 0.042 for absorption band and extinction coefficient, respectively), but limited their applicability only to these homologous series. When developing models, one should clearly keep in mind their potential use and select a strategy that could contribute the most accurate predictions for the target application. The models and data are publicly available.
Collapse
Affiliation(s)
- Aleksey I. Rusanov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, 153045 Ivanovo, Russia; (A.I.R.); (O.A.D.); (N.Z.M.)
| | - Olga A. Dmitrieva
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, 153045 Ivanovo, Russia; (A.I.R.); (O.A.D.); (N.Z.M.)
| | - Nugzar Zh. Mamardashvili
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, 153045 Ivanovo, Russia; (A.I.R.); (O.A.D.); (N.Z.M.)
| | - Igor V. Tetko
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, 153045 Ivanovo, Russia; (A.I.R.); (O.A.D.); (N.Z.M.)
- Helmholtz Munich, Institute of Structural Biology, Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), D-85764 Neuherberg, Germany
- BIGCHEM GmbH, D-85716 Unterschleißheim, Germany
- Correspondence: ; Tel.: +49-89-3187-3575
| |
Collapse
|
42
|
Rusanov AI, Dmitrieva OA, Mamardashvili NZ, Tetko IV. More Is Not Always Better: Local Models Provide Accurate Predictions of Spectral Properties of Porphyrins. Int J Mol Sci 2022. [DOI: https:/doi.org/10.3390/ijms23031201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
The development of new functional materials based on porphyrins requires fast and accurate prediction of their spectral properties. The available models in the literature for absorption wavelength and extinction coefficient of the Soret band have low accuracy for this class of compounds. We collected spectral data for porphyrins to extend the literature set and compared the performance of global and local models for their modelling using different machine learning methods. Interestingly, extension of the public database contributed models with lower accuracies compared to the models, which we built using porphyrins only. The later model calculated acceptable RMSE = 2.61 for prediction of the absorption band of 335 porphyrins synthesized in our laboratory, but had a low accuracy (RMSE = 0.52) for extinction coefficient. A development of models using only compounds from our laboratory significantly decreased errors for these compounds (RMSE = 0.5 and 0.042 for absorption band and extinction coefficient, respectively), but limited their applicability only to these homologous series. When developing models, one should clearly keep in mind their potential use and select a strategy that could contribute the most accurate predictions for the target application. The models and data are publicly available.
Collapse
|
43
|
Huang S, Ding Y. Identification of Anticancer and Anti-inflammatory Drugs from Drug-target Interaction Descriptors by Machine Learning.. LETT DRUG DES DISCOV 2022. [DOI: 10.2174/1570180819666220114114752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Drug repositioning is an important subject in drug-disease research. In the past, most studies simply used drug descriptors as the feature vector to classify drugs or targets, or used qualitative data about drug-target or drug-disease to predict drug-target interactions. These data provide limited information for drug repositioning.
Objective:
Considering both drugs and targets and constructing quantitative drug-target interaction descriptors as a method of drug characteristics are of great significance to the study of drug repositioning.
Methods:
Taking anticancer and anti-inflammatory drugs as research objects, the interaction sites between drugs and targets were determined by molecular docking. Sixty-seven drug-target interaction descriptors were calculated to describe the drug-target interactions, and 22 important descriptors were screened for drug classification by SVM, LightGBM and MLP.
Results:
The accuracy of SVM, LightGBM and MLP reached 93.29%, 92.68% and 94.51%, their Matthews correlation coefficients reached 0.852, 0.840 and 0.882, and their areas under the ROC curve reached 0.977, 0.969 and 0.968, respectively.
Conclusion:
Using drug-target interaction descriptors to build machine learning models can obtain better results for drug classification. Number of atom pairs, force field, hydrophobic interactions and bSASA are the four types of key features for the classification of anticancer and anti-inflammatory drugs.
Collapse
Affiliation(s)
- Songtao Huang
- School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, P.R. China
- Laboratory of Media Design and Software Technology, Jiangnan University, Wuxi, Jiangsu, 214122, P.R. China
| | - Yanrui Ding
- school of Science, Jiangnan University, Wuxi, Jiangsu, 214122, P.R. China
- Key Laboratory of Industrial Biotechnology, Jiangnan University, Wuxi, Jiangsu, 214122, P.R. China
| |
Collapse
|
44
|
AI-powered drug repurposing for developing COVID-19 treatments. REFERENCE MODULE IN BIOMEDICAL SCIENCES 2022. [PMCID: PMC8865759 DOI: 10.1016/b978-0-12-824010-6.00005-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Emerging infectious diseases are an ever-present threat to public health, and COVID-19 is the most recent example. There is an urgent need to develop a robust framework to combat the disease with safe and effective therapeutic options. Compared to de novo drug discovery, drug repurposing may offer a lower-cost and faster drug discovery paradigm to explore potential treatment options of existing drugs. This chapter elucidates the advantages of artificial intelligence (AI) in enhancing the drug repurposing process from a data science perspective, using COVID-19 as an example. First, we elaborate on how AI-powered drug repurposing benefits from the accumulated data and knowledge of COVID-19 natural history and pathogenesis. Second, we summarize the pros and cons of AI-powered drug repurposing strategies to facilitate fit-for-purpose selection. Finally, we outline challenges of AI-powered drug repurposing from a regulatory perspective and suggest some potential solutions. AI-powered drug purposing is promising for emerging treatments for COVID-19 infection. Accumulated biological data profiles facilitate AI-based drug repurposing efforts for development of COVID-19 therapies. The ‘fit-for-purpose selection of AI-powered drug repurposing strategies is key to uncovering hidden information among drugs, targets, and diseases. Efforts from different stakeholders boost the adoption of AI-powered drug repurposing in the regulatory setting.
Collapse
|
45
|
Liu J, Guo W, Sakkiah S, Ji Z, Yavas G, Zou W, Chen M, Tong W, Patterson TA, Hong H. Machine Learning Models for Predicting Liver Toxicity. Methods Mol Biol 2022; 2425:393-415. [PMID: 35188640 DOI: 10.1007/978-1-0716-1960-5_15] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Liver toxicity is a major adverse drug reaction that accounts for drug failure in clinical trials and withdrawal from the market. Therefore, predicting potential liver toxicity at an early stage in drug discovery is crucial to reduce costs and the potential for drug failure. However, current in vivo animal toxicity testing is very expensive and time consuming. As an alternative approach, various machine learning models have been developed to predict potential liver toxicity in humans. This chapter reviews current advances in the development and application of machine learning models for prediction of potential liver toxicity in humans and discusses possible improvements to liver toxicity prediction.
Collapse
Affiliation(s)
- Jie Liu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Wenjing Guo
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Sugunadevi Sakkiah
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Zuowei Ji
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Gokhan Yavas
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Wen Zou
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Minjun Chen
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Tucker A Patterson
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA.
| |
Collapse
|
46
|
Duchowicz PR, Fioressi SE, Bacelo DE. QSAR predictions on antichagas fenarimols. RESULTS IN CHEMISTRY 2022. [DOI: 10.1016/j.rechem.2021.100256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
47
|
Saavedra LM, Duchowicz PR. Predicting zebrafish (Danio rerio) embryo developmental toxicity through a non-conformational QSAR approach. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021; 796:148820. [PMID: 34328907 DOI: 10.1016/j.scitotenv.2021.148820] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 06/11/2021] [Accepted: 06/29/2021] [Indexed: 06/13/2023]
Abstract
For many years, the frequent use of synthetic chemicals in the manufacture of veterinary drugs and plague control products has raised negative effects on human health and other non-target organisms, promoting the need to employ a practical and suitable methodology for early risk identification of several thousand commercial compounds. The zebrafish (Danio rerio) embryo has been emerged as one sustainable animal model for measuring developmental toxicity, an endpoint that is included in the regulatory procedures to approve chemicals, avoiding conventional and costly toxicity assays based on animal testing. In this context, the Quantitative Structure-Activity Relationships (QSAR) theory is applied to develop a predictive model based on a well-defined zebrafish embryo developmental toxicity database reported by the ToxCast™ Phase I chemical library of the Environmental Protection Agency (U.S. EPA). By means of four freely available softwares, a set with 28,038 non-conformational descriptors that encode the largest amount of permanent structural features are readily calculated. The Replacement Method (RM) variable subset selection technique provided the best regression models. Thereby, a linear QSAR model with proper statistical quality (Rtrain2 = 0.64, RMSEtrain = 0.49) is established in agreement with the Organization for Economic Co-operation and Development principles, accomplishing each internal (loo, l15 % o, VIF and Y-randomization) and external (Rtest2,Rm2, QF12, QF22, QF32 and CCC) validation criterion. The present QSAR approach provides a useful computational tool to estimate zebrafish developmental toxicity of new, untasted or hypothetical compounds, and it can contribute to the general lack of QSAR models in the literature to predict this endpoint.
Collapse
Affiliation(s)
- Laura M Saavedra
- Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas (INIFTA), CONICET, UNLP, Diag. 113 y 64, C.C. 16, Sucursal 4, 1900 La Plata, Argentina.
| | - Pablo R Duchowicz
- Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas (INIFTA), CONICET, UNLP, Diag. 113 y 64, C.C. 16, Sucursal 4, 1900 La Plata, Argentina.
| |
Collapse
|
48
|
Li T, Tong W, Roberts R, Liu Z, Thakkar S. DeepCarc: Deep Learning-Powered Carcinogenicity Prediction Using Model-Level Representation. Front Artif Intell 2021; 4:757780. [PMID: 34870186 PMCID: PMC8636933 DOI: 10.3389/frai.2021.757780] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 10/27/2021] [Indexed: 12/16/2022] Open
Abstract
Carcinogenicity testing plays an essential role in identifying carcinogens in environmental chemistry and drug development. However, it is a time-consuming and label-intensive process to evaluate the carcinogenic potency with conventional 2-years rodent animal studies. Thus, there is an urgent need for alternative approaches to providing reliable and robust assessments on carcinogenicity. In this study, we proposed a DeepCarc model to predict carcinogenicity for small molecules using deep learning-based model-level representations. The DeepCarc Model was developed using a data set of 692 compounds and evaluated on a test set containing 171 compounds in the National Center for Toxicological Research liver cancer database (NCTRlcdb). As a result, the proposed DeepCarc model yielded a Matthews correlation coefficient (MCC) of 0.432 for the test set, outperforming four advanced deep learning (DL) powered quantitative structure-activity relationship (QSAR) models with an average improvement rate of 37%. Furthermore, the DeepCarc model was also employed to screen the carcinogenicity potential of the compounds from both DrugBank and Tox21. Altogether, the proposed DeepCarc model could serve as an early detection tool (https://github.com/TingLi2016/DeepCarc) for carcinogenicity assessment.
Collapse
Affiliation(s)
- Ting Li
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States,University of Arkansas at Little Rock and University of Arkansas for Medical Sciences Joint Bioinformatics Program, Little Rock, AR, United States
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States
| | - Ruth Roberts
- ApconiX Ltd., Alderley Edge, United Kingdom,Department of Biosciences, University of Birmingham, Birmingham, United Kingdom
| | - Zhichao Liu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States,*Correspondence: Zhichao Liu, ; Shraddha Thakkar,
| | - Shraddha Thakkar
- Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States,*Correspondence: Zhichao Liu, ; Shraddha Thakkar,
| |
Collapse
|
49
|
Pantaleão SQ, Fernandes PO, Gonçalves JE, Maltarollo VG, Honorio KM. Recent Advances in the Prediction of Pharmacokinetics Properties in Drug Design Studies: A Review. ChemMedChem 2021; 17:e202100542. [PMID: 34655454 DOI: 10.1002/cmdc.202100542] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 10/07/2021] [Indexed: 12/11/2022]
Abstract
This review presents the main aspects related to pharmacokinetic properties, which are essential for the efficacy and safety of drugs. This topic is very important because the analysis of pharmacokinetic aspects in the initial design stages of drug candidates can increase the chances of success for the entire process. In this scenario, experimental and in silico techniques have been widely used. Due to the difficulties encountered with the use of some experimental tests to determine pharmacokinetic properties, several in silico tools have been developed and have shown promising results. Therefore, in this review, we address the main free tools/servers that have been used in this area, as well as some cases of application. Finally, we present some studies that employ a multidisciplinary approach with synergy between in silico, in vitro, and in vivo techniques to assess ADME properties of bioactive substances, achieving successful results in drug discovery and design.
Collapse
Affiliation(s)
- Simone Q Pantaleão
- Centro de Ciências Naturais e Humanas, Institution Universidade Federal do ABC, 09210-580, Santo André, SP, Brazil
| | - Philipe O Fernandes
- Departamento de Produtos Farmacêuticos, Universidade Federal de Minas Gerais, 31270-901, Pampulha, MG, Brazil
| | - José Eduardo Gonçalves
- Departamento de Produtos Farmacêuticos, Universidade Federal de Minas Gerais, 31270-901, Pampulha, MG, Brazil
| | - Vinícius G Maltarollo
- Departamento de Produtos Farmacêuticos, Universidade Federal de Minas Gerais, 31270-901, Pampulha, MG, Brazil
| | - Kathia Maria Honorio
- Centro de Ciências Naturais e Humanas, Institution Universidade Federal do ABC, 09210-580, Santo André, SP, Brazil.,Escola de Artes, Ciências e Humanidades, Universidade de São Paulo, 03828-000, São Paulo, SP, Brazil
| |
Collapse
|
50
|
Mao J, Akhtar J, Zhang X, Sun L, Guan S, Li X, Chen G, Liu J, Jeon HN, Kim MS, No KT, Wang G. Comprehensive strategies of machine-learning-based quantitative structure-activity relationship models. iScience 2021; 24:103052. [PMID: 34553136 PMCID: PMC8441174 DOI: 10.1016/j.isci.2021.103052] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Early quantitative structure-activity relationship (QSAR) technologies have unsatisfactory versatility and accuracy in fields such as drug discovery because they are based on traditional machine learning and interpretive expert features. The development of Big Data and deep learning technologies significantly improve the processing of unstructured data and unleash the great potential of QSAR. Here we discuss the integration of wet experiments (which provide experimental data and reliable verification), molecular dynamics simulation (which provides mechanistic interpretation at the atomic/molecular levels), and machine learning (including deep learning) techniques to improve QSAR models. We first review the history of traditional QSAR and point out its problems. We then propose a better QSAR model characterized by a new iterative framework to integrate machine learning with disparate data input. Finally, we discuss the application of QSAR and machine learning to many practical research fields, including drug development and clinical trials.
Collapse
Affiliation(s)
- Jiashun Mao
- The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China
- Guangdong Provincial Key Laboratory of Computational Science and Material Design, Shenzhen, Guangdong 518055 China
| | - Javed Akhtar
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China
- Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Shenzhen, Guangdong 518055, China
| | - Xiao Zhang
- Shanghai Rural Commercial Bank Co., Ltd, Shanghai 200002, China
| | - Liang Sun
- Department of Physics, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
| | - Shenghui Guan
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China
- Guangdong Provincial Key Laboratory of Computational Science and Material Design, Shenzhen, Guangdong 518055 China
| | - Xinyu Li
- School of Life and Health Sciences and Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Guangming Chen
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China
- Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Shenzhen, Guangdong 518055, China
| | - Jiaxin Liu
- Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
| | - Hyeon-Nae Jeon
- Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
| | - Min Sung Kim
- Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
| | - Kyoung Tai No
- The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Guanyu Wang
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China
- Guangdong Provincial Key Laboratory of Computational Science and Material Design, Shenzhen, Guangdong 518055 China
- Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Shenzhen, Guangdong 518055, China
| |
Collapse
|