1
|
Srisuwananukorn A, Krull JE, Ma Q, Zhang P, Pearson AT, Hoffman R. Applications of artificial intelligence to myeloproliferative neoplasms: a narrative review. Expert Rev Hematol 2024; 17:669-677. [PMID: 39114884 DOI: 10.1080/17474086.2024.2389997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 08/05/2024] [Indexed: 09/21/2024]
Abstract
INTRODUCTION Artificial intelligence (AI) is a rapidly growing field of computational research with the potential to extract nuanced biomarkers for the prediction of outcomes of interest. AI implementations for the prediction for clinical outcomes for myeloproliferative neoplasms (MPNs) are currently under investigation. AREAS COVERED In this narrative review, we discuss AI investigations for the improvement of MPN clinical care utilizing either clinically available data or experimental laboratory findings. Abstracts and manuscripts were identified upon querying PubMed and the American Society of Hematology conference between 2000 and 2023. Overall, multidisciplinary researchers have developed AI methods in MPNs attempting to improve diagnostic accuracy, risk prediction, therapy selection, or pre-clinical investigations to identify candidate molecules as novel therapeutic agents. EXPERT OPINION It is our expert opinion that AI methods in MPN care and hematology will continue to grow with increasing clinical utility. We believe that AI models will assist healthcare workers as clinical decision support tools if appropriately developed with AI-specific regulatory guidelines. Though the reported findings in this review are early investigations for AI in MPNs, the collective work developed by the research community provides a promising framework for improving decision-making in the future of MPN clinical care.
Collapse
Affiliation(s)
- Andrew Srisuwananukorn
- Division of Hematology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Jordan E Krull
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Ping Zhang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
- Department of Computer Science and Engineering, College of Engineering, The Ohio State University, Columbus, OH, USA
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH, USA
| | - Alexander T Pearson
- Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Ronald Hoffman
- Division of Hematology and Medical Oncology, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
2
|
Matúška J, Bucinsky L, Gall M, Pitoňák M, Štekláč M. SchNetPack Hyperparameter Optimization for a More Reliable Top Docking Scores Prediction. J Phys Chem B 2024; 128:4943-4951. [PMID: 38733335 DOI: 10.1021/acs.jpcb.4c00296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2024]
Abstract
Options to improve the extrapolation power of the neural network designed using the SchNetPack package with respect to top docking scores prediction are presented. It is shown that hyperparameter tuning of the atomistic model representation (in the schnetpack.representation) improves the prediction of the top scoring compounds, which have characteristically a low incidence in randomized data sets for training of machine learning models. The prediction robustness is evaluated according to the mean square error (MSE) and the entropy of the average loss landscape decrease. Admittedly, the improvement of the top scoring compounds' prediction accuracy comes with the penalty of worsening the overall prediction power. It is revealed that the most impactful hyperparameter is the cutoff (5 Å is reported as the optimal choice). Other parameters (e.g., number of radial basis functions, number of interaction layers of the neural network, feature vector size or its batch size) are found to not affect the prediction robustness of the top scoring compounds in any comparable way relative to the cutoff. The MSE of the best docking score prediction (below -13 kcal/mol) improves from ca. 3.5 to 0.9 kcal/mol, while the prediction of less potent compounds (-13 to -11 kcal/mol) shows a lesser improvement, i.e., a decrease of MSE from 1.6 to 1.3 kcal/mol. Additionally, oversampling and undersampling of the training set with respect to the top scoring compounds' abundance is presented. The results indicate that the cutoff choice performs better than over- or undersampling of the training set, with undersampling performing better than oversampling.
Collapse
Affiliation(s)
- Ján Matúška
- Institute of Physical Chemistry and Chemical Physics, Faculty of Chemical and Food Technology, Slovak University of Technology in Bratislava, Radlinského 9, SK-81237 Bratislava, Slovak Republic
| | - Lukas Bucinsky
- Institute of Physical Chemistry and Chemical Physics, Faculty of Chemical and Food Technology, Slovak University of Technology in Bratislava, Radlinského 9, SK-81237 Bratislava, Slovak Republic
| | - Marián Gall
- Institute of Information Engineering, Automation and Mathematics, Faculty of Chemical and Food Technology, Slovak University of Technology in Bratislava, Radlinského 9, SK-81237 Bratislava, Slovak Republic
- National SuperComputing Center, Dúbravská cesta č. 9, SK-84104 Bratislava, Slovak Republic
| | - Michal Pitoňák
- National SuperComputing Center, Dúbravská cesta č. 9, SK-84104 Bratislava, Slovak Republic
- Department of Physical and Theoretical Chemistry, Faculty of Natural Sciences, Comenius University in Bratislava, Mlynská dolina Ilkovičova 6, SK-84215 Bratislava, Slovak Republic
| | - Marek Štekláč
- Institute of Physical Chemistry and Chemical Physics, Faculty of Chemical and Food Technology, Slovak University of Technology in Bratislava, Radlinského 9, SK-81237 Bratislava, Slovak Republic
- Computing Centre, Centre of Operations of the Slovak Academy of Sciences, Dúbravská cesta č. 9, SK-84535 Bratislava, Slovak Republic
| |
Collapse
|
3
|
Castelo-Soccio L, Kim H, Gadina M, Schwartzberg PL, Laurence A, O'Shea JJ. Protein kinases: drug targets for immunological disorders. Nat Rev Immunol 2023; 23:787-806. [PMID: 37188939 PMCID: PMC10184645 DOI: 10.1038/s41577-023-00877-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/24/2023] [Indexed: 05/17/2023]
Abstract
Protein kinases play a major role in cellular activation processes, including signal transduction by diverse immunoreceptors. Given their roles in cell growth and death and in the production of inflammatory mediators, targeting kinases has proven to be an effective treatment strategy, initially as anticancer therapies, but shortly thereafter in immune-mediated diseases. Herein, we provide an overview of the status of small molecule inhibitors specifically generated to target protein kinases relevant to immune cell function, with an emphasis on those approved for the treatment of immune-mediated diseases. The development of inhibitors of Janus kinases that target cytokine receptor signalling has been a particularly active area, with Janus kinase inhibitors being approved for the treatment of multiple autoimmune and allergic diseases as well as COVID-19. In addition, TEC family kinase inhibitors (including Bruton's tyrosine kinase inhibitors) targeting antigen receptor signalling have been approved for haematological malignancies and graft versus host disease. This experience provides multiple important lessons regarding the importance (or not) of selectivity and the limits to which genetic information informs efficacy and safety. Many new agents are being generated, along with new approaches for targeting kinases.
Collapse
Affiliation(s)
- Leslie Castelo-Soccio
- Dermatology Branch, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Hanna Kim
- Juvenile Myositis Pathogenesis and Therapeutics Unit, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Massimo Gadina
- Translational Immunology Section, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Pamela L Schwartzberg
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Arian Laurence
- Department of Immunology, Royal Free London Hospitals NHS Foundation Trust, London, UK.
- University College London Hospitals NHS Foundation Trust, London, UK.
| | - John J O'Shea
- Molecular Immunology and Inflammation Branch, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
4
|
Liu L, Na R, Yang L, Liu J, Tan Y, Zhao X, Huang X, Chen X. A Workflow Combining Machine Learning with Molecular Simulations Uncovers Potential Dual-Target Inhibitors against BTK and JAK3. Molecules 2023; 28:7140. [PMID: 37894618 PMCID: PMC10608827 DOI: 10.3390/molecules28207140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/08/2023] [Accepted: 10/15/2023] [Indexed: 10/29/2023] Open
Abstract
The drug development process suffers from low success rates and requires expensive and time-consuming procedures. The traditional one drug-one target paradigm is often inadequate to treat multifactorial diseases. Multitarget drugs may potentially address problems such as adverse reactions to drugs. With the aim to discover a multitarget potential inhibitor for B-cell lymphoma treatment, herein, we developed a general pipeline combining machine learning, the interpretable model SHapley Additive exPlanation (SHAP), and molecular dynamics simulations to predict active compounds and fragments. Bruton's tyrosine kinase (BTK) and Janus kinase 3 (JAK3) are popular synergistic targets for B-cell lymphoma. We used this pipeline approach to identify prospective potential dual inhibitors from a natural product database and screened three candidate inhibitors with acceptable drug absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties. Ultimately, the compound CNP0266747 with specialized binding conformations that exhibited potential binding free energy against BTK and JAK3 was selected as the optimum choice. Furthermore, we also identified key residues and fingerprint features of this dual-target inhibitor of BTK and JAK3.
Collapse
Affiliation(s)
- Lu Liu
- Institute of Theoretical Chemistry, Jilin University, Changchun 130061, China; (L.L.); (J.L.); (Y.T.)
| | - Risong Na
- Collaborative Innovation Center of Henan Grain Crops, National Key Laboratory of Wheat and Maize Crop Science, College of Plant Protection, Henan Agricultural University, Zhengzhou 450002, China;
| | - Lianjuan Yang
- Department of Medical Mycology, Shanghai Skin Disease Hospital, Tongji University School of Medicine, Shanghai 200443, China;
| | - Jixiang Liu
- Institute of Theoretical Chemistry, Jilin University, Changchun 130061, China; (L.L.); (J.L.); (Y.T.)
| | - Yingjia Tan
- Institute of Theoretical Chemistry, Jilin University, Changchun 130061, China; (L.L.); (J.L.); (Y.T.)
| | - Xi Zhao
- Institute of Theoretical Chemistry, Jilin University, Changchun 130061, China; (L.L.); (J.L.); (Y.T.)
| | - Xuri Huang
- Institute of Theoretical Chemistry, Jilin University, Changchun 130061, China; (L.L.); (J.L.); (Y.T.)
| | - Xuecheng Chen
- Department of Nanomaterials Physicochemistry, Faculty of Chemical Technology and Engineering, West Pomeranian University of Technology, Szczecin Piastów Ave. 42, 71-065 Szczecin, Poland;
| |
Collapse
|
5
|
Park H, Hong S, Lee M, Kang S, Brahma R, Cho KH, Shin JM. AiKPro: deep learning model for kinome-wide bioactivity profiling using structure-based sequence alignments and molecular 3D conformer ensemble descriptors. Sci Rep 2023; 13:10268. [PMID: 37355672 PMCID: PMC10290719 DOI: 10.1038/s41598-023-37456-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 06/22/2023] [Indexed: 06/26/2023] Open
Abstract
The discovery of selective and potent kinase inhibitors is crucial for the treatment of various diseases, but the process is challenging due to the high structural similarity among kinases. Efficient kinome-wide bioactivity profiling is essential for understanding kinase function and identifying selective inhibitors. In this study, we propose AiKPro, a deep learning model that combines structure-validated multiple sequence alignments and molecular 3D conformer ensemble descriptors to predict kinase-ligand binding affinities. Our deep learning model uses an attention-based mechanism to capture complex patterns in the interactions between the kinase and the ligand. To assess the performance of AiKPro, we evaluated the impact of descriptors, the predictability for untrained kinases and compounds, and kinase activity profiling based on odd ratios. Our model, AiKPro, shows good Pearson's correlation coefficients of 0.88 and 0.87 for the test set and for the untrained sets of compounds, respectively, which also shows the robustness of the model. AiKPro shows good kinase-activity profiles across the kinome, potentially facilitating the discovery of novel interactions and selective inhibitors. Our approach holds potential implications for the discovery of novel, selective kinase inhibitors and guiding rational drug design.
Collapse
Affiliation(s)
- Hyejin Park
- AZothBio Inc., Rm. DA724 Hyundai Knowledge Industry Center, Hanam-si, Gyeonggi-do, Republic of Korea
| | - Sujeong Hong
- AZothBio Inc., Rm. DA724 Hyundai Knowledge Industry Center, Hanam-si, Gyeonggi-do, Republic of Korea
| | - Myeonghun Lee
- AZothBio Inc., Rm. DA724 Hyundai Knowledge Industry Center, Hanam-si, Gyeonggi-do, Republic of Korea
| | - Sungil Kang
- AZothBio Inc., Rm. DA724 Hyundai Knowledge Industry Center, Hanam-si, Gyeonggi-do, Republic of Korea
| | - Rahul Brahma
- School of Systems Biomedical Science, Soongsil University, Seoul, Republic of Korea
| | - Kwang-Hwi Cho
- School of Systems Biomedical Science, Soongsil University, Seoul, Republic of Korea
| | - Jae-Min Shin
- AZothBio Inc., Rm. DA724 Hyundai Knowledge Industry Center, Hanam-si, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
6
|
Chen H. Enterprise marketing strategy using big data mining technology combined with XGBoost model in the new economic era. PLoS One 2023; 18:e0285506. [PMID: 37276212 DOI: 10.1371/journal.pone.0285506] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 04/24/2023] [Indexed: 06/07/2023] Open
Abstract
The technological development in the new economic era has brought challenges to enterprises. Enterprises need to use massive and effective consumption information to provide customers with high-quality customized services. Big data technology has strong mining ability. The relevant theories of computer data mining technology are summarized to optimize the marketing strategy of enterprises. The application of data mining in precision marketing services is analyzed. Extreme Gradient Boosting (XGBoost) has shown strong advantages in machine learning algorithms. In order to help enterprises to analyze customer data quickly and accurately, the characteristics of XGBoost feedback are used to reverse the main factors that can affect customer activation cards, and effective analysis is carried out for these factors. The data obtained from the analysis points out the direction of effective marketing for potential customers to be activated. Finally, the performance of XGBoost is compared with the other three methods. The characteristics that affect the top 7 prediction results are tested for differences. The results show that: (1) the accuracy and recall rate of the proposed model are higher than other algorithms, and the performance is the best. (2) The significance p values of the features included in the test are all less than 0.001. The data shows that there is a very significant difference between the proposed features and the results of activation or not. The contributions of this paper are mainly reflected in two aspects. 1. Four precision marketing strategies based on big data mining are designed to provide scientific support for enterprise decision-making. 2. The improvement of the connection rate and stickiness between enterprises and customers has played a huge driving role in overall customer marketing.
Collapse
|
7
|
Yang M, Sun H, Liu X, Xue X, Deng Y, Wang X. CMGN: a conditional molecular generation net to design target-specific molecules with desired properties. Brief Bioinform 2023:7165252. [PMID: 37193672 DOI: 10.1093/bib/bbad185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 04/06/2023] [Accepted: 04/23/2023] [Indexed: 05/18/2023] Open
Abstract
The rational design of chemical entities with desired properties for a specific target is a long-standing challenge in drug design. Generative neural networks have emerged as a powerful approach to sample novel molecules with specific properties, termed as inverse drug design. However, generating molecules with biological activity against certain targets and predefined drug properties still remains challenging. Here, we propose a conditional molecular generation net (CMGN), the backbone of which is a bidirectional and autoregressive transformer. CMGN applies large-scale pretraining for molecular understanding and navigates the chemical space for specified targets by fine-tuning with corresponding datasets. Additionally, fragments and properties were trained to recover molecules to learn the structure-properties relationships. Our model crisscrosses the chemical space for specific targets and properties that control fragment-growth processes. Case studies demonstrated the advantages and utility of our model in fragment-to-lead processes and multi-objective lead optimization. The results presented in this paper illustrate that CMGN has the potential to accelerate the drug discovery process.
Collapse
Affiliation(s)
- Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Department of Medicinal Chemistry, Beijing Key Laboratory of Active Substances Discovery and Druggability Evaluation, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Xi Xue
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd., China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Department of Medicinal Chemistry, Beijing Key Laboratory of Active Substances Discovery and Druggability Evaluation, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| |
Collapse
|
8
|
Wu J, Xiao Y, Lin M, Cai H, Zhao D, Li Y, Luo H, Tang C, Wang L. DeepCancerMap: A versatile deep learning platform for target- and cell-based anticancer drug discovery. Eur J Med Chem 2023; 255:115401. [PMID: 37116265 DOI: 10.1016/j.ejmech.2023.115401] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 03/29/2023] [Accepted: 04/18/2023] [Indexed: 04/30/2023]
Abstract
Discovering new anticancer drugs has been widely concerned and remains an open challenge. Target- and phenotypic-based experimental screening represent two mainstream anticancer drug discovery methods, which suffer from time-consuming, labor-intensive, and high experimental costs. In this study, we collected 485,900 compounds involving in 3,919,974 bioactivity records against 426 anticancer targets and 346 cancer cell lines from academic literature, as well as 60 tumor cell lines from NCI-60 panel. A total of 832 classification models (426 target- and 406 cell-based predictive models) were then constructed to predict the inhibitory activity of compounds against targets and tumor cell lines using FP-GNN deep learning method. Compared to the classical machine learning and deep learning methods, the FP-GNN models achieve considerable overall predictive performance, with the highest AUC values of 0.91, 0.88, 0.91 for the test sets of targets, academia-sourced and NCI-60 cancer cell lines, respectively. A user-friendly webserver called DeepCancerMap and its local version were developed based on these high-quality models, enabling users to perform anticancer drug discovery-related tasks including large-scale virtual screening, profiling prediction of anticancer agents, target fishing, and drug repositioning. We anticipate this platform to accelerate the discovery of anticancer drugs in the field. DeepCancerMap is freely available at https://deepcancermap.idruglab.cn.
Collapse
Affiliation(s)
- Jingxing Wu
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yi Xiao
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Mujie Lin
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Hanxuan Cai
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Duancheng Zhao
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yirui Li
- School of Software Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Hailin Luo
- School of Software Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Chuanqi Tang
- School of Design, South China University of Technology, Guangzhou, 510006, China
| | - Ling Wang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China.
| |
Collapse
|
9
|
Bu Y, Gao R, Zhang B, Zhang L, Sun D. CoGT: Ensemble Machine Learning Method and Its Application on JAK Inhibitor Discovery. ACS OMEGA 2023; 8:13232-13242. [PMID: 37065046 PMCID: PMC10099439 DOI: 10.1021/acsomega.3c00160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 03/16/2023] [Indexed: 06/19/2023]
Abstract
The discovery of new drug candidates to inhibit an intended target is a complex and resource-consuming process. A machine learning (ML) method for predicting drug-target interactions (DTI) is a potential solution to improve the efficiency. However, traditional ML approaches have limitations in accuracy. In this study, we developed a novel ensemble model CoGT for DTI prediction using multilayer perceptron (MLP), which integrated graph-based models to extract non-Euclidean molecular structures and large pretrained models, specifically chemBERTa, to process simplified molecular input line entry systems (SMILES). The performance of CoGT was evaluated using compounds inhibiting four Janus kinases (JAKs). Results showed that the large pretrained model, chemBERTa, was better than other conventional ML models in predicting DTI across multiple evaluation metrics, while the graph neural network (GNN) was effective for prediction on imbalanced data sets. To take full advantage of the strengths of these different models, we developed an ensemble model, CoGT, which outperformed other individual ML models in predicting compounds' inhibition on different isoforms of JAKs. Our data suggest that the ensemble model CoGT has the potential to accelerate the process of drug discovery.
Collapse
Affiliation(s)
- Yingzi Bu
- Department
of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Ruoxi Gao
- Department
of Electrical Engineering and Computer Science, University of MichiganAnn Arbor, Michigan 48109, United States
| | - Bohan Zhang
- School
of Information, University of MichiganAnn Arbor, Michigan 48109, United States
| | - Luchen Zhang
- Department
of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Duxin Sun
- Department
of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
10
|
Belenahalli Shekarappa S, Kandagalla S, Lee J. Development of machine learning models based on molecular fingerprints for selection of small molecule inhibitors against JAK2 protein. J Comput Chem 2023; 44:1493-1504. [PMID: 36929511 DOI: 10.1002/jcc.27103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 02/18/2023] [Accepted: 02/24/2023] [Indexed: 03/18/2023]
Abstract
Janus kinase 2 (JAK2) is emerging as a potential therapeutic target for many inflammatory diseases such as myeloproliferative disorders (MPD), cancer and rheumatoid arthritis (RA). In this study, we have collected experimental data of JAK2 protein containing 6021 unique inhibitors. We then characterized them based on Morgan (ECFP6) fingerprints followed by clustering into training and test set based on their molecular scaffolds. These data were used to build the classification models with various supervised machine learning (ML) algorithms that could prioritize novel inhibitors for future drug development against JAK2 protein. The best model built by Random Forest (RF) and Morgan fingerprints achieved the G-mean value of 0.84 on the external test set. As an application of our classification model, virtual screening was performed against Drugbank molecules in order to identify the potential inhibitors based on the confidence score by RF model. Nine potential molecules were identified, which were further subject to molecular docking studies to evaluate the virtual screening results of the best RF model. This proposed method can prove useful for developing novel target-specific JAK2 inhibitors.
Collapse
Affiliation(s)
- Sharath Belenahalli Shekarappa
- School of Systems Biomedical Science and Department of Bioinformatics and Life Science, Soongsil University, Seoul, South Korea
| | - Shivananda Kandagalla
- Laboratory of Computational Modeling of Drugs, Higher Medical & Biological School, South Ural State University, Chelyabinsk, Russia
| | - Julian Lee
- School of Systems Biomedical Science and Department of Bioinformatics and Life Science, Soongsil University, Seoul, South Korea
| |
Collapse
|
11
|
Developing a Naïve Bayesian Classification Model with PI3Kγ structural features for virtual screening against PI3Kγ: Combining molecular docking and pharmacophore based on multiple PI3Kγ conformations. Eur J Med Chem 2022; 244:114824. [DOI: 10.1016/j.ejmech.2022.114824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 09/28/2022] [Accepted: 10/01/2022] [Indexed: 11/21/2022]
|
12
|
Interpretable Machine Learning Models for Molecular Design of Tyrosine Kinase Inhibitors Using Variational Autoencoders and Perturbation-Based Approach of Chemical Space Exploration. Int J Mol Sci 2022; 23:ijms231911262. [PMID: 36232566 PMCID: PMC9569663 DOI: 10.3390/ijms231911262] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/21/2022] [Accepted: 09/21/2022] [Indexed: 11/17/2022] Open
Abstract
In the current study, we introduce an integrative machine learning strategy for the autonomous molecular design of protein kinase inhibitors using variational autoencoders and a novel cluster-based perturbation approach for exploration of the chemical latent space. The proposed strategy combines autoencoder-based embedding of small molecules with a cluster-based perturbation approach for efficient navigation of the latent space and a feature-based kinase inhibition likelihood classifier that guides optimization of the molecular properties and targeted molecular design. In the proposed generative approach, molecules sharing similar structures tend to cluster in the latent space, and interpolating between two molecules in the latent space enables smooth changes in the molecular structures and properties. The results demonstrated that the proposed strategy can efficiently explore the latent space of small molecules and kinase inhibitors along interpretable directions to guide the generation of novel family-specific kinase molecules that display a significant scaffold diversity and optimal biochemical properties. Through assessment of the latent-based and chemical feature-based binary and multiclass classifiers, we developed a robust probabilistic evaluator of kinase inhibition likelihood that is specifically tailored to guide the molecular design of novel SRC kinase molecules. The generated molecules originating from LCK and ABL1 kinase inhibitors yielded ~40% of novel and valid SRC kinase compounds with high kinase inhibition likelihood probability values (p > 0.75) and high similarity (Tanimoto coefficient > 0.6) to the known SRC inhibitors. By combining the molecular perturbation design with the kinase inhibition likelihood analysis and similarity assessments, we showed that the proposed molecular design strategy can produce novel valid molecules and transform known inhibitors of different kinase families into potential chemical probes of the SRC kinase with excellent physicochemical profiles and high similarity to the known SRC kinase drugs. The results of our study suggest that task-specific manipulation of a biased latent space may be an important direction for more effective task-oriented and target-specific autonomous chemical design models.
Collapse
|
13
|
Kwapien K, Nittinger E, He J, Margreitter C, Voronov A, Tyrchan C. Implications of Additivity and Nonadditivity for Machine Learning and Deep Learning Models in Drug Design. ACS OMEGA 2022; 7:26573-26581. [PMID: 35936431 PMCID: PMC9352238 DOI: 10.1021/acsomega.2c02738] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 07/08/2022] [Indexed: 05/20/2023]
Abstract
Matched molecular pairs (MMPs) are nowadays a commonly applied concept in drug design. They are used in many computational tools for structure-activity relationship analysis, biological activity prediction, or optimization of physicochemical properties. However, until now it has not been shown in a rigorous way that MMPs, that is, changing only one substituent between two molecules, can be predicted with higher accuracy and precision in contrast to any other chemical compound pair. It is expected that any model should be able to predict such a defined change with high accuracy and reasonable precision. In this study, we examine the predictability of four classical properties relevant for drug design ranging from simple physicochemical parameters (log D and solubility) to more complex cell-based ones (permeability and clearance), using different data sets and machine learning algorithms. Our study confirms that additive data are the easiest to predict, which highlights the importance of recognition of nonadditivity events and the challenging complexity of predicting properties in case of scaffold hopping. Despite deep learning being well suited to model nonlinear events, these methods do not seem to be an exception of this observation. Though they are in general performing better than classical machine learning methods, this leaves the field with a still standing challenge.
Collapse
Affiliation(s)
- Karolina Kwapien
- Medicinal
Chemistry, Research and Early Development, Respiratory and Immunology
(R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg 431 83, Sweden
| | - Eva Nittinger
- Medicinal
Chemistry, Research and Early Development, Respiratory and Immunology
(R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg 431 83, Sweden
| | - Jiazhen He
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 431 83, Sweden
| | | | - Alexey Voronov
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 431 83, Sweden
| | - Christian Tyrchan
- Medicinal
Chemistry, Research and Early Development, Respiratory and Immunology
(R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg 431 83, Sweden
| |
Collapse
|
14
|
Artificial intelligence in virtual screening: models versus experiments. Drug Discov Today 2022; 27:1913-1923. [PMID: 35597513 DOI: 10.1016/j.drudis.2022.05.013] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 05/08/2022] [Accepted: 05/12/2022] [Indexed: 12/22/2022]
Abstract
A typical drug discovery project involves identifying active compounds with significant binding potential for selected disease-specific targets. Experimental high-throughput screening (HTS) is a traditional approach to drug discovery, but is expensive and time-consuming when dealing with huge chemical libraries with billions of compounds. The search space can be narrowed down with the use of reliable computational screening approaches. In this review, we focus on various machine-learning (ML) and deep-learning (DL)-based scoring functions developed for solving classification and ranking problems in drug discovery. We highlight studies in which ML and DL models were successfully deployed to identify lead compounds for which the experimental validations are available from bioassay studies.
Collapse
|
15
|
López-López E, Fernández-de Gortari E, Medina-Franco JL. Yes SIR! On the structure-inactivity relationships in drug discovery. Drug Discov Today 2022; 27:2353-2362. [PMID: 35561964 DOI: 10.1016/j.drudis.2022.05.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/09/2022] [Accepted: 05/05/2022] [Indexed: 12/12/2022]
Abstract
In analogy with structure-activity relationships (SARs), which are at the core of medicinal chemistry, studying structure-inactivity relationships (SIRs) is essential to understanding and predicting biological activity. Current computational methods should predict or distinguish 'activity' and 'inactivity' with the same confidence because both concepts are complementary. However, the lack of inactivity data, in particular in the public domain, limits the development of predictive models and its broad application. In this review, we encourage the scientific community to disclose and analyze high-confidence activity data considering both the labeled 'active' and 'inactive' compounds.
Collapse
Affiliation(s)
- Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico; Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico City 07000, Mexico.
| | - Eli Fernández-de Gortari
- Department of Nanosafety, International Iberian Nanotechnology Laboratory, Braga 4715-330, Portugal
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| |
Collapse
|
16
|
Venkatraman V, Colligan TH, Lesica GT, Olson DR, Gaiser J, Copeland CJ, Wheeler TJ, Roy A. Drugsniffer: An Open Source Workflow for Virtually Screening Billions of Molecules for Binding Affinity to Protein Targets. Front Pharmacol 2022; 13:874746. [PMID: 35559261 PMCID: PMC9086895 DOI: 10.3389/fphar.2022.874746] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Accepted: 04/04/2022] [Indexed: 11/13/2022] Open
Abstract
The SARS-CoV2 pandemic has highlighted the importance of efficient and effective methods for identification of therapeutic drugs, and in particular has laid bare the need for methods that allow exploration of the full diversity of synthesizable small molecules. While classical high-throughput screening methods may consider up to millions of molecules, virtual screening methods hold the promise of enabling appraisal of billions of candidate molecules, thus expanding the search space while concurrently reducing costs and speeding discovery. Here, we describe a new screening pipeline, called drugsniffer, that is capable of rapidly exploring drug candidates from a library of billions of molecules, and is designed to support distributed computation on cluster and cloud resources. As an example of performance, our pipeline required ∼40,000 total compute hours to screen for potential drugs targeting three SARS-CoV2 proteins among a library of ∼3.7 billion candidate molecules.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Chemistry, Norwegian University of Science and Technology, Trondheim, Norway
| | - Thomas H. Colligan
- Department of Computer Science, University of Montana, Missoula, MT, United States
| | - George T. Lesica
- Department of Computer Science, University of Montana, Missoula, MT, United States
| | - Daniel R. Olson
- Department of Computer Science, University of Montana, Missoula, MT, United States
| | - Jeremiah Gaiser
- Department of Computer Science, University of Montana, Missoula, MT, United States
| | - Conner J. Copeland
- Department of Computer Science, University of Montana, Missoula, MT, United States
| | - Travis J. Wheeler
- Department of Computer Science, University of Montana, Missoula, MT, United States
| | - Amitava Roy
- Department of Computer Science, University of Montana, Missoula, MT, United States
- Rocky Mountain Laboratories, Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Hamilton, MT, United States
| |
Collapse
|
17
|
Huang YW, Hsu YC, Chuang YH, Chen YT, Lin XY, Fan YW, Pathak N, Yang JM. Discovery of moiety preference by Shapley value in protein kinase family using random forest models. BMC Bioinformatics 2022; 23:130. [PMID: 35428180 PMCID: PMC9011936 DOI: 10.1186/s12859-022-04663-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 04/04/2022] [Indexed: 11/30/2022] Open
Abstract
Background Human protein kinases play important roles in cancers, are highly co-regulated by kinase families rather than a single kinase, and complementarily regulate signaling pathways. Even though there are > 100,000 protein kinase inhibitors, only 67 kinase drugs are currently approved by the Food and Drug Administration (FDA). Results In this study, we used “merged moiety-based interpretable features (MMIFs),” which merged four moiety-based compound features, including Checkmol fingerprint, PubChem fingerprint, rings in drugs, and in-house moieties as the input features for building random forest (RF) models. By using > 200,000 bioactivity test data, we classified inhibitors as kinase family inhibitors or non-inhibitors in the machine learning. The results showed that our RF models achieved good accuracy (> 0.8) for the 10 kinase families. In addition, we found kinase common and specific moieties across families using the Shapley Additive exPlanations (SHAP) approach. We also verified our results using protein kinase complex structures containing important interactions of the hinges, DFGs, or P-loops in the ATP pocket of active sites. Conclusions In summary, we not only constructed highly accurate prediction models for predicting inhibitors of kinase families but also discovered common and specific inhibitor moieties between different kinase families, providing new opportunities for designing protein kinase inhibitors.
Collapse
|
18
|
Wang Y, Gu Y, Lou C, Gong Y, Wu Z, Li W, Tang Y, Liu G. A multitask GNN-based interpretable model for discovery of selective JAK inhibitors. J Cheminform 2022; 14:16. [PMID: 35292114 PMCID: PMC8922399 DOI: 10.1186/s13321-022-00593-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 02/26/2022] [Indexed: 11/10/2022] Open
Abstract
The Janus kinase (JAK) family plays a pivotal role in most cytokine-mediated inflammatory and autoimmune responses via JAK/STAT signaling, and administration of JAK inhibitors is a promising therapeutic strategy for several diseases including COVID-19. However, to screen and design selective JAK inhibitors is a daunting task due to the extremely high homology among four JAK isoforms. In this study, we aimed to simultaneously predict pIC50 values of compounds for all JAK subtypes by constructing an interpretable GNN multitask regression model. The final model performance was positive, with R2 values of 0.96, 0.79 and 0.78 on the training, validation and test sets, respectively. Meanwhile, we calculated and visualized atom weights, followed by the rank sum tests and local mean comparisons to obtain key atoms and substructures that could be fine-tuned to design selective JAK inhibitors. Several successful case studies have demonstrated that our approach is feasible and our model could learn the interactions between proteins and small molecules well, which could provide practitioners with a novel way to discover and design JAK inhibitors with selectivity.
Collapse
Affiliation(s)
- Yimeng Wang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yaxin Gu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Chaofeng Lou
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yuning Gong
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Zengrui Wu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| |
Collapse
|
19
|
Discovery of Kinase and Carbonic Anhydrase Dual Inhibitors by Machine Learning Classification and Experiments. Pharmaceuticals (Basel) 2022; 15:ph15020236. [PMID: 35215348 PMCID: PMC8875555 DOI: 10.3390/ph15020236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 02/11/2022] [Accepted: 02/12/2022] [Indexed: 02/04/2023] Open
Abstract
A multi-target small molecule modulator is advantageous for treating complicated diseases such as cancers. However, the strategy and application for discovering a multi-target modulator have been less reported. This study presents the dual inhibitors for kinase and carbonic anhydrase (CA) predicted by machine learning (ML) classifiers, and validated by biochemical and biophysical experiments. ML trained by CA I and CA II inhibitor molecular fingerprints predicted candidates from the protein-specific bioactive molecules approved or under clinical trials. For experimental tests, three sulfonamide-containing kinase inhibitors, 5932, 5946, and 6046, were chosen. The enzyme assays with CA I, CA II, CA IX, and CA XII have allowed the quantitative comparison in the molecules’ inhibitory activities. While 6046 inhibited weakly, 5932 and 5946 exhibited potent inhibitions with 100 nM to 1 μM inhibitory constants. The ML screening was extended for finding CAs inhibitors of all known kinase inhibitors. It found XMU-MP-1 as another potent CA inhibitor with an approximate 30 nM inhibitory constant for CA I, CA II, and CA IX. Differential scanning fluorimetry confirmed the direct interaction between CAs and small molecules. Cheminformatics studies, including docking simulation, suggest that each molecule possesses two separate functional moieties: one for interaction with kinases and the other with CAs.
Collapse
|
20
|
Bucinsky L, Bortňák D, Gall M, Matúška J, Milata V, Pitoňák M, Štekláč M, Végh D, Zajaček D. Machine learning prediction of 3CLpro SARS-CoV-2 docking scores. Comput Biol Chem 2022; 98:107656. [PMID: 35288359 PMCID: PMC8881816 DOI: 10.1016/j.compbiolchem.2022.107656] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 02/23/2022] [Accepted: 02/24/2022] [Indexed: 12/14/2022]
Abstract
Molecular docking results of two training sets containing 866 and 8,696 compounds were used to train three different machine learning (ML) approaches. Neural network approaches according to Keras and TensorFlow libraries and the gradient boosted decision trees approach of XGBoost were used with DScribe’s Smooth Overlap of Atomic Positions molecular descriptors. In addition, neural networks using the SchNetPack library and descriptors were used. The ML performance was tested on three different sets, including compounds for future organic synthesis. The final evaluation of the ML predicted docking scores was based on the ZINC in vivo set, from which 1,200 compounds were randomly selected with respect to their size. The results obtained showed a consistent ML prediction capability of docking scores, and even though compounds with more than 60 atoms were found slightly overestimated they remain valid for a subsequent evaluation of their drug repurposing suitability.
Collapse
|
21
|
Vijayan RSK, Kihlberg J, Cross JB, Poongavanam V. Enhancing preclinical drug discovery with artificial intelligence. Drug Discov Today 2021; 27:967-984. [PMID: 34838731 DOI: 10.1016/j.drudis.2021.11.023] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 10/15/2021] [Accepted: 11/19/2021] [Indexed: 12/14/2022]
Abstract
Artificial intelligence (AI) is becoming an integral part of drug discovery. It has the potential to deliver across the drug discovery and development value chain, starting from target identification and reaching through clinical development. In this review, we provide an overview of current AI technologies and a glimpse of how AI is reimagining preclinical drug discovery by highlighting examples where AI has made a real impact. Considering the excitement and hyperbole surrounding AI in drug discovery, we aim to present a realistic view by discussing both opportunities and challenges in adopting AI in drug discovery.
Collapse
Affiliation(s)
- R S K Vijayan
- Institute for Applied Cancer Science, MD Anderson Cancer Center, Houston, TX, USA
| | - Jan Kihlberg
- Department of Chemistry-BMC, Uppsala University, Uppsala, Sweden
| | - Jason B Cross
- Institute for Applied Cancer Science, MD Anderson Cancer Center, Houston, TX, USA.
| | | |
Collapse
|
22
|
Machine Learning Models for the Classification of CK2 Natural Products Inhibitors with Molecular Fingerprint Descriptors. Processes (Basel) 2021. [DOI: 10.3390/pr9112074] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Casein kinase 2 (CK2) is considered an important target for anti-cancer drugs. Given the structural diversity and broad spectrum of pharmaceutical activities of natural products, numerous studies have been performed to prove them as valuable sources of drugs. However, there has been little study relevant to identifying structural factors responsible for their inhibitory activity against CK2 with machine learning methods. In this study, classification studies were conducted on 115 natural products as CK2 inhibitors. Seven machine learning methods along with six molecular fingerprints were employed to develop qualitative classification models. The performances of all models were evaluated by cross-validation and test set. By taking predictive accuracy(CA), the area under receiver operating characteristic (AUC), and (MCC)as three performance indicators, the optimal models with high reliability and predictive ability were obtained, including the Extended Fingerprint-Logistic Regression model (CA = 0.859, AUC = 0.826, MCC = 0.520) for training test andPubChem fingerprint along with the artificial neural model (CA = 0.826, AUC = 0.933, MCC = 0.628) for test set. Meanwhile, the privileged substructures responsible for their inhibitory activity against CK2 were also identified through a combination of frequency analysis and information gain. The results are expected to provide useful information for the further utilization of natural products and the discovery of novel CK2 inhibitors.
Collapse
|
23
|
Jabeen A, de March CA, Matsunami H, Ranganathan S. Machine Learning Assisted Approach for Finding Novel High Activity Agonists of Human Ectopic Olfactory Receptors. Int J Mol Sci 2021; 22:ijms222111546. [PMID: 34768977 PMCID: PMC8583936 DOI: 10.3390/ijms222111546] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/21/2021] [Accepted: 10/22/2021] [Indexed: 12/29/2022] Open
Abstract
Olfactory receptors (ORs) constitute the largest superfamily of G protein-coupled receptors (GPCRs). ORs are involved in sensing odorants as well as in other ectopic roles in non-nasal tissues. Matching of an enormous number of the olfactory stimulation repertoire to its counterpart OR through machine learning (ML) will enable understanding of olfactory system, receptor characterization, and exploitation of their therapeutic potential. In the current study, we have selected two broadly tuned ectopic human OR proteins, OR1A1 and OR2W1, for expanding their known chemical space by using molecular descriptors. We present a scheme for selecting the optimal features required to train an ML-based model, based on which we selected the random forest (RF) as the best performer. High activity agonist prediction involved screening five databases comprising ~23 M compounds, using the trained RF classifier. To evaluate the effectiveness of the machine learning based virtual screening and check receptor binding site compatibility, we used docking of the top target ligands to carefully develop receptor model structures. Finally, experimental validation of selected compounds with significant docking scores through in vitro assays revealed two high activity novel agonists for OR1A1 and one for OR2W1.
Collapse
Affiliation(s)
- Amara Jabeen
- Applied BioSciences, Macquarie University, Sydney, NSW 2109, Australia;
| | - Claire A. de March
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA;
| | - Hiroaki Matsunami
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA;
- Department of Neurobiology, Duke Institute for Brain Sciences, Duke University, Durham, NC 27710, USA
- Correspondence: (H.M.); (S.R.)
| | - Shoba Ranganathan
- Applied BioSciences, Macquarie University, Sydney, NSW 2109, Australia;
- Correspondence: (H.M.); (S.R.)
| |
Collapse
|
24
|
Towards Data‐Driven Design of Asymmetric Hydrogenation of Olefins: Database and Hierarchical Learning. Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202106880] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
25
|
Xu LC, Zhang SQ, Li X, Tang MJ, Xie PP, Hong X. Towards Data-driven Design of Asymmetric Hydrogenation of Olefins: Database and Hierarchical Learning. Angew Chem Int Ed Engl 2021; 60:22804-22811. [PMID: 34370892 DOI: 10.1002/anie.202106880] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Revised: 07/14/2021] [Indexed: 11/09/2022]
Abstract
Asymmetric hydrogenation of olefins is one of the most powerful asymmetric transformations in molecular synthesis. Although several privileged catalyst scaffolds are available, the catalyst development for asymmetric hydrogenation is still a time- and resource-consuming process due to the lack of predictive catalyst design strategy. Targeting the data-driven design of asymmetric catalysis, we herein report the development of a standardized database that contains the detailed information of over 12000 literature asymmetric hydrogenations of olefins. This database provides a valuable platform for the machine learning applications in asymmetric catalysis. Based on this database, we developed a hierarchical learning approach to achieve predictive machine leaning model using only dozens of enantioselectivity data with the target olefin, which offers a useful solution for the few-shot learning problem and will facilitate the reaction optimization with new olefin substrate in catalysis screening.
Collapse
Affiliation(s)
- Li-Cheng Xu
- Zhejiang University, Department of Chemistry, CHINA
| | | | - Xin Li
- Zhejiang University, Department of Chemistry, CHINA
| | | | - Pei-Pei Xie
- Zhejiang University, Department of Chemistry, CHINA
| | - Xin Hong
- Zhejiang University, Department of Chemistry, 38 Zheda Road, 310028, Hangzhou, CHINA
| |
Collapse
|
26
|
Recent advances in drug repurposing using machine learning. Curr Opin Chem Biol 2021; 65:74-84. [PMID: 34274565 DOI: 10.1016/j.cbpa.2021.06.001] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 05/28/2021] [Accepted: 06/01/2021] [Indexed: 12/11/2022]
Abstract
Drug repurposing aims to find new uses for already existing and approved drugs. We now provide a brief overview of recent developments in drug repurposing using machine learning alongside other computational approaches for comparison. We also highlight several applications for cancer using kinase inhibitors, Alzheimer's disease as well as COVID-19.
Collapse
|
27
|
Menke J, Koch O. Using Domain-Specific Fingerprints Generated Through Neural Networks to Enhance Ligand-Based Virtual Screening. J Chem Inf Model 2021; 61:664-675. [PMID: 33497572 DOI: 10.1021/acs.jcim.0c01208] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Similarity-based virtual screening is a fundamental tool in the early drug discovery process and relies heavily on molecular fingerprints. We propose a novel strategy of generating domain-specific fingerprints by training neural networks on target-specific bioactivity datasets and using the activation as a new molecular representation. The neural network is expected to combine information of already known bioactive compounds with unique information of the molecular structure and by doing so enrich the fingerprint. We evaluate this strategy on a large kinase-specific bioactivity dataset. A comparison of five neural network architectures and their fingerprints to the well-established extended-connectivity fingerprint (ECFP) and an autoencoder shows that our neural fingerprint produces better results in the similarity search. Most importantly, the neural fingerprint performs well even when specific targets are not included during training. Surprisingly, while Graph Neural Networks (GNNs) are thought to offer an advantageous alternative, the best performing neural fingerprints were based on traditional fully connected layers using the ECFP4 as the input. The neural fingerprint is freely available at: https://github.com/kochgroup/kinase_nnfp.
Collapse
Affiliation(s)
- Janosch Menke
- Institute of Pharmaceutical and Medicinal Chemistry, Westfälische Wilhelms-Universität Münster, Corrensstraße 48, Münster 48149, Germany
| | - Oliver Koch
- Institute of Pharmaceutical and Medicinal Chemistry, Westfälische Wilhelms-Universität Münster, Corrensstraße 48, Münster 48149, Germany.,Center for Multiscale Theory and Computation, Westfälische Wilhelms-Universität Münster, Corrensstraße 48, Münster 48149, Germany
| |
Collapse
|
28
|
Xing G, Liang L, Deng C, Hua Y, Chen X, Yang Y, Liu H, Lu T, Chen Y, Zhang Y. Activity Prediction of Small Molecule Inhibitors for Antirheumatoid Arthritis Targets Based on Artificial Intelligence. ACS COMBINATORIAL SCIENCE 2020; 22:873-886. [PMID: 33146518 DOI: 10.1021/acscombsci.0c00169] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Rheumatoid arthritis (RA) is a chronic autoimmune disease, which is compared to "immortal cancer" in industry. Currently, SYK, BTK, and JAK are the three major targets of protein tyrosine kinase for this disease. According to existing research, marketed and research drugs for RA are mostly based on single target, which limits their efficacy. Therefore, designing multitarget or dual-target inhibitors provide new insights for the treatment of RA regarding of the specific association between SYK, BTK, and JAK from two signal transduction pathways. In this study, machine learning (XGBoost, SVM) and deep learning (DNN) models were combined for the first time to build a powerful integrated model for SYK, BTK, and JAK. The predictive power of the integrated model was proved to be superior to that of a single classifier. In order to accurately assess the generalization ability of the integrated model, comprehensive similarity analysis was performed on the training and the test set, and the prediction accuracy of the integrated model was specifically analyzed under different similarity thresholds. External validation was conducted using single-target and dual-target inhibitors, respectively. Results showed that our model not only obtained a high recall rate (97%) in single-target prediction, but also achieved a favorable yield (54.4%) in dual-target prediction. Furthermore, by clustering dual-target inhibitors, the prediction performance of model in various classes were proved, evaluating the applicability domain of the model in the dual-target drug screening. In summary, the integrated model proposed is promising to screen dual-target inhibitors of SYK/JAK or BTK/JAK as RA drugs, which is beneficial for the clinical treatment of rheumatoid arthritis.
Collapse
Affiliation(s)
- Guomeng Xing
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Li Liang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Chenglong Deng
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yi Hua
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Xingye Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yan Yang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing 210009, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| |
Collapse
|
29
|
Antiplasmodial activity of sulfonylhydrazones: in vitro and in silico approaches. Future Med Chem 2020; 13:233-250. [PMID: 33295837 DOI: 10.4155/fmc-2020-0229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Malaria is still a life-threatening public health issue, and the upsurge of resistant strains requires continuous generation of active molecules. In this work, 35 sulfonylhydrazone derivatives were synthesized and evaluated against Plasmodium falciparum chloroquine-sensitive (3D7) and resistant (W2) strains. The most promising compound, 5b, had an IC50 of 0.22 μM against W2 and was less cytotoxic and 26-fold more selective than chloroquine. The structure-activity relationship model, statistical analysis and molecular modeling studies suggested that antiplasmodial activity was related to hydrogen bond acceptor count, molecular weight and partition coefficient of octanol/water and displacement of frontier orbitals to the heteroaromatic ring beside the imine bond. This study demonstrates that the synthesized molecules with a simple scaffold allow the hit-to-lead process for new antimalarials to commence.
Collapse
|
30
|
Venkatraman V. Evaluation of Molecular Fingerprints for Determining Dye Aggregation on Semiconductor Surfaces. Mol Inform 2020; 41:e2000062. [PMID: 32476288 DOI: 10.1002/minf.202000062] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 05/31/2020] [Indexed: 01/19/2023]
Abstract
Dye aggregation plays an important role in determining the photovoltaic performance of dye sensitized solar cells. Compared with the spectra observed in solution, it is, apriori, difficult to ascertain whether a dye is likely to show hypsochromic (H) or bathochromic (J) aggregation, until after adsorption onto the semiconductor electrode. Herein, we show that molecular fingerprint-based methods provide a fast and efficient way to discriminate between H- and J-aggregating dyes. The efficacy of the fingerprint-based classification models is demonstrated with a diverse set of over 3000 organic dyes dissolved in different solvents. Requiring only the structure of the dye and the polarity of the solvent used, the machine learning model achieves close to 80 % classification accuracies that are comparable with models based on a combination of fragment counts and topological indices. For interested researchers, we have bundled the prediction tools as an R package.
Collapse
|