1
|
Curcio A, Rocca R, Alcaro S, Artese A. The Histone Deacetylase Family: Structural Features and Application of Combined Computational Methods. Pharmaceuticals (Basel) 2024; 17:620. [PMID: 38794190 PMCID: PMC11124352 DOI: 10.3390/ph17050620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 05/03/2024] [Accepted: 05/08/2024] [Indexed: 05/26/2024] Open
Abstract
Histone deacetylases (HDACs) are crucial in gene transcription, removing acetyl groups from histones. They also influence the deacetylation of non-histone proteins, contributing to the regulation of various biological processes. Thus, HDACs play pivotal roles in various diseases, including cancer, neurodegenerative disorders, and inflammatory conditions, highlighting their potential as therapeutic targets. This paper reviews the structure and function of the four classes of human HDACs. While four HDAC inhibitors are currently available for treating hematological malignancies, numerous others are undergoing clinical trials. However, their non-selective toxicity necessitates ongoing research into safer and more efficient class-selective or isoform-selective inhibitors. Computational methods have aided the discovery of HDAC inhibitors with the desired potency and/or selectivity. These methods include ligand-based approaches, such as scaffold hopping, pharmacophore modeling, three-dimensional quantitative structure-activity relationships, and structure-based virtual screening (molecular docking). Moreover, recent developments in the field of molecular dynamics simulations, combined with Poisson-Boltzmann/molecular mechanics generalized Born surface area techniques, have improved the prediction of ligand binding affinity. In this review, we delve into the ways in which these methods have contributed to designing and identifying HDAC inhibitors.
Collapse
Affiliation(s)
- Antonio Curcio
- Dipartimento di Scienze della Salute, Campus “S. Venuta”, Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy; (A.C.); (S.A.); (A.A.)
| | - Roberta Rocca
- Dipartimento di Scienze della Salute, Campus “S. Venuta”, Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy; (A.C.); (S.A.); (A.A.)
- Net4Science S.r.l., Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy
| | - Stefano Alcaro
- Dipartimento di Scienze della Salute, Campus “S. Venuta”, Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy; (A.C.); (S.A.); (A.A.)
- Net4Science S.r.l., Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy
| | - Anna Artese
- Dipartimento di Scienze della Salute, Campus “S. Venuta”, Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy; (A.C.); (S.A.); (A.A.)
- Net4Science S.r.l., Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy
| |
Collapse
|
2
|
Wu J, Chen Y, Wu J, Zhao D, Huang J, Lin M, Wang L. Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors. J Cheminform 2024; 16:13. [PMID: 38291477 PMCID: PMC10829268 DOI: 10.1186/s13321-023-00799-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 12/22/2023] [Indexed: 02/01/2024] Open
Abstract
Conventional machine learning (ML) and deep learning (DL) play a key role in the selectivity prediction of kinase inhibitors. A number of models based on available datasets can be used to predict the kinase profile of compounds, but there is still controversy about the advantages and disadvantages of ML and DL for such tasks. In this study, we constructed a comprehensive benchmark dataset of kinase inhibitors, involving in 141,086 unique compounds and 216,823 well-defined bioassay data points for 354 kinases. We then systematically compared the performance of 12 ML and DL methods on the kinase profiling prediction task. Extensive experimental results reveal that (1) Descriptor-based ML models generally slightly outperform fingerprint-based ML models in terms of predictive performance. RF as an ensemble learning approach displays the overall best predictive performance. (2) Single-task graph-based DL models are generally inferior to conventional descriptor- and fingerprint-based ML models, however, the corresponding multi-task models generally improves the average accuracy of kinase profile prediction. For example, the multi-task FP-GNN model outperforms the conventional descriptor- and fingerprint-based ML models with an average AUC of 0.807. (3) Fusion models based on voting and stacking methods can further improve the performance of the kinase profiling prediction task, specifically, RF::AtomPairs + FP2 + RDKitDes fusion model performs best with the highest average AUC value of 0.825 on the test sets. These findings provide useful information for guiding choices of the ML and DL methods for the kinase profiling prediction tasks. Finally, an online platform called KIPP ( https://kipp.idruglab.cn ) and python software are developed based on the best models to support the kinase profiling prediction, as well as various kinase inhibitor identification tasks including virtual screening, compound repositioning and target fishing.
Collapse
Affiliation(s)
- Jiangxia Wu
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yihao Chen
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Jingxing Wu
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Duancheng Zhao
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Jindi Huang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - MuJie Lin
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China.
| |
Collapse
|
3
|
Zhu W, Wang Y, Niu Y, Zhang L, Liu Z. Current Trends and Challenges in Drug-Likeness Prediction: Are They Generalizable and Interpretable? HEALTH DATA SCIENCE 2023; 3:0098. [PMID: 38487200 PMCID: PMC10880170 DOI: 10.34133/hds.0098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 10/20/2023] [Indexed: 03/17/2024]
Abstract
Importance: Drug-likeness of a compound is an overall assessment of its potential to succeed in clinical trials, and is essential for economizing research expenditures by filtering compounds with unfavorable properties and poor development potential. To this end, a robust drug-likeness prediction method is indispensable. Various approaches, including discriminative rules, statistical models, and machine learning models, have been developed to predict drug-likeness based on physiochemical properties and structural features. Notably, recent advancements in novel deep learning techniques have significantly advanced drug-likeness prediction, especially in classification performance. Highlights: In this review, we addressed the evolving landscape of drug-likeness prediction, with emphasis on methods employing novel deep learning techniques, and highlighted the current challenges in drug-likeness prediction, specifically regarding the aspects of generalization and interpretability. Moreover, we explored potential remedies and outlined promising avenues for future research. Conclusion: Despite the hurdles of generalization and interpretability, novel deep learning techniques have great potential in drug-likeness prediction and are worthy of further research efforts.
Collapse
Affiliation(s)
- Wenyu Zhu
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Yanxing Wang
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Yan Niu
- Department of Medicinal Chemistry,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Liangren Zhang
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Zhenming Liu
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| |
Collapse
|
4
|
Gil-Pichardo A, Sánchez-Ruiz A, Colmenarejo G. Analysis of metabolites in human gut: illuminating the design of gut-targeted drugs. J Cheminform 2023; 15:96. [PMID: 37833792 PMCID: PMC10571276 DOI: 10.1186/s13321-023-00768-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 10/06/2023] [Indexed: 10/15/2023] Open
Abstract
Gut-targeted drugs provide a new drug modality besides that of oral, systemic molecules, that could tap into the growing knowledge of gut metabolites of bacterial or host origin and their involvement in biological processes and health through their interaction with gut targets (bacterial or host, too). Understanding the properties of gut metabolites can provide guidance for the design of gut-targeted drugs. In the present work we analyze a large set of gut metabolites, both shared with serum or present only in gut, and compare them with oral systemic drugs. We find patterns specific for these two subsets of metabolites that could be used to design drugs targeting the gut. In addition, we develop and openly share a Super Learner model to predict gut permanence, in order to aid in the design of molecules with appropriate profiles to remain in the gut, resulting in molecules with putatively reduced secondary effects and better pharmacokinetics.
Collapse
Affiliation(s)
- Alberto Gil-Pichardo
- Biostatistics and Bioinformatics Unit, IMDEA Food, CEI UAM+CSIC, 28049, Madrid, Spain
| | - Andrés Sánchez-Ruiz
- Biostatistics and Bioinformatics Unit, IMDEA Food, CEI UAM+CSIC, 28049, Madrid, Spain
| | - Gonzalo Colmenarejo
- Biostatistics and Bioinformatics Unit, IMDEA Food, CEI UAM+CSIC, 28049, Madrid, Spain.
| |
Collapse
|
5
|
Riedl M, Mukherjee S, Gauthier M. Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma. Mol Pharm 2023; 20:4984-4993. [PMID: 37656906 DOI: 10.1021/acs.molpharmaceut.3c00129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/03/2023]
Abstract
Chemical-specific parameters are either measured in vitro or estimated using quantitative structure-activity relationship (QSAR) models. The existing body of QSAR work relies on extracting a set of descriptors or fingerprints, subset selection, and training a machine learning model. In this work, we used a state-of-the-art natural language processing model, Bidirectional Encoder Representations from Transformers, which allowed us to circumvent the need for calculation of these chemical descriptors. In this approach, simplified molecular-input line-entry system (SMILES) strings were embedded in a high-dimensional space using a two-stage training approach. The model was first pre-trained on a masked SMILES token task and then fine-tuned on a QSAR prediction task. The pre-training task learned meaningful high-dimensional embeddings based upon the relationships between the chemical tokens in the SMILES strings derived from the "in-stock" portion of the ZINC 15 dataset─a large dataset of commercially available chemicals. The fine-tuning task then perturbed the pre-trained embeddings to facilitate prediction of a specific QSAR endpoint of interest. The power of this model stems from the ability to reuse the pre-trained model for multiple different fine-tuning tasks, reducing the computational burden of developing multiple models for different endpoints. We used our framework to develop a predictive model for fraction unbound in human plasma (fu,p). This approach is flexible, requires minimum domain expertise, and can be generalized for other parameters of interest for rapid and accurate estimation of absorption, distribution, metabolism, excretion, and toxicity.
Collapse
|
6
|
Fang C, Wang Y, Grater R, Kapadnis S, Black C, Trapa P, Sciabola S. Prospective Validation of Machine Learning Algorithms for Absorption, Distribution, Metabolism, and Excretion Prediction: An Industrial Perspective. J Chem Inf Model 2023. [PMID: 37216672 DOI: 10.1021/acs.jcim.3c00160] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Absorption, distribution, metabolism, and excretion (ADME), which collectively define the concentration profile of a drug at the site of action, are of critical importance to the success of a drug candidate. Recent advances in machine learning algorithms and the availability of larger proprietary as well as public ADME data sets have generated renewed interest within the academic and pharmaceutical science communities in predicting pharmacokinetic and physicochemical endpoints in early drug discovery. In this study, we collected 120 internal prospective data sets over 20 months across six ADME in vitro endpoints: human and rat liver microsomal stability, MDR1-MDCK efflux ratio, solubility, and human and rat plasma protein binding. A variety of machine learning algorithms in combination with different molecular representations were evaluated. Our results suggest that gradient boosting decision tree and deep learning models consistently outperformed random forest over time. We also observed better performance when models were retrained on a fixed schedule, and the more frequent retraining generally resulted in increased accuracy, while hyperparameters tuning only improved the prospective predictions marginally.
Collapse
Affiliation(s)
- Cheng Fang
- Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States
| | - Ye Wang
- Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States
| | - Richard Grater
- DMPK, Biogen, Cambridge, Massachusetts 02142, United States
| | | | - Cheryl Black
- DMPK, Biogen, Cambridge, Massachusetts 02142, United States
| | - Patrick Trapa
- DMPK, Biogen, Cambridge, Massachusetts 02142, United States
| | - Simone Sciabola
- Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States
| |
Collapse
|
7
|
Win ZM, Cheong AMY, Hopkins WS. Using Machine Learning To Predict Partition Coefficient (Log P) and Distribution Coefficient (Log D) with Molecular Descriptors and Liquid Chromatography Retention Time. J Chem Inf Model 2023; 63:1906-1913. [PMID: 36926888 DOI: 10.1021/acs.jcim.2c01373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
During preclinical evaluations of drug candidates, several physicochemical (p-chem) properties are measured and employed as metrics to estimate drug efficacy in vivo. Two such p-chem properties are the octanol-water partition coefficient, Log P, and distribution coefficient, Log D, which are useful in estimating the distribution of drugs within the body. Log P and Log D are traditionally measured using the shake-flask method and high-performance liquid chromatography. However, it is challenging to measure these properties for species that are very hydrophobic (or hydrophilic) owing to the very low equilibrium concentrations partitioned into octanol (or aqueous) phases. Moreover, the shake-flask method is relatively time-consuming and can require multistep dilutions as the range of analyte concentrations can differ by several orders of magnitude. Here, we circumvent these limitations by using machine learning (ML) to correlate Log P and Log D with liquid chromatography (LC) retention time (RT). Predictive models based on four ML algorithms, which used molecular descriptors and LC RTs as features, were extensively tested and compared. The inclusion of RT as an additional descriptor improves model performance (MAE = 0.366 and R2 = 0.89), and Shapley additive explanations analysis indicates that RT has the highest impact on model accuracy.
Collapse
Affiliation(s)
- Zaw-Myo Win
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong.,School of Optometry, The Hong Kong Polytechnic University, Kowloon 999077, Hong Kong.,Department of Chemistry, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada
| | - Allen M Y Cheong
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong.,School of Optometry, The Hong Kong Polytechnic University, Kowloon 999077, Hong Kong
| | - W Scott Hopkins
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong.,Department of Chemistry, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada.,Waterloo Institute for Nanotechnology, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada.,WaterMine Innovation, Inc., Waterloo, Ontario N0B 2T0, Canada
| |
Collapse
|
8
|
Ai D, Cai H, Wei J, Zhao D, Chen Y, Wang L. DEEPCYPs: A deep learning platform for enhanced cytochrome P450 activity prediction. Front Pharmacol 2023; 14:1099093. [PMID: 37101544 PMCID: PMC10123292 DOI: 10.3389/fphar.2023.1099093] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 03/31/2023] [Indexed: 04/28/2023] Open
Abstract
Cytochrome P450 (CYP) is a superfamily of heme-containing oxidizing enzymes involved in the metabolism of a wide range of medicines, xenobiotics, and endogenous compounds. Five of the CYPs (1A2, 2C9, 2C19, 2D6, and 3A4) are responsible for metabolizing the vast majority of approved drugs. Adverse drug-drug interactions, many of which are mediated by CYPs, are one of the important causes for the premature termination of drug development and drug withdrawal from the market. In this work, we reported in silicon classification models to predict the inhibitory activity of molecules against these five CYP isoforms using our recently developed FP-GNN deep learning method. The evaluation results showed that, to the best of our knowledge, the multi-task FP-GNN model achieved the best predictive performance with the highest average AUC (0.905), F1 (0.779), BA (0.819), and MCC (0.647) values for the test sets, even compared to advanced machine learning, deep learning, and existing models. Y-scrambling testing confirmed that the results of the multi-task FP-GNN model were not attributed to chance correlation. Furthermore, the interpretability of the multi-task FP-GNN model enables the discovery of critical structural fragments associated with CYPs inhibition. Finally, an online webserver called DEEPCYPs and its local version software were created based on the optimal multi-task FP-GNN model to detect whether compounds bear potential inhibitory activity against CYPs, thereby promoting the prediction of drug-drug interactions in clinical practice and could be used to rule out inappropriate compounds in the early stages of drug discovery and/or identify new CYPs inhibitors.
Collapse
|
9
|
Bhat V, Sornberger P, Pokuri BSS, Duke R, Ganapathysubramanian B, Risko C. Electronic, redox, and optical property prediction of organic π-conjugated molecules through a hierarchy of machine learning approaches. Chem Sci 2022; 14:203-213. [PMID: 36605753 PMCID: PMC9769113 DOI: 10.1039/d2sc04676h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 11/16/2022] [Indexed: 11/18/2022] Open
Abstract
Accelerating the development of π-conjugated molecules for applications such as energy generation and storage, catalysis, sensing, pharmaceuticals, and (semi)conducting technologies requires rapid and accurate evaluation of the electronic, redox, or optical properties. While high-throughput computational screening has proven to be a tremendous aid in this regard, machine learning (ML) and other data-driven methods can further enable orders of magnitude reduction in time while at the same time providing dramatic increases in the chemical space that is explored. However, the lack of benchmark datasets containing the electronic, redox, and optical properties that characterize the diverse, known chemical space of organic π-conjugated molecules limits ML model development. Here, we present a curated dataset containing 25k molecules with density functional theory (DFT) and time-dependent DFT (TDDFT) evaluated properties that include frontier molecular orbitals, ionization energies, relaxation energies, and low-lying optical excitation energies. Using the dataset, we train a hierarchy of ML models, ranging from classical models such as ridge regression to sophisticated graph neural networks, with molecular SMILES representation as input. We observe that graph neural networks augmented with contextual information allow for significantly better predictions across a wide array of properties. Our best-performing models also provide an uncertainty quantification for the predictions. To democratize access to the data and trained models, an interactive web platform has been developed and deployed.
Collapse
Affiliation(s)
- Vinayak Bhat
- Department of Chemistry and Center for Applied Energy Research, University of Kentucky Lexington Kentucky 40506 USA
| | - Parker Sornberger
- Department of Chemistry and Center for Applied Energy Research, University of Kentucky Lexington Kentucky 40506 USA
| | - Balaji Sesha Sarath Pokuri
- Department of Mechanical Engineering and Translational AI Center, Iowa State University Ames Iowa 50010 USA
| | - Rebekah Duke
- Department of Chemistry and Center for Applied Energy Research, University of Kentucky Lexington Kentucky 40506 USA
| | | | - Chad Risko
- Department of Chemistry and Center for Applied Energy Research, University of Kentucky Lexington Kentucky 40506 USA
| |
Collapse
|
10
|
Using Artificial Intelligence for Drug Discovery: A Bibliometric Study and Future Research Agenda. Pharmaceuticals (Basel) 2022; 15:ph15121492. [PMID: 36558943 PMCID: PMC9785219 DOI: 10.3390/ph15121492] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 11/23/2022] [Accepted: 11/27/2022] [Indexed: 12/03/2022] Open
Abstract
Drug discovery is usually a rule-based process that is carefully carried out by pharmacists. However, a new trend is emerging in research and practice where artificial intelligence is being used for drug discovery to increase efficiency or to develop new drugs for previously untreatable diseases. Nevertheless, so far, no study takes a holistic view of AI-based drug discovery research. Given the importance and potential of AI for drug discovery, this lack of research is surprising. This study aimed to close this research gap by conducting a bibliometric analysis to identify all relevant studies and to analyze interrelationships among algorithms, institutions, countries, and funding sponsors. For this purpose, a sample of 3884 articles was examined bibliometrically, including studies from 1991 to 2022. We utilized various qualitative and quantitative methods, such as performance analysis, science mapping, and thematic analysis. Based on these findings, we furthermore developed a research agenda that aims to serve as a foundation for future researchers.
Collapse
|
11
|
Zhang H, Huang J, Chen R, Cai H, Chen Y, He S, Xu J, Zhang J, Wang L. Ligand- and structure-based identification of novel CDK9 inhibitors for the potential treatment of leukemia. Bioorg Med Chem 2022; 72:116994. [PMID: 36087428 DOI: 10.1016/j.bmc.2022.116994] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 08/21/2022] [Accepted: 08/29/2022] [Indexed: 11/02/2022]
Abstract
Cyclin-dependent kinase 9 (CDK9) plays a vital role in controlling cell transcription and has been an attractive target for cancer treatment. Herein, ten predictive models derived from 1330 unique molecules against CDK9 were constructed based on molecular fingerprints and graphs using two conventional machine learning and four deep learning methods. The evaluation results showed that FP-GNN deep learning architecture performed best for CDK9 inhibitors prediction with the highest BA and F1 values of 0.681 and 0.912 for testing set. We then performed virtual screening to identify new CDK9 inhibitors by incorporating the optimal established predictive model and molecular docking. Five compounds were identified to show broad anticancer activity against various cancer cell lines through bioassays. For example, C9 exhibited antiproliferative activities against HeLa, MOLM-13 and MDA-MB-231 with IC50 values of 2.53, 3.92 and 11.65 μM. Kinase inhibition assay results demonstrated that these compounds displayed submicromolar (214 ∼ 504 nM) inhibitory activities against CDK9. Further cellular mechanism evaluation revealed that C9 suppressed the activity of CDK9 and interfered with the expression of Mcl-1 and cleaved PARP in MOLM-13 cells, resulting in the induction of cellular apoptosis. In addition, C9 displayed a good stability in rat liver microsomes, artificial gastrointestinal fluid and plasm. An online platform (called DEEPCDK9Pred) was developed based on the FP-GNN models to predict or design new CDK9 inhibitors. Collectively, our findings demonstrated that FP-GNN algorithm can achieve accurate prediction of CDK9 inhibitors and the subsequent discovery of C9 as a new potential CDK9 inhibitor deserves further structural modification for the treatment of leukemia.
Collapse
Affiliation(s)
- Huimin Zhang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Jindi Huang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Rui Chen
- State Key Laboratory of Functions and Applications of Medicinal Plants & College of Pharmacy, Guizhou Provincial Engineering Technology Research Center for Chemical Drug R&D, Guizhou Medical University, Guiyang 550004, China
| | - Hanxuan Cai
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yihao Chen
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Shuyun He
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Jianrong Xu
- Department of Pharmacology and Chemical Biology, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China; Academy of Integrative Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Jiquan Zhang
- State Key Laboratory of Functions and Applications of Medicinal Plants & College of Pharmacy, Guizhou Provincial Engineering Technology Research Center for Chemical Drug R&D, Guizhou Medical University, Guiyang 550004, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China.
| |
Collapse
|
12
|
Kong Y, Zhao X, Liu R, Yang Z, Yin H, Zhao B, Wang J, Qin B, Yan A. Integrating concept of pharmacophore with graph neural networks for chemical property prediction and interpretation. J Cheminform 2022; 14:52. [PMID: 35927691 PMCID: PMC9351086 DOI: 10.1186/s13321-022-00634-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 07/16/2022] [Indexed: 11/10/2022] Open
Abstract
Recently, graph neural networks (GNNs) have revolutionized the field of chemical property prediction and achieved state-of-the-art results on benchmark data sets. Compared with the traditional descriptor- and fingerprint-based QSAR models, GNNs can learn task related representations, which completely gets rid of the rules defined by experts. However, due to the lack of useful prior knowledge, the prediction performance and interpretability of the GNNs may be affected. In this study, we introduced a new GNN model called RG-MPNN for chemical property prediction that integrated pharmacophore information hierarchically into message-passing neural network (MPNN) architecture, specifically, in the way of pharmacophore-based reduced-graph (RG) pooling. RG-MPNN absorbed not only the information of atoms and bonds from the atom-level message-passing phase, but also the information of pharmacophores from the RG-level message-passing phase. Our experimental results on eleven benchmark and ten kinase data sets showed that our model consistently matched or outperformed other existing GNN models. Furthermore, we demonstrated that applying pharmacophore-based RG pooling to MPNN architecture can generally help GNN models improve the predictive power. The cluster analysis of RG-MPNN representations and the importance analysis of pharmacophore nodes will help chemists gain insights for hit discovery and lead optimization.
Collapse
Affiliation(s)
- Yue Kong
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China.,Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Xiaoman Zhao
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China
| | - Ruizi Liu
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China
| | - Zhenwu Yang
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China
| | - Hongyan Yin
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China.,Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Bowen Zhao
- Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Jinling Wang
- Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Bingjie Qin
- Hyper-Dimension Insight Pharmaceuticals Ltd. Room 511, Block A, No. 2C, DongSanHuan North Road, Beijing, People's Republic of China
| | - Aixia Yan
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P. O. Box 53, Beijing, 100029, People's Republic of China.
| |
Collapse
|
13
|
DRDB: A Machine Learning Platform to Predict Chemical-Protein Interactions towards Diabetic Retinopathy. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2022; 2022:1718353. [PMID: 35910835 PMCID: PMC9329024 DOI: 10.1155/2022/1718353] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 06/17/2022] [Accepted: 06/22/2022] [Indexed: 11/17/2022]
Abstract
Diabetic retinopathy (DR), a diabetic microangiopathy caused by diabetes, affects approximately 93 million people, worldwide. However, the drugs used to treat DR have limited efficacy and the variety of side effects. This is possibly because the complicated pathogenesis of DR is associated with multiple proteins. In this work, we attempted to identify potential drugs against DR-associated proteins and predict potential targets for drugs using in silico prediction of chemical-protein interactions (CPI) based on multitarget quantitative structure-activity relationship (mt-QSAR) method. Therefore, we developed 128 binary classifiers to predict the CPI for 15 DR targets using random forest (RF), k-nearest neighbours (KNN), support vector machine (SVM), and neural network (NN) algorithms with MACCS, extended connectivity fingerprints (ECFP6) fingerprints, and protein descriptors. In order to facilitate discovery of the novel drugs and target identification using the 128 binary classifiers, a free web server (DRDB) was developed. Compound Danshen Dripping Pills (CDDP), composed of Salvia miltiorrhiza, Panax notoginseng, and borneol, is commonly used in the treatment of cardiovascular diseases. To explore the applicability of DRDB, the potential CPIs of CDDP in treatment of DR were investigated based on DRDB. In vitro experimental validation demonstrated that cryptotanshinone and protocatechuic acid, two key components of CDDP, are capable of targeting ICAM-1 which is one of the key target of DR. We hope that this work can facilitate development of more effective clinical strategies for the treatment of DR.
Collapse
|
14
|
Priya S, Tripathi G, Singh DB, Jain P, Kumar A. Machine learning approaches and their applications in drug discovery and design. Chem Biol Drug Des 2022; 100:136-153. [PMID: 35426249 DOI: 10.1111/cbdd.14057] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 03/30/2022] [Accepted: 04/10/2022] [Indexed: 01/04/2023]
Abstract
This review is focused on several machine learning approaches used in chemoinformatics. Machine learning approaches provide tools and algorithms to improve drug discovery. Many physicochemical properties of drugs like toxicity, absorption, drug-drug interaction, carcinogenesis, and distribution have been effectively modeled by QSAR techniques. Machine learning is a subset of artificial intelligence, and this technique has shown tremendous potential in the field of drug discovery. Techniques discussed in this review are capable of modeling non-linear datasets, as well as big data of increasing depth and complexity. Various machine learning-based approaches are being used for drug target prediction, modeling the structure of drug target, binding site prediction, ligand-based similarity searching, de novo designing of ligands with desired properties, developing scoring functions for molecular docking, building QSAR model for biological activity prediction, and prediction of pharmacokinetic and pharmacodynamic properties of ligands. In recent years, these predictive tools and models have achieved good accuracy. By the use of more related input data, relevant parameters, and appropriate algorithms, the accuracy of these predictions can be further improved.
Collapse
Affiliation(s)
- Sonal Priya
- Department of Chemistry, T. N. B. College, TMBU, Bhagalpur, India
| | - Garima Tripathi
- Department of Chemistry, T. N. B. College, TMBU, Bhagalpur, India
| | - Dev Bukhsh Singh
- Department of Biotechnology, Siddharth University, Siddharth Nagar, India
| | - Priyanka Jain
- National Institute of Plant Genome Research, New Delhi, India
| | - Abhijeet Kumar
- Department of Chemistry, Mahatma Gandhi Central University, Motihari, India
| |
Collapse
|
15
|
Brenner AR, Laoveeravat P, Carey PJ, Joiner D, Mardini SH, Jovani M. Artificial intelligence using advanced imaging techniques and cholangiocarcinoma: Recent advances and future direction. Artif Intell Gastroenterol 2022; 3:88-95. [DOI: 10.35712/aig.v3.i3.88] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 04/16/2022] [Accepted: 05/08/2022] [Indexed: 02/06/2023] Open
Abstract
While cholangiocarcinoma represents only about 3% of all gastrointestinal tumors, it has a dismal survival rate, usually because it is diagnosed at a late stage. The utilization of Artificial Intelligence (AI) in medicine in general, and in gastroenterology has made gigantic steps. However, the application of AI for biliary disease, in particular for cholangiocarcinoma, has been sub-optimal. The use of AI in combination with clinical data, cross-sectional imaging (computed tomography, magnetic resonance imaging) and endoscopy (endoscopic ultrasound and cholangioscopy) has the potential to significantly improve early diagnosis and the choice of optimal therapeutic options, leading to a transformation in the prognosis of this feared disease. In this review we summarize the current knowledge on the use of AI for the diagnosis and management of cholangiocarcinoma and point to future directions in the field.
Collapse
Affiliation(s)
- Aaron R Brenner
- Department of Internal Medicine, University of Kentucky College of Medicine, Lexington, KY 40536, United States
| | - Passisd Laoveeravat
- Division of Digestive Diseases and Nutrition, University of Kentucky College of Medicine, Lexington, KY 40536, United States
| | - Patrick J Carey
- Department of Internal Medicine, University of Kentucky College of Medicine, Lexington, KY 40536, United States
| | - Danielle Joiner
- Department of Internal Medicine, University of Kentucky College of Medicine, Lexington, KY 40536, United States
| | - Samuel H Mardini
- Division of Digestive Diseases and Nutrition, University of Kentucky College of Medicine, Lexington, KENTUCKY 40536, United States
| | - Manol Jovani
- Digestive Diseases and Nutrition, University of Kentucky Albert B. Chandler Hospital, Lexington, KY 40536, United States
| |
Collapse
|
16
|
Wu Z, Jiang D, Wang J, Zhang X, Du H, Pan L, Hsieh CY, Cao D, Hou T. Knowledge-based BERT: a method to extract molecular features such as computational chemists. Brief Bioinform 2022; 23:6570013. [PMID: 35438145 DOI: 10.1093/bib/bbac131] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/16/2022] [Accepted: 03/18/2022] [Indexed: 11/12/2022] Open
Abstract
Molecular property prediction models based on machine learning algorithms have become important tools to triage unpromising lead molecules in the early stages of drug discovery. Compared with the mainstream descriptor- and graph-based methods for molecular property predictions, SMILES-based methods can directly extract molecular features from SMILES without human expert knowledge, but they require more powerful algorithms for feature extraction and a larger amount of data for training, which makes SMILES-based methods less popular. Here, we show the great potential of pre-training in promoting the predictions of important pharmaceutical properties. By utilizing three pre-training tasks based on atom feature prediction, molecular feature prediction and contrastive learning, a new pre-training method K-BERT, which can extract chemical information from SMILES like chemists, was developed. The calculation results on 15 pharmaceutical datasets show that K-BERT outperforms well-established descriptor-based (XGBoost) and graph-based (Attentive FP and HRGCN+) models. In addition, we found that the contrastive learning pre-training task enables K-BERT to 'understand' SMILES not limited to canonical SMILES. Moreover, the general fingerprints K-BERT-FP generated by K-BERT exhibit comparative predictive power to MACCS on 15 pharmaceutical datasets and can also capture molecular size and chirality information that traditional binary fingerprints cannot capture. Our results illustrate the great potential of K-BERT in the practical applications of molecular property predictions in drug discovery.
Collapse
Affiliation(s)
- Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan 430072, Hubei, P. R. China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Lurong Pan
- Global Health Drug Discovery Institute, Beijing 100192, P. R. China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| |
Collapse
|
17
|
Sun X, Zhu J, Chen B, You H, Xu H. A feature transferring workflow between data-poor compounds in various tasks. PLoS One 2022; 17:e0266088. [PMID: 35353844 PMCID: PMC8967016 DOI: 10.1371/journal.pone.0266088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 03/14/2022] [Indexed: 12/03/2022] Open
Abstract
Compound screening by in silico approaches has advantages in identifying high-activity leading compounds and can predict the safety of the drug. A key challenge is that the number of observations of drug activity and toxicity accumulation varies by target in different datasets, some of which are more understudied than others. Owing to an overall insufficiency and imbalance of drug data, it is hard to accurately predict drug activity and toxicity of multiple tasks by the existing models. To solve this problem, this paper proposed a two-stage transfer learning workflow to develop a novel prediction model, which can accurately predict drug activity and toxicity of the targets with insufficient observations. We built a balanced dataset based on the Tox21 dataset and developed a drug activity and toxicity prediction model based on Siamese networks and graph convolution to produce multitasking output. We also took advantage of transfer learning from data-rich targets to data-poor targets. We showed greater accuracy in predicting the activity and toxicity of compounds to targets with rich data and poor data. In Tox21, a relatively rich dataset, the prediction model accuracy for classification tasks was 0.877 AUROC. In the other five unbalanced datasets, we also found that transfer learning strategies brought the accuracy of models to a higher level in understudied targets. Our models can overcome the imbalance in target data and predict the compound activity and toxicity of understudied targets to help prioritize upcoming biological experiments.
Collapse
Affiliation(s)
- Xiaofei Sun
- Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu, Sichuan, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jingyuan Zhu
- School of science, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Chen
- University of Chinese Academy of Sciences, Beijing, China
- IRIAI, Harbin Institute of Technology, Shenzhen, Guangdong, China
- * E-mail: (BC); (HY)
| | - Hengzhi You
- School of science, Harbin Institute of Technology, Shenzhen, Guangdong, China
- * E-mail: (BC); (HY)
| | - Huiqing Xu
- Guangdong Energy Group Science and Technology Research Institute Co., Ltd., Guangzhou, Guangdong, China
| |
Collapse
|
18
|
Dey V, Machiraju R, Ning X. Improving Compound Activity Classification via Deep Transfer and Representation Learning. ACS OMEGA 2022; 7:9465-9483. [PMID: 35350358 PMCID: PMC8945064 DOI: 10.1021/acsomega.1c06805] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 02/23/2022] [Indexed: 06/14/2023]
Abstract
Recent advances in molecular machine learning, especially deep neural networks such as graph neural networks (GNNs), for predicting structure-activity relationships (SAR) have shown tremendous potential in computer-aided drug discovery. However, the applicability of such deep neural networks is limited by the requirement of large amounts of training data. In order to cope with limited training data for a target task, transfer learning for SAR modeling has been recently adopted to leverage information from data of related tasks. In this work, in contrast to the popular parameter-based transfer learning such as pretraining, we develop novel deep transfer learning methods TAc and TAc-fc to leverage source domain data and transfer useful information to the target domain. TAc learns to generate effective molecular features that can generalize well from one domain to another and increase the classification performance in the target domain. Additionally, TAc-fc extends TAc by incorporating novel components to selectively learn feature-wise and compound-wise transferability. We used the bioassay screening data from PubChem and identified 120 pairs of bioassays such that the active compounds in each pair are more similar to each other compared to their inactive compounds. Overall, TAc achieves the best performance with an average ROC-AUC of 0.801; it significantly improves the ROC-AUC of 83% of target tasks with an average task-wise performance improvement of 7.102%, compared to the best baseline dmpna. Our experiments clearly demonstrate that TAc achieves significant improvement over all baselines across a large number of target tasks. Furthermore, although TAc-fc achieves slightly worse ROC-AUC on average compared to TAc (0.798 vs 0.801), TAc-fc still achieves the best performance on more tasks in terms of PR-AUC and F1 compared to other methods. In summary, TAc-fc is also found to be a strong model with competitive or even better performance than TAc on a notable number of target tasks.
Collapse
Affiliation(s)
- Vishal Dey
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
| | - Raghu Machiraju
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
- Biomedical
Informatics, The Ohio State University, Columbus, Ohio 43210, United States
- Translational
Data Analytics Institute, The Ohio State
University, Columbus, Ohio 43210, United
States
| | - Xia Ning
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
- Biomedical
Informatics, The Ohio State University, Columbus, Ohio 43210, United States
- Translational
Data Analytics Institute, The Ohio State
University, Columbus, Ohio 43210, United
States
| |
Collapse
|
19
|
Nikolaienko T, Gurbych O, Druchok M. Complex machine learning model needs complex testing: Examining predictability of molecular binding affinity by a graph neural network. J Comput Chem 2022; 43:728-739. [PMID: 35201629 DOI: 10.1002/jcc.26831] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 01/04/2022] [Accepted: 02/09/2022] [Indexed: 12/12/2022]
Abstract
Drug discovery pipelines typically involve high-throughput screening of large amounts of compounds in a search of potential drugs candidates. As a chemical space of small organic molecules is huge, a "navigation" over it urges for fast and lightweight computational methods, thus promoting machine-learning approaches for processing huge pools of candidates. In this contribution, we present a graph-based deep neural network for prediction of protein-drug binding affinity and assess its predictive power under thorough testing conditions. Within the suggested approach, both protein and drug molecules are represented as graphs and passed to separate graph sub-networks, then concatenated and regressed towards a binding affinity. The neural network is trained on two binding affinity datasets-PDBbind and data imported from RCSB Protein Data Bank. In order to explore the generalization capabilities of the model we go beyond traditional random or leave-cluster-out techniques and demonstrate the need for more elaborate model performance assessment - six different strategies for test/train data partitioning (random, time- and property-arranged, protein- and ligand-clustered) with a k-fold cross-validation are engaged. Finally, we discuss the model performance in terms of a set of metrics for different split strategies and fold arrangement. Our code is available at https://github.com/SoftServeInc/affinity-by-GNN.
Collapse
Affiliation(s)
- Tymofii Nikolaienko
- SoftServe, Inc., Lviv, Ukraine.,Faculty of Physics, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Oleksandr Gurbych
- Blackthorn AI Ltd., London, UK.,Department of Artificial Intelligence Systems, Lviv Polytechnic National University, Lviv, Ukraine
| | - Maksym Druchok
- SoftServe, Inc., Lviv, Ukraine.,Institute for Condensed Matter Physics, NAS of Ukraine, Lviv, Ukraine
| |
Collapse
|
20
|
He S, Zhao D, Ling Y, Cai H, Cai Y, Zhang J, Wang L. Machine Learning Enables Accurate and Rapid Prediction of Active Molecules Against Breast Cancer Cells. Front Pharmacol 2022; 12:796534. [PMID: 34975493 PMCID: PMC8719637 DOI: 10.3389/fphar.2021.796534] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Accepted: 12/02/2021] [Indexed: 12/22/2022] Open
Abstract
Breast cancer (BC) has surpassed lung cancer as the most frequently occurring cancer, and it is the leading cause of cancer-related death in women. Therefore, there is an urgent need to discover or design new drug candidates for BC treatment. In this study, we first collected a series of structurally diverse datasets consisting of 33,757 active and 21,152 inactive compounds for 13 breast cancer cell lines and one normal breast cell line commonly used in in vitro antiproliferative assays. Predictive models were then developed using five conventional machine learning algorithms, including naïve Bayesian, support vector machine, k-Nearest Neighbors, random forest, and extreme gradient boosting, as well as five deep learning algorithms, including deep neural networks, graph convolutional networks, graph attention network, message passing neural networks, and Attentive FP. A total of 476 single models and 112 fusion models were constructed based on three types of molecular representations including molecular descriptors, fingerprints, and graphs. The evaluation results demonstrate that the best model for each BC cell subtype can achieve high predictive accuracy for the test sets with AUC values of 0.689–0.993. Moreover, important structural fragments related to BC cell inhibition were identified and interpreted. To facilitate the use of the model, an online webserver called ChemBC (http://chembc.idruglab.cn/) and its local version software (https://github.com/idruglab/ChemBC) were developed to predict whether compounds have potential inhibitory activity against BC cells.
Collapse
Affiliation(s)
- Shuyun He
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.,Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Duancheng Zhao
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.,Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Yanle Ling
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.,Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Hanxuan Cai
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.,Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Yike Cai
- Center for Certification and Evaluation, Guangdong Drug Administration, Guangzhou, China
| | - Jiquan Zhang
- State Key Laboratory of Functions and Applications of Medicinal Plants, College of Pharmacy, Guizhou Provincial Engineering Technology Research Center for Chemical Drug R&D, Guizhou Medical University, Guiyang, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.,Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
21
|
Abstract
This review provides the feasible literature on drug discovery through ML tools and techniques that are enforced in every phase of drug development to accelerate the research process and deduce the risk and expenditure in clinical trials. Machine learning techniques improve the decision-making in pharmaceutical data across various applications like QSAR analysis, hit discoveries, de novo drug architectures to retrieve accurate outcomes. Target validation, prognostic biomarkers, digital pathology are considered under problem statements in this review. ML challenges must be applicable for the main cause of inadequacy in interpretability outcomes that may restrict the applications in drug discovery. In clinical trials, absolute and methodological data must be generated to tackle many puzzles in validating ML techniques, improving decision-making, promoting awareness in ML approaches, and deducing risk failures in drug discovery.
Collapse
Affiliation(s)
- Suresh Dara
- Department of Computer Science and Engineering, B V Raju Institute of Technology, Narsapur, Medak, 502313 Telangana India
| | - Swetha Dhamercherla
- Department of Computer Science and Engineering, B V Raju Institute of Technology, Narsapur, Medak, 502313 Telangana India
| | - Surender Singh Jadav
- Centre for Molecular Cancer Research (CMCR) and Vishnu Institute of Pharmaceutical Education and Research (VIPER), Narsapur, Medak, 502313 Telangana India
| | - CH Madhu Babu
- Department of Computer Science and Engineering, B V Raju Institute of Technology, Narsapur, Medak, 502313 Telangana India
| | - Mohamed Jawed Ahsan
- Department of Pharmaceutical Chemistry, Maharishi Arvind College of Pharmacy, Jaipur, 302023 Rajasthan India
| |
Collapse
|
22
|
Wu Z, Jiang D, Hsieh CY, Chen G, Liao B, Cao D, Hou T. Hyperbolic relational graph convolution networks plus: a simple but highly efficient QSAR-modeling method. Brief Bioinform 2021; 22:6235968. [PMID: 33866354 DOI: 10.1093/bib/bbab112] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 03/11/2021] [Accepted: 03/12/2021] [Indexed: 01/04/2023] Open
Abstract
Accurate predictions of druggability and bioactivities of compounds are desirable to reduce the high cost and time of drug discovery. After more than five decades of continuing developments, quantitative structure-activity relationship (QSAR) methods have been established as indispensable tools that facilitate fast, reliable and affordable assessments of physicochemical and biological properties of compounds in drug-discovery programs. Currently, there are mainly two types of QSAR methods, descriptor-based methods and graph-based methods. The former is developed based on predefined molecular descriptors, whereas the latter is developed based on simple atomic and bond information. In this study, we presented a simple but highly efficient modeling method by combining molecular graphs and molecular descriptors as the input of a modified graph neural network, called hyperbolic relational graph convolution network plus (HRGCN+). The evaluation results show that HRGCN+ achieves state-of-the-art performance on 11 drug-discovery-related datasets. We also explored the impact of the addition of traditional molecular descriptors on the predictions of graph-based methods, and found that the addition of molecular descriptors can indeed boost the predictive power of graph-based methods. The results also highlight the strong anti-noise capability of our method. In addition, our method provides a way to interpret models at both the atom and descriptor levels, which can help medicinal chemists extract hidden information from complex datasets. We also offer an HRGCN+'s online prediction service at https://quantum.tencent.com/hrgcn/.
Collapse
Affiliation(s)
- Zhenxing Wu
- College of Pharmaceutical Sciences, Zhejiang University, under the supervision of Prof. Tingjun Hou
| | - Dejun Jiang
- College of Pharmaceutical Sciences, Zhejiang University, under the supervision of Prof. Tingjun Hou
| | | | - Guangyong Chen
- Shenzhen Institute of Advanced Technology Chinese Academy of Sciences
| | - Ben Liao
- demonstrated history of working in industry and academia. Skilled in machine learning, mathematics, natural language processing, computer vision and graph neural networks. Strong education professional with a PhD from Université de Paris in France
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University
| |
Collapse
|
23
|
Protti ÍF, Rodrigues DR, Fonseca SK, Alves RJ, de Oliveira RB, Maltarollo VG. Do Drug-likeness Rules Apply to Oral Prodrugs? ChemMedChem 2021; 16:1446-1456. [PMID: 33471444 DOI: 10.1002/cmdc.202000805] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 01/12/2021] [Indexed: 12/21/2022]
Abstract
This paper describes a comparative analysis of the physicochemical and structural properties of prodrugs and their corresponding drugs with regard to drug-likeness rules. The dataset used in this work was obtained from the DrugBank. Sixty-five pairs of prodrugs/drugs were retrieved and divided into the following categories: carrier-linked to increase hydrophilic character, carrier-linked to increase absorption, and bioprecursors. We compared the physicochemical properties related to drug-likeness between prodrugs and drugs. Our results show that prodrugs do not always follow Lipinski's Rule of 5, especially as we observed 15 prodrugs with more than 10 hydrogen bond acceptors and 18 with a molecular weight greater than 500 Da. This fact highlights the importance of extending Lipinski's rules to encompass other parameters as both strategies (filtering of drug-like chemical libraries and prodrug design) aim to improve the bioavailability of compounds. Therefore, critical reasoning is fundamental to determine whether a structure has drug-like properties or could be considered a potential orally active compound in the drug-design pipeline.
Collapse
Affiliation(s)
- Ícaro F Protti
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627 Pampulha, Belo Horizonte, MG, BR 31270-901, Brazil
| | - Daniel R Rodrigues
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627 Pampulha, Belo Horizonte, MG, BR 31270-901, Brazil
| | - Sofia K Fonseca
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627 Pampulha, Belo Horizonte, MG, BR 31270-901, Brazil
| | - Ricardo J Alves
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627 Pampulha, Belo Horizonte, MG, BR 31270-901, Brazil
| | - Renata B de Oliveira
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627 Pampulha, Belo Horizonte, MG, BR 31270-901, Brazil
| | - Vinícius G Maltarollo
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627 Pampulha, Belo Horizonte, MG, BR 31270-901, Brazil
| |
Collapse
|
24
|
Jiang D, Wu Z, Hsieh CY, Chen G, Liao B, Wang Z, Shen C, Cao D, Wu J, Hou T. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J Cheminform 2021; 13:12. [PMID: 33597034 PMCID: PMC7888189 DOI: 10.1186/s13321-020-00479-8] [Citation(s) in RCA: 186] [Impact Index Per Article: 62.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 11/26/2020] [Indexed: 12/31/2022] Open
Abstract
Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.![]()
Collapse
Affiliation(s)
- Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.,State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058, Zhejiang, China.,College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory Tencent, Shenzhen, 518057, Guangdong, China
| | - Guangyong Chen
- Shenzhen Institutes of Advanced Technology, Shenzhen, 518055, Guangdong, China
| | - Ben Liao
- Tencent Quantum Laboratory Tencent, Shenzhen, 518057, Guangdong, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004, Hunan, China.
| | - Jian Wu
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China. .,State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
25
|
Berrhail F, Belhadef H. Genetic Algorithm-based Feature Selection Approach for Enhancing the Effectiveness of Similarity Searching in Ligand-based Virtual Screening. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191119123935] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Background:
In the last years, similarity searching has gained wide popularity as a
method for performing Ligand-Based Virtual Screening (LBVS). This screening technique
functions by making a comparison of the target compound’s features with that of each compound
in the database of compounds. It is well known that none of the individual similarity measures
could provide the best performances each time pertaining to an active compound structure,
representing all types of activity classes. In the literature, we find several techniques and strategies
that have been proposed to improve the overall effectiveness of ligand-based virtual screening
approaches.
Objective:
In this work, our main objective is to propose a features selection approach based on
genetic algorithm (FSGASS) to improve similarity searching pertaining to ligand-based virtual
screening.
Methods:
Our contribution allows us to identify the most important and relevant characteristics of
chemical compounds and to minimize their number in their representations. This will allow the
reduction of features space, the elimination of redundancy, the reduction of training execution
time, and the increase of the performance of the screening process.
Results:
The obtained results demonstrate superiority in the performance compared with these
obtained with Tanimoto coefficient, which is considered as the most widely coefficient to quantify
the similarity in the domain of LBVS.
Conclusion:
Our results show that significant improvements can be obtained by using molecular
similarity research methods at the basis of features selection.
Collapse
Affiliation(s)
- Fouaz Berrhail
- NTIC Faculty, University of Constantine 2 Abdelhamid Mehri, Constantine, Algeria
| | - Hacene Belhadef
- NTIC Faculty, University of Constantine 2 Abdelhamid Mehri, Constantine, Algeria
| |
Collapse
|
26
|
Hussain W, Rasool N, Khan YD. Insights into Machine Learning-based Approaches for Virtual Screening in Drug Discovery: Existing Strategies and Streamlining Through FP-CADD. Curr Drug Discov Technol 2020; 18:463-472. [PMID: 32767944 DOI: 10.2174/1570163817666200806165934] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 07/01/2020] [Accepted: 07/03/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Machine learning is an active area of research in computer science by the availability of big data collection of all sorts prompting interest in the development of novel tools for data mining. Machine learning methods have wide applications in computer-aided drug discovery methods. Most incredible approaches to machine learning are used in drug designing, which further aid the process of biological modelling in drug discovery. Mainly, two main categories are present which are Ligand-Based Virtual Screening (LBVS) and Structure-Based Virtual Screening (SBVS), however, the machine learning approaches fall mostly in the category of LBVS. OBJECTIVES This study exposits the major machine learning approaches being used in LBVS. Moreover, we have introduced a protocol named FP-CADD which depicts a 4-steps rule of thumb for drug discovery, the four protocols of computer-aided drug discovery (FP-CADD). Various important aspects along with SWOT analysis of FP-CADD are also discussed in this article. CONCLUSION By this thorough study, we have observed that in LBVS algorithms, Support Vector Machines (SVM) and Random Forest (RF) are those which are widely used due to high accuracy and efficiency. These virtual screening approaches have the potential to revolutionize the drug designing field. Also, we believe that the process flow presented in this study, named FP-CADD, can streamline the whole process of computer-aided drug discovery. By adopting this rule, the studies related to drug discovery can be made homogeneous and this protocol can also be considered as an evaluation criterion in the peer-review process of research articles.
Collapse
Affiliation(s)
| | | | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
27
|
Tonelli Nogueira MDO, Almeida JSFDD, Franca TCC, Figueroa-Villar JD. Synthesis and docking studies of three new diaminochromenes as potential leads for anticancer drugs. J Biomol Struct Dyn 2020; 39:5005-5013. [PMID: 32597332 DOI: 10.1080/07391102.2020.1784284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
In this work, the new diaminochromenes: 2,5-dimono-8-methoxychromeno[4,3,2-de][1,6]naphthyridine-4-carbonitrile (4), 8-ethoxy-2-imino-3,4-dihydro-2H-chromene-3-carbonitrile-4-malononitrile (5), 2,5-diamino-8-ethoxychromene[4,3,2-de][1,6]naphthyridine-4-carbonotrile (6), were synthesized and fully characterized through 600 MHz using 1H, 13C, APT, gHSQC, gHMBC, ROESY-1D and gated decoupling 13C. Further docking studies suggested that these compounds are capable of intercalating with the Drew-Dickerson Dodecamer DNA and, therefore, be candidates to work as effective compounds to decrease the cancer radiotherapy.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
| | | | - Tanos Celmar Costa Franca
- Laboratory of Molecular Modeling Applied to the Chemical and Biological Defense (LMCBD), Military Institute of Engineering, Rio de Janeiro, RJ, Brazil.,Department of Chemistry, Faculty of Science, University of Hradec Kralove, Hradec Kralove, Czech Republic
| | - José Daniel Figueroa-Villar
- Medicinal Chemistry Group, Department of Chemistry, Military Institute of Engineering, Rio de Janeiro, RJ, Brazil
| |
Collapse
|
28
|
Korkmaz S. Deep Learning-Based Imbalanced Data Classification for Drug Discovery. J Chem Inf Model 2020; 60:4180-4190. [DOI: 10.1021/acs.jcim.9b01162] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Selçuk Korkmaz
- Trakya University Faculty of Medicine, Department of Biostatistics and Medical Informatics, Edirne, Turkey
| |
Collapse
|
29
|
Hassanzadeh P, Atyabi F, Dinarvand R. The significance of artificial intelligence in drug delivery system design. Adv Drug Deliv Rev 2019; 151-152:169-190. [PMID: 31071378 DOI: 10.1016/j.addr.2019.05.001] [Citation(s) in RCA: 93] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 04/14/2019] [Accepted: 05/02/2019] [Indexed: 02/07/2023]
Abstract
Over the last decade, increasing interest has been attracted towards the application of artificial intelligence (AI) technology for analyzing and interpreting the biological or genetic information, accelerated drug discovery, and identification of the selective small-molecule modulators or rare molecules and prediction of their behavior. Application of the automated workflows and databases for rapid analysis of the huge amounts of data and artificial neural networks (ANNs) for development of the novel hypotheses and treatment strategies, prediction of disease progression, and evaluation of the pharmacological profiles of drug candidates may significantly improve treatment outcomes. Target fishing (TF) by rapid prediction or identification of the biological targets might be of great help for linking targets to the novel compounds. AI and TF methods in association with human expertise may indeed revolutionize the current theranostic strategies, meanwhile, validation approaches are necessary to overcome the potential challenges and ensure higher accuracy. In this review, the significance of AI and TF in the development of drugs and delivery systems and the potential challenging issues have been highlighted.
Collapse
Affiliation(s)
- Parichehr Hassanzadeh
- Nanotechnology Research Center, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran 13169-43551, Iran.
| | - Fatemeh Atyabi
- Nanotechnology Research Center, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran 13169-43551, Iran.
| | - Rassoul Dinarvand
- Nanotechnology Research Center, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran 13169-43551, Iran.
| |
Collapse
|
30
|
Batool M, Ahmad B, Choi S. A Structure-Based Drug Discovery Paradigm. Int J Mol Sci 2019; 20:ijms20112783. [PMID: 31174387 PMCID: PMC6601033 DOI: 10.3390/ijms20112783] [Citation(s) in RCA: 277] [Impact Index Per Article: 55.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 05/31/2019] [Accepted: 06/04/2019] [Indexed: 12/14/2022] Open
Abstract
Structure-based drug design is becoming an essential tool for faster and more cost-efficient lead discovery relative to the traditional method. Genomic, proteomic, and structural studies have provided hundreds of new targets and opportunities for future drug discovery. This situation poses a major problem: the necessity to handle the “big data” generated by combinatorial chemistry. Artificial intelligence (AI) and deep learning play a pivotal role in the analysis and systemization of larger data sets by statistical machine learning methods. Advanced AI-based sophisticated machine learning tools have a significant impact on the drug discovery process including medicinal chemistry. In this review, we focus on the currently available methods and algorithms for structure-based drug design including virtual screening and de novo drug design, with a special emphasis on AI- and deep-learning-based methods used for drug discovery.
Collapse
Affiliation(s)
- Maria Batool
- Department of Molecular Science and Technology, Ajou University, Suwon 16499, Korea.
| | - Bilal Ahmad
- Department of Molecular Science and Technology, Ajou University, Suwon 16499, Korea.
| | - Sangdun Choi
- Department of Molecular Science and Technology, Ajou University, Suwon 16499, Korea.
| |
Collapse
|
31
|
Sachdev K, Gupta MK. A comprehensive review of feature based methods for drug target interaction prediction. J Biomed Inform 2019; 93:103159. [PMID: 30926470 DOI: 10.1016/j.jbi.2019.103159] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 03/25/2019] [Accepted: 03/26/2019] [Indexed: 12/22/2022]
Abstract
Drug target interaction is a prominent research area in the field of drug discovery. It refers to the recognition of interactions between chemical compounds and the protein targets in the human body. Wet lab experiments to identify these interactions are expensive as well as time consuming. The computational methods of interaction prediction help limit the search space for these experiments. These computational methods can be divided into ligand based approaches, docking approaches and chemogenomic approaches. In this review, we aim to describe the various feature based chemogenomic methods for drug target interaction prediction. It provides a comprehensive overview of the various techniques, datasets, tools and metrics. The feature based methods have been categorized, explained and compared. A novel framework for drug target interaction prediction has also been proposed that aims to improve the performance of existing methods. To the best of our knowledge, this is the first comprehensive review focusing only on feature based methods of drug target interaction.
Collapse
Affiliation(s)
- Kanica Sachdev
- Computer Science and Engineering Department, SMVDU, J&K, India.
| | | |
Collapse
|
32
|
Guan L, Yang H, Cai Y, Sun L, Di P, Li W, Liu G, Tang Y. ADMET-score - a comprehensive scoring function for evaluation of chemical drug-likeness. MEDCHEMCOMM 2019; 10:148-157. [PMID: 30774861 PMCID: PMC6350845 DOI: 10.1039/c8md00472b] [Citation(s) in RCA: 227] [Impact Index Per Article: 45.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 11/29/2018] [Indexed: 01/04/2023]
Abstract
Chemical absorption, distribution, metabolism, excretion, and toxicity (ADMET), play key roles in drug discovery and development. A high-quality drug candidate should not only have sufficient efficacy against the therapeutic target, but also show appropriate ADMET properties at a therapeutic dose. A lot of in silico models are hence developed for prediction of chemical ADMET properties. However, it is still not easy to evaluate the drug-likeness of compounds in terms of so many ADMET properties. In this study, we proposed a scoring function named the ADMET-score to evaluate drug-likeness of a compound. The scoring function was defined on the basis of 18 ADMET properties predicted via our web server admetSAR. The weight of each property in the ADMET-score was determined by three parameters: the accuracy rate of the model, the importance of the endpoint in the process of pharmacokinetics, and the usefulness index. The FDA-approved drugs from DrugBank, the small molecules from ChEMBL and the old drugs withdrawn from the market due to safety concerns were used to evaluate the performance of the ADMET-score. The indices of the arithmetic mean and p-value showed that the ADMET-score among the three data sets differed significantly. Furthermore, we learned that there was no obvious linear correlation between the ADMET-score and QED (quantitative estimate of drug-likeness). These results suggested that the ADMET-score would be a comprehensive index to evaluate chemical drug-likeness, and might be helpful for users to select appropriate drug candidates for further development.
Collapse
Affiliation(s)
- Longfei Guan
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China .
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China .
| | - Yingchun Cai
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China .
| | - Lixia Sun
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China .
| | - Peiwen Di
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China .
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China .
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China .
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China .
| |
Collapse
|
33
|
Taylor DL, Gough A, Schurdak ME, Vernetti L, Chennubhotla CS, Lefever D, Pei F, Faeder JR, Lezon TR, Stern AM, Bahar I. Harnessing Human Microphysiology Systems as Key Experimental Models for Quantitative Systems Pharmacology. Handb Exp Pharmacol 2019; 260:327-367. [PMID: 31201557 PMCID: PMC6911651 DOI: 10.1007/164_2019_239] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Two technologies that have emerged in the last decade offer a new paradigm for modern pharmacology, as well as drug discovery and development. Quantitative systems pharmacology (QSP) is a complementary approach to traditional, target-centric pharmacology and drug discovery and is based on an iterative application of computational and systems biology methods with multiscale experimental methods, both of which include models of ADME-Tox and disease. QSP has emerged as a new approach due to the low efficiency of success in developing therapeutics based on the existing target-centric paradigm. Likewise, human microphysiology systems (MPS) are experimental models complementary to existing animal models and are based on the use of human primary cells, adult stem cells, and/or induced pluripotent stem cells (iPSCs) to mimic human tissues and organ functions/structures involved in disease and ADME-Tox. Human MPS experimental models have been developed to address the relatively low concordance of human disease and ADME-Tox with engineered, experimental animal models of disease. The integration of the QSP paradigm with the use of human MPS has the potential to enhance the process of drug discovery and development.
Collapse
Affiliation(s)
- D Lansing Taylor
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA.
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Albert Gough
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Mark E Schurdak
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Lawrence Vernetti
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Chakra S Chennubhotla
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Daniel Lefever
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
| | - Fen Pei
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - James R Faeder
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Timothy R Lezon
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Andrew M Stern
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Ivet Bahar
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
34
|
Ferreira Neto DC, Alencar Lima J, Sobreiro Francisco Diz de Almeida J, Costa França TC, Jorge do Nascimento C, Figueroa Villar JD. New semicarbazones as gorge-spanning ligands of acetylcholinesterase and potential new drugs against Alzheimer's disease: Synthesis, molecular modeling, NMR, and biological evaluation. J Biomol Struct Dyn 2017; 36:4099-4113. [PMID: 29198175 DOI: 10.1080/07391102.2017.1407676] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Two new compounds (E)-2-(5,7-dibromo-3,3-dimethyl-3,4-dihydroacridin-1(2H)-ylidene)hydrazinecarbothiomide (3) and (E)-2-(5,7-dibromo-3,3-dimethyl-3,4-dhihydroacridin-1(2H)-ylidene)hydrazinecarboxamide (4) were synthesized and evaluated for their anticholinesterase activities. In vitro tests performed by NMR and Ellman's tests, pointed to a mixed kinetic mechanism for the inhibition of acetylcholinesterase (AChE). This result was corroborated through further docking and molecular dynamics studies, suggesting that the new compounds can work as gorge-spanning ligands by interacting with two different binding sites inside AChE. Also, in silico toxicity evaluation suggested that these new compounds can be less toxic than tacrine.
Collapse
Affiliation(s)
- Denise Cristian Ferreira Neto
- a Medicinal Chemistry Group , Military Institute of Engineering , Praia Vermelha, Rio de Janeiro 22290-270 , Brazil.,b Department of Chemistry , Federal University of Roraima , Boa Vista, Roraima 69310-000 , Brazil
| | - Josélia Alencar Lima
- c Laboratory of Molecular Modeling Applied to Chemical and Biological Defense (LMCBD) , Military Institute of Engineering , Praia Vermelha, Rio de Janeiro 22290-270 , Brazil
| | - Joyce Sobreiro Francisco Diz de Almeida
- c Laboratory of Molecular Modeling Applied to Chemical and Biological Defense (LMCBD) , Military Institute of Engineering , Praia Vermelha, Rio de Janeiro 22290-270 , Brazil
| | - Tanos Celmar Costa França
- c Laboratory of Molecular Modeling Applied to Chemical and Biological Defense (LMCBD) , Military Institute of Engineering , Praia Vermelha, Rio de Janeiro 22290-270 , Brazil.,d Center for Basic and Applied Research, Faculty of Informatics and Management , University of Hradec Kralove , Hradec Kralove , Czech Republic
| | - Claudia Jorge do Nascimento
- e Institute of Biosciences , Federal University of the State of Rio de Janeiro , Urca, Rio de Janeiro 22290-240 , Brazil
| | - José Daniel Figueroa Villar
- a Medicinal Chemistry Group , Military Institute of Engineering , Praia Vermelha, Rio de Janeiro 22290-270 , Brazil
| |
Collapse
|
35
|
Varela JN, Lammoglia Cobo MF, Pawar SV, Yadav VG. Cheminformatic Analysis of Antimalarial Chemical Space Illuminates Therapeutic Mechanisms and Offers Strategies for Therapy Development. J Chem Inf Model 2017; 57:2119-2131. [PMID: 28810125 DOI: 10.1021/acs.jcim.7b00072] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The clear and present danger of malaria, which has been amplified in recent years by climate change, and the progressive thinning of our drug arsenal over the past two decades raise uncomfortable questions about the current state and future of antimalarial drug development. Besides suffering from many of the same technical challenges that affect drug development in other disease areas, the quest for new antimalarial therapies is also hindered by the complex, dynamic life cycle of the malaria parasite, P. falciparum, in its mosquito and human hosts, and its role thereof in the elicitation of drug resistance. New strategies are needed in order to ensure economical and expeditious development of new, more efficacious treatments. In the present study, we employ open-source cheminformatics tools to analyze the chemical space traversed by approved antimalarial drugs and promising candidates at various stages of development to uncover insights that could shape future endeavors in the field. Our scaffold-centric analysis reveals that the antimalarial chemical space is disjointed and segregated into a few dominant structural groups. In fact, the structures of antimalarial drugs and drug candidates are distributed according to Pareto's principle. This structural convergence can potentially be exploited for future drug discovery by incorporating it into bioinformatics workflows that are typically employed for solving problems in structural biology. Significantly, we demonstrate how molecular scaffold hunting can be applied to unearth putative mechanisms of action of drugs whose activities remain a mystery, and how scaffold-centric analysis of drug space can also provide a recipe for combination therapies that minimize the likelihood of emergence of drug resistance, as well as identify areas on which to focus efforts. Finally, we also observe that over half of the molecules in the antimalarial space bear no resemblance to other molecules in the collection, which suggests that the pharmacobiology of antimalarial drugs has not been entirely surveyed.
Collapse
Affiliation(s)
- Julia Nogueira Varela
- Department of Chemical & Biological Engineering, The University of British Columbia , Vancouver, BC, Canada , V6T 1Z3
| | - María Fernanda Lammoglia Cobo
- Department of Chemical & Biological Engineering, The University of British Columbia , Vancouver, BC, Canada , V6T 1Z3.,Life Sciences Department, Monterrey Institute of Technology and Higher Education , Mexico City Campus, Mexico City, Mexico , 14380
| | - Sandip V Pawar
- Department of Chemical & Biological Engineering, The University of British Columbia , Vancouver, BC, Canada , V6T 1Z3
| | - Vikramaditya G Yadav
- Department of Chemical & Biological Engineering, The University of British Columbia , Vancouver, BC, Canada , V6T 1Z3.,Neglected Global Diseases Initiative, The University of British Columbia , Vancouver, BC, Canada , V6T 1Z3
| |
Collapse
|
36
|
Yan Y, Wang W, Sun Z, Zhang JZH, Ji C. Protein-Ligand Empirical Interaction Components for Virtual Screening. J Chem Inf Model 2017; 57:1793-1806. [PMID: 28678484 DOI: 10.1021/acs.jcim.7b00017] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
A major shortcoming of empirical scoring functions is that they often fail to predict binding affinity properly. Removing false positives of docking results is one of the most challenging works in structure-based virtual screening. Postdocking filters, making use of all kinds of experimental structure and activity information, may help in solving the issue. We describe a new method based on detailed protein-ligand interaction decomposition and machine learning. Protein-ligand empirical interaction components (PLEIC) are used as descriptors for support vector machine learning to develop a classification model (PLEIC-SVM) to discriminate false positives from true positives. Experimentally derived activity information is used for model training. An extensive benchmark study on 36 diverse data sets from the DUD-E database has been performed to evaluate the performance of the new method. The results show that the new method performs much better than standard empirical scoring functions in structure-based virtual screening. The trained PLEIC-SVM model is able to capture important interaction patterns between ligand and protein residues for one specific target, which is helpful in discarding false positives in postdocking filtering.
Collapse
Affiliation(s)
- Yuna Yan
- Shanghai Engineering Research Center for Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University , Shanghai 200062, China.,State Key Laboratory of Precision Spectroscopy, East China Normal University , Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai , Shanghai 200062, China
| | - Weijun Wang
- Shanghai Engineering Research Center for Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University , Shanghai 200062, China.,State Key Laboratory of Precision Spectroscopy, East China Normal University , Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai , Shanghai 200062, China
| | - Zhaoxi Sun
- Shanghai Engineering Research Center for Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University , Shanghai 200062, China.,State Key Laboratory of Precision Spectroscopy, East China Normal University , Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai , Shanghai 200062, China
| | - John Z H Zhang
- Shanghai Engineering Research Center for Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University , Shanghai 200062, China.,State Key Laboratory of Precision Spectroscopy, East China Normal University , Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai , Shanghai 200062, China
| | - Changge Ji
- Shanghai Engineering Research Center for Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University , Shanghai 200062, China.,State Key Laboratory of Precision Spectroscopy, East China Normal University , Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai , Shanghai 200062, China
| |
Collapse
|
37
|
Onay A, Onay M, Abul O. Classification of nervous system withdrawn and approved drugs with ToxPrint features via machine learning strategies. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017; 142:9-19. [PMID: 28325450 DOI: 10.1016/j.cmpb.2017.02.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Revised: 01/20/2017] [Accepted: 02/08/2017] [Indexed: 06/06/2023]
Abstract
BACKGROUND AND OBJECTIVES Early-phase virtual screening of candidate drug molecules plays a key role in pharmaceutical industry from data mining and machine learning to prevent adverse effects of the drugs. Computational classification methods can distinguish approved drugs from withdrawn ones. We focused on 6 data sets including maximum 110 approved and 110 withdrawn drugs for all and nervous system diseases to distinguish approved drugs from withdrawn ones. METHODS In this study, we used support vector machines (SVMs) and ensemble methods (EMs) such as boosted and bagged trees to classify drugs into approved and withdrawn categories. Also, we used CORINA Symphony program to identify Toxprint chemotypes including over 700 predefined chemotypes for determination of risk and safety assesment of candidate drug molecules. In addition, we studied nervous system withdrawn drugs to determine the key fragments with The ParMol package including gSpan algorithm. RESULTS According to our results, the descriptors named as the number of total chemotypes and bond CN_amine_aliphatic_generic were more significant descriptors. The developed Medium Gaussian SVM model reached 78% prediction accuracy on test set for drug data set including all disease. Here, bagged tree and linear SVM models showed 89% of accuracies for phycholeptics and psychoanaleptics drugs. A set of discriminative fragments in nervous system withdrawn drug (NSWD) data sets was obtained. These fragments responsible for the drugs removed from market were benzene, toluene, N,N-dimethylethylamine, crotylamine, 5-methyl-2,4-heptadiene, octatriene and carbonyl group. CONCLUSION This paper covers the development of computational classification methods to distinguish approved drugs from withdrawn ones. In addition, the results of this study indicated the identification of discriminative fragments is of significance to design a new nervous system approved drugs with interpretation of the structures of the NSWDs.
Collapse
Affiliation(s)
- Aytun Onay
- Department of Computer Engineering, TOBB University of Economics & Technology, 06560, Ankara, Turkey
| | - Melih Onay
- Department of Environmental Engineering, Computational & Experimental Biochemistry Lab, Yuzuncu Yil University, 65080, Van, Turkey.
| | - Osman Abul
- Department of Computer Engineering, TOBB University of Economics & Technology, 06560, Ankara, Turkey
| |
Collapse
|
38
|
Tong L, Guo L, Lv X, Li Y. Modification of polychlorinated phenols and evaluation of their toxicity, biodegradation and bioconcentration using three-dimensional quantitative structure–activity relationship models. J Mol Graph Model 2017; 71:1-12. [DOI: 10.1016/j.jmgm.2016.10.012] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 09/19/2016] [Accepted: 10/14/2016] [Indexed: 01/04/2023]
|
39
|
Ferreira Neto DC, de Souza Ferreira M, da Conceição Petronilho E, Alencar Lima J, Oliveira Francisco de Azeredo S, de Oliveira Carneiro Brum J, Jorge do Nascimento C, Figueroa Villar JD. A new guanylhydrazone derivative as a potential acetylcholinesterase inhibitor for Alzheimer's disease: synthesis, molecular docking, biological evaluation and kinetic studies by nuclear magnetic resonance. RSC Adv 2017. [DOI: 10.1039/c7ra04180b] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Molecular docking, in silico studies and NMR show that the new guanylhydrazone is a promising compound for the treatment of Alzheimer's disease.
Collapse
|
40
|
Cui Y, Chen Q, Li Y, Tang L. A new model of flavonoids affinity towards P-glycoprotein: genetic algorithm-support vector machine with features selected by a modified particle swarm optimization algorithm. Arch Pharm Res 2016; 40:214-230. [DOI: 10.1007/s12272-016-0876-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Accepted: 12/16/2016] [Indexed: 01/04/2023]
|
41
|
Zhang M, Xia Z, Yan A. Computer modeling in predicting the bioactivity of human 5-lipoxygenase inhibitors. Mol Divers 2016; 21:235-246. [PMID: 27904990 DOI: 10.1007/s11030-016-9709-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Accepted: 11/14/2016] [Indexed: 01/04/2023]
Abstract
5-Lipoxygenase (5-LOX) is a key enzyme in the inflammatory path. Inhibitors of 5-LOX are useful for the treatment of diseases like arthritis, cancer, and asthma. We have collected a dataset including 220 human 5-LOX inhibitors for classification. A self-organizing map (SOM), a support vector machine (SVM), and a multilayer perceptron (MLP) algorithm were used to build models with selected descriptors for classifying 5-LOX inhibitors into active and weakly active ones. MACCS fingerprints were used in this model building process. The accuracy (Q) and Matthews correlation coefficient (MCC) of the best SOM model (Model 1A) were 86.49% and 0.73 on the test set, respectively. The Q and MCC of the best SVM model (Model 2A) were 82.67% and 0.64 on the test set, respectively. The Q and MCC of the best MLP model (Model 3B) were 84.00% and 0.67 on the test set, respectively. In addition, 180 inhibitors with bioactivities measured by fluorescence method were further used for a quantitative prediction. Multiple linear regression (MLR) and SVM algorithms were used to build models to predict the [Formula: see text] values. The correlation coefficients (R) of the MLR model (Model Q1) and the SVM model (Model Q2) were 0.72 and 0.74 on the test set, respectively.
Collapse
Affiliation(s)
- Mengdi Zhang
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, P.O. Box 53, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China
| | - Zhonghua Xia
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, P.O. Box 53, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China
| | - Aixia Yan
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, P.O. Box 53, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China. .,State Key Laboratory of Natural and Biomimetic Drugs, Peking University, Beijing, People's Republic of China.
| |
Collapse
|
42
|
Chiddarwar RK, Rohrer SG, Wolf A, Tresch S, Wollenhaupt S, Bender A. In silico target prediction for elucidating the mode of action of herbicides including prospective validation. J Mol Graph Model 2016; 71:70-79. [PMID: 27846423 DOI: 10.1016/j.jmgm.2016.10.021] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2016] [Accepted: 10/25/2016] [Indexed: 01/04/2023]
Abstract
The rapid emergence of pesticide resistance has given rise to a demand for herbicides with new mode of action (MoA). In the agrochemical sector, with the availability of experimental high throughput screening (HTS) data, it is now possible to utilize in silico target prediction methods in the early discovery phase to suggest the MoA of a compound via data mining of bioactivity data. While having been established in the pharmaceutical context, in the agrochemical area this approach poses rather different challenges, as we have found in this work, partially due to different chemistry, but even more so due to different (usually smaller) amounts of data, and different ways of conducting HTS. With the aim to apply computational methods for facilitating herbicide target identification, 48,000 bioactivity data against 16 herbicide targets were processed to train Laplacian modified Naïve Bayesian (NB) classification models. The herbicide target prediction model ("HerbiMod") is an ensemble of 16 binary classification models which are evaluated by internal, external and prospective validation sets. In addition to the experimental inactives, 10,000 random agrochemical inactives were included in the training process, which showed to improve the overall balanced accuracy of our models up to 40%. For all the models, performance in terms of balanced accuracy of≥80% was achieved in five-fold cross validation. Ranking target predictions was addressed by means of z-scores which improved predictivity over using raw scores alone. An external testset of 247 compounds from ChEMBL and a prospective testset of 394 compounds from BASF SE tested against five well studied herbicide targets (ACC, ALS, HPPD, PDS and PROTOX) were used for further validation. Only 4% of the compounds in the external testset lied in the applicability domain and extrapolation (and correct prediction) was hence impossible, which on one hand was surprising, and on the other hand illustrated the utilization of using applicability domains in the first place. However, performance better than 60% in balanced accuracy was achieved on the prospective testset, where all the compounds fell within the applicability domain, and which hence underlines the possibility of using target prediction also in the area of agrochemicals.
Collapse
Affiliation(s)
- Rucha K Chiddarwar
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Sebastian G Rohrer
- Global Research Crop Protection, BASF SE, Speyerer Strasse 2, 67177 Limburgerhof, Germany
| | - Antje Wolf
- Computational Chemistry and Biology, BASF SE, Carl-Bosch-Strasse 38, 67056 Ludwigshafen, Germany
| | - Stefan Tresch
- Global Research Crop Protection, BASF SE, Speyerer Strasse 2, 67177 Limburgerhof, Germany
| | - Sabrina Wollenhaupt
- Computational Chemistry and Biology, BASF SE, Carl-Bosch-Strasse 38, 67056 Ludwigshafen, Germany
| | - Andreas Bender
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom.
| |
Collapse
|
43
|
Takeda S, Kaneko H, Funatsu K. Chemical-Space-Based de Novo Design Method To Generate Drug-Like Molecules. J Chem Inf Model 2016; 56:1885-1893. [PMID: 27632418 DOI: 10.1021/acs.jcim.6b00038] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
To discover drug compounds in chemical space containing an enormous number of compounds, a structure generator is required to produce virtual drug-like chemical structures. The de novo design algorithm for exploring chemical space (DAECS) visualizes the activity distribution on a two-dimensional plane corresponding to chemical space and generates structures in a target area on a plane selected by the user. In this study, we modify the DAECS to enable the user to select a target area to consider properties other than activity and improve the diversity of the generated structures by visualizing the drug-likeness distribution and the activity distribution, generating structures by substructure-based structural changes, including addition, deletion, and substitution of substructures, as well as the slight structural changes used in the DAECS. Through case studies using ligand data for the human adrenergic alpha2A receptor and the human histamine H1 receptor, the modified DAECS can generate high diversity drug-like structures, and the usefulness of the modification of the DAECS is verified.
Collapse
Affiliation(s)
- Shunichi Takeda
- Department of Chemical Systems Engineering, The University of Tokyo , 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Hiromasa Kaneko
- Department of Chemical Systems Engineering, The University of Tokyo , 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Kimito Funatsu
- Department of Chemical Systems Engineering, The University of Tokyo , 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| |
Collapse
|
44
|
Bushkov NA, Veselov MS, Chuprov-Netochin RN, Marusich EI, Majouga AG, Volynchuk PB, Shumilina DV, Leonov SV, Ivanenkov YA. Computational insight into the chemical space of plant growth regulators. PHYTOCHEMISTRY 2016; 122:254-264. [PMID: 26723884 DOI: 10.1016/j.phytochem.2015.12.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2015] [Revised: 12/02/2015] [Accepted: 12/11/2015] [Indexed: 06/05/2023]
Abstract
An enormous technological progress has resulted in an explosive growth in the amount of biological and chemical data that is typically multivariate and tangled in structure. Therefore, several computational approaches have mainly focused on dimensionality reduction and convenient representation of high-dimensional datasets to elucidate the relationships between the observed activity (or effect) and calculated parameters commonly expressed in terms of molecular descriptors. We have collected the experimental data available in patent and scientific publications as well as specific databases for various agrochemicals. The resulting dataset was then thoroughly analyzed using Kohonen-based self-organizing technique. The overall aim of the presented study is to investigate whether the developed in silico model can be applied to predict the agrochemical activity of small molecule compounds and, at the same time, to offer further insights into the distinctive features of different agrochemical categories. The preliminary external validation with several plant growth regulators demonstrated a relatively high prediction power (67%) of the constructed model. This study is, actually, the first example of a large-scale modeling in the field of agrochemistry.
Collapse
Affiliation(s)
- Nikolay A Bushkov
- Moscow Institute of Physics and Technology, 9 Institutskiy Lane, Dolgoprudny, Moscow Region 141700, Russian Federation.
| | - Mark S Veselov
- Moscow Institute of Physics and Technology, 9 Institutskiy Lane, Dolgoprudny, Moscow Region 141700, Russian Federation; Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow 119991, Russian Federation; National University of Science and Technology MISiS, 2 Leninskiy Prospect, Moscow 119049, Russian Federation
| | - Roman N Chuprov-Netochin
- Moscow Institute of Physics and Technology, 9 Institutskiy Lane, Dolgoprudny, Moscow Region 141700, Russian Federation
| | - Elena I Marusich
- Moscow Institute of Physics and Technology, 9 Institutskiy Lane, Dolgoprudny, Moscow Region 141700, Russian Federation
| | - Alexander G Majouga
- Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow 119991, Russian Federation; National University of Science and Technology MISiS, 2 Leninskiy Prospect, Moscow 119049, Russian Federation
| | - Polina B Volynchuk
- Moscow Institute of Physics and Technology, 9 Institutskiy Lane, Dolgoprudny, Moscow Region 141700, Russian Federation
| | - Daria V Shumilina
- Moscow Institute of Physics and Technology, 9 Institutskiy Lane, Dolgoprudny, Moscow Region 141700, Russian Federation
| | - Sergey V Leonov
- Moscow Institute of Physics and Technology, 9 Institutskiy Lane, Dolgoprudny, Moscow Region 141700, Russian Federation
| | - Yan A Ivanenkov
- Moscow Institute of Physics and Technology, 9 Institutskiy Lane, Dolgoprudny, Moscow Region 141700, Russian Federation; Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow 119991, Russian Federation; National University of Science and Technology MISiS, 2 Leninskiy Prospect, Moscow 119049, Russian Federation; ChemDiv, 6605 Nancy Ridge Drive, San Diego, CA 92121, USA
| |
Collapse
|
45
|
Tian S, Wang J, Li Y, Li D, Xu L, Hou T. The application of in silico drug-likeness predictions in pharmaceutical research. Adv Drug Deliv Rev 2015; 86:2-10. [PMID: 25666163 DOI: 10.1016/j.addr.2015.01.009] [Citation(s) in RCA: 258] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Revised: 01/14/2015] [Accepted: 01/29/2015] [Indexed: 02/08/2023]
Abstract
The concept of drug-likeness, established from the analyses of the physiochemical properties or/and structural features of existing small organic drugs or/and drug candidates, has been widely used to filter out compounds with undesirable properties, especially poor ADMET (absorption, distribution, metabolism, excretion, and toxicity) profiles. Here, we summarize various approaches for drug-likeness evaluations, including simple rules/filters based on molecular properties/structures and quantitative prediction models based on sophisticated machine learning methods, and provide a comprehensive review of recent advances in this field. Moreover, the strengths and weaknesses of these approaches are briefly outlined. Finally, the drug-likeness analyses of natural products and traditional Chinese medicines (TCM) are discussed.
Collapse
Affiliation(s)
- Sheng Tian
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China; College of Pharmaceutical Sciences, Soochow University, Suzhou, Jiangsu 215123, China
| | - Junmei Wang
- Green Center for Systems Biology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX 75390, United States
| | - Youyong Li
- Institute of Functional Nano and Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China
| | - Dan Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Lei Xu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China; College of Pharmaceutical Sciences, Soochow University, Suzhou, Jiangsu 215123, China.
| |
Collapse
|
46
|
Korkmaz S, Zararsiz G, Goksuluk D. MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development. PLoS One 2015; 10:e0124600. [PMID: 25928885 PMCID: PMC4415797 DOI: 10.1371/journal.pone.0124600] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 03/03/2015] [Indexed: 12/18/2022] Open
Abstract
Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/.
Collapse
Affiliation(s)
- Selcuk Korkmaz
- Department of Biostatistics, Faculty of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey
- * E-mail:
| | - Gokmen Zararsiz
- Department of Biostatistics, Faculty of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey
| | - Dincer Goksuluk
- Department of Biostatistics, Faculty of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey
| |
Collapse
|
47
|
Wang WJ, Huang Q, Zou J, Li LL, Yang SY. TS-Chemscore, a Target-Specific Scoring Function, Significantly Improves the Performance of Scoring in Virtual Screening. Chem Biol Drug Des 2014; 86:1-8. [PMID: 25358259 DOI: 10.1111/cbdd.12470] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2014] [Revised: 10/03/2014] [Accepted: 10/17/2014] [Indexed: 02/05/2023]
Affiliation(s)
- Wen-Jing Wang
- State Key Laboratory of Biotherapy/Collaborative Innovation Center of Biotherapy; West China Hospital; West China Medical School; Sichuan University; Chengdu Sichuan 610041 China
| | - Qi Huang
- State Key Laboratory of Biotherapy/Collaborative Innovation Center of Biotherapy; West China Hospital; West China Medical School; Sichuan University; Chengdu Sichuan 610041 China
| | - Jun Zou
- State Key Laboratory of Biotherapy/Collaborative Innovation Center of Biotherapy; West China Hospital; West China Medical School; Sichuan University; Chengdu Sichuan 610041 China
| | - Lin-Li Li
- West China School of Pharmacy; Sichuan University; Chengdu Sichuan 610041 China
| | - Sheng-Yong Yang
- State Key Laboratory of Biotherapy/Collaborative Innovation Center of Biotherapy; West China Hospital; West China Medical School; Sichuan University; Chengdu Sichuan 610041 China
| |
Collapse
|
48
|
Lavecchia A. Machine-learning approaches in drug discovery: methods and applications. Drug Discov Today 2014; 20:318-31. [PMID: 25448759 DOI: 10.1016/j.drudis.2014.10.012] [Citation(s) in RCA: 359] [Impact Index Per Article: 35.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Revised: 09/27/2014] [Accepted: 10/24/2014] [Indexed: 12/19/2022]
Abstract
During the past decade, virtual screening (VS) has evolved from traditional similarity searching, which utilizes single reference compounds, into an advanced application domain for data mining and machine-learning approaches, which require large and representative training-set compounds to learn robust decision rules. The explosive growth in the amount of public domain-available chemical and biological data has generated huge effort to design, analyze, and apply novel learning methodologies. Here, I focus on machine-learning techniques within the context of ligand-based VS (LBVS). In addition, I analyze several relevant VS studies from recent publications, providing a detailed view of the current state-of-the-art in this field and highlighting not only the problematic issues, but also the successes and opportunities for further advances.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Department of Pharmacy, Drug Discovery Laboratory, University of Napoli 'Federico II', via D. Montesano 49, I-80131 Napoli, Italy.
| |
Collapse
|
49
|
Korkmaz S, Zararsiz G, Goksuluk D. Drug/nondrug classification using Support Vector Machines with various feature selection strategies. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 117:51-60. [PMID: 25224081 DOI: 10.1016/j.cmpb.2014.08.009] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 08/15/2014] [Accepted: 08/27/2014] [Indexed: 06/03/2023]
Abstract
In conjunction with the advance in computer technology, virtual screening of small molecules has been started to use in drug discovery. Since there are thousands of compounds in early-phase of drug discovery, a fast classification method, which can distinguish between active and inactive molecules, can be used for screening large compound collections. In this study, we used Support Vector Machines (SVM) for this type of classification task. SVM is a powerful classification tool that is becoming increasingly popular in various machine-learning applications. The data sets consist of 631 compounds for training set and 216 compounds for a separate test set. In data pre-processing step, the Pearson's correlation coefficient used as a filter to eliminate redundant features. After application of the correlation filter, a single SVM has been applied to this reduced data set. Moreover, we have investigated the performance of SVM with different feature selection strategies, including SVM-Recursive Feature Elimination, Wrapper Method and Subset Selection. All feature selection methods generally represent better performance than a single SVM while Subset Selection outperforms other feature selection methods. We have tested SVM as a classification tool in a real-life drug discovery problem and our results revealed that it could be a useful method for classification task in early-phase of drug discovery.
Collapse
Affiliation(s)
- Selcuk Korkmaz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey.
| | - Gokmen Zararsiz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| | - Dincer Goksuluk
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| |
Collapse
|
50
|
Mao R, Raj Kumar PK, Guo C, Zhang Y, Liang C. Comparative analyses between retained introns and constitutively spliced introns in Arabidopsis thaliana using random forest and support vector machine. PLoS One 2014; 9:e104049. [PMID: 25110928 PMCID: PMC4128822 DOI: 10.1371/journal.pone.0104049] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Accepted: 07/06/2014] [Indexed: 01/04/2023] Open
Abstract
One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention.
Collapse
Affiliation(s)
- Rui Mao
- College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling, Shaanxi, China
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
- Department of Biology, Miami University, Oxford, Ohio, United States of America
| | | | - Cheng Guo
- Department of Biology, Miami University, Oxford, Ohio, United States of America
| | - Yang Zhang
- College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling, Shaanxi, China
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
- * E-mail: (YZ); (CL)
| | - Chun Liang
- Department of Biology, Miami University, Oxford, Ohio, United States of America
- Department of Computer Sciences and Software Engineering, Miami University, Oxford, Ohio, United States of America
- * E-mail: (YZ); (CL)
| |
Collapse
|