1
|
Asim A, Kiani YS, Saeed MT, Jabeen I. Decoding the Role of Epigenetics in Breast Cancer Using Formal Modeling and Machine-Learning Methods. Front Mol Biosci 2022; 9:882738. [PMID: 35898303 PMCID: PMC9309526 DOI: 10.3389/fmolb.2022.882738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 05/25/2022] [Indexed: 11/17/2022] Open
Abstract
Breast carcinogenesis is known to be instigated by genetic and epigenetic modifications impacting multiple cellular signaling cascades, thus making its prevention and treatments a challenging endeavor. However, epigenetic modification, particularly DNA methylation-mediated silencing of key TSGs, is a hallmark of cancer progression. One such tumor suppressor gene (TSG) RUNX3 (Runt-related transcription factor 3) has been a new insight in breast cancer known to be suppressed due to local promoter hypermethylation mediated by DNA methyltransferase 1 (DNMT1). However, the precise mechanism of epigenetic-influenced silencing of the RUNX3 signaling resulting in cancer invasion and metastasis remains inadequately characterized. In this study, a biological regulatory network (BRN) has been designed to model the dynamics of the DNMT1–RUNX3 network augmented by other regulators such as p21, c-myc, and p53. For this purpose, the René Thomas qualitative modeling was applied to compute the unknown parameters and the subsequent trajectories signified important behaviors of the DNMT1–RUNX3 network (i.e., recovery cycle, homeostasis, and bifurcation state). As a result, the biological system was observed to invade cancer metastasis due to persistent activation of oncogene c-myc accompanied by consistent downregulation of TSG RUNX3. Conversely, homeostasis was achieved in the absence of c-myc and activated TSG RUNX3. Furthermore, DNMT1 was endorsed as a potential epigenetic drug target to be subjected to the implementation of machine-learning techniques for the classification of the active and inactive DNMT1 modulators. The best-performing ML model successfully classified the active and least-active DNMT1 inhibitors exhibiting 97% classification accuracy. Collectively, this study reveals the underlined epigenetic events responsible for RUNX3-implicated breast cancer metastasis along with the classification of DNMT1 modulators that can potentially drive the perception of epigenetic-based tumor therapy.
Collapse
|
2
|
Diéguez-Santana K, Casañola-Martin GM, Torres R, Rasulev B, Green JR, González-Díaz H. Machine Learning Study of Metabolic Networks vs ChEMBL Data of Antibacterial Compounds. Mol Pharm 2022; 19:2151-2163. [PMID: 35671399 PMCID: PMC9986951 DOI: 10.1021/acs.molpharmaceut.2c00029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Antibacterial drugs (AD) change the metabolic status of bacteria, contributing to bacterial death. However, antibiotic resistance and the emergence of multidrug-resistant bacteria increase interest in understanding metabolic network (MN) mutations and the interaction of AD vs MN. In this study, we employed the IFPTML = Information Fusion (IF) + Perturbation Theory (PT) + Machine Learning (ML) algorithm on a huge dataset from the ChEMBL database, which contains >155,000 AD assays vs >40 MNs of multiple bacteria species. We built a linear discriminant analysis (LDA) and 17 ML models centered on the linear index and based on atoms to predict antibacterial compounds. The IFPTML-LDA model presented the following results for the training subset: specificity (Sp) = 76% out of 70,000 cases, sensitivity (Sn) = 70%, and Accuracy (Acc) = 73%. The same model also presented the following results for the validation subsets: Sp = 76%, Sn = 70%, and Acc = 73.1%. Among the IFPTML nonlinear models, the k nearest neighbors (KNN) showed the best results with Sn = 99.2%, Sp = 95.5%, Acc = 97.4%, and Area Under Receiver Operating Characteristic (AUROC) = 0.998 in training sets. In the validation series, the Random Forest had the best results: Sn = 93.96% and Sp = 87.02% (AUROC = 0.945). The IFPTML linear and nonlinear models regarding the ADs vs MNs have good statistical parameters, and they could contribute toward finding new metabolic mutations in antibiotic resistance and reducing time/costs in antibacterial drug research.
Collapse
Affiliation(s)
- Karel Diéguez-Santana
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain.,Universidad Regional Amazónica IKIAM, Tena, Napo 150150, Ecuador
| | - Gerardo M Casañola-Martin
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States.,Department of Systems and Computer Engineering, Carleton University, K1S5B6 Ottawa, Ontario, Canada
| | - Roldan Torres
- Universidad Regional Amazónica IKIAM, Tena, Napo 150150, Ecuador
| | - Bakhtiyor Rasulev
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
| | - James R Green
- Department of Systems and Computer Engineering, Carleton University, K1S5B6 Ottawa, Ontario, Canada
| | - Humbert González-Díaz
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain.,BIOFISIKA, Basque Center for Biophysics CSIC-UPVEH, 48940 Leioa, Spain.,IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Biscay, Spain
| |
Collapse
|
3
|
Ding W, Nan Y, Wu J, Han C, Xin X, Li S, Liu H, Zhang L. Combining multi-dimensional molecular fingerprints to predict the hERG cardiotoxicity of compounds. Comput Biol Med 2022; 144:105390. [DOI: 10.1016/j.compbiomed.2022.105390] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/06/2022] [Accepted: 03/07/2022] [Indexed: 01/28/2023]
|
4
|
Periwal V, Bassler S, Andrejev S, Gabrielli N, Patil KR, Typas A, Patil KR. Bioactivity assessment of natural compounds using machine learning models trained on target similarity between drugs. PLoS Comput Biol 2022; 18:e1010029. [PMID: 35468126 PMCID: PMC9071136 DOI: 10.1371/journal.pcbi.1010029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 05/05/2022] [Accepted: 03/17/2022] [Indexed: 11/19/2022] Open
Abstract
Natural compounds constitute a rich resource of potential small molecule therapeutics. While experimental access to this resource is limited due to its vast diversity and difficulties in systematic purification, computational assessment of structural similarity with known therapeutic molecules offers a scalable approach. Here, we assessed functional similarity between natural compounds and approved drugs by combining multiple chemical similarity metrics and physicochemical properties using a machine-learning approach. We computed pairwise similarities between 1410 drugs for training classification models and used the drugs shared protein targets as class labels. The best performing models were random forest which gave an average area under the ROC of 0.9, Matthews correlation coefficient of 0.35, and F1 score of 0.33, suggesting that it captured the structure-activity relation well. The models were then used to predict protein targets of circa 11k natural compounds by comparing them with the drugs. This revealed therapeutic potential of several natural compounds, including those with support from previously published sources as well as those hitherto unexplored. We experimentally validated one of the predicted pair’s activities, viz., Cox-1 inhibition by 5-methoxysalicylic acid, a molecule commonly found in tea, herbs and spices. In contrast, another natural compound, 4-isopropylbenzoic acid, with the highest similarity score when considering most weighted similarity metric but not picked by our models, did not inhibit Cox-1. Our results demonstrate the utility of a machine-learning approach combining multiple chemical features for uncovering protein binding potential of natural compounds.
Collapse
Affiliation(s)
- Vinita Periwal
- European Molecular Biology Laboratory, Heidelberg, Germany
- Medical Research Council Toxicology Unit, University of Cambridge, Cambridge, United Kingdom
| | - Stefan Bassler
- European Molecular Biology Laboratory, Heidelberg, Germany
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | | | | | - Kaustubh Raosaheb Patil
- Institute of Neuroscience and Medicine (INM-7), Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | | | - Kiran Raosaheb Patil
- European Molecular Biology Laboratory, Heidelberg, Germany
- Medical Research Council Toxicology Unit, University of Cambridge, Cambridge, United Kingdom
- * E-mail:
| |
Collapse
|
5
|
Wang Y, Jeon H. 3D cell cultures toward quantitative high-throughput drug screening. Trends Pharmacol Sci 2022; 43:569-581. [DOI: 10.1016/j.tips.2022.03.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 03/29/2022] [Accepted: 03/30/2022] [Indexed: 01/16/2023]
|
6
|
Piroozmand F, Mohammadipanah F, Sajedi H. Spectrum of deep learning algorithms in drug discovery. Chem Biol Drug Des 2021; 96:886-901. [PMID: 33058458 DOI: 10.1111/cbdd.13674] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 02/11/2020] [Accepted: 02/19/2020] [Indexed: 12/16/2022]
Abstract
Deep learning (DL) algorithms are a subset of machine learning algorithms with the aim of modeling complex mapping between a set of elements and their classes. In parallel to the advance in revealing the molecular bases of diseases, a notable innovation has been undertaken to apply DL in data/libraries management, reaction optimizations, differentiating uncertainties, molecule constructions, creating metrics from qualitative results, and prediction of structures or interactions. From source identification to lead discovery and medicinal chemistry of the drug candidate, drug delivery, and modification, the challenges can be subjected to artificial intelligence algorithms to aid in the generation and interpretation of data. Discovery and design approach, both demand automation, large data management and data fusion by the advance in high-throughput mode. The application of DL can accelerate the exploration of drug mechanisms, finding novel indications for existing drugs (drug repositioning), drug development, and preclinical and clinical studies. The impact of DL in the workflow of drug discovery, design, and their complementary tools are highlighted in this review. Additionally, the type of DL algorithms used for this purpose, and their pros and cons along with the dominant directions of future research are presented.
Collapse
Affiliation(s)
- Firoozeh Piroozmand
- Pharmaceutical Biotechnology Lab, Department of Microbiology, School of Biology and Center of Excellence in Phylogeny of Living Organisms, College of Science, University of Tehran, Tehran, Iran
| | - Fatemeh Mohammadipanah
- Pharmaceutical Biotechnology Lab, Department of Microbiology, School of Biology and Center of Excellence in Phylogeny of Living Organisms, College of Science, University of Tehran, Tehran, Iran
| | - Hedieh Sajedi
- Department of Computer Science, School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
| |
Collapse
|
7
|
Diallo BN, Glenister M, Musyoka TM, Lobb K, Tastan Bishop Ö. SANCDB: an update on South African natural compounds and their readily available analogs. J Cheminform 2021; 13:37. [PMID: 33952332 PMCID: PMC8097257 DOI: 10.1186/s13321-021-00514-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2020] [Accepted: 04/23/2021] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND South African Natural Compounds Database (SANCDB; https://sancdb.rubi.ru.ac.za/ ) is the sole and a fully referenced database of natural chemical compounds of South African biodiversity. It is freely available, and since its inception in 2015, the database has become an important resource to several studies. Its content has been: used as training data for machine learning models; incorporated to larger databases; and utilized in drug discovery studies for hit identifications. DESCRIPTION Here, we report the updated version of SANCDB. The new version includes 412 additional compounds that have been reported since 2015, giving a total of 1012 compounds in the database. Further, although natural products (NPs) are an important source of unique scaffolds, they have a major drawback due to their complex structure resulting in low synthetic feasibility in the laboratory. With this in mind, SANCDB is, now, updated to provide direct links to commercially available analogs from two major chemical databases namely Mcule and MolPort. To our knowledge, this feature is not available in other NP databases. Additionally, for easier access to information by users, the database and website interface were updated. The compounds are now downloadable in many different chemical formats. CONCLUSIONS The drug discovery process relies heavily on NPs due to their unique chemical organization. This has inspired the establishment of numerous NP chemical databases. With the emergence of newer chemoinformatic technologies, existing chemical databases require constant updates to facilitate information accessibility and integration by users. Besides increasing the NPs compound content, the updated SANCDB allows users to access the individual compounds (if available) or their analogs from commercial databases seamlessly.
Collapse
Affiliation(s)
- Bakary N'tji Diallo
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Makhanda/Grahamstown, 6140, South Africa
| | - Michael Glenister
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Makhanda/Grahamstown, 6140, South Africa
| | - Thommas M Musyoka
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Makhanda/Grahamstown, 6140, South Africa
| | - Kevin Lobb
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Makhanda/Grahamstown, 6140, South Africa.,Department of Chemistry, Rhodes University, Makhanda/Grahamstown, 6140, South Africa
| | - Özlem Tastan Bishop
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Makhanda/Grahamstown, 6140, South Africa.
| |
Collapse
|
8
|
García-Sosa AT. Androgen Receptor Binding Category Prediction with Deep Neural Networks and Structure-, Ligand-, and Statistically Based Features. Molecules 2021; 26:1285. [PMID: 33652992 PMCID: PMC7956632 DOI: 10.3390/molecules26051285] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 02/23/2021] [Accepted: 02/24/2021] [Indexed: 01/10/2023] Open
Abstract
Substances that can modify the androgen receptor pathway in humans and animals are entering the environment and food chain with the proven ability to disrupt hormonal systems and leading to toxicity and adverse effects on reproduction, brain development, and prostate cancer, among others. State-of-the-art databases with experimental data of human, chimp, and rat effects by chemicals have been used to build machine-learning classifiers and regressors and to evaluate these on independent sets. Different featurizations, algorithms, and protein structures lead to different results, with deep neural networks (DNNs) on user-defined physicochemically relevant features developed for this work outperforming graph convolutional, random forest, and large featurizations. The results show that these user-provided structure-, ligand-, and statistically based features and specific DNNs provided the best results as determined by AUC (0.87), MCC (0.47), and other metrics and by their interpretability and chemical meaning of the descriptors/features. In addition, the same features in the DNN method performed better than in a multivariate logistic model: validation MCC = 0.468 and training MCC = 0.868 for the present work compared to evaluation set MCC = 0.2036 and training set MCC = 0.5364 for the multivariate logistic regression on the full, unbalanced set. Techniques of this type may improve AR and toxicity description and prediction, improving assessment and design of compounds. Source code and data are available on github.
Collapse
|
9
|
Machine learning, artificial intelligence, and data science breaking into drug design and neglected diseases. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1513] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
10
|
Cancer gene expression profiles associated with clinical outcomes to chemotherapy treatments. BMC Med Genomics 2020; 13:111. [PMID: 32948183 PMCID: PMC7499993 DOI: 10.1186/s12920-020-00759-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 07/27/2020] [Indexed: 12/18/2022] Open
Abstract
Background Machine learning (ML) methods still have limited applicability in personalized oncology due to low numbers of available clinically annotated molecular profiles. This doesn’t allow sufficient training of ML classifiers that could be used for improving molecular diagnostics. Methods We reviewed published datasets of high throughput gene expression profiles corresponding to cancer patients with known responses on chemotherapy treatments. We browsed Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA) and Tumor Alterations Relevant for GEnomics-driven Therapy (TARGET) repositories. Results We identified data collections suitable to build ML models for predicting responses on certain chemotherapeutic schemes. We identified 26 datasets, ranging from 41 till 508 cases per dataset. All the datasets identified were checked for ML applicability and robustness with leave-one-out cross validation. Twenty-three datasets were found suitable for using ML that had balanced numbers of treatment responder and non-responder cases. Conclusions We collected a database of gene expression profiles associated with clinical responses on chemotherapy for 2786 individual cancer cases. Among them seven datasets included RNA sequencing data (for 645 cases) and the others – microarray expression profiles. The cases represented breast cancer, lung cancer, low-grade glioma, endothelial carcinoma, multiple myeloma, adult leukemia, pediatric leukemia and kidney tumors. Chemotherapeutics included taxanes, bortezomib, vincristine, trastuzumab, letrozole, tipifarnib, temozolomide, busulfan and cyclophosphamide.
Collapse
|
11
|
Dinić J, Efferth T, García-Sosa AT, Grahovac J, Padrón JM, Pajeva I, Rizzolio F, Saponara S, Spengler G, Tsakovska I. Repurposing old drugs to fight multidrug resistant cancers. Drug Resist Updat 2020; 52:100713. [PMID: 32615525 DOI: 10.1016/j.drup.2020.100713] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 06/04/2020] [Accepted: 06/06/2020] [Indexed: 02/08/2023]
Abstract
Overcoming multidrug resistance represents a major challenge for cancer treatment. In the search for new chemotherapeutics to treat malignant diseases, drug repurposing gained a tremendous interest during the past years. Repositioning candidates have often emerged through several stages of clinical drug development, and may even be marketed, thus attracting the attention and interest of pharmaceutical companies as well as regulatory agencies. Typically, drug repositioning has been serendipitous, using undesired side effects of small molecule drugs to exploit new disease indications. As bioinformatics gain increasing popularity as an integral component of drug discovery, more rational approaches are needed. Herein, we show some practical examples of in silico approaches such as pharmacophore modelling, as well as pharmacophore- and docking-based virtual screening for a fast and cost-effective repurposing of small molecule drugs against multidrug resistant cancers. We provide a timely and comprehensive overview of compounds with considerable potential to be repositioned for cancer therapeutics. These drugs are from diverse chemotherapeutic classes. We emphasize the scope and limitations of anthelmintics, antibiotics, antifungals, antivirals, antimalarials, antihypertensives, psychopharmaceuticals and antidiabetics that have shown extensive immunomodulatory, antiproliferative, pro-apoptotic, and antimetastatic potential. These drugs, either used alone or in combination with existing anticancer chemotherapeutics, represent strong candidates to prevent or overcome drug resistance. We particularly focus on outcomes and future perspectives of drug repositioning for the treatment of multidrug resistant tumors and discuss current possibilities and limitations of preclinical and clinical investigations.
Collapse
Affiliation(s)
- Jelena Dinić
- Department of Neurobiology, Institute for Biological Research "Siniša Stanković" - National Institute of Republic of Serbia, University of Belgrade, Bulevar Despota Stefana 142, 11060 Belgrade, Serbia
| | - Thomas Efferth
- Department of Pharmaceutical Biology, Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg University, Staudinger Weg 5, 55128 Mainz, Germany
| | | | - Jelena Grahovac
- Department of Experimental Oncology, Institute for Oncology and Radiology of Serbia, Pasterova 14, 11000 Belgrade, Serbia
| | - José M Padrón
- BioLab, Instituto Universitario de Bio-Orgánica Antonio González (IUBO AG), Universidad de La Laguna, Avda. Astrofísico Francisco Sánchez 2, E-38071 La Laguna, Spain.
| | - Ilza Pajeva
- Institute of Biophysics and Biomedical Engineering, Bulgarian Academy of Sciences, Acad. G. Bonchev Str., Bl. 105, 1113 Sofia, Bulgaria
| | - Flavio Rizzolio
- Department of Molecular Sciences and Nanosystems, Ca' Foscari University of Venice, 301724 Venezia-Mestre, Italy; Pathology Unit, Centro di Riferimento Oncologico di Aviano (CRO) IRCCS, 33081 Aviano, Italy
| | - Simona Saponara
- Department of Life Sciences, University of Siena, Via Aldo Moro 2, 53100 Siena, Italy
| | - Gabriella Spengler
- Department of Medical Microbiology and Immunobiology, Faculty of Medicine, University of Szeged, H-6720 Szeged, Dóm tér 10, Hungary
| | - Ivanka Tsakovska
- Institute of Biophysics and Biomedical Engineering, Bulgarian Academy of Sciences, Acad. G. Bonchev Str., Bl. 105, 1113 Sofia, Bulgaria
| |
Collapse
|
12
|
Ahamed TKS, Muraleedharan K. A cheminformatic study on chemical space characterization and diversity analysis of 5-LOX inhibitors. J Mol Graph Model 2020; 100:107699. [PMID: 32799052 DOI: 10.1016/j.jmgm.2020.107699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 06/19/2020] [Accepted: 07/10/2020] [Indexed: 10/23/2022]
Abstract
The process of blocking 5-lipoxygenase (5-LOX) catalyzed leukotriene biosynthesis has been recognized for the past few decades as a promising therapeutic strategy for acute inflammatory, allergic, and respiratory diseases. Due to the toxicity effect of FDA approved 5-LOX inhibitor zileuton, novel 5-LOX inhibitors have been sought by the scientific community. As a result, a significant and relevant amount of information on the structure-activity of 5-LOX inhibitors has been released and stored in public databases. In this study, we aimed at the comprehensive cheminformatic characterization of the diversity and complexity of the chemical space of 5-LOX inhibitors and its activating protein FLAP inhibitors by comparing it with the Approved drug space and virtual LOX library. The visual representation of the property space indicates some compounds in the 5-LOX inhibitors space broaden the traditional medicinal space. The structural diversity of the databases is computed using complementary approaches, including Physicochemical Property (PCP) descriptors, molecular fingerprints, and molecular scaffold. With the apparent exception of approved drugs, the 5-LOX dataset shows more diversity compared to FLAP and LOX virtual library set. This study was able to identify the underlying patterns in the chemical and pharmacological properties space that were decisive for the drug discovery and development of 5-LOX inhibitors.
Collapse
Affiliation(s)
| | - K Muraleedharan
- Department of Chemistry, University of Calicut, Malappuram, 673635, India.
| |
Collapse
|
13
|
Abstract
Aim: The explosion of data based technology has accelerated pattern mining. However, it is clear that quality and bias of data impacts all machine learning and modeling. Results & methodology: A technique is presented for using the distribution of first significant digits of medicinal chemistry features: logP, logS, and pKa. experimental and predicted, to assess their following of Benford's law as seen in many natural phenomena. Conclusion: Quality of data depends on the dataset sizes, diversity, and magnitudes. Profiling based on drugs may be too small or narrow; using larger sets of experimentally determined or predicted values recovers the distribution seen in other natural phenomena. This technique may be used to improve profiling, machine learning, large dataset assessment and other data based methods for better (automated) data generation and designing compounds.
Collapse
|
14
|
Duan S, Cao H, Liu H, Miao L, Wang J, Zhou X, Wang W, Hu P, Qu L, Wu Y. Development of a machine learning-based multimode diagnosis system for lung cancer. Aging (Albany NY) 2020; 12:9840-9854. [PMID: 32445550 PMCID: PMC7288961 DOI: 10.18632/aging.103249] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 04/20/2020] [Indexed: 02/06/2023]
Abstract
As an emerging technology, artificial intelligence has been applied to identify various physical disorders. Here, we developed a three-layer diagnosis system for lung cancer, in which three machine learning approaches including decision tree C5.0, artificial neural network (ANN) and support vector machine (SVM) were involved. The area under the curve (AUC) was employed to evaluate their decision powers. In the first layer, the AUCs of C5.0, ANN and SVM were 0.676, 0.736 and 0.640, ANN was better than C5.0 and SVM. In the second layer, ANN was similar with SVM but superior to C5.0 supported by the AUCs of 0.804, 0.889 and 0.825. Much higher AUCs of 0.908, 0.910 and 0.849 were identified in the third layer, where the highest sensitivity of 94.12% was found in C5.0. These data proposed a three-layer diagnosis system for lung cancer: ANN was used as a broad-spectrum screening subsystem basing on 14 epidemiological data and clinical symptoms, which was firstly adopted to screen high-risk groups; then, combining with additional 5 tumor biomarkers, ANN was used as an auxiliary diagnosis subsystem to determine the suspected lung cancer patients; C5.0 was finally employed to confirm lung cancer patients basing on 22 CT nodule-based radiomic features.
Collapse
Affiliation(s)
- Shuyin Duan
- College of Public Health, Zhengzhou University, Zhengzhou 450001, China
| | - Huimin Cao
- College of Public Health, Zhengzhou University, Zhengzhou 450001, China
| | - Hong Liu
- The First Affiliated Hospital of Zhengzhou University, Zhengzhou 450001, China
| | - Lijun Miao
- The First Affiliated Hospital of Zhengzhou University, Zhengzhou 450001, China
| | - Jing Wang
- The First Affiliated Hospital of Zhengzhou University, Zhengzhou 450001, China
| | - Xiaolei Zhou
- Henan Provincial Chest Hospital, Zhengzhou 450001, China
| | - Wei Wang
- College of Public Health, Zhengzhou University, Zhengzhou 450001, China
| | - Pingzhao Hu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB R3E 3N4, Canada
| | - Lingbo Qu
- College of Public Health, Zhengzhou University, Zhengzhou 450001, China.,Henan Joint International Research Laboratory of Green Construction of Functional Molecules and Their Bioanalytical Applications, Zhengzhou 450001, China
| | - Yongjun Wu
- College of Public Health, Zhengzhou University, Zhengzhou 450001, China.,The Key Laboratory of Nanomedicine and Health Inspection of Zhengzhou, Zhengzhou 450001, China
| |
Collapse
|
15
|
Tkachev V, Sorokin M, Borisov C, Garazha A, Buzdin A, Borisov N. Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology. Int J Mol Sci 2020; 21:ijms21030713. [PMID: 31979006 PMCID: PMC7037338 DOI: 10.3390/ijms21030713] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 01/16/2020] [Accepted: 01/17/2020] [Indexed: 12/21/2022] Open
Abstract
(1) Background: Machine learning (ML) methods are rarely used for an omics-based prescription of cancer drugs, due to shortage of case histories with clinical outcome supplemented by high-throughput molecular data. This causes overtraining and high vulnerability of most ML methods. Recently, we proposed a hybrid global-local approach to ML termed floating window projective separator (FloWPS) that avoids extrapolation in the feature space. Its core property is data trimming, i.e., sample-specific removal of irrelevant features. (2) Methods: Here, we applied FloWPS to seven popular ML methods, including linear SVM, k nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naïve Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). (3) Results: We performed computational experiments for 21 high throughput gene expression datasets (41–235 samples per dataset) totally representing 1778 cancer patients with known responses on chemotherapy treatments. FloWPS essentially improved the classifier quality for all global ML methods (SVM, RF, BNB, ADA, MLP), where the area under the receiver-operator curve (ROC AUC) for the treatment response classifiers increased from 0.61–0.88 range to 0.70–0.94. We tested FloWPS-empowered methods for overtraining by interrogating the importance of different features for different ML methods in the same model datasets. (4) Conclusions: We showed that FloWPS increases the correlation of feature importance between the different ML methods, which indicates its robustness to overtraining. For all the datasets tested, the best performance of FloWPS data trimming was observed for the BNB method, which can be valuable for further building of ML classifiers in personalized oncology.
Collapse
Affiliation(s)
- Victor Tkachev
- OmicsWayCorp, Walnut, CA 91788, USA; (V.T.); (M.S.); (A.G.)
| | - Maxim Sorokin
- OmicsWayCorp, Walnut, CA 91788, USA; (V.T.); (M.S.); (A.G.)
- Institute for Personailzed Medicine, I.M. Sechenov First Moscow State Medical University, 119991 Moscow, Russia
| | - Constantin Borisov
- National Research University—Higher School of Economics, 101000 Moscow, Russia;
| | - Andrew Garazha
- OmicsWayCorp, Walnut, CA 91788, USA; (V.T.); (M.S.); (A.G.)
| | - Anton Buzdin
- OmicsWayCorp, Walnut, CA 91788, USA; (V.T.); (M.S.); (A.G.)
- Institute for Personailzed Medicine, I.M. Sechenov First Moscow State Medical University, 119991 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Moscow Oblast, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, 117997 Moscow, Russia
| | - Nicolas Borisov
- OmicsWayCorp, Walnut, CA 91788, USA; (V.T.); (M.S.); (A.G.)
- Institute for Personailzed Medicine, I.M. Sechenov First Moscow State Medical University, 119991 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Moscow Oblast, Russia
- Correspondence: ; Tel.: +7-903-218-7261
| |
Collapse
|
16
|
Kausar S, Falcao AO. A visual approach for analysis and inference of molecular activity spaces. J Cheminform 2019; 11:63. [PMID: 33430986 PMCID: PMC6805449 DOI: 10.1186/s13321-019-0386-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 10/05/2019] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Molecular space visualization can help to explore the diversity of large heterogeneous chemical data, which ultimately may increase the understanding of structure-activity relationships (SAR) in drug discovery projects. Visual SAR analysis can therefore be useful for library design, chemical classification for their biological evaluation and virtual screening for the selection of compounds for synthesis or in vitro testing. As such, computational approaches for molecular space visualization have become an important issue in cheminformatics research. The proposed approach uses molecular similarity as the sole input for computing a probabilistic surface of molecular activity (PSMA). This similarity matrix is transformed in 2D using different dimension reduction algorithms (Principal Coordinates Analysis ( PCooA), Kruskal multidimensional scaling, Sammon mapping and t-SNE). From this projection, a kernel density function is applied to compute the probability of activity for each coordinate in the new projected space. RESULTS This methodology was tested over four different quantitative structure-activity relationship (QSAR) binary classification data sets and the PSMAs were computed for each. The generated maps showed internal consistency with active molecules grouped together for all data sets and all dimensionality reduction algorithms. To validate the quality of the generated maps, the 2D coordinates of test molecules were computed into the new reference space using a data transformation matrix. In total sixteen PSMAs were built, and their performance was assessed using the Area Under Curve (AUC) and the Matthews Coefficient Correlation (MCC). For the best projections for each data set, AUC testing results ranged from 0.87 to 0.98 and the MCC scores ranged from 0.33 to 0.77, suggesting this methodology can validly capture the complexities of the molecular activity space. All four mapping functions provided generally good results yet the overall performance of PCooA and t-SNE was slightly better than Sammon mapping and Kruskal multidimensional scaling. CONCLUSIONS Our result showed that by using an appropriate combination of metric space representation and dimensionality reduction applied over metric spaces it is possible to produce a visual PSMA for which its consistency has been validated by using this map as a classification model. The produced maps can be used as prediction tools as it is simple to project any molecule into this new reference space as long as the similarities to the molecules used to compute the initial similarity matrix can be computed.
Collapse
Affiliation(s)
- Samina Kausar
- LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
- BioISI: Biosystems & Integrative Sciences Institute, Faculdade de Ciencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| | - Andre O. Falcao
- LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
- BioISI: Biosystems & Integrative Sciences Institute, Faculdade de Ciencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| |
Collapse
|
17
|
Borisov N, Buzdin A. New Paradigm of Machine Learning (ML) in Personalized Oncology: Data Trimming for Squeezing More Biomarkers From Clinical Datasets. Front Oncol 2019; 9:658. [PMID: 31380288 PMCID: PMC6650540 DOI: 10.3389/fonc.2019.00658] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 07/05/2019] [Indexed: 11/13/2022] Open
Affiliation(s)
- Nicolas Borisov
- Department of Personalized Medicine, I.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia
| | - Anton Buzdin
- Department of Personalized Medicine, I.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia.,Department of Genomics and Postgenomic Technologies, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States
| |
Collapse
|
18
|
Wang YW, Shen ZZ, Jiang Y. Comparison of autoregressive integrated moving average model and generalised regression neural network model for prediction of haemorrhagic fever with renal syndrome in China: a time-series study. BMJ Open 2019; 9:e025773. [PMID: 31209084 PMCID: PMC6589045 DOI: 10.1136/bmjopen-2018-025773] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Revised: 03/13/2019] [Accepted: 05/15/2019] [Indexed: 12/31/2022] Open
Abstract
OBJECTIVES Haemorrhagic fever with renal syndrome (HFRS) is a serious threat to public health in China, accounting for almost 90% cases reported globally. Infectious disease prediction may help in disease prevention despite some uncontrollable influence factors. This study conducted a comparison between a hybrid model and two single models in forecasting the monthly incidence of HFRS in China. DESIGN Time-series study. SETTING The People's Republic of China. METHODS Autoregressive integrated moving average (ARIMA) model, generalised regression neural network (GRNN) model and hybrid ARIMA-GRNN model were constructed by R V.3.4.3 software. The monthly reported incidence of HFRS from January 2011 to May 2018 were adopted to evaluate models' performance. Root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) were adopted to evaluate these models' effectiveness. Spatial stratified heterogeneity of the time series was tested by month and another GRNN model was built with a new series. RESULTS The monthly incidence of HFRS in the past several years showed a slight downtrend and obvious seasonal variation. A total of four plausible ARIMA models were built and ARIMA(2,1,1) (2,1,1)12 model was selected as the optimal model in HFRS fitting. The smooth factors of the basic GRNN model and the hybrid model were 0.027 and 0.043, respectively. The single ARIMA model was the best in fitting part (MAPE=9.1154, MAE=89.0302, RMSE=138.8356) while the hybrid model was the best in prediction (MAPE=17.8335, MAE=152.3013, RMSE=196.4682). GRNN model was revised by building model with new series and the forecasting performance of revised model (MAPE=17.6095, MAE=163.8000, RMSE=169.4751) was better than original GRNN model (MAPE=19.2029, MAE=177.0356, RMSE=202.1684). CONCLUSIONS The hybrid ARIMA-GRNN model was better than single ARIMA and basic GRNN model in forecasting monthly incidence of HFRS in China. It could be considered as a decision-making tool in HFRS prevention and control.
Collapse
Affiliation(s)
- Ya-wen Wang
- School of Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Zhong-zhou Shen
- School of Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yu Jiang
- School of Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
19
|
Tkachev V, Sorokin M, Mescheryakov A, Simonov A, Garazha A, Buzdin A, Muchnik I, Borisov N. FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier. Front Genet 2019; 9:717. [PMID: 30697229 PMCID: PMC6341065 DOI: 10.3389/fgene.2018.00717] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2018] [Accepted: 12/21/2018] [Indexed: 01/31/2023] Open
Abstract
Here, we propose a heuristic technique of data trimming for SVM termed FLOating Window Projective Separator (FloWPS), tailored for personalized predictions based on molecular data. This procedure can operate with high throughput genetic datasets like gene expression or mutation profiles. Its application prevents SVM from extrapolation by excluding non-informative features. FloWPS requires training on the data for the individuals with known clinical outcomes to create a clinically relevant classifier. The genetic profiles linked with the outcomes are broken as usual into the training and validation datasets. The unique property of FloWPS is that irrelevant features in validation dataset that don’t have significant number of neighboring hits in the training dataset are removed from further analyses. Next, similarly to the k nearest neighbors (kNN) method, for each point of a validation dataset, FloWPS takes into account only the proximal points of the training dataset. Thus, for every point of a validation dataset, the training dataset is adjusted to form a floating window. FloWPS performance was tested on ten gene expression datasets for 992 cancer patients either responding or not on the different types of chemotherapy. We experimentally confirmed by leave-one-out cross-validation that FloWPS enables to significantly increase quality of a classifier built based on the classical SVM in most of the applications, particularly for polynomial kernels.
Collapse
Affiliation(s)
- Victor Tkachev
- Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States
| | - Maxim Sorokin
- Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States.,Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| | | | - Alexander Simonov
- Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States
| | - Andrew Garazha
- Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States
| | - Anton Buzdin
- Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States.,Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,I.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia
| | - Ilya Muchnik
- Hill Center, Rutgers University, Piscataway, NJ, United States
| | - Nicolas Borisov
- Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States.,I.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia
| |
Collapse
|
20
|
Kaspi O, Yosipof A, Senderowitz H. Visualization of Solar Cell Library Space by Dimensionality Reduction Methods. J Chem Inf Model 2018; 58:2428-2439. [PMID: 30485100 DOI: 10.1021/acs.jcim.8b00552] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Visualizing high-dimensional data by projecting them into a two- or three-dimensional space is a popular approach in many scientific fields, including computer-aided drug design and cheminformatics. In contrast, dimensionality reduction techniques have been far less explored for materials informatics. Nevertheless, similar to their usefulness in analyzing the space of, e.g., drug-like molecules, such techniques could provide useful insights on materials space, including an intuitive grasp of the overall distribution of samples, the identification of interesting trends, including the formation of materials clusters and the presence of activity cliffs and outliers, and rational navigation through this space in the search for new materials. Here we present the first application of four dimensionality reduction techniques, namely, principal component analysis (PCA), kernel PCA, Isomap, and diffusion map, to visualize and analyze a part of the materials space populated by solar cells made of metal oxides. Solar cells in general and metal-oxide-based solar cells in particular hold the promise of contributing to the world's search for clean and affordable energy resources. With the exception of PCA, these methods have seldom been used to visualize chemistry space and almost never been used to visualize materials space. For this purpose, we integrated five metal-oxide-based solar cell libraries into a uniform database and subjected it to dimensionality reduction by all four methods, comparing their performances using various criteria such as maintaining the local environment of samples and the clustering structure in the low-dimensional space. We also looked at the number of outliers produced by each method and analyzed common outliers. We found that PCA performs best in terms of the ability to correctly maintain the local environment of samples, whereas Isomap does the best job of assigning class membership on the basis of the identities of nearest neighbors (i.e., it is the best classifier). We also found that many of the outliers identified by all of the methods could be rationalized. We suggest that the methods used in this work could be extended to study other types of solar cells, thereby setting the ground for further analysis of the photovoltaic (PV) space as well as other regions of materials space.
Collapse
Affiliation(s)
- Omer Kaspi
- Department of Chemistry , Bar-Ilan University , Ramat-Gan 5290002 , Israel
| | - Abraham Yosipof
- Department of Information Systems , College of Law & Business, Ramat-Gan , P.O. Box 852, Bnei Brak 5110801 , Israel
| | - Hanoch Senderowitz
- Department of Chemistry , Bar-Ilan University , Ramat-Gan 5290002 , Israel
| |
Collapse
|
21
|
Wang Y, Yang Y, Jiao J, Wu Z, Yang M. Support Vector Regression Approach to Predict the Design Space for the Extraction Process of Pueraria lobata. Molecules 2018; 23:molecules23102405. [PMID: 30241281 PMCID: PMC6222814 DOI: 10.3390/molecules23102405] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Revised: 09/11/2018] [Accepted: 09/18/2018] [Indexed: 11/16/2022] Open
Abstract
A support vector regression (SVR) method was introduced to improve the robustness and predictability of the design space in the implementation of quality by design (QbD), taking the extraction process of Pueraria lobata as a case study. In this paper, extraction time, number of extraction cycles, and liquid–solid ratio were identified as critical process parameters (CPPs), and the yield of puerarin, total isoflavonoids, and extracta sicca were the critical quality attributes (CQAs). Models between CQAs and CPPs were constructed using both a conventional quadratic polynomial model (QPM) and the SVR algorithm. The results of the two models indicated that the SVR model had better performance, with a higher R2 and lower root-mean-square error (RMSE) and mean absolute deviation (MAD) than those of the QPM. Furthermore, the design space was predicted using a grid search technique. The operational range was extraction time, 24–51 min; number of extraction cycles, 3; and liquid–solid ratio, 14–18 mL/g. This study is the first reported work optimizing the design space of the extraction process of P. lobata based on an SVR model. SVR modeling, with its better prediction accuracy and generalization ability, could be a reliable tool for predicting the design space and shows great potential for the quality control of QbD.
Collapse
Affiliation(s)
- Yaqi Wang
- College of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 610072, China.
- Key Laboratory of Modern Preparation of Traditional Chinese Medicine, Ministry of Education, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China.
| | - Yuanzhen Yang
- Key Laboratory of Modern Preparation of Traditional Chinese Medicine, Ministry of Education, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China.
| | - Jiaojiao Jiao
- College of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 610072, China.
| | - Zhenfeng Wu
- Key Laboratory of Modern Preparation of Traditional Chinese Medicine, Ministry of Education, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China.
| | - Ming Yang
- College of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 610072, China.
- Key Laboratory of Modern Preparation of Traditional Chinese Medicine, Ministry of Education, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China.
| |
Collapse
|
22
|
Prediction Methods of Herbal Compounds in Chinese Medicinal Herbs. Molecules 2018; 23:molecules23092303. [PMID: 30201875 PMCID: PMC6225236 DOI: 10.3390/molecules23092303] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 09/06/2018] [Accepted: 09/07/2018] [Indexed: 12/12/2022] Open
Abstract
Chinese herbal medicine has recently gained worldwide attention. The curative mechanism of Chinese herbal medicine is compared with that of western medicine at the molecular level. The treatment mechanism of most Chinese herbal medicines is still not clear. How do we integrate Chinese herbal medicine compounds with modern medicine? Chinese herbal medicine drug-like prediction method is particularly important. A growing number of Chinese herbal source compounds are now widely used as drug-like compound candidates. An important way for pharmaceutical companies to develop drugs is to discover potentially active compounds from related herbs in Chinese herbs. The methods for predicting the drug-like properties of Chinese herbal compounds include the virtual screening method, pharmacophore model method and machine learning method. In this paper, we focus on the prediction methods for the medicinal properties of Chinese herbal medicines. We analyze the advantages and disadvantages of the above three methods, and then introduce the specific steps of the virtual screening method. Finally, we present the prospect of the joint application of various methods.
Collapse
|