1
|
Huang S, Ding Y. Identification of Anticancer and Anti-inflammatory Drugs from Drug-target Interaction Descriptors by Machine Learning.. LETT DRUG DES DISCOV 2022. [DOI: 10.2174/1570180819666220114114752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Drug repositioning is an important subject in drug-disease research. In the past, most studies simply used drug descriptors as the feature vector to classify drugs or targets, or used qualitative data about drug-target or drug-disease to predict drug-target interactions. These data provide limited information for drug repositioning.
Objective:
Considering both drugs and targets and constructing quantitative drug-target interaction descriptors as a method of drug characteristics are of great significance to the study of drug repositioning.
Methods:
Taking anticancer and anti-inflammatory drugs as research objects, the interaction sites between drugs and targets were determined by molecular docking. Sixty-seven drug-target interaction descriptors were calculated to describe the drug-target interactions, and 22 important descriptors were screened for drug classification by SVM, LightGBM and MLP.
Results:
The accuracy of SVM, LightGBM and MLP reached 93.29%, 92.68% and 94.51%, their Matthews correlation coefficients reached 0.852, 0.840 and 0.882, and their areas under the ROC curve reached 0.977, 0.969 and 0.968, respectively.
Conclusion:
Using drug-target interaction descriptors to build machine learning models can obtain better results for drug classification. Number of atom pairs, force field, hydrophobic interactions and bSASA are the four types of key features for the classification of anticancer and anti-inflammatory drugs.
Collapse
Affiliation(s)
- Songtao Huang
- School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, P.R. China
- Laboratory of Media Design and Software Technology, Jiangnan University, Wuxi, Jiangsu, 214122, P.R. China
| | - Yanrui Ding
- school of Science, Jiangnan University, Wuxi, Jiangsu, 214122, P.R. China
- Key Laboratory of Industrial Biotechnology, Jiangnan University, Wuxi, Jiangsu, 214122, P.R. China
| |
Collapse
|
2
|
Liu P, Li H, Li S, Leung KS. Improving prediction of phenotypic drug response on cancer cell lines using deep convolutional network. BMC Bioinformatics 2019; 20:408. [PMID: 31357929 PMCID: PMC6664725 DOI: 10.1186/s12859-019-2910-6] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2018] [Accepted: 05/21/2019] [Indexed: 12/11/2022] Open
Abstract
Background Understanding the phenotypic drug response on cancer cell lines plays a vital role in anti-cancer drug discovery and re-purposing. The Genomics of Drug Sensitivity in Cancer (GDSC) database provides open data for researchers in phenotypic screening to build and test their models. Previously, most research in these areas starts from the molecular fingerprints or physiochemical features of drugs, instead of their structures. Results In this paper, a model called twin Convolutional Neural Network for drugs in SMILES format (tCNNS) is introduced for phenotypic screening. tCNNS uses a convolutional network to extract features for drugs from their simplified molecular input line entry specification (SMILES) format and uses another convolutional network to extract features for cancer cell lines from the genetic feature vectors respectively. After that, a fully connected network is used to predict the interaction between the drugs and the cancer cell lines. When the training set and the testing set are divided based on the interaction pairs between drugs and cell lines, tCNNS achieves 0.826, 0.831 for the mean and top quartile of the coefficient of determinant (R2) respectively and 0.909, 0.912 for the mean and top quartile of the Pearson correlation (Rp) respectively, which are significantly better than those of the previous works (Ammad-Ud-Din et al., J Chem Inf Model 54:2347–9, 2014), (Haider et al., PLoS ONE 10:0144490, 2015), (Menden et al., PLoS ONE 8:61318, 2013). However, when the training set and the testing set are divided exclusively based on drugs or cell lines, the performance of tCNNS decreases significantly and Rp and R2 drop to barely above 0. Conclusions Our approach is able to predict the drug effects on cancer cell lines with high accuracy, and its performance remains stable with less but high-quality data, and with fewer features for the cancer cell lines. tCNNS can also solve the problem of outliers in other feature space. Besides achieving high scores in these statistical metrics, tCNNS also provides some insights into the phenotypic screening. However, the performance of tCNNS drops in the blind test. Electronic supplementary material The online version of this article (10.1186/s12859-019-2910-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pengfei Liu
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China.
| | - Hongjian Li
- SDIVF R&D Centre, Hong Kong Science Park, Sha Tin, N.T., Hong Kong, China.,CUHK-SDU Reproductive Genetics Joint Laboratory, School of Biomedical Sciences, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China
| | - Shuai Li
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China
| |
Collapse
|
3
|
Cao J, Zhang K, Yong H, Lai X, Chen B, Lin Z. Extreme Learning Machine With Affine Transformation Inputs in an Activation Function. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2093-2107. [PMID: 30442621 DOI: 10.1109/tnnls.2018.2877468] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The extreme learning machine (ELM) has attracted much attention over the past decade due to its fast learning speed and convincing generalization performance. However, there still remains a practical issue to be approached when applying the ELM: the randomly generated hidden node parameters without tuning can lead to the hidden node outputs being nonuniformly distributed, thus giving rise to poor generalization performance. To address this deficiency, a novel activation function with an affine transformation (AT) on its input is introduced into the ELM, which leads to an improved ELM algorithm that is referred to as an AT-ELM in this paper. The scaling and translation parameters of the AT activation function are computed based on the maximum entropy principle in such a way that the hidden layer outputs approximately obey a uniform distribution. Application of the AT-ELM algorithm in nonlinear function regression shows its robustness to the range scaling of the network inputs. Experiments on nonlinear function regression, real-world data set classification, and benchmark image recognition demonstrate better performance for the AT-ELM compared with the original ELM, the regularized ELM, and the kernel ELM. Recognition results on benchmark image data sets also reveal that the AT-ELM outperforms several other state-of-the-art algorithms in general.
Collapse
|
4
|
|
5
|
Rataj K, Czarnecki W, Podlewska S, Pocha A, Bojarski AJ. Substructural Connectivity Fingerprint and Extreme Entropy Machines-A New Method of Compound Representation and Analysis. Molecules 2018; 23:E1242. [PMID: 29789513 PMCID: PMC6100401 DOI: 10.3390/molecules23061242] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Revised: 05/19/2018] [Accepted: 05/21/2018] [Indexed: 11/16/2022] Open
Abstract
Key-based substructural fingerprints are an important element of computer-aided drug design techniques. The usefulness of the fingerprints in filtering compound databases is invaluable, as they allow for the quick rejection of molecules with a low probability of being active. However, this method is flawed, as it does not consider the connections between substructures. After changing the connections between particular chemical moieties, the fingerprint representation of the compound remains the same, which leads to difficulties in distinguishing between active and inactive compounds. In this study, we present a new method of compound representation-substructural connectivity fingerprints (SCFP), providing information not only about the presence of particular substructures in the molecule but also additional data on substructure connections. Such representation was analyzed by the recently developed methodology-extreme entropy machines (EEM). The SCFP can be a valuable addition to virtual screening tools, as it represents compound structure with greater detail and more specificity, allowing for more accurate classification.
Collapse
Affiliation(s)
- Krzysztof Rataj
- Institute of Pharmacology, Polish Academy of Sciences, Department of Medicinal Chemistry, Smętna Street 12, 31-343 Kraków, Poland.
| | - Wojciech Czarnecki
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza Street 6, 30-348 Kraków, Poland.
| | - Sabina Podlewska
- Institute of Pharmacology, Polish Academy of Sciences, Department of Medicinal Chemistry, Smętna Street 12, 31-343 Kraków, Poland.
| | - Agnieszka Pocha
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza Street 6, 30-348 Kraków, Poland.
| | - Andrzej J Bojarski
- Institute of Pharmacology, Polish Academy of Sciences, Department of Medicinal Chemistry, Smętna Street 12, 31-343 Kraków, Poland.
| |
Collapse
|
6
|
Rataj K, Kelemen ÁA, Brea J, Loza MI, Bojarski AJ, Keserű GM. Fingerprint-Based Machine Learning Approach to Identify Potent and Selective 5-HT 2BR Ligands. Molecules 2018; 23:molecules23051137. [PMID: 29748476 PMCID: PMC6100008 DOI: 10.3390/molecules23051137] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Revised: 05/05/2018] [Accepted: 05/07/2018] [Indexed: 11/16/2022] Open
Abstract
The identification of subtype-selective GPCR (G-protein coupled receptor) ligands is a challenging task. In this study, we developed a computational protocol to find compounds with 5-HT2BR versus 5-HT1BR selectivity. Our approach employs the hierarchical combination of machine learning methods, docking, and multiple scoring methods. First, we applied machine learning tools to filter a large database of druglike compounds by the new Neighbouring Substructures Fingerprint (NSFP). This two-dimensional fingerprint contains information on the connectivity of the substructural features of a compound. Preselected subsets of the database were then subjected to docking calculations. The main indicators of compounds’ selectivity were their different interactions with the secondary binding pockets of both target proteins, while binding modes within the orthosteric binding pocket were preserved. The combined methodology of ligand-based and structure-based methods was validated prospectively, resulting in the identification of hits with nanomolar affinity and ten-fold to ten thousand-fold selectivities.
Collapse
Affiliation(s)
- Krzysztof Rataj
- Department of Medicinal Chemistry, Institute of Pharmacology, Polish Academy of Sciences, 12 Smętna Street, 31-343 Krakow, Poland.
| | - Ádám Andor Kelemen
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok krt. 2, H1117 Budapest, Hungary.
| | - José Brea
- Grupo de Investigación "BioFarma" USC, Centro de Investigación CIMUS, Planta 3ª, Avd. de Barcelona s/n, 15782 Santiago de Compostela, Spain.
| | - María Isabel Loza
- Grupo de Investigación "BioFarma" USC, Centro de Investigación CIMUS, Planta 3ª, Avd. de Barcelona s/n, 15782 Santiago de Compostela, Spain.
| | - Andrzej J Bojarski
- Department of Medicinal Chemistry, Institute of Pharmacology, Polish Academy of Sciences, 12 Smętna Street, 31-343 Krakow, Poland.
| | - György Miklós Keserű
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok krt. 2, H1117 Budapest, Hungary.
| |
Collapse
|
7
|
Pasupa K, Kudisthalert W. Virtual screening by a new Clustering-based Weighted Similarity Extreme Learning Machine approach. PLoS One 2018; 13:e0195478. [PMID: 29652912 PMCID: PMC5898726 DOI: 10.1371/journal.pone.0195478] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 03/24/2018] [Indexed: 12/31/2022] Open
Abstract
Machine learning techniques are becoming popular in virtual screening tasks. One of the powerful machine learning algorithms is Extreme Learning Machine (ELM) which has been applied to many applications and has recently been applied to virtual screening. We propose the Weighted Similarity ELM (WS-ELM) which is based on a single layer feed-forward neural network in a conjunction of 16 different similarity coefficients as activation function in the hidden layer. It is known that the performance of conventional ELM is not robust due to random weight selection in the hidden layer. Thus, we propose a Clustering-based WS-ELM (CWS-ELM) that deterministically assigns weights by utilising clustering algorithms i.e. k-means clustering and support vector clustering. The experiments were conducted on one of the most challenging datasets–Maximum Unbiased Validation Dataset–which contains 17 activity classes carefully selected from PubChem. The proposed algorithms were then compared with other machine learning techniques such as support vector machine, random forest, and similarity searching. The results show that CWS-ELM in conjunction with support vector clustering yields the best performance when utilised together with Sokal/Sneath(1) coefficient. Furthermore, ECFP_6 fingerprint presents the best results in our framework compared to the other types of fingerprints, namely ECFP_4, FCFP_4, and FCFP_6.
Collapse
Affiliation(s)
- Kitsuchart Pasupa
- Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand
- * E-mail:
| | - Wasu Kudisthalert
- Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand
| |
Collapse
|
8
|
Discriminant document embeddings with an extreme learning machine for classifying clinical narratives. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.01.117] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
9
|
Tang X, Chen L. A self-adaptive evolutionary weighted extreme learning machine for binary imbalance learning. PROGRESS IN ARTIFICIAL INTELLIGENCE 2018. [DOI: 10.1007/s13748-017-0136-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
10
|
Kozik R. Distributing extreme learning machines with Apache Spark for NetFlow-based malware activity detection. Pattern Recognit Lett 2018. [DOI: 10.1016/j.patrec.2017.11.004] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
11
|
Zhang L, Zhang D. Evolutionary Cost-Sensitive Extreme Learning Machine. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:3045-3060. [PMID: 27740499 DOI: 10.1109/tnnls.2016.2607757] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Conventional extreme learning machines (ELMs) solve a Moore-Penrose generalized inverse of hidden layer activated matrix and analytically determine the output weights to achieve generalized performance, by assuming the same loss from different types of misclassification. The assumption may not hold in cost-sensitive recognition tasks, such as face recognition-based access control system, where misclassifying a stranger as a family member may result in more serious disaster than misclassifying a family member as a stranger. Though recent cost-sensitive learning can reduce the total loss with a given cost matrix that quantifies how severe one type of mistake against another, in many realistic cases, the cost matrix is unknown to users. Motivated by these concerns, this paper proposes an evolutionary cost-sensitive ELM, with the following merits: 1) to the best of our knowledge, it is the first proposal of ELM in evolutionary cost-sensitive classification scenario; 2) it well addresses the open issue of how to define the cost matrix in cost-sensitive learning tasks; and 3) an evolutionary backtracking search algorithm is induced for adaptive cost matrix optimization. Experiments in a variety of cost-sensitive tasks well demonstrate the effectiveness of the proposed approaches, with about 5%-10% improvements.
Collapse
|
12
|
A Novel Neutrosophic Weighted Extreme Learning Machine for Imbalanced Data Set. Symmetry (Basel) 2017. [DOI: 10.3390/sym9080142] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
13
|
Cao J, Zhang K, Luo M, Yin C, Lai X. Extreme learning machine and adaptive sparse representation for image classification. Neural Netw 2016; 81:91-102. [DOI: 10.1016/j.neunet.2016.06.001] [Citation(s) in RCA: 119] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Revised: 06/01/2016] [Accepted: 06/06/2016] [Indexed: 10/21/2022]
|
14
|
Extremely Randomized Machine Learning Methods for Compound Activity Prediction. Molecules 2015; 20:20107-17. [PMID: 26569196 PMCID: PMC6332304 DOI: 10.3390/molecules201119679] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Revised: 08/14/2015] [Accepted: 10/27/2015] [Indexed: 11/24/2022] Open
Abstract
Speed, a relatively low requirement for computational resources and high effectiveness of the evaluation of the bioactivity of compounds have caused a rapid growth of interest in the application of machine learning methods to virtual screening tasks. However, due to the growth of the amount of data also in cheminformatics and related fields, the aim of research has shifted not only towards the development of algorithms of high predictive power but also towards the simplification of previously existing methods to obtain results more quickly. In the study, we tested two approaches belonging to the group of so-called ‘extremely randomized methods’—Extreme Entropy Machine and Extremely Randomized Trees—for their ability to properly identify compounds that have activity towards particular protein targets. These methods were compared with their ‘non-extreme’ competitors, i.e., Support Vector Machine and Random Forest. The extreme approaches were not only found out to improve the efficiency of the classification of bioactive compounds, but they were also proved to be less computationally complex, requiring fewer steps to perform an optimization procedure.
Collapse
|