1
|
Shamim S, Akhtar M, Gul S. Novel designed analogues of quercetin against SARS-CoV2:an in-silico pharmacokinetic evaluation, molecular modeling, MD simulations based study. J Biomol Struct Dyn 2023:1-19. [PMID: 37798928 DOI: 10.1080/07391102.2023.2265469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 09/24/2023] [Indexed: 10/07/2023]
Abstract
Here we present the design of the series of quercetin analogues and their molecular docking study involving the binding of quercetin and its analogues with SARS-CoV2 3CLpro. The scientific literature shows that quercetin compound has been successfully used against SARS-CoV by inhibiting the replication of virus in respiratory epithelial cell through the inhibition of the SARS-CoV main protease (3CLpro.) It was suggested that the modification at position 3 in quercetin structure may produce potent compounds against SARS-CoV2. A series of quercetin analogues were designed and screened for physicochemical and pharmacokinetics parameters. The activities of selected compounds against SARS-CoV2 were screened by molecular modelling and evaluated that analogues, Q5, Q6 and Q13 have the best docking scores (-8.01 to -8.17 kcal/mol) and also better than quercetin, α-ketoamide and current available inhibitors of the same target. The structure-activity relationship (SAR) study revealed that the introduction of the amino group in a designed molecule was highly promising for increasing the inhibitory activity against SARS-CoV2 3CL pro. Moreover, to check the stability and orientation of selected compounds inside the binding pocket, the molecular dynamic simulations were performed for 100 ns. Results revealed that the designed analogues Q1, Q6 and Q13 having lowest binding energies (-8.0, -8.17 and -8.06 kcal/mol respectively) as well as better physicochemical properties, pharmacokinetics, and toxicity profile show their potential to synthesize and develop as the therapeutic agents against corona virus.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Sumbul Shamim
- Department of Pharmacology, Faculty of Pharmaceutical Sciences, Dow College of Pharmacy, Dow University of Health Sciences, Karachi, Pakistan
| | - Mahwish Akhtar
- Department of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Dow College of Pharmacy, Dow University of Health Sciences, Karachi, Pakistan
| | - Somia Gul
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Jinnah University for Women, Karachi, Pakistan
| |
Collapse
|
2
|
Hentabli H, Bengherbia B, Saeed F, Salim N, Nafea I, Toubal A, Nasser M. Convolutional Neural Network Model Based on 2D Fingerprint for Bioactivity Prediction. Int J Mol Sci 2022; 23:13230. [PMID: 36362018 PMCID: PMC9657591 DOI: 10.3390/ijms232113230] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 10/22/2022] [Accepted: 10/27/2022] [Indexed: 10/15/2023] Open
Abstract
Determining and modeling the possible behaviour and actions of molecules requires investigating the basic structural features and physicochemical properties that determine their behaviour during chemical, physical, biological, and environmental processes. Computational approaches such as machine learning methods are alternatives to predicting the physiochemical properties of molecules based on their structures. However, the limited accuracy and high error rates of such predictions restrict their use. In this paper, a novel technique based on a deep learning convolutional neural network (CNN) for the prediction of chemical compounds' bioactivity is proposed and developed. The molecules are represented in the new matrix format Mol2mat, a molecular matrix representation adapted from the well-known 2D-fingerprint descriptors. To evaluate the performance of the proposed methods, a series of experiments were conducted using two standard datasets, namely the MDL Drug Data Report (MDDR) and Sutherland, datasets comprising 10 homogeneous and 14 heterogeneous activity classes. After analysing the eight fingerprints, all the probable combinations were investigated using the five best descriptors. The results showed that a combination of three fingerprints, ECFP4, EPFP4, and ECFC4, along with a CNN activity prediction process, achieved the highest performance of 98% AUC when compared to the state-of-the-art ML algorithms NaiveB, LSVM, and RBFN.
Collapse
Affiliation(s)
- Hamza Hentabli
- Laboratory of Advanced Electronics Systems (LSEA), University of Medea, Medea 26000, Algeria
- UTM Big Data Centre, Ibnu Sina Institute for Scientific and Industrial Research, Universiti Teknologi Malaysia, Johor Bahru 81310, Johor, Malaysia
| | - Billel Bengherbia
- Laboratory of Advanced Electronics Systems (LSEA), University of Medea, Medea 26000, Algeria
| | - Faisal Saeed
- UTM Big Data Centre, Ibnu Sina Institute for Scientific and Industrial Research, Universiti Teknologi Malaysia, Johor Bahru 81310, Johor, Malaysia
- DAAI Research Group, Department of Computing and Data Science, School of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK
| | - Naomie Salim
- UTM Big Data Centre, Ibnu Sina Institute for Scientific and Industrial Research, Universiti Teknologi Malaysia, Johor Bahru 81310, Johor, Malaysia
| | - Ibtehal Nafea
- College of Computer Science and Engineering, Taibah University, Medina 41477, Saudi Arabia
| | - Abdelmoughni Toubal
- Laboratory of Advanced Electronics Systems (LSEA), University of Medea, Medea 26000, Algeria
| | - Maged Nasser
- School of Computer Sciences, Universiti Sains Malaysia, Gelugor 11800, Penang, Malaysia
| |
Collapse
|
3
|
Feature Reduction for Molecular Similarity Searching Based on Autoencoder Deep Learning. Biomolecules 2022; 12:biom12040508. [PMID: 35454097 PMCID: PMC9029813 DOI: 10.3390/biom12040508] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Revised: 03/21/2022] [Accepted: 03/22/2022] [Indexed: 01/27/2023] Open
Abstract
The concept of molecular similarity has been commonly used in rational drug design, where structurally similar molecules are examined in molecular databases to retrieve functionally similar molecules. The most used conventional similarity methods used two-dimensional (2D) fingerprints to evaluate the similarity of molecules towards a target query. However, these descriptors include redundant and irrelevant features that might impact the performance of similarity searching methods. Thus, this study proposed a new approach for identifying the important features of molecules in chemical datasets based on the representation of the molecular features using Autoencoder (AE), with the aim of removing irrelevant and redundant features. The proposed approach experimented using the MDL Data Drug Report standard dataset (MDDR). Based on experimental findings, the proposed approach performed better than several existing benchmark similarity methods such as Tanimoto Similarity Method (TAN), Adapted Similarity Measure of Text Processing (ASMTP), and Quantum-Based Similarity Method (SQB). The results demonstrated that the performance achieved by the proposed approach has proven to be superior, particularly with the use of structurally heterogeneous datasets, where it yielded improved results compared to other previously used methods with the similar goal of improving molecular similarity searching.
Collapse
|
4
|
Artificial intelligence in drug design: algorithms, applications, challenges and ethics. FUTURE DRUG DISCOVERY 2021. [DOI: 10.4155/fdd-2020-0028] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The discovery paradigm of drugs is rapidly growing due to advances in machine learning (ML) and artificial intelligence (AI). This review covers myriad faces of AI and ML in drug design. There is a plethora of AI algorithms, the most common of which are summarized in this review. In addition, AI is fraught with challenges that are highlighted along with plausible solutions to them. Examples are provided to illustrate the use of AI and ML in drug discovery and in predicting drug properties such as binding affinities and interactions, solubility, toxicology, blood–brain barrier permeability and chemical properties. The review also includes examples depicting the implementation of AI and ML in tackling intractable diseases such as COVID-19, cancer and Alzheimer’s disease. Ethical considerations and future perspectives of AI are also covered in this review.
Collapse
|
5
|
Nasser M, Salim N, Hamza H, Saeed F, Rabiu I. Improved Deep Learning Based Method for Molecular Similarity Searching Using Stack of Deep Belief Networks. Molecules 2020; 26:E128. [PMID: 33383976 PMCID: PMC7795308 DOI: 10.3390/molecules26010128] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 12/24/2020] [Accepted: 12/25/2020] [Indexed: 11/24/2022] Open
Abstract
Virtual screening (VS) is a computational practice applied in drug discovery research. VS is popularly applied in a computer-based search for new lead molecules based on molecular similarity searching. In chemical databases similarity searching is used to identify molecules that have similarities to a user-defined reference structure and is evaluated by quantitative measures of intermolecular structural similarity. Among existing approaches, 2D fingerprints are widely used. The similarity of a reference structure and a database structure is measured by the computation of association coefficients. In most classical similarity approaches, it is assumed that the molecular features in both biological and non-biologically-related activity carry the same weight. However, based on the chemical structure, it has been found that some distinguishable features are more important than others. Hence, this difference should be taken consideration by placing more weight on each important fragment. The main aim of this research is to enhance the performance of similarity searching by using multiple descriptors. In this paper, a deep learning method known as deep belief networks (DBN) has been used to reweight the molecule features. Several descriptors have been used for the MDL Drug Data Report (MDDR) dataset each of which represents different important features. The proposed method has been implemented with each descriptor individually to select the important features based on a new weight, with a lower error rate, and merging together all new features from all descriptors to produce a new descriptor for similarity searching. Based on the extensive experiments conducted, the results show that the proposed method outperformed several existing benchmark similarity methods, including Bayesian inference networks (BIN), the Tanimoto similarity method (TAN), adapted similarity measure of text processing (ASMTP) and the quantum-based similarity method (SQB). The results of this proposed multi-descriptor-based on Stack of deep belief networks method (SDBN) demonstrated a higher accuracy compared to existing methods on structurally heterogeneous datasets.
Collapse
Affiliation(s)
- Maged Nasser
- School of Computing, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia; (H.H.); (I.R.)
| | - Naomie Salim
- School of Computing, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia; (H.H.); (I.R.)
| | - Hentabli Hamza
- School of Computing, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia; (H.H.); (I.R.)
| | - Faisal Saeed
- College of Computer Science and Engineering, Taibah University, Medina 344, Saudi Arabia
| | - Idris Rabiu
- School of Computing, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia; (H.H.); (I.R.)
| |
Collapse
|
6
|
Hu S, Chen P, Gu P, Wang B. A Deep Learning-Based Chemical System for QSAR Prediction. IEEE J Biomed Health Inform 2020; 24:3020-3028. [PMID: 32142459 DOI: 10.1109/jbhi.2020.2977009] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Research on quantitative structure-activity relationships (QSAR) provides an effective approach to determine new hits and promising lead compounds during drug discovery. In the past decades, various works have gained good performance for QSAR with the development of machine learning. The rise of deep learning, along with massive accessible chemical databases, made improvement on the QSAR performance. This article proposes a novel deep-learning-based method to implement QSAR prediction by the concatenation of end-to-end encoder-decoder model and convolutional neural network (CNN) architecture. The encoder-decoder model is mainly used to generate fixed-size latent features to represent chemical molecules; while these features are then input into CNN framework to train a robust and stable model and finally to predict active chemicals. Two models with different schemes are investigated to evaluate the validity of our proposed model on the same data sets. Experimental results showed that our proposed method outperforms other state-of-the-art methods in successful identification of chemical molecule whether it is active.
Collapse
|
7
|
Bioactivity Prediction Using Convolutional Neural Network. ADVANCES IN INTELLIGENT SYSTEMS AND COMPUTING 2020. [DOI: 10.1007/978-3-030-33582-3_33] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
8
|
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019; 119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 346] [Impact Index Per Article: 69.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|
9
|
Ricart E, Leclère V, Flissi A, Mueller M, Pupin M, Lisacek F. rBAN: retro-biosynthetic analysis of nonribosomal peptides. J Cheminform 2019; 11:13. [PMID: 30737579 PMCID: PMC6689883 DOI: 10.1186/s13321-019-0335-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Accepted: 01/31/2019] [Indexed: 12/19/2022] Open
Abstract
Proteinogenic and non-proteinogenic amino acids, fatty acids or glycans are some of the main building blocks of nonribsosomal peptides (NRPs) and as such may give insight into the origin, biosynthesis and bioactivities of their constitutive peptides. Hence, the structural representation of NRPs using monomers provides a biologically interesting skeleton of these secondary metabolites. Databases dedicated to NRPs such as Norine, already integrate monomer-based annotations in order to facilitate the development of structural analysis tools. In this paper, we present rBAN (retro-biosynthetic analysis of nonribosomal peptides), a new computational tool designed to predict the monomeric graph of NRPs from their atomic structure in SMILES format. This prediction is achieved through the "in silico" fragmentation of a chemical structure and matching the resulting fragments against the monomers of Norine for identification. Structures containing monomers not yet recorded in Norine, are processed in a "discovery mode" that uses the RESTful service from PubChem to search the unidentified substructures and suggest new monomers. rBAN was integrated in a pipeline for the curation of Norine data in which it was used to check the correspondence between the monomeric graphs annotated in Norine and SMILES-predicted graphs. The process concluded with the validation of the 97.26% of the records in Norine, a two-fold extension of its SMILES data and the introduction of 11 new monomers suggested in the discovery mode. The accuracy, robustness and high-performance of rBAN were demonstrated in benchmarking it against other tools with the same functionality: Smiles2Monomers and GRAPE.
Collapse
Affiliation(s)
- Emma Ricart
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, 1211, Geneva, Switzerland. .,Computer Science Department, University of Geneva, Geneva, Switzerland.
| | - Valérie Leclère
- EA 7394-ICV- Institut Charles Viollette, University of Lille, INRA, ISA, University of Artois, Univ. Littoral Côte d'Opale, 59000, Lille, France
| | - Areski Flissi
- UMR 9189- CRIStAL- Centre de Recherche en Informatique Signal et Automatique de Lille, University of Lille, CNRS, Centrale Lille, 59000, Lille, France.,Bonsai Team, Inria-Lille Nord Europe, 9655, Villeneuve d'Ascq Cedex, France
| | - Markus Mueller
- Vital-IT Group, SIB Swiss Institute of Bioinformatics, Amphipole Building, Quartier Sorge, 1015, Lausanne, Switzerland
| | - Maude Pupin
- UMR 9189- CRIStAL- Centre de Recherche en Informatique Signal et Automatique de Lille, University of Lille, CNRS, Centrale Lille, 59000, Lille, France.,Bonsai Team, Inria-Lille Nord Europe, 9655, Villeneuve d'Ascq Cedex, France
| | - Frédérique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, 1211, Geneva, Switzerland.,Computer Science Department, University of Geneva, Geneva, Switzerland.,Section of Biology, University of Geneva, Geneva, Switzerland
| |
Collapse
|
10
|
Alberga D, Trisciuzzi D, Montaruli M, Leonetti F, Mangiatordi GF, Nicolotti O. A New Approach for Drug Target and Bioactivity Prediction: The Multifingerprint Similarity Search Algorithm (MuSSeL). J Chem Inf Model 2018; 59:586-596. [PMID: 30485097 DOI: 10.1021/acs.jcim.8b00698] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
We present MuSSeL, a multifingerprint similarity search algorithm, able to predict putative drug targets for a given query small molecule as well as to return a quantitative assessment of its bioactivity in terms of Ki or IC50 values. Predictions are automatically made exploiting a large collection of high quality experimental bioactivity data available from ChEMBL (version 22.1) combining, in a consensus-like approach, predictions resulting from a similarity search performed using 13 different fingerprint definitions. Importantly, the herein proposed algorithm is also effective in detecting and handling activity cliffs. A calibration set including small molecules present in the last updated version of ChEMBL (version 23) was employed to properly tune the algorithm parameters. Three randomly built external sets were instead challenged for model performances. The potential use of MuSSeL was also challenged by a prospective exercise for the prediction of five bioactive compounds taken from articles published in the Journal of Medicinal Chemistry just few months ago. The paper emphasizes the importance of implementing multifingerprint consensus strategies to increase the confidence in prediction of similarity search algorithms and provides a fast and easy-to-run tool for drug target and bioactivity prediction.
Collapse
Affiliation(s)
- Domenico Alberga
- Dipartimento di Farmacia-Scienze del Farmaco , Università degli Studi di Bari "Aldo Moro" , Via E. Orabona, 4 , I-70126 Bari , Italy
| | - Daniela Trisciuzzi
- Dipartimento di Farmacia-Scienze del Farmaco , Università degli Studi di Bari "Aldo Moro" , Via E. Orabona, 4 , I-70126 Bari , Italy
| | - Michele Montaruli
- Dipartimento di Farmacia-Scienze del Farmaco , Università degli Studi di Bari "Aldo Moro" , Via E. Orabona, 4 , I-70126 Bari , Italy
| | - Francesco Leonetti
- Dipartimento di Farmacia-Scienze del Farmaco , Università degli Studi di Bari "Aldo Moro" , Via E. Orabona, 4 , I-70126 Bari , Italy
| | - Giuseppe Felice Mangiatordi
- Dipartimento di Farmacia-Scienze del Farmaco , Università degli Studi di Bari "Aldo Moro" , Via E. Orabona, 4 , I-70126 Bari , Italy
| | - Orazio Nicolotti
- Dipartimento di Farmacia-Scienze del Farmaco , Università degli Studi di Bari "Aldo Moro" , Via E. Orabona, 4 , I-70126 Bari , Italy
| |
Collapse
|
11
|
Petinrin OO, Saeed F. Bioactive molecule prediction using majority voting-based ensemble method. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2018. [DOI: 10.3233/jifs-169596] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
| | - Faisal Saeed
- College of Computer Science and Engineering, Taibah University, Medina, Saudi Arabia
- Department of Information Systems, Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia
| |
Collapse
|
12
|
Afolabi LT, Saeed F, Hashim H, Petinrin OO. Ensemble learning method for the prediction of new bioactive molecules. PLoS One 2018; 13:e0189538. [PMID: 29329334 PMCID: PMC5766097 DOI: 10.1371/journal.pone.0189538] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 11/27/2017] [Indexed: 12/31/2022] Open
Abstract
Pharmacologically active molecules can provide remedies for a range of different illnesses and infections. Therefore, the search for such bioactive molecules has been an enduring mission. As such, there is a need to employ a more suitable, reliable, and robust classification method for enhancing the prediction of the existence of new bioactive molecules. In this paper, we adopt a recently developed combination of different boosting methods (Adaboost) for the prediction of new bioactive molecules. We conducted the research experiments utilizing the widely used MDL Drug Data Report (MDDR) database. The proposed boosting method generated better results than other machine learning methods. This finding suggests that the method is suitable for inclusion among the in silico tools for use in cheminformatics, computational chemistry and molecular biology.
Collapse
Affiliation(s)
| | - Faisal Saeed
- College of Computer Science and Engineering, Taibah University, Medina, Saudi Arabia
- Information Systems Department, Faculty of Computing, Universiti Teknologi Malaysia, Skudai, Johor, Malaysia
| | - Haslinda Hashim
- Information Systems Department, Faculty of Computing, Universiti Teknologi Malaysia, Skudai, Johor, Malaysia
- Kolej Yayasan Pelajaran Johor, KM16, Jalan Kulai-Kota Tinggi, Kota Tinggi, Johor, Malaysia
| | | |
Collapse
|
13
|
Babajide Mustapha I, Saeed F. Bioactive Molecule Prediction Using Extreme Gradient Boosting. Molecules 2016; 21:molecules21080983. [PMID: 27483216 PMCID: PMC6273295 DOI: 10.3390/molecules21080983] [Citation(s) in RCA: 98] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2016] [Revised: 07/19/2016] [Accepted: 07/22/2016] [Indexed: 01/29/2023] Open
Abstract
Following the explosive growth in chemical and biological data, the shift from traditional methods of drug discovery to computer-aided means has made data mining and machine learning methods integral parts of today's drug discovery process. In this paper, extreme gradient boosting (Xgboost), which is an ensemble of Classification and Regression Tree (CART) and a variant of the Gradient Boosting Machine, was investigated for the prediction of biological activity based on quantitative description of the compound's molecular structure. Seven datasets, well known in the literature were used in this paper and experimental results show that Xgboost can outperform machine learning algorithms like Random Forest (RF), Support Vector Machines (LSVM), Radial Basis Function Neural Network (RBFN) and Naïve Bayes (NB) for the prediction of biological activities. In addition to its ability to detect minority activity classes in highly imbalanced datasets, it showed remarkable performance on both high and low diversity datasets.
Collapse
Affiliation(s)
- Ismail Babajide Mustapha
- UTM Big Data Centre, Ibnu Sina Institute for Scientific and Industrial Research, Universiti Teknologi Malaysia, Skudai, Johor 81310, Malaysia.
| | - Faisal Saeed
- Information Systems Department, Faculty of Computing, Universiti Teknologi Malaysia, Skudai, Johor 81310, Malaysia.
| |
Collapse
|
14
|
Kanavos A, Makris C, Plegas Y, Theodoridis E. Ranking Web Search Results Exploiting Wikipedia. INT J ARTIF INTELL T 2016. [DOI: 10.1142/s0218213016500184] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
It is widely known that search engines are the dominating tools for finding information on the web. In most of the cases, these engines return web page references on a global ranking taking in mind either the importance of the web site or the relevance of the web pages to the identified topic. In this paper, we focus on the problem of determining distinct thematic groups on web search engine results that other existing engines provide. We additionally address the problem of dynamically adapting their ranking according to user selections, incorporating user judgments as implicitly registered in their selection of relevant documents. Our system exploits a state of the art semantic web data mining technique that identifies semantic entities of Wikipedia for grouping the result set in different topic groups, according to the various meanings of the provided query. Moreover, we propose a novel probabilistic Network scheme that employs the aforementioned topic identification method, in order to modify ranking of results as the users select documents. We evaluated in practice our implemented prototype with extensive experiments with the ClueWeb09 dataset using the TREC’s 2009, 2010, 2011 and 2012 Web Tracks’ where we observed improved retrieval performance compared to current state of the art re-ranking methods.
Collapse
Affiliation(s)
- Andreas Kanavos
- Computer Engineering and Informatics Department, University of Patras, Rio, Patras, Greece, 26504
| | - Christos Makris
- Computer Engineering and Informatics Department, University of Patras, Rio, Patras, Greece, 26504
| | - Yannis Plegas
- Computer Engineering and Informatics Department, University of Patras, Rio, Patras, Greece, 26504
| | | |
Collapse
|
15
|
Dufresne Y, Noé L, Leclère V, Pupin M. Smiles2Monomers: a link between chemical and biological structures for polymers. J Cheminform 2015; 7:62. [PMID: 26715946 PMCID: PMC4693424 DOI: 10.1186/s13321-015-0111-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Accepted: 12/06/2015] [Indexed: 12/17/2022] Open
Abstract
Background The monomeric composition of polymers is powerful for structure comparison and synthetic biology, among others. Many databases give access to the atomic structure of compounds but the monomeric structure of polymers is often lacking. We have designed a smart algorithm, implemented in the tool Smiles2Monomers (s2m), to infer efficiently and accurately the monomeric structure of a polymer from its chemical structure. Results Our strategy is divided into two steps: first, monomers are mapped on the atomic structure by an efficient subgraph-isomorphism algorithm ; second, the best tiling is computed so that non-overlapping monomers cover all the structure of the target polymer. The mapping is based on a Markovian index built by a dynamic programming algorithm. The index enables s2m to search quickly all the given monomers on a target polymer. After, a greedy algorithm combines the mapped monomers into a consistent monomeric structure. Finally, a local branch and cut algorithm refines the structure. We tested this method on two manually annotated databases of polymers and reconstructed the structures de novo with a sensitivity over 90 %. The average computation time per polymer is 2 s. Conclusion s2m automatically creates de novo monomeric annotations for polymers, efficiently in terms of time computation and sensitivity. s2m allowed us to detect annotation errors in the tested databases and to easily find the accurate structures. So, s2m could be integrated into the curation process of databases of small compounds to verify the current entries and accelerate the annotation of new polymers. The full method can be downloaded or accessed via a website for peptide-like polymers at http://bioinfo.lifl.fr/norine/smiles2monomers.jsp.. ![]() Electronic supplementary material The online version of this article (doi:10.1186/s13321-015-0111-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yoann Dufresne
- Univ. Lille, CNRS, Centrale Lille, UMR 9189-CRIStAL-Centre de Recherche en Informatique Signal et Automatique de Lille, 59000 Lille, France ; Inria Lille Nord Europe, Bonsai team, Parc scientifique de la Haute Borne, 40 avenue Halley, 59650 Villeneuve d'Ascq, France
| | - Laurent Noé
- Univ. Lille, CNRS, Centrale Lille, UMR 9189-CRIStAL-Centre de Recherche en Informatique Signal et Automatique de Lille, 59000 Lille, France ; Inria Lille Nord Europe, Bonsai team, Parc scientifique de la Haute Borne, 40 avenue Halley, 59650 Villeneuve d'Ascq, France
| | - Valérie Leclère
- Univ. Lille, CNRS, Centrale Lille, UMR 9189-CRIStAL-Centre de Recherche en Informatique Signal et Automatique de Lille, 59000 Lille, France ; Inria Lille Nord Europe, Bonsai team, Parc scientifique de la Haute Borne, 40 avenue Halley, 59650 Villeneuve d'Ascq, France ; Univ. Lille, INRA, ISA, Univ. Artois, Univ. Littoral Côte d'Opale, EA 7394 - ICV - Institut Charles Viollette, 59000 Lille, France
| | - Maude Pupin
- Univ. Lille, CNRS, Centrale Lille, UMR 9189-CRIStAL-Centre de Recherche en Informatique Signal et Automatique de Lille, 59000 Lille, France ; Inria Lille Nord Europe, Bonsai team, Parc scientifique de la Haute Borne, 40 avenue Halley, 59650 Villeneuve d'Ascq, France
| |
Collapse
|