151
|
Cardoso-Silva J, Papageorgiou LG, Tsoka S. Network-based piecewise linear regression for QSAR modelling. J Comput Aided Mol Des 2019; 33:831-844. [PMID: 31628660 PMCID: PMC6825651 DOI: 10.1007/s10822-019-00228-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 09/28/2019] [Indexed: 02/07/2023]
Abstract
Quantitative Structure-Activity Relationship (QSAR) models are critical in various areas of drug discovery, for example in lead optimisation and virtual screening. Recently, the need for models that are not only predictive but also interpretable has been highlighted. In this paper, a new methodology is proposed to build interpretable QSAR models by combining elements of network analysis and piecewise linear regression. The algorithm presented, modSAR, splits data using a two-step procedure. First, compounds associated with a common target are represented as a network in terms of their structural similarity, revealing modules of similar chemical properties. Second, each module is subdivided into subsets (regions), each of which is modelled by an independent linear equation. Comparative analysis of QSAR models across five data sets of protein inhibitors obtained from ChEMBL is reported and it is shown that modSAR offers similar predictive accuracy to popular algorithms, such as Random Forest and Support Vector Machine. Moreover, we show that models built by modSAR are interpretatable, capable of evaluating the applicability domain of the compounds and serve well tasks such as virtual screening and the development of new drug leads.
Collapse
Affiliation(s)
- Jonathan Cardoso-Silva
- Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, Bush House, 30 Aldwych, London, WC2B 4BG, UK
| | - Lazaros G Papageorgiou
- Centre for Process Systems Engineering, Department of Chemical Engineering, University College London, Roberts Building, Torrington Place, London, WC1E 7JE, UK
| | - Sophia Tsoka
- Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, Bush House, 30 Aldwych, London, WC2B 4BG, UK.
| |
Collapse
|
152
|
Onay A, Onay M. A Drug Decision Support System for Developing a Successful Drug Candidate Using Machine Learning Techniques. Curr Comput Aided Drug Des 2019; 16:407-419. [PMID: 31438830 DOI: 10.2174/1573409915666190716143601] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Revised: 04/24/2019] [Accepted: 05/06/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND Virtual screening of candidate drug molecules using machine learning techniques plays a key role in pharmaceutical industry to design and discovery of new drugs. Computational classification methods can determine drug types according to the disease groups and distinguish approved drugs from withdrawn ones. INTRODUCTION Classification models developed in this study can be used as a simple filter in drug modelling to eliminate potentially inappropriate molecules in the early stages. In this work, we developed a Drug Decision Support System (DDSS) to classify each drug candidate molecule as potentially drug or non-drug and to predict its disease group. METHODS Molecular descriptors were identified for the determination of a number of rules in drug molecules. They were derived using ADRIANA.Code program and Lipinski's rule of five. We used Artificial Neural Network (ANN) to classify drug molecules correctly according to the types of diseases. Closed frequent molecular structures in the form of subgraph fragments were also obtained with Gaston algorithm included in ParMol Package to find common molecular fragments for withdrawn drugs. RESULTS We observed that TPSA, XlogP Natoms, HDon_O and TPSA are the most distinctive features in the pool of the molecular descriptors and evaluated the performances of classifiers on all datasets and found that classification accuracies are very high on all the datasets. Neural network models achieved 84.6% and 83.3% accuracies on test sets including cardiac therapy, anti-epileptics and anti-parkinson drugs with approved and withdrawn drugs for drug classification problems. CONCLUSION The experimental evaluation shows that the system is promising at determination of potential drug molecules to classify drug molecules correctly according to the types of diseases.
Collapse
Affiliation(s)
- Aytun Onay
- Department of Computer Engineering, Faculty of Engineering & Architecture, Kafkas University, Kars, 36100, Turkey
| | - Melih Onay
- Department of Environmental Engineering, Computational & Experimental Biochemistry Lab, Faculty of Engineering, Van Yuzuncu Yil University, 65100, Van, Turkey
| |
Collapse
|
153
|
Wang X, Li Z, Jiang M, Wang S, Zhang S, Wei Z. Molecule Property Prediction Based on Spatial Graph Embedding. J Chem Inf Model 2019; 59:3817-3828. [DOI: 10.1021/acs.jcim.9b00410] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Affiliation(s)
- Xiaofeng Wang
- College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China
| | - Zhen Li
- College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China
| | - Mingjian Jiang
- College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China
| | - Shuang Wang
- College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China
| | - Shugang Zhang
- College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China
| | - Zhiqiang Wei
- College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China
| |
Collapse
|
154
|
Liu P, Li H, Li S, Leung KS. Improving prediction of phenotypic drug response on cancer cell lines using deep convolutional network. BMC Bioinformatics 2019; 20:408. [PMID: 31357929 PMCID: PMC6664725 DOI: 10.1186/s12859-019-2910-6] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2018] [Accepted: 05/21/2019] [Indexed: 12/11/2022] Open
Abstract
Background Understanding the phenotypic drug response on cancer cell lines plays a vital role in anti-cancer drug discovery and re-purposing. The Genomics of Drug Sensitivity in Cancer (GDSC) database provides open data for researchers in phenotypic screening to build and test their models. Previously, most research in these areas starts from the molecular fingerprints or physiochemical features of drugs, instead of their structures. Results In this paper, a model called twin Convolutional Neural Network for drugs in SMILES format (tCNNS) is introduced for phenotypic screening. tCNNS uses a convolutional network to extract features for drugs from their simplified molecular input line entry specification (SMILES) format and uses another convolutional network to extract features for cancer cell lines from the genetic feature vectors respectively. After that, a fully connected network is used to predict the interaction between the drugs and the cancer cell lines. When the training set and the testing set are divided based on the interaction pairs between drugs and cell lines, tCNNS achieves 0.826, 0.831 for the mean and top quartile of the coefficient of determinant (R2) respectively and 0.909, 0.912 for the mean and top quartile of the Pearson correlation (Rp) respectively, which are significantly better than those of the previous works (Ammad-Ud-Din et al., J Chem Inf Model 54:2347–9, 2014), (Haider et al., PLoS ONE 10:0144490, 2015), (Menden et al., PLoS ONE 8:61318, 2013). However, when the training set and the testing set are divided exclusively based on drugs or cell lines, the performance of tCNNS decreases significantly and Rp and R2 drop to barely above 0. Conclusions Our approach is able to predict the drug effects on cancer cell lines with high accuracy, and its performance remains stable with less but high-quality data, and with fewer features for the cancer cell lines. tCNNS can also solve the problem of outliers in other feature space. Besides achieving high scores in these statistical metrics, tCNNS also provides some insights into the phenotypic screening. However, the performance of tCNNS drops in the blind test. Electronic supplementary material The online version of this article (10.1186/s12859-019-2910-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pengfei Liu
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China.
| | - Hongjian Li
- SDIVF R&D Centre, Hong Kong Science Park, Sha Tin, N.T., Hong Kong, China.,CUHK-SDU Reproductive Genetics Joint Laboratory, School of Biomedical Sciences, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China
| | - Shuai Li
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China
| |
Collapse
|
155
|
Error Tolerance of Machine Learning Algorithms across Contemporary Biological Targets. Molecules 2019; 24:molecules24112115. [PMID: 31167452 PMCID: PMC6601015 DOI: 10.3390/molecules24112115] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Revised: 05/31/2019] [Accepted: 06/01/2019] [Indexed: 12/16/2022] Open
Abstract
Machine learning continues to make strident advances in the prediction of desired properties concerning drug development. Problematically, the efficacy of machine learning in these arenas is reliant upon highly accurate and abundant data. These two limitations, high accuracy and abundance, are often taken together; however, insight into the dataset accuracy limitation of contemporary machine learning algorithms may yield insight into whether non-bench experimental sources of data may be used to generate useful machine learning models where there is a paucity of experimental data. We took highly accurate data across six kinase types, one GPCR, one polymerase, a human protease, and HIV protease, and intentionally introduced error at varying population proportions in the datasets for each target. With the generated error in the data, we explored how the retrospective accuracy of a Naïve Bayes Network, a Random Forest Model, and a Probabilistic Neural Network model decayed as a function of error. Additionally, we explored the ability of a training dataset with an error profile resembling that produced by the Free Energy Perturbation method (FEP+) to generate machine learning models with useful retrospective capabilities. The categorical error tolerance was quite high for a Naïve Bayes Network algorithm averaging 39% error in the training set required to lose predictivity on the test set. Additionally, a Random Forest tolerated a significant degree of categorical error introduced into the training set with an average error of 29% required to lose predictivity. However, we found the Probabilistic Neural Network algorithm did not tolerate as much categorical error requiring an average of 20% error to lose predictivity. Finally, we found that a Naïve Bayes Network and a Random Forest could both use datasets with an error profile resembling that of FEP+. This work demonstrates that computational methods of known error distribution like FEP+ may be useful in generating machine learning models not based on extensive and expensive in vitro-generated datasets.
Collapse
|
156
|
Kairys V, Baranauskiene L, Kazlauskiene M, Matulis D, Kazlauskas E. Binding affinity in drug design: experimental and computational techniques. Expert Opin Drug Discov 2019; 14:755-768. [DOI: 10.1080/17460441.2019.1623202] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Visvaldas Kairys
- Department of Bioinformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Lina Baranauskiene
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | | | - Daumantas Matulis
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Egidijus Kazlauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
157
|
Koulouridi E, Valli M, Ntie-Kang F, Bolzani VDS. A primer on natural product-based virtual screening. PHYSICAL SCIENCES REVIEWS 2019. [DOI: 10.1515/psr-2018-0105] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Abstract
Databases play an important role in various computational techniques, including virtual screening (VS) and molecular modeling in general. These collections of molecules can contain a large amount of information, making them suitable for several drug discovery applications. For example, vendor, bioactivity data or target type can be found when searching a database. The introduction of these data resources and their characteristics is used for the design of an experiment. The description of the construction of a database can also be a good advisor for the creation of a new one. There are free available databases and commercial virtual libraries of molecules. Furthermore, a computational chemist can find databases for a general purpose or a specific subset such as natural products (NPs). In this chapter, NP database resources are presented, along with some guidelines when preparing an NP database for drug discovery purposes.
Collapse
|
158
|
Ekins S, Puhl AC, Zorn KM, Lane TR, Russo DP, Klein JJ, Hickey AJ, Clark AM. Exploiting machine learning for end-to-end drug discovery and development. NATURE MATERIALS 2019; 18:435-441. [PMID: 31000803 PMCID: PMC6594828 DOI: 10.1038/s41563-019-0338-z] [Citation(s) in RCA: 243] [Impact Index Per Article: 48.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 03/07/2019] [Indexed: 05/20/2023]
Abstract
A variety of machine learning methods such as naive Bayesian, support vector machines and more recently deep neural networks are demonstrating their utility for drug discovery and development. These leverage the generally bigger datasets created from high-throughput screening data and allow prediction of bioactivities for targets and molecular properties with increased levels of accuracy. We have only just begun to exploit the potential of these techniques but they may already be fundamentally changing the research process for identifying new molecules and/or repurposing old drugs. The integrated application of such machine learning models for end-to-end (E2E) application is broadly relevant and has considerable implications for developing future therapies and their targeting.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations Pharmaceuticals, Inc., Raleigh, NC, USA.
| | - Ana C Puhl
- Collaborations Pharmaceuticals, Inc., Raleigh, NC, USA
| | | | - Thomas R Lane
- Collaborations Pharmaceuticals, Inc., Raleigh, NC, USA
| | - Daniel P Russo
- Collaborations Pharmaceuticals, Inc., Raleigh, NC, USA
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, USA
| | | | - Anthony J Hickey
- RTI International, Research Triangle Park, NC, USA
- UNC Catalyst for Rare Diseases, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alex M Clark
- Molecular Materials Informatics, Inc., Montreal, Quebec, Canada
| |
Collapse
|
159
|
Willatt MJ, Musil F, Ceriotti M. Atom-density representations for machine learning. J Chem Phys 2019; 150:154110. [DOI: 10.1063/1.5090481] [Citation(s) in RCA: 91] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Affiliation(s)
- Michael J. Willatt
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Félix Musil
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National Center for Computational Design and Discovery of Novel Materials (MARVEL), Lausanne, Switzerland
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
160
|
Jiménez-Carvelo AM, González-Casado A, Bagur-González MG, Cuadros-Rodríguez L. Alternative data mining/machine learning methods for the analytical evaluation of food quality and authenticity - A review. Food Res Int 2019; 122:25-39. [PMID: 31229078 DOI: 10.1016/j.foodres.2019.03.063] [Citation(s) in RCA: 123] [Impact Index Per Article: 24.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 03/25/2019] [Accepted: 03/26/2019] [Indexed: 12/31/2022]
Abstract
In recent years, the variety and volume of data acquired by modern analytical instruments in order to conduct a better authentication of food has dramatically increased. Several pattern recognition tools have been developed to deal with the large volume and complexity of available trial data. The most widely used methods are principal component analysis (PCA), partial least squares-discriminant analysis (PLS-DA), soft independent modelling by class analogy (SIMCA), k-nearest neighbours (kNN), parallel factor analysis (PARAFAC), and multivariate curve resolution-alternating least squares (MCR-ALS). Nevertheless, there are alternative data treatment methods, such as support vector machine (SVM), classification and regression tree (CART) and random forest (RF), that show a great potential and more advantages compared to conventional ones. In this paper, we explain the background of these methods and review and discuss the reported studies in which these three methods have been applied in the area of food quality and authenticity. In addition, we clarify the technical terminology used in this particular area of research.
Collapse
Affiliation(s)
- Ana M Jiménez-Carvelo
- Department of Analytical Chemistry, Faculty of Science, University of Granada, C/ Fuentenueva s/n, E-18071 Granada, Spain.
| | - Antonio González-Casado
- Department of Analytical Chemistry, Faculty of Science, University of Granada, C/ Fuentenueva s/n, E-18071 Granada, Spain
| | - M Gracia Bagur-González
- Department of Analytical Chemistry, Faculty of Science, University of Granada, C/ Fuentenueva s/n, E-18071 Granada, Spain
| | - Luis Cuadros-Rodríguez
- Department of Analytical Chemistry, Faculty of Science, University of Granada, C/ Fuentenueva s/n, E-18071 Granada, Spain
| |
Collapse
|
161
|
Tang W, Chen J, Wang Z, Xie H, Hong H. Deep learning for predicting toxicity of chemicals: a mini review. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, ENVIRONMENTAL CARCINOGENESIS & ECOTOXICOLOGY REVIEWS 2019; 36:252-271. [PMID: 30821199 DOI: 10.1080/10590501.2018.1537563] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Humans and wildlife inhabit a world with panoply of natural and synthetic chemicals. Alarmingly, only a limited number of chemicals have undergone comprehensive toxicological evaluation due to limitations of traditional toxicity testing. High-throughput screening assays provide a higher-speed alternative for conventional toxicity testing. Advancement of high-throughput bioassay technology has greatly increased chemical toxicity data volumes in the past decade, pushing toxicology research into a "big data" era. However, traditional data analysis methods fail to effectively process large data volumes, presenting both a challenge and an opportunity for toxicologists. Deep learning, a machine learning method leveraging deep neural networks (DNNs), is a proven useful tool for building quantitative structure-activity relationship (QSAR) models for toxicity prediction utilizing these new large datasets. In this mini review, a brief technical background on DNNs is provided, and the current state of chemical toxicity prediction models built with DNNs is reviewed. In addition, relevant toxicity data sources are summarized, possible limitations are discussed, and perspectives on DNN utilization in chemical toxicity prediction are given.
Collapse
Affiliation(s)
- Weihao Tang
- a Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology , Dalian University of Technology , Dalian , China
| | - Jingwen Chen
- a Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology , Dalian University of Technology , Dalian , China
| | - Zhongyu Wang
- a Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology , Dalian University of Technology , Dalian , China
| | - Hongbin Xie
- a Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology , Dalian University of Technology , Dalian , China
| | - Huixiao Hong
- b National Center for Toxicological Research , U.S. Food and Drug Administration , Jefferson , Arkansas , USA
| |
Collapse
|
162
|
Abdullah M, Guruprasad L. Computational fragment-based design of Wee1 kinase inhibitors with tricyclic core scaffolds. Struct Chem 2019. [DOI: 10.1007/s11224-018-1176-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
163
|
Deng S, Qiu C, Yao Z, Sun X, Wei Z, Zhuang G, Zhong X, Wang J. Multiscale simulation on thermal stability of supported metal nanocatalysts. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1405] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Shengwei Deng
- Institute of Industrial Catalysis, College of Chemical Engineering, State Key Laboratory Breeding Base of Green‐Chemical Synthesis Technology Zhejiang University of Technology Hangzhou China
| | - Chenglong Qiu
- Institute of Industrial Catalysis, College of Chemical Engineering, State Key Laboratory Breeding Base of Green‐Chemical Synthesis Technology Zhejiang University of Technology Hangzhou China
| | - Zihao Yao
- Institute of Industrial Catalysis, College of Chemical Engineering, State Key Laboratory Breeding Base of Green‐Chemical Synthesis Technology Zhejiang University of Technology Hangzhou China
| | - Xiang Sun
- Institute of Industrial Catalysis, College of Chemical Engineering, State Key Laboratory Breeding Base of Green‐Chemical Synthesis Technology Zhejiang University of Technology Hangzhou China
| | - Zhongzhe Wei
- Institute of Industrial Catalysis, College of Chemical Engineering, State Key Laboratory Breeding Base of Green‐Chemical Synthesis Technology Zhejiang University of Technology Hangzhou China
| | - Guilin Zhuang
- Institute of Industrial Catalysis, College of Chemical Engineering, State Key Laboratory Breeding Base of Green‐Chemical Synthesis Technology Zhejiang University of Technology Hangzhou China
| | - Xing Zhong
- Institute of Industrial Catalysis, College of Chemical Engineering, State Key Laboratory Breeding Base of Green‐Chemical Synthesis Technology Zhejiang University of Technology Hangzhou China
| | - Jian‐guo Wang
- Institute of Industrial Catalysis, College of Chemical Engineering, State Key Laboratory Breeding Base of Green‐Chemical Synthesis Technology Zhejiang University of Technology Hangzhou China
| |
Collapse
|
164
|
Sosnin S, Karlov D, Tetko IV, Fedorov MV. Comparative Study of Multitask Toxicity Modeling on a Broad Chemical Space. J Chem Inf Model 2019; 59:1062-1072. [PMID: 30589269 DOI: 10.1021/acs.jcim.8b00685] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Acute toxicity is one of the most challenging properties to predict purely with computational methods due to its direct relationship to biological interactions. Moreover, toxicity can be represented by different end points: it can be measured for different species using different types of administration, etc., and it is questionable if the knowledge transfer between end points is possible. We performed a comparative study of prediction multitask toxicity for a broad chemical space using different descriptors and modeling algorithms and applied multitask learning for a large toxicity data set extracted from the Registry of Toxic Effects of Chemical Substances (RTECS). We demonstrated that multitask modeling provides significant improvement over single-output models and other machine learning methods. Our research reveals that multitask learning can be very useful to improve the quality of acute toxicity modeling and raises a discussion about the usage of multitask approaches for regulation purposes. Our MultiTox models are freely available in OCHEM platform ( ochem.eu/multitox ) under CC-BY-NC license.
Collapse
Affiliation(s)
- Sergey Sosnin
- Skolkovo Institute of Science and Technology , Skolkovo Innovation Center , Moscow 143026 , Russia
| | - Dmitry Karlov
- Skolkovo Institute of Science and Technology , Skolkovo Innovation Center , Moscow 143026 , Russia
| | - Igor V Tetko
- Helmholtz Zentrum München-Research Center for Environmental Health (GmbH) , Institute of Structural Biology and BIGCHEM GmbH , Ingolstädter Landstraße 1 , D-85764 Neuherberg , Germany
| | - Maxim V Fedorov
- Skolkovo Institute of Science and Technology , Skolkovo Innovation Center , Moscow 143026 , Russia.,University of Strathclyde , Department of Physics , John Anderson Building, 107 Rottenrow East , Glasgow , U.K. G40NG
| |
Collapse
|
165
|
Zahrt AF, Henle JJ, Rose BT, Wang Y, Darrow WT, Denmark SE. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 2019; 363:363/6424/eaau5631. [PMID: 30655414 DOI: 10.1126/science.aau5631] [Citation(s) in RCA: 260] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 12/03/2018] [Indexed: 12/18/2022]
Abstract
Catalyst design in asymmetric reaction development has traditionally been driven by empiricism, wherein experimentalists attempt to qualitatively recognize structural patterns to improve selectivity. Machine learning algorithms and chemoinformatics can potentially accelerate this process by recognizing otherwise inscrutable patterns in large datasets. Herein we report a computationally guided workflow for chiral catalyst selection using chemoinformatics at every stage of development. Robust molecular descriptors that are agnostic to the catalyst scaffold allow for selection of a universal training set on the basis of steric and electronic properties. This set can be used to train machine learning methods to make highly accurate predictive models over a broad range of selectivity space. Using support vector machines and deep feed-forward neural networks, we demonstrate accurate predictive modeling in the chiral phosphoric acid-catalyzed thiol addition to N-acylimines.
Collapse
Affiliation(s)
- Andrew F Zahrt
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, IL 61801, USA
| | - Jeremy J Henle
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, IL 61801, USA
| | - Brennan T Rose
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, IL 61801, USA
| | - Yang Wang
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, IL 61801, USA
| | - William T Darrow
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, IL 61801, USA
| | - Scott E Denmark
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, IL 61801, USA.
| |
Collapse
|
166
|
Tangadpalliwar SR, Vishwakarma S, Nimbalkar R, Garg P. ChemSuite: A package for chemoinformatics calculations and machine learning. Chem Biol Drug Des 2019; 93:960-964. [PMID: 30637953 DOI: 10.1111/cbdd.13479] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2018] [Revised: 12/12/2018] [Accepted: 12/22/2018] [Indexed: 01/18/2023]
Abstract
Prediction of biological and toxicological properties of small molecules using in silico approaches has become a wide practice in pharmaceutical research to lessen the cost and enhance productivity. The development of a tool "ChemSuite," a stand-alone application for chemoinformatics calculations and machine-learning model development, is reported. Availability of multi-functional features makes it widely acceptable in various fields. Force field such as UFF is incorporated in tool for optimization of molecules. Packages like RDKit, PyDPI and PaDEL help to calculate 1D, 2D and 3D descriptors and more than 10 types of fingerprints. MinMax Scaler and Z-Score algorithms are available to normalize descriptor values. Varied descriptor selection and machine-learning algorithms are available for model development. It allows the user to add their own algorithm or extend the software for various scientific purposes. It is free, open source and has user-friendly graphical interface, and it can work on all major platforms.
Collapse
Affiliation(s)
- Sujit R Tangadpalliwar
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Mohali, Punjab, India
| | - Sachin Vishwakarma
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Mohali, Punjab, India
| | - Rakesh Nimbalkar
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Mohali, Punjab, India
| | - Prabha Garg
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Mohali, Punjab, India
| |
Collapse
|
167
|
|
168
|
Abstract
Drug promiscuity or polypharmacology is the ability of small molecules to interact with multiple protein targets simultaneously. In drug discovery, understanding the polypharmacology of potential drug molecules is crucial to improve their efficacy and safety, and to discover the new therapeutic potentials of existing drugs. Over the past decade, several computational methods have been developed to study the polypharmacology of small molecules, many of which are available as Web services. In this chapter, we review some of these Web tools focusing on ligand based approaches. We highlight in particular our recently developed polypharmacology browser (PPB) and its application for finding the side targets of a new inhibitor of the TRPV6 calcium channel.
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne, Berne, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne, Berne, Switzerland.
| |
Collapse
|
169
|
Evaluation of pyrrole-2,3-dicarboxylate derivatives: Synthesis, DFT analysis, molecular docking, virtual screening and in vitro anti-hepatic cancer study. J Mol Struct 2019. [DOI: 10.1016/j.molstruc.2018.08.049] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
170
|
Abstract
Drugs modulate disease states through their actions on targets in the body. Determining these targets aids the focused development of new treatments, and helps to better characterize those already employed. One means of accomplishing this is through the deployment of in silico methodologies, harnessing computational analytical and predictive power to produce educated hypotheses for experimental verification. Here, we provide an overview of the current state of the art, describe some of the well-established methods in detail, and reflect on how they, and emerging technologies promoting the incorporation of complex and heterogeneous data-sets, can be employed to improve our understanding of (poly)pharmacology.
Collapse
Affiliation(s)
- Ryan Byrne
- Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland.
| |
Collapse
|
171
|
Duran‐Frigola M, Fernández‐Torras A, Bertoni M, Aloy P. Formatting biological big data for modern machine learning in drug discovery. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2018. [DOI: 10.1002/wcms.1408] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Miquel Duran‐Frigola
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
| | - Adrià Fernández‐Torras
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
| | - Martino Bertoni
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
| | - Patrick Aloy
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA) Barcelona Spain
| |
Collapse
|
172
|
Ivanov SM, Huber RG, Alibay I, Warwicker J, Bond PJ. Energetic Fingerprinting of Ligand Binding to Paralogous Proteins: The Case of the Apoptotic Pathway. J Chem Inf Model 2018; 59:245-261. [DOI: 10.1021/acs.jcim.8b00765] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Stefan M. Ivanov
- Manchester Institute of Biotechnology, School of Chemistry, The University of Manchester, 131 Princess Street, Manchester M1 7DN, U.K
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Matrix 07-01, 30 Biopolis Street, Singapore 138671, Singapore
| | - Roland G. Huber
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Matrix 07-01, 30 Biopolis Street, Singapore 138671, Singapore
| | - Irfan Alibay
- Division of Pharmacy and Optometry, School of Health Sciences, The University of Manchester, Oxford Road, Manchester M13 9PT, U.K
| | - Jim Warwicker
- Manchester Institute of Biotechnology, School of Chemistry, The University of Manchester, 131 Princess Street, Manchester M1 7DN, U.K
| | - Peter J. Bond
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Matrix 07-01, 30 Biopolis Street, Singapore 138671, Singapore
- Department of Biological Sciences, National University of Singapore, 14 Science Drive 4, Singapore 117543, Singapore
| |
Collapse
|
173
|
Liu S, Alnammi M, Ericksen SS, Voter AF, Ananiev GE, Keck JL, Hoffmann FM, Wildman SA, Gitter A. Practical Model Selection for Prospective Virtual Screening. J Chem Inf Model 2018; 59:282-293. [PMID: 30500183 PMCID: PMC6351977 DOI: 10.1021/acs.jcim.8b00363] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
![]()
Virtual (computational) high-throughput
screening provides a strategy
for prioritizing compounds for experimental screens, but the choice
of virtual screening algorithm depends on the data set and evaluation
strategy. We consider a wide range of ligand-based machine learning
and docking-based approaches for virtual screening on two protein–protein
interactions, PriA-SSB and RMI-FANCM, and present a strategy for choosing
which algorithm is best for prospective compound prioritization. Our
workflow identifies a random forest as the best algorithm for these
targets over more sophisticated neural network-based models. The top
250 predictions from our selected random forest recover 37 of the
54 active compounds from a library of 22,434 new molecules assayed
on PriA-SSB. We show that virtual screening methods that perform well
on public data sets and synthetic benchmarks, like multi-task neural
networks, may not always translate to prospective screening performance
on a specific assay of interest.
Collapse
Affiliation(s)
- Shengchao Liu
- Department of Computer Sciences , University of Wisconsin-Madison , Madison , Wisconsin 53706 , United States.,Morgridge Institute for Research , Madison , Wisconsin 53715 , United States
| | - Moayad Alnammi
- Department of Computer Sciences , University of Wisconsin-Madison , Madison , Wisconsin 53706 , United States.,Morgridge Institute for Research , Madison , Wisconsin 53715 , United States
| | - Spencer S Ericksen
- Small Molecule Screening Facility , University of Wisconsin Carbone Cancer Center , Madison , Wisconsin 53792 , United States
| | - Andrew F Voter
- Department of Biomolecular Chemistry , University of Wisconsin School of Medicine and Public Health , Madison , Wisconsin 53706 , United States
| | - Gene E Ananiev
- Small Molecule Screening Facility , University of Wisconsin Carbone Cancer Center , Madison , Wisconsin 53792 , United States
| | - James L Keck
- Department of Biomolecular Chemistry , University of Wisconsin School of Medicine and Public Health , Madison , Wisconsin 53706 , United States
| | - F Michael Hoffmann
- Small Molecule Screening Facility , University of Wisconsin Carbone Cancer Center , Madison , Wisconsin 53792 , United States.,McArdle Laboratory for Cancer Research , University of Wisconsin-Madison , Madison , Wisconsin 53705 , United States
| | - Scott A Wildman
- Small Molecule Screening Facility , University of Wisconsin Carbone Cancer Center , Madison , Wisconsin 53792 , United States
| | - Anthony Gitter
- Department of Computer Sciences , University of Wisconsin-Madison , Madison , Wisconsin 53706 , United States.,Morgridge Institute for Research , Madison , Wisconsin 53715 , United States.,Department of Biostatistics and Medical Informatics , University of Wisconsin-Madison , Madison , Wisconsin 53792 , United States
| |
Collapse
|
174
|
Ståhl N, Falkman G, Karlsson A, Mathiason G, Boström J. Deep Convolutional Neural Networks for the Prediction of Molecular Properties: Challenges and Opportunities Connected to the Data. J Integr Bioinform 2018; 16:/j/jib.ahead-of-print/jib-2018-0065/jib-2018-0065.xml. [PMID: 30517077 PMCID: PMC6798861 DOI: 10.1515/jib-2018-0065] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 10/22/2018] [Indexed: 11/15/2022] Open
Abstract
We present a flexible deep convolutional neural network method for the analysis of arbitrary sized graph structures representing molecules. This method, which makes use of the Lipinski RDKit module, an open-source cheminformatics software, enables the incorporation of any global molecular (such as molecular charge and molecular weight) and local (such as atom hybridization and bond orders) information. In this paper, we show that this method significantly outperforms another recently proposed method based on deep convolutional neural networks on several datasets that are studied. Several best practices for training deep convolutional neural networks on chemical datasets are also highlighted within the article, such as how to select the information to be included in the model, how to prevent overfitting and how unbalanced classes in the data can be handled.
Collapse
Affiliation(s)
- Niclas Ståhl
- School of Informatics, University of Skövde, Högskolevägen 28, SE 54145, Skövde, Sweden
| | - Göran Falkman
- School of Informatics, University of Skövde, Skövde, Sweden
| | | | | | - Jonas Boström
- Department of Medicinal Chemistry, CVMD iMED, AstraZeneca, Mölndal, Sweden
| |
Collapse
|
175
|
Suh M, Lee DS. Brain Theranostics and Radiotheranostics: Exosomes and Graphenes In Vivo as Novel Brain Theranostics. Nucl Med Mol Imaging 2018; 52:407-419. [PMID: 30538772 PMCID: PMC6261865 DOI: 10.1007/s13139-018-0550-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 09/10/2018] [Accepted: 10/05/2018] [Indexed: 12/17/2022] Open
Abstract
Brain disease is one of the greatest threats to public health. Brain theranostics is recently taking shape, indicating the treatments of stroke, inflammatory brain disorders, psychiatric diseases, neurodevelopmental disease, and neurodegenerative disease. However, several factors, such as lack of endophenotype classification, blood-brain barrier (BBB), target determination, ignorance of biodistribution after administration, and complex intercellular communication between brain cells, make brain theranostics application difficult, especially when it comes to clinical application. So, a more thorough understanding of each aspect is needed. In this review, we focus on recent studies regarding the role of exosomes in intercellular communication of brain cells, therapeutic effect of graphene quantum dots, transcriptomics/epitranscriptomics approach for target selection, and in vitro/in vivo considerations.
Collapse
Affiliation(s)
- Minseok Suh
- Department of Nuclear Medicine, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, 03080 Republic of Korea
| | - Dong Soo Lee
- Department of Nuclear Medicine, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, 03080 Republic of Korea
| |
Collapse
|
176
|
Three-dimensional descriptors for aminergic GPCRs: dependence on docking conformation and crystal structure. Mol Divers 2018; 23:603-613. [PMID: 30484023 PMCID: PMC6682580 DOI: 10.1007/s11030-018-9894-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 11/12/2018] [Indexed: 01/01/2023]
Abstract
Three-dimensional descriptors are often used to search for new biologically active compounds, in both ligand- and structure-based approaches, capturing the spatial orientation of molecules. They frequently constitute an input for machine learning-based predictions of compound activity or quantitative structure-activity relationship modeling; however, the distribution of their values and the accuracy of depicting compound orientations might have an impact on the power of the obtained predictive models. In this study, we analyzed the distribution of three-dimensional descriptors calculated for docking poses of active and inactive compounds for all aminergic G protein-coupled receptors with available crystal structures, focusing on the variation in conformations for different receptors and crystals. We demonstrated that the consistency in compound orientation in the binding site is rather not correlated with the affinity itself, but is more influenced by other factors, such as the number of rotatable bonds and crystal structure used for docking studies. The visualizations of the descriptors distributions were prepared and made available online at http://chem.gmum.net/vischem_stability , which enables the investigation of chemical structures referring to particular data points depicted in the figures. Moreover, the performed analysis can assist in choosing crystal structure for docking studies, helping in selection of conditions providing the best discrimination between active and inactive compounds in machine learning-based experiments.
Collapse
|
177
|
Neves BJ, Braga RC, Melo-Filho CC, Moreira-Filho JT, Muratov EN, Andrade CH. QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery. Front Pharmacol 2018; 9:1275. [PMID: 30524275 PMCID: PMC6262347 DOI: 10.3389/fphar.2018.01275] [Citation(s) in RCA: 191] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2018] [Accepted: 10/18/2018] [Indexed: 02/03/2023] Open
Abstract
Virtual screening (VS) has emerged in drug discovery as a powerful computational approach to screen large libraries of small molecules for new hits with desired properties that can then be tested experimentally. Similar to other computational approaches, VS intention is not to replace in vitro or in vivo assays, but to speed up the discovery process, to reduce the number of candidates to be tested experimentally, and to rationalize their choice. Moreover, VS has become very popular in pharmaceutical companies and academic organizations due to its time-, cost-, resources-, and labor-saving. Among the VS approaches, quantitative structure–activity relationship (QSAR) analysis is the most powerful method due to its high and fast throughput and good hit rate. As the first preliminary step of a QSAR model development, relevant chemogenomics data are collected from databases and the literature. Then, chemical descriptors are calculated on different levels of representation of molecular structure, ranging from 1D to nD, and then correlated with the biological property using machine learning techniques. Once developed and validated, QSAR models are applied to predict the biological property of novel compounds. Although the experimental testing of computational hits is not an inherent part of QSAR methodology, it is highly desired and should be performed as an ultimate validation of developed models. In this mini-review, we summarize and critically analyze the recent trends of QSAR-based VS in drug discovery and demonstrate successful applications in identifying perspective compounds with desired properties. Moreover, we provide some recommendations about the best practices for QSAR-based VS along with the future perspectives of this approach.
Collapse
Affiliation(s)
- Bruno J Neves
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil.,Laboratory of Cheminformatics, Centro Universitário de Anápolis (UniEVANGÉLICA), Anápolis, Brazil
| | - Rodolpho C Braga
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil
| | - Cleber C Melo-Filho
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil
| | - José Teófilo Moreira-Filho
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil
| | - Eugene N Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.,Department of Chemical Technology, Odessa National Polytechnic University, Odessa, Ukraine
| | - Carolina Horta Andrade
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil
| |
Collapse
|
178
|
Halder AK. Finding the structural requirements of diverse HIV-1 protease inhibitors using multiple QSAR modelling for lead identification. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2018; 29:911-933. [PMID: 30332922 DOI: 10.1080/1062936x.2018.1529702] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 09/25/2018] [Indexed: 06/08/2023]
Abstract
Multiple Quantitative Structure-Activity Relationship (QSAR) analysis is widely used in drug discovery for lead identification. Human Immunodeficiency Virus (HIV) protease is one of the key targets for the treatment of Acquired Immunodeficiency Syndrome (AIDS). One of the major challenges for the design of HIV-1 protease inhibitors (HIV PRIs) is to increase the inhibitory activities against the enzyme to a level where the problem associated to drug resistance may be considerably delayed. Herein, chemometric analyses were performed with 346 structurally diverse HIV PRIs with experimental bioactivities against a sub-type B mutant to develop highly predictable QSAR models and also to identify the effective structural determinants for higher affinity against HIV PR. The QSAR models were developed using OCHEM-based machine learning tools (ASNN, FSMLR, KNN, RF, MANN and XGBoost), with descriptors calculated by eight different software packages. Simultaneously, a Monte Carlo optimization-based QSAR modelling was performed using SMILES and graph-based descriptors to understand fragment and topochemical contributions. To validate the actual predictability of all these models, an additional set of 104 compounds (also with known experimental activities) with slightly different chemical space were employed. This ligand-based study serves as a crucial benchmark for further development of the HIV protease inhibitors with improved activities.
Collapse
Affiliation(s)
- A K Halder
- a School of Health Sciences, University of KwaZulu-Natal , Durban , South Africa
| |
Collapse
|
179
|
Tomberg A, Johansson MJ, Norrby PO. A Predictive Tool for Electrophilic Aromatic Substitutions Using Machine Learning. J Org Chem 2018; 84:4695-4703. [PMID: 30336024 DOI: 10.1021/acs.joc.8b02270] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
At the early stages of the drug development process, thousands of compounds are synthesized in order to attain the best possible potency and pharmacokinetic properties. Once successful scaffolds are identified, large libraries of analogues are made, which is a challenging and time-consuming task. Recently, late stage functionalization (LSF) has become increasingly prominent since these reactions selectively functionalize C-H bonds, allowing to quickly produce analogues. Classical electrophilic aromatic halogenations are a powerful type of reaction in the LSF toolkit. However, the introduction of an electrophile in a regioselective manner on a drug-like molecule is a challenging task. Herein we present a machine learning model able to predict the reactive site of an electrophilic aromatic substitution with an accuracy of 93% (internal validation set). The model takes as input a SMILES of a compound and uses six quantum mechanics descriptors to identify its reactive site(s). On an external validation set, 90% of all molecules were correctly predicted.
Collapse
|
180
|
Schöning V, Krähenbühl S, Drewe J. The hepatotoxic potential of protein kinase inhibitors predicted with Random Forest and Artificial Neural Networks. Toxicol Lett 2018; 299:145-148. [PMID: 30315951 DOI: 10.1016/j.toxlet.2018.10.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Revised: 10/01/2018] [Accepted: 10/08/2018] [Indexed: 01/29/2023]
Abstract
Protein kinases (PKs) play a role in many pivotal aspects of cellular function. Dysregulation and mutations of protein kinases are involved in the development of different diseases, which might be treated by inhibition of the corresponding kinase. Protein kinase inhibitors (PKIs) are generally well tolerated, but unexpected and serious adverse events on the heart, lung, kidney and liver were observed clinically. In this study, the structure-activity relationship of PKIs in relation to hepatotoxicity was investigated. A dataset of 165 PKIs was compiled and the probability of human hepatotoxicity with two different machine learning algorithms (Random Forest and Artificial Neural Networks) was analysed. The estimated probability of hepatotoxicity was generally high for single PKIs. However, depending on the target kinase of the PKI, a difference in hepatotoxic potential could be observed. The similarity of the PKIs to each other is caused by the conserved site of action of the protein kinases. Hepatotoxicity may therefore always be an issue in PKIs.
Collapse
Affiliation(s)
- Verena Schöning
- Department of Clinical Pharmacology, University Hospital Basel, CH 4031 Basel, Switzerland
| | - Stephan Krähenbühl
- Department of Clinical Pharmacology, University Hospital Basel, CH 4031 Basel, Switzerland
| | - Jürgen Drewe
- Department of Clinical Pharmacology, University Hospital Basel, CH 4031 Basel, Switzerland.
| |
Collapse
|
181
|
Makhouri FR, Ghasemi JB. In Silico Studies in Drug Research Against Neurodegenerative Diseases. Curr Neuropharmacol 2018; 16:664-725. [PMID: 28831921 PMCID: PMC6080098 DOI: 10.2174/1570159x15666170823095628] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Revised: 07/24/2017] [Accepted: 08/16/2017] [Indexed: 01/14/2023] Open
Abstract
Background Neurodegenerative diseases such as Alzheimer's disease (AD), amyotrophic lateral sclerosis, Parkinson's disease (PD), spinal cerebellar ataxias, and spinal and bulbar muscular atrophy are described by slow and selective degeneration of neurons and axons in the central nervous system (CNS) and constitute one of the major challenges of modern medicine. Computer-aided or in silico drug design methods have matured into powerful tools for reducing the number of ligands that should be screened in experimental assays. Methods In the present review, the authors provide a basic background about neurodegenerative diseases and in silico techniques in the drug research. Furthermore, they review the various in silico studies reported against various targets in neurodegenerative diseases, including homology modeling, molecular docking, virtual high-throughput screening, quantitative structure activity relationship (QSAR), hologram quantitative structure activity relationship (HQSAR), 3D pharmacophore mapping, proteochemometrics modeling (PCM), fingerprints, fragment-based drug discovery, Monte Carlo simulation, molecular dynamic (MD) simulation, quantum-mechanical methods for drug design, support vector machines, and machine learning approaches. Results Detailed analysis of the recently reported case studies revealed that the majority of them use a sequential combination of ligand and structure-based virtual screening techniques, with particular focus on pharmacophore models and the docking approach. Conclusion Neurodegenerative diseases have a multifactorial pathoetiological origin, so scientists have become persuaded that a multi-target therapeutic strategy aimed at the simultaneous targeting of multiple proteins (and therefore etiologies) involved in the development of a disease is recommended in future.
Collapse
Affiliation(s)
| | - Jahan B Ghasemi
- Chemistry Department, Faculty of Sciences, University of Tehran, Tehran, Iran
| |
Collapse
|
182
|
Lane T, Russo DP, Zorn KM, Clark AM, Korotcov A, Tkachenko V, Reynolds RC, Perryman AL, Freundlich JS, Ekins AS. Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. Mol Pharm 2018; 15:4346-4360. [PMID: 29672063 PMCID: PMC6167198 DOI: 10.1021/acs.molpharmaceut.8b00083] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Tuberculosis is a global health dilemma. In 2016, the WHO reported 10.4 million incidences and 1.7 million deaths. The need to develop new treatments for those infected with Mycobacterium tuberculosis ( Mtb) has led to many large-scale phenotypic screens and many thousands of new active compounds identified in vitro. However, with limited funding, efforts to discover new active molecules against Mtb needs to be more efficient. Several computational machine learning approaches have been shown to have good enrichment and hit rates. We have curated small molecule Mtb data and developed new models with a total of 18,886 molecules with activity cutoffs of 10 μM, 1 μM, and 100 nM. These data sets were used to evaluate different machine learning methods (including deep learning) and metrics and to generate predictions for additional molecules published in 2017. One Mtb model, a combined in vitro and in vivo data Bayesian model at a 100 nM activity yielded the following metrics for 5-fold cross validation: accuracy = 0.88, precision = 0.22, recall = 0.91, specificity = 0.88, kappa = 0.31, and MCC = 0.41. We have also curated an evaluation set ( n = 153 compounds) published in 2017, and when used to test our model, it showed the comparable statistics (accuracy = 0.83, precision = 0.27, recall = 1.00, specificity = 0.81, kappa = 0.36, and MCC = 0.47). We have also compared these models with additional machine learning algorithms showing Bayesian machine learning models constructed with literature Mtb data generated by different laboratories generally were equivalent to or outperformed deep neural networks with external test sets. Finally, we have also compared our training and test sets to show they were suitably diverse and different in order to represent useful evaluation sets. Such Mtb machine learning models could help prioritize compounds for testing in vitro and in vivo.
Collapse
Affiliation(s)
- Thomas Lane
- Collaborations Pharmaceuticals, Inc., Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Daniel P. Russo
- Collaborations Pharmaceuticals, Inc., Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA
| | - Kimberley M. Zorn
- Collaborations Pharmaceuticals, Inc., Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Alex M. Clark
- Molecular Materials Informatics, Inc., 1900 St. Jacques #302, Montreal H3J 2S1, Quebec, Canada
| | - Alexandru Korotcov
- Science Data Software, LLC, 14914 Bradwill Court, Rockville, MD 20850, USA
| | - Valery Tkachenko
- Science Data Software, LLC, 14914 Bradwill Court, Rockville, MD 20850, USA
| | - Robert C. Reynolds
- Department of Medicine, Division of Hematology and Oncology, University of Alabama at Birmingham, NP 2540 J, 1720 2Avenue South, Birmingham, AL 35294-3300, USA
| | - Alexander L. Perryman
- Department of Pharmacology, Physiology and Neuroscience, Rutgers University-New Jersey Medical School, Newark, New Jersey 07103, USA
| | - Joel S. Freundlich
- Department of Pharmacology, Physiology and Neuroscience, Rutgers University-New Jersey Medical School, Newark, New Jersey 07103, USA
- Division of Infectious Diseases, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University–New Jersey Medical School, Newark, New Jersey 07103, USA
| | - and Sean Ekins
- Collaborations Pharmaceuticals, Inc., Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| |
Collapse
|
183
|
Cardoso‐Silva J, Papadatos G, Papageorgiou LG, Tsoka S. Optimal Piecewise Linear Regression Algorithm for QSAR Modelling. Mol Inform 2018; 38:e1800028. [DOI: 10.1002/minf.201800028] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 08/02/2018] [Indexed: 12/20/2022]
Affiliation(s)
- Jonathan Cardoso‐Silva
- Department of Informatics, Faculty of Natural and Mathematical SciencesKing's College London, Bush House London WC2B 4BG UK
| | - George Papadatos
- European Molecular Biology Laboratory – European Bioinformatics InstituteWellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD UK
- GlaxoSmithKline Gunnels Wood Road Stevenage, Hertfordshire SG1 2NY UK
| | - Lazaros G. Papageorgiou
- Centre for Process Systems Engineering, Department of Chemical EngineeringUniversity College London Torrington Place London WC1E 7JE UK
| | - Sophia Tsoka
- Department of Informatics, Faculty of Natural and Mathematical SciencesKing's College London, Bush House London WC2B 4BG UK
| |
Collapse
|
184
|
Banerjee P, Dehnbostel FO, Preissner R. Prediction Is a Balancing Act: Importance of Sampling Methods to Balance Sensitivity and Specificity of Predictive Models Based on Imbalanced Chemical Data Sets. Front Chem 2018; 6:362. [PMID: 30271769 PMCID: PMC6149243 DOI: 10.3389/fchem.2018.00362] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2018] [Accepted: 07/30/2018] [Indexed: 12/24/2022] Open
Abstract
Increase in the number of new chemicals synthesized in past decades has resulted in constant growth in the development and application of computational models for prediction of activity as well as safety profiles of the chemicals. Most of the time, such computational models and its application must deal with imbalanced chemical data. It is indeed a challenge to construct a classifier using imbalanced data set. In this study, we analyzed and validated the importance of different sampling methods over non-sampling method, to achieve a well-balanced sensitivity and specificity of a machine learning model trained on imbalanced chemical data. Additionally, this study has achieved an accuracy of 93.00%, an AUC of 0.94, F1 measure of 0.90, sensitivity of 96.00% and specificity of 91.00% using SMOTE sampling and Random Forest classifier for the prediction of Drug Induced Liver Injury (DILI). Our results suggest that, irrespective of data set used, sampling methods can have major influence on reducing the gap between sensitivity and specificity of a model. This study demonstrates the efficacy of different sampling methods for class imbalanced problem using binary chemical data sets.
Collapse
Affiliation(s)
- Priyanka Banerjee
- Structural Bioinformatics Group, Institute for Physiology, Charité - University Medicine Berlin, Berlin, Germany
| | - Frederic O Dehnbostel
- Structural Bioinformatics Group, Institute for Physiology, Charité - University Medicine Berlin, Berlin, Germany
| | - Robert Preissner
- Structural Bioinformatics Group, Institute for Physiology, Charité - University Medicine Berlin, Berlin, Germany
| |
Collapse
|
185
|
Majumdar S, Basak SC, Lungu CN, Diudea MV, Grunwald GD. Mathematical structural descriptors and mutagenicity assessment: a study with congeneric and diverse datasets $. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2018; 29:579-590. [PMID: 30025481 DOI: 10.1080/1062936x.2018.1496475] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Accepted: 07/01/2018] [Indexed: 06/08/2023]
Abstract
Quantitative bioactivity and toxicity assessment of chemical compounds plays a central role in drug discovery as it saves a substantial amount of resources. To this end, high-performance computing has enabled researchers and practitioners to leverage hundreds, or even thousands, of computed molecular descriptors for the activity prediction of candidate compounds. In this paper, we evaluate the utility of two large groups of chemical descriptors by such predictive modelling, as well as chemical structure discovery, through empirical analysis. We use a suite of commercially available and in-house software to calculate molecular descriptors for two sets of chemical mutagens - a homogeneous set of 95 amines, and a diverse set of 508 chemicals. Using calculated descriptors, we model the mutagenic activity of these compounds using a number of methods from the statistics and machine-learning literature, and use robust principal component analysis to investigate the low-dimensional subspaces that characterize these chemicals. Our results suggest that combining different sets of descriptors is likely to result in a better predictive model - but that depends on the compounds being modelled and the modelling technique being used.
Collapse
Affiliation(s)
- S Majumdar
- a University of Florida Informatics Institute , Gainesville , USA
| | - S C Basak
- b Department of Chemistry and Biochemistry , University of Minnesota , Duluth MN , USA
| | - C N Lungu
- c Department of Chemistry , Babes-Bolyai University , Cluj-Napoca , Romania
| | - M V Diudea
- c Department of Chemistry , Babes-Bolyai University , Cluj-Napoca , Romania
| | - G D Grunwald
- d Natural Resources Research Institute , University of Minnesota , Duluth , USA
| |
Collapse
|
186
|
Schöning V, Hammann F, Peinl M, Drewe J. Editor's Highlight: Identification of Any Structure-Specific Hepatotoxic Potential of Different Pyrrolizidine Alkaloids Using Random Forests and Artificial Neural Networks. Toxicol Sci 2018; 160:361-370. [PMID: 28973379 DOI: 10.1093/toxsci/kfx187] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Pyrrolizidine alkaloids (PAs) are characteristic metabolites of some plant families and form a powerful defense mechanism against herbivores. More than 600 different PAs are known. PAs are ester alkaloids composed of a necine base and a necic acid, which can be used to divide PAs in different structural subcategories. The main target organs for PA metabolism and toxicity are liver and lungs. Additionally, PAs are potentially genotoxic, carcinogenic and exhibit developmental toxicity. Only for very few PAs, in vitro and in vivo investigations have characterized their toxic potential. However, these investigations suggest that structural differences have an influence on the toxicity of single PAs. To investigate this structural relationship for a large number of PAs, a quantitative structural-activity relationship (QSAR) analysis for hepatotoxicity of over 600 different PAs was performed, using Random Forest- and artificial Neural Networks-algorithms. These models were trained with a recently established dataset specific for acute hepatotoxicity in humans. Using this dataset, a set of molecular predictors was identified to predict the hepatotoxic potential of each compound in validated QSAR models. Based on these models, the hepatotoxic potential of the 602 PAs was predicted and the following hepatotoxic rank order in 3 main categories defined (1) for necine base: otonecine > retronecine > platynecine; (2) for necine base modification: dehydropyrrolizidine ≫ tertiary PA = N-oxide; and (3) for necic acid: macrocyclic diester ≥ open-ring diester > monoester. A further analysis with combined structural features revealed that necic acid has a higher influence on the acute hepatotoxicity than the necine base.
Collapse
Affiliation(s)
| | - Felix Hammann
- Department of Clinical Pharmacology, University Hospital Basel, CH 4031 Basel, Switzerland
| | - Mark Peinl
- rt-mp Softwaredevelopment, D-63694 Limeshain, Germany
| | - Jürgen Drewe
- Max Zeller Söhne AG, CH 8590 Romanshorn, Switzerland.,Department of Clinical Pharmacology, University Hospital Basel, CH 4031 Basel, Switzerland
| |
Collapse
|
187
|
Kaiser TM, Burger PB, Butch CJ, Pelly SC, Liotta DC. A Machine Learning Approach for Predicting HIV Reverse Transcriptase Mutation Susceptibility of Biologically Active Compounds. J Chem Inf Model 2018; 58:1544-1552. [PMID: 29953819 DOI: 10.1021/acs.jcim.7b00475] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
HIV resistance emerging against antiretroviral drugs represents a great threat to the continued prolongation of the lifespans of HIV-infected patients. Therefore, methods capable of predicting resistance susceptibility in the development of compounds are in great need. By targeting the major reverse transcription residues Y181, K103, and L100, we used the biological activities of compounds against these enzymes and the wild-type reverse transcriptase to create Naïve Bayes Networks. Through this machine learning approach, we could predict, with high accuracy, whether a compound would be susceptible to a loss of potency due to resistance. Also, we could perfectly predict retrospectively whether compounds would be susceptible to both a K103 mutant RT and a Y181 mutant RT. In the study presented here, our method outperformed a traditional molecular mechanics approach. This method should be of broad interest beyond drug discovery efforts, and serves to expand the utility of machine learning for the prediction of physical, chemical, or biological properties using the vast information available in the literature.
Collapse
Affiliation(s)
- Thomas M Kaiser
- Department of Chemistry , Emory University , 201 Dowman Drive , Atlanta , Georgia 30322 , United States
| | - Pieter B Burger
- Department of Chemistry , Emory University , 201 Dowman Drive , Atlanta , Georgia 30322 , United States.,Department of Drug Discovery and Biomedical Sciences, College of Pharmacy , Medical University of South Carolina , 280 Calhoun St., MSC 141 , Charleston , South Carolina 29425-1410 , United States
| | - Christopher J Butch
- Department of Chemistry , Emory University , 201 Dowman Drive , Atlanta , Georgia 30322 , United States.,Earth-Life Science Institute , Tokyo Institute of Technology , 2-12-1-IE-1 Ookayam , Meguro-ku , Tokyo 152-8550 , Japan
| | - Stephen C Pelly
- Department of Chemistry , Emory University , 201 Dowman Drive , Atlanta , Georgia 30322 , United States
| | - Dennis C Liotta
- Department of Chemistry , Emory University , 201 Dowman Drive , Atlanta , Georgia 30322 , United States
| |
Collapse
|
188
|
Abstract
Computer-aided synthesis planning (CASP) is focused on the goal of accelerating the process by which chemists decide how to synthesize small molecule compounds. The ideal CASP program would take a molecular structure as input and output a sorted list of detailed reaction schemes that each connect that target to purchasable starting materials via a series of chemically feasible reaction steps. Early work in this field relied on expert-crafted reaction rules and heuristics to describe possible retrosynthetic disconnections and selectivity rules but suffered from incompleteness, infeasible suggestions, and human bias. With the relatively recent availability of large reaction corpora (such as the United States Patent and Trademark Office (USPTO), Reaxys, and SciFinder databases), consisting of millions of tabulated reaction examples, it is now possible to construct and validate purely data-driven approaches to synthesis planning. As a result, synthesis planning has been opened to machine learning techniques, and the field is advancing rapidly. In this Account, we focus on two critical aspects of CASP and recent machine learning approaches to both challenges. First, we discuss the problem of retrosynthetic planning, which requires a recommender system to propose synthetic disconnections starting from a target molecule. We describe how the search strategy, necessary to overcome the exponential growth of the search space with increasing number of reaction steps, can be assisted through a learned synthetic complexity metric. We also describe how the recursive expansion can be performed by a straightforward nearest neighbor model that makes clever use of reaction data to generate high quality retrosynthetic disconnections. Second, we discuss the problem of anticipating the products of chemical reactions, which can be used to validate proposed reactions in a computer-generated synthesis plan (i.e., reduce false positives) to increase the likelihood of experimental success. While we introduce this task in the context of reaction validation, its utility extends to the prediction of side products and impurities, among other applications. We describe neural network-based approaches that we and others have developed for this forward prediction task that can be trained on previously published experimental data. Machine learning and artificial intelligence have revolutionized a number of disciplines, not limited to image recognition, dictation, translation, content recommendation, advertising, and autonomous driving. While there is a rich history of using machine learning for structure-activity models in chemistry, it is only now that it is being successfully applied more broadly to organic synthesis and synthesis design. As reported in this Account, machine learning is rapidly transforming CASP, but there are several remaining challenges and opportunities, many pertaining to the availability and standardization of both data and evaluation metrics, which must be addressed by the community at large.
Collapse
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - William H. Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Klavs F. Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
189
|
Nielsen MK, Ahneman DT, Riera O, Doyle AG. Deoxyfluorination with Sulfonyl Fluorides: Navigating Reaction Space with Machine Learning. J Am Chem Soc 2018; 140:5004-5008. [DOI: 10.1021/jacs.8b01523] [Citation(s) in RCA: 134] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Matthew K. Nielsen
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| | - Derek T. Ahneman
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| | - Orestes Riera
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| | - Abigail G. Doyle
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| |
Collapse
|
190
|
Basith S, Cui M, Macalino SJY, Park J, Clavio NAB, Kang S, Choi S. Exploring G Protein-Coupled Receptors (GPCRs) Ligand Space via Cheminformatics Approaches: Impact on Rational Drug Design. Front Pharmacol 2018; 9:128. [PMID: 29593527 PMCID: PMC5854945 DOI: 10.3389/fphar.2018.00128] [Citation(s) in RCA: 79] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 02/06/2018] [Indexed: 01/14/2023] Open
Abstract
The primary goal of rational drug discovery is the identification of selective ligands which act on single or multiple drug targets to achieve the desired clinical outcome through the exploration of total chemical space. To identify such desired compounds, computational approaches are necessary in predicting their drug-like properties. G Protein-Coupled Receptors (GPCRs) represent one of the largest and most important integral membrane protein families. These receptors serve as increasingly attractive drug targets due to their relevance in the treatment of various diseases, such as inflammatory disorders, metabolic imbalances, cardiac disorders, cancer, monogenic disorders, etc. In the last decade, multitudes of three-dimensional (3D) structures were solved for diverse GPCRs, thus referring to this period as the "golden age for GPCR structural biology." Moreover, accumulation of data about the chemical properties of GPCR ligands has garnered much interest toward the exploration of GPCR chemical space. Due to the steady increase in the structural, ligand, and functional data of GPCRs, several cheminformatics approaches have been implemented in its drug discovery pipeline. In this review, we mainly focus on the cheminformatics-based paradigms in GPCR drug discovery. We provide a comprehensive view on the ligand- and structure-based cheminformatics approaches which are best illustrated via GPCR case studies. Furthermore, an appropriate combination of ligand-based knowledge with structure-based ones, i.e., integrated approach, which is emerging as a promising strategy for cheminformatics-based GPCR drug design is also discussed.
Collapse
Affiliation(s)
| | | | | | | | | | - Soosung Kang
- College of Pharmacy and Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul, South Korea
| | - Sun Choi
- College of Pharmacy and Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul, South Korea
| |
Collapse
|
191
|
Karlberg M, von Stosch M, Glassey J. Exploiting mAb structure characteristics for a directed QbD implementation in early process development. Crit Rev Biotechnol 2018. [DOI: 10.1080/07388551.2017.1421899] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Micael Karlberg
- School of Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne, UK
| | - Moritz von Stosch
- School of Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne, UK
| | - Jarka Glassey
- School of Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
192
|
Piras P, Sheridan R, Sherer EC, Schafer W, Welch CJ, Roussel C. Modeling and predicting chiral stationary phase enantioselectivity: An efficient random forest classifier using an optimally balanced training dataset and an aggregation strategy. J Sep Sci 2018; 41:1365-1375. [PMID: 29383846 DOI: 10.1002/jssc.201701334] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 01/17/2018] [Accepted: 01/17/2018] [Indexed: 11/10/2022]
Abstract
Predicting whether a chiral column will be effective is a daily task for many analysts. Moreover, finding the best chiral column for separating a particular racemic compound is mostly a matter of trial and error that may take up to a week in some cases. In this study we have developed a novel prediction approach based on combining a random forest classifier and an optimized discretization method for dealing with enantioselectivity as a continuous variable. Using the optimization results, models were trained on data sets divided into four enantioselectivity classes. The best model performances were achieved by over-sampling the minority classes (α ≤ 1.10 and α ≥ 2.00), down-sampling the majority class (1.2 ≤ α < 2.0), and aggregating multicategory predictions into binary classifications. We tested our method on 41 chiral stationary phases using layered fingerprints as descriptors. Experimental results show that this learning methodology was successful in terms of average area under the Receiver Operating Characteristic curve, Kappa indices and F-measure for structure-based prediction of the enantioselective behavior of 34 chiral columns.
Collapse
Affiliation(s)
- Patrick Piras
- Aix Marseille Université, CNRS, Centrale Marseille, iSm2, Marseille, France
| | - Robert Sheridan
- Department of Structural Chemistry, Merck Research Laboratories, Rahway, USA
| | - Edward C Sherer
- Modeling and Informatics Process Research and Development, Merck Research Laboratories, Rahway, USA
| | - Wes Schafer
- Department of Process & Analytical Chemistry, Merck Research Laboratories, Rahway, NJ, USA
| | | | - Christian Roussel
- Aix Marseille Université, CNRS, Centrale Marseille, iSm2, Marseille, France
| |
Collapse
|
193
|
Ahmed L, Georgiev V, Capuccini M, Toor S, Schaal W, Laure E, Spjuth O. Efficient iterative virtual screening with Apache Spark and conformal prediction. J Cheminform 2018; 10:8. [PMID: 29492726 PMCID: PMC5833896 DOI: 10.1186/s13321-018-0265-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 02/17/2018] [Indexed: 12/02/2022] Open
Abstract
Background Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. Contribution In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as ‘low-scoring’ ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling. Results We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub (https://github.com/laeeq80/spark-cpvs) and can be run on high-performance computers as well as on cloud resources.
Collapse
Affiliation(s)
- Laeeq Ahmed
- Department of Computational Science and Technology, Royal Institute of Technology (KTH), Lindstedtsvägen 5, 10044, Stockholm, Sweden.
| | - Valentin Georgiev
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden
| | - Marco Capuccini
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden.,Department of Information Technology, Uppsala University, Box 337, 75105, Uppsala, Sweden
| | - Salman Toor
- Department of Information Technology, Uppsala University, Box 337, 75105, Uppsala, Sweden
| | - Wesley Schaal
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden
| | - Erwin Laure
- Department of Computational Science and Technology, Royal Institute of Technology (KTH), Lindstedtsvägen 5, 10044, Stockholm, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden
| |
Collapse
|
194
|
Chen Z, Cao Y, He S, Qiao Y. Development of models for classification of action between heat-clearing herbs and blood-activating stasis-resolving herbs based on theory of traditional Chinese medicine. Chin Med 2018; 13:12. [PMID: 29492098 PMCID: PMC5828388 DOI: 10.1186/s13020-018-0169-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Accepted: 02/14/2018] [Indexed: 12/18/2022] Open
Abstract
Background Action (“gongxiao” in Chinese) of traditional Chinese medicine (TCM) is the high recapitulation for therapeutic and health-preserving effects under the guidance of TCM theory. TCM-defined herbal properties (“yaoxing” in Chinese) had been used in this research. TCM herbal property (TCM-HP) is the high generalization and summary for actions, both of which come from long-term effective clinical practice in two thousands of years in China. However, the specific relationship between TCM-HP and action of TCM is complex and unclear from a scientific perspective. The research about this is conducive to expound the connotation of TCM-HP theory and is of important significance for the development of the TCM-HP theory. Methods One hundred and thirty-three herbs including 88 heat-clearing herbs (HCHs) and 45 blood-activating stasis-resolving herbs (BAHRHs) were collected from reputable TCM literatures, and their corresponding TCM-HPs/actions information were collected from Chinese pharmacopoeia (2015 edition). The Kennard–Stone (K–S) algorithm was used to split 133 herbs into 100 calibration samples and 33 validation samples. Then, machine learning methods including supported vector machine (SVM), k-nearest neighbor (kNN) and deep learning methods including deep belief network (DBN), convolutional neutral network (CNN) were adopted to develop action classification models based on TCM-HP theory, respectively. In order to ensure robustness, these four classification methods were evaluated by using the method of tenfold cross validation and 20 external validation samples for prediction. Results As results, 72.7–100% of 33 validation samples including 17 HCHs and 16 BASRHs were correctly predicted by these four types of methods. Both of the DBN and CNN methods gave out the best results and their sensitivity, specificity, precision, accuracy were all 100.00%. Especially, the predicted results of external validation set showed that the performance of deep learning methods (DBN, CNN) were better than traditional machine learning methods (kNN, SVM) in terms of their sensitivity, specificity, precision, accuracy. Moreover, the distribution patterns of TCM-HPs of HCHs and BASRHs were also analyzed to detect the featured TCM-HPs of these two types of herbs. The result showed that the featured TCM-HPs of HCHs were cold, bitter, liver and stomach meridians entered, while those of BASRHs were warm, bitter and pungent, liver meridian entered. Conclusions The performance on validation set and external validation set of deep learning methods (DBN, CNN) were better than machine learning models (kNN, SVM) in sensitivity, specificity, precision, accuracy when predicting the actions of heat-clearing and blood-activating stasis-resolving based on TCM-HP theory. The deep learning classification methods owned better generalization ability and accuracy when predicting the actions of heat-clearing and blood-activating stasis-resolving based on TCM-HP theory. Besides, the methods of deep learning would help us to improve our understanding about the relationship between herbal property and action, as well as to enrich and develop the theory of TCM-HP scientifically. Electronic supplementary material The online version of this article (10.1186/s13020-018-0169-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhao Chen
- 1School of Chines Materia Medica, Beijing University of Chinese Medicine, Yangguang South Avenue, Fangshan District, Beijing, 102488 China.,2Research Center of TCM Information Engineering, Beijing University of Chinese Medicine, Yangguang South Avenue, Fangshan District, Beijing, 102488 China
| | - Yanfeng Cao
- 1School of Chines Materia Medica, Beijing University of Chinese Medicine, Yangguang South Avenue, Fangshan District, Beijing, 102488 China.,2Research Center of TCM Information Engineering, Beijing University of Chinese Medicine, Yangguang South Avenue, Fangshan District, Beijing, 102488 China
| | - Shuaibing He
- 1School of Chines Materia Medica, Beijing University of Chinese Medicine, Yangguang South Avenue, Fangshan District, Beijing, 102488 China
| | - Yanjiang Qiao
- 1School of Chines Materia Medica, Beijing University of Chinese Medicine, Yangguang South Avenue, Fangshan District, Beijing, 102488 China.,2Research Center of TCM Information Engineering, Beijing University of Chinese Medicine, Yangguang South Avenue, Fangshan District, Beijing, 102488 China
| |
Collapse
|
195
|
Simões RS, Maltarollo VG, Oliveira PR, Honorio KM. Transfer and Multi-task Learning in QSAR Modeling: Advances and Challenges. Front Pharmacol 2018; 9:74. [PMID: 29467659 PMCID: PMC5807924 DOI: 10.3389/fphar.2018.00074] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 01/22/2018] [Indexed: 12/11/2022] Open
Abstract
Medicinal chemistry projects involve some steps aiming to develop a new drug, such as the analysis of biological targets related to a given disease, the discovery and the development of drug candidates for these targets, performing parallel biological tests to validate the drug effectiveness and side effects. Approaches as quantitative study of activity-structure relationships (QSAR) involve the construction of predictive models that relate a set of descriptors of a chemical compound series and its biological activities with respect to one or more targets in the human body. Datasets used to perform QSAR analyses are generally characterized by a small number of samples and this makes them more complex to build accurate predictive models. In this context, transfer and multi-task learning techniques are very suitable since they take information from other QSAR models to the same biological target, reducing efforts and costs for generating new chemical compounds. Therefore, this review will present the main features of transfer and multi-task learning studies, as well as some applications and its potentiality in drug design projects.
Collapse
Affiliation(s)
- Rodolfo S Simões
- School of Arts, Sciences and Humanities, University of São Paulo, São Paulo, Brazil
| | - Vinicius G Maltarollo
- Department of Pharmaceutical Products, Faculty of Pharmacy, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Patricia R Oliveira
- School of Arts, Sciences and Humanities, University of São Paulo, São Paulo, Brazil
| | - Kathia M Honorio
- School of Arts, Sciences and Humanities, University of São Paulo, São Paulo, Brazil.,Center for Natural and Human Sciences, Federal University of ABC, Santo André, Brazil
| |
Collapse
|
196
|
Enabling the hypothesis-driven prioritization of ligand candidates in big databases: Screenlamp and its application to GPCR inhibitor discovery for invasive species control. J Comput Aided Mol Des 2018; 32:415-433. [DOI: 10.1007/s10822-018-0100-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Accepted: 01/17/2018] [Indexed: 01/20/2023]
|
197
|
Segler MHS, Kogej T, Tyrchan C, Waller MP. Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks. ACS CENTRAL SCIENCE 2018; 4:120-131. [PMID: 29392184 PMCID: PMC5785775 DOI: 10.1021/acscentsci.7b00512] [Citation(s) in RCA: 680] [Impact Index Per Article: 113.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Indexed: 05/20/2023]
Abstract
In de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active toward a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target. Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria), it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery.
Collapse
Affiliation(s)
- Marwin H. S. Segler
- Institute of Organic
Chemistry & Center for Multiscale Theory and Computation, Westfälische Wilhelms-Universität Münster, 48149 Münster, Germany
| | - Thierry Kogej
- Hit Discovery, Discovery Sciences, AstraZeneca R&D, Gothenburg, Sweden
| | - Christian Tyrchan
- Department of Medicinal
Chemistry, IMED RIA, AstraZeneca R&D, Gothenburg, Sweden
| | - Mark P. Waller
- Department of Physics & International Centre for Quantum and
Molecular Structures, Shanghai University, Shanghai, China
| |
Collapse
|
198
|
Guan D, Fan K, Spence I, Matthews S. Combining machine learning models of in vitro and in vivo bioassays improves rat carcinogenicity prediction. Regul Toxicol Pharmacol 2018; 94:8-15. [PMID: 29337192 DOI: 10.1016/j.yrtph.2018.01.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Revised: 01/09/2018] [Accepted: 01/10/2018] [Indexed: 12/18/2022]
Abstract
In vitro genotoxicity bioassays are cost-efficient methods of assessing potential carcinogens. However, many genotoxicity bioassays are inappropriate for detecting chemicals eliciting non-genotoxic mechanisms, such as tumour promotion, this necessitates the use of in vivo rodent carcinogenicity (IVRC) assays. In silico IVRC modelling could potentially address the low throughput and high cost of this assay. We aimed to develop and combine computational QSAR models of novel bioassays for the prediction of IVRC results and compare with existing software. QSAR models were generated from existing Ames (n = 6512), Syrian Hamster Embryonic (SHE, n = 410), ISSCAN rodent carcinogenicity (ISC, n = 834) and GreenScreen GADD45a-GFP (n = 1415) chemical datasets. These models mapped the molecular descriptors of each compound to their respective assay result using machine learning algorithms (adaboost, k-Nearest Neighbours, C.45 Decision Tree, Multilayer Perceptron, Random Forest). The best performing models were combined with k-Nearest Neighbours to create a cascade model for IVRC prediction. High QSAR model performance was observed from ten time 10-fold cross-validation with above 80% accuracy and 0.85 AUC for each assay dataset. The cascade model predicted rat carcinogenicity with 69.3% accuracy and 0.700 AUC. This study demonstrates the novelty of a combined approach for IVRC prediction, with higher performance than existing software.
Collapse
Affiliation(s)
- Davy Guan
- Sydney Medical School, The University of Sydney, Australia
| | - Kevin Fan
- Sydney Medical School, The University of Sydney, Australia
| | - Ian Spence
- Sydney Medical School, The University of Sydney, Australia
| | - Slade Matthews
- Sydney Medical School, The University of Sydney, Australia.
| |
Collapse
|
199
|
Abstract
The use of computational toxicology methods within drug discovery began in the early 2000s with applications such as predicting bacterial mutagenicity and hERG inhibition. The field has been continuously expanding ever since and the tasks at hand have become more complex. These approaches are now strategically integrated into the risk assessment process, as a complement to in vitro and in vivo methods. Today, computational toxicology can be used in every phase of drug discovery and development, from profiling large libraries early on, to predicting off-target effects in the mid-discovery phase, to assessing potential mutagenic impurities in development and degradants as part of life-cycle management. This chapter provides an overview of the field and describes the application of computational toxicology throughout the entire discovery and development process.
Collapse
Affiliation(s)
- Catrin Hasselgren
- PureInfo Discovery Inc., Albuquerque, NM, USA.
- Leadscope Inc., Columbus, OH, USA.
| | | |
Collapse
|
200
|
Fianchini M. Synthesis meets theory: Past, present and future of rational chemistry. PHYSICAL SCIENCES REVIEWS 2017. [DOI: 10.1515/psr-2017-0134] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Abstract
Chemical synthesis has its roots in the empirical approach of alchemy. Nonetheless, the birth of the scientific method, the technical and technological advances (exploiting revolutionary discoveries in physics) and the improved management and sharing of growing databases greatly contributed to the evolution of chemistry from an esoteric ground into a mature scientific discipline during these last 400 years. Furthermore, thanks to the evolution of computational resources, platforms and media in the last 40 years, theoretical chemistry has added to the puzzle the final missing tile in the process of “rationalizing” chemistry. The use of mathematical models of chemical properties, behaviors and reactivities is nowadays ubiquitous in literature. Theoretical chemistry has been successful in the difficult task of complementing and explaining synthetic results and providing rigorous insights when these are otherwise unattainable by experiment. The first part of this review walks the reader through a concise historical overview on the evolution of the “model” in chemistry. Salient milestones have been highlighted and briefly discussed. The second part focuses more on the general description of recent state-of-the-art computational techniques currently used worldwide by chemists to produce synergistic models between theory and experiment. Each section is complemented by key-examples taken from the literature that illustrate the application of the technique discussed therein.
Collapse
|