1
|
Nicolle A, Deng S, Ihme M, Kuzhagaliyeva N, Ibrahim EA, Farooq A. Mixtures Recomposition by Neural Nets: A Multidisciplinary Overview. J Chem Inf Model 2024; 64:597-620. [PMID: 38284618 DOI: 10.1021/acs.jcim.3c01633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]
Abstract
Artificial Neural Networks (ANNs) are transforming how we understand chemical mixtures, providing an expressive view of the chemical space and multiscale processes. Their hybridization with physical knowledge can bridge the gap between predictivity and understanding of the underlying processes. This overview explores recent progress in ANNs, particularly their potential in the 'recomposition' of chemical mixtures. Graph-based representations reveal patterns among mixture components, and deep learning models excel in capturing complexity and symmetries when compared to traditional Quantitative Structure-Property Relationship models. Key components, such as Hamiltonian networks and convolution operations, play a central role in representing multiscale mixtures. The integration of ANNs with Chemical Reaction Networks and Physics-Informed Neural Networks for inverse chemical kinetic problems is also examined. The combination of sensors with ANNs shows promise in optical and biomimetic applications. A common ground is identified in the context of statistical physics, where ANN-based methods iteratively adapt their models by blending their initial states with training data. The concept of mixture recomposition unveils a reciprocal inspiration between ANNs and reactive mixtures, highlighting learning behaviors influenced by the training environment.
Collapse
Affiliation(s)
- Andre Nicolle
- Aramco Fuel Research Center, Rueil-Malmaison 92852, France
| | - Sili Deng
- Massachusetts Institute of Technology, Cambridge 02139, Massachusetts, United States
| | - Matthias Ihme
- Stanford University, Stanford 94305, California, United States
| | | | - Emad Al Ibrahim
- King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Aamir Farooq
- King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| |
Collapse
|
2
|
Pikalyova R, Zabolotna Y, Horvath D, Marcou G, Varnek A. Meta-GTM: Visualization and Analysis of the Chemical Library Space. J Chem Inf Model 2023; 63:5571-5582. [PMID: 37602843 DOI: 10.1021/acs.jcim.3c00719] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/22/2023]
Abstract
In chemical library analysis, it may be useful to describe libraries as individual items rather than collections of compounds. This is particularly true for ultra-large noncherry-pickable compound mixtures, such as DNA-encoded libraries (DELs). In this sense, the chemical library space (CLS) is useful for the management of a portfolio of libraries, just like chemical space (CS) helps manage a portfolio of molecules. Several possible CLSs were previously defined using vectorial library representations obtained from generative topographic mapping (GTM). Given the steadily growing number of DEL designs, the CLS becomes "crowded" and requires analysis tools beyond pairwise library comparison. Therefore, herein, we investigate the cartography of CLS on meta-(μ)GTMs─"meta" to remind that these are maps of the CLS, itself based on responsibility vectors issued by regular CS GTMs. 2,5 K DELs and ChEMBL (reference) were projected on the μGTM, producing landscapes of library-specific properties. These describe both interlibrary similarity and intrinsic library characteristics in the same view, herewith facilitating the selection of the best project-specific libraries.
Collapse
Affiliation(s)
- Regina Pikalyova
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Yuliana Zabolotna
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| |
Collapse
|
3
|
López-Pérez K, López-López E, Medina-Franco JL, Miranda-Quintana RA. Sampling and Mapping Chemical Space with Extended Similarity Indices. Molecules 2023; 28:6333. [PMID: 37687162 PMCID: PMC10489020 DOI: 10.3390/molecules28176333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 08/24/2023] [Accepted: 08/26/2023] [Indexed: 09/10/2023] Open
Abstract
Visualization of the chemical space is useful in many aspects of chemistry, including compound library design, diversity analysis, and exploring structure-property relationships, to name a few. Examples of notable research areas where the visualization of chemical space has strong applications are drug discovery and natural product research. However, the sheer volume of even comparatively small sub-sections of chemical space implies that we need to use approximations at the time of navigating through chemical space. ChemMaps is a visualization methodology that approximates the distribution of compounds in large datasets based on the selection of satellite compounds that yield a similar mapping of the whole dataset when principal component analysis on a similarity matrix is performed. Here, we show how the recently proposed extended similarity indices can help find regions that are relevant to sample satellites and reduce the amount of high-dimensional data needed to describe a library's chemical space.
Collapse
Affiliation(s)
- Kenneth López-Pérez
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL 32611, USA;
| | - Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, National Autonomous University of Mexico, Mexico City 04510, Mexico;
- Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico City 07000, Mexico
| | - José L. Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, National Autonomous University of Mexico, Mexico City 04510, Mexico;
| | | |
Collapse
|
4
|
Abstract
DNA-encoded libraries (DELs) are widely used in the discovery of drug candidates, and understanding their design principles is critical for accessing better libraries. Most DELs are combinatorial in nature and are synthesized by assembling sets of building blocks in specific topologies. In this study, different aspects of library topology were explored and their effect on DEL properties and chemical diversity was analyzed. We introduce a descriptor for DEL topological assignment (DELTA) and use it to examine the landscape of possible DEL topologies and their coverage in the literature. A generative topographic mapping analysis revealed that the impact of library topology on chemical space coverage is secondary to building block selection. Furthermore, it became apparent that the descriptor used to analyze chemical space dictates how structures cluster, with the effects of topology being apparent when using three-dimensional descriptors but not with common two-dimensional descriptors. This outcome points to potential challenges of attempts to predict DEL productivity based on chemical space analyses alone. While topology is rather inconsequential for defining the chemical space of encoded compounds, it greatly affects possible interactions with target proteins as illustrated in docking studies using NAD/NADP binding proteins as model receptors.
Collapse
Affiliation(s)
- William K Weigel
- Department of Medicinal Chemistry, Skaggs College of Pharmacy, University of Utah, 30 S 2000 E, Salt Lake City, Utah 84112, United States
| | - Alba L Montoya
- Department of Medicinal Chemistry, Skaggs College of Pharmacy, University of Utah, 30 S 2000 E, Salt Lake City, Utah 84112, United States
| | - Raphael M Franzini
- Department of Medicinal Chemistry, Skaggs College of Pharmacy, University of Utah, 30 S 2000 E, Salt Lake City, Utah 84112, United States
- Huntsman Cancer Institute, University of Utah, 2000 Circle of Hope Dr., Salt Lake City, Utah 84112, United States
| |
Collapse
|
5
|
Pikalyova R, Zabolotna Y, Horvath D, Marcou G, Varnek A. Chemical Library Space: Definition and DNA-Encoded Library Comparison Study Case. J Chem Inf Model 2023. [PMID: 37368824 DOI: 10.1021/acs.jcim.3c00520] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2023]
Abstract
The development of DNA-encoded library (DEL) technology introduced new challenges for the analysis of chemical libraries. It is often useful to consider a chemical library as a stand-alone chemoinformatic object─represented both as a collection of independent molecules, and yet an individual entity─in particular, when they are inseparable mixtures, like DELs. Herein, we introduce the concept of chemical library space (CLS), in which resident items are individual chemical libraries. We define and compare four vectorial library representations obtained using generative topographic mapping. These allow for an effective comparison of libraries, with the ability to tune and chemically interpret the similarity relationships. In particular, property-tuned CLS encodings enable to simultaneously compare libraries with respect to both property and chemotype distributions. We apply the various CLS encodings for the selection problem of DELs that optimally "match" a reference collection (here ChEMBL28), showing how the choice of the CLS descriptors may help to fine-tune the "matching" (overlap) criteria. Hence, the proposed CLS may represent a new efficient way for polyvalent analysis of thousands of chemical libraries. Selection of an easily accessible compound collection for drug discovery, as a substitute for a difficult to produce reference library, can be tuned for either primary or target-focused screening, also considering property distributions of compounds. Alternatively, selection of libraries covering novel regions of the chemical space with respect to a reference compound subspace may serve for library portfolio enrichment.
Collapse
Affiliation(s)
- Regina Pikalyova
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Yuliana Zabolotna
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| |
Collapse
|
6
|
Khodadadi Karimvand S, Mohammad Jafari J, Vali Zade S, Abdollahi H. Practical and comparative application of efficient data reduction - Multivariate curve resolution. Anal Chim Acta 2023; 1243:340824. [PMID: 36697179 DOI: 10.1016/j.aca.2023.340824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 01/11/2023] [Accepted: 01/11/2023] [Indexed: 01/13/2023]
Abstract
The term 'Big Data' has recently attracted much attention in science. Working with big data sets can be both challenging and rewarding. The complexity and big data sets make the analysis difficult to deal with, and the increasing volume of data sets requires the development of new practical methods for their handling. In this contribution, we explored the efficient data reduction-multivariate curve resolution (EDR-MCR) strategy based on the convex hull theory for quantitative and qualitative analysis of large chemical data sets. For the quantitative example, the potential of the EDR-MCR method for selecting a representative calibration set was investigated, and the results were compared with the widely used Kennard-Stone (KS) algorithm. The EDR-MCR strategy strongly limits the number of calibration samples with a high potency of prediction performance. The priority of EDR-MCR over KS is its ability to find informative variables and eliminate redundant features. Moreover, the EDR-MCR strategy was also applied for the qualitative analysis of a large-scale metabolomic data set. The comparable analysis results of EDR-MCR with the region of interest (ROI) method confirmed the ability of this method for quantitative analysis of big mass spectrophotometer data sets.
Collapse
Affiliation(s)
| | - Jamile Mohammad Jafari
- Department of Chemistry, Institute for Advanced Studies in Basic Sciences, P.O. Box 45195-1159, Zanjan, Iran
| | - Somaye Vali Zade
- Halal Research Center of IRI, Food and Drug Administration, Ministry of Health and Medical Education, Tehran, Iran
| | - Hamid Abdollahi
- Department of Chemistry, Institute for Advanced Studies in Basic Sciences, P.O. Box 45195-1159, Zanjan, Iran.
| |
Collapse
|
7
|
Ruchawapol C, Fu WW, Xu HX. A review on computational approaches that support the researches on traditional Chinese medicines (TCM) against COVID-19. PHYTOMEDICINE : INTERNATIONAL JOURNAL OF PHYTOTHERAPY AND PHYTOPHARMACOLOGY 2022; 104:154324. [PMID: 35841663 PMCID: PMC9259013 DOI: 10.1016/j.phymed.2022.154324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 06/23/2022] [Accepted: 07/05/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND COVID-19 highly caused contagious infections and massive deaths worldwide as well as unprecedentedly disrupting global economies and societies, and the urgent development of new antiviral medications are required. Medicinal herbs are promising resources for the discovery of prophylactic candidate against COVID-19. Considerable amounts of experimental efforts have been made on vaccines and direct-acting antiviral agents (DAAs), but neither of them was fast and fully developed. PURPOSE This study examined the computational approaches that have played a significant role in drug discovery and development against COVID-19, and these computational methods and tools will be helpful for the discovery of lead compounds from phytochemicals and understanding the molecular mechanism of action of TCM in the prevention and control of the other diseases. METHODS A search conducting in scientific databases (PubMed, Science Direct, ResearchGate, Google Scholar, and Web of Science) found a total of 2172 articles, which were retrieved via web interface of the following websites. After applying some inclusion and exclusion criteria and full-text screening, only 292 articles were collected as eligible articles. RESULTS In this review, we highlight three main categories of computational approaches including structure-based, knowledge-mining (artificial intelligence) and network-based approaches. The most commonly used database, molecular docking tool, and MD simulation software include TCMSP, AutoDock Vina, and GROMACS, respectively. Network-based approaches were mainly provided to help readers understanding the complex mechanisms of multiple TCM ingredients, targets, diseases, and networks. CONCLUSION Computational approaches have been broadly applied to the research of phytochemicals and TCM against COVID-19, and played a significant role in drug discovery and development in terms of the financial and time saving.
Collapse
Affiliation(s)
- Chattarin Ruchawapol
- School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Cai Lun Lu 1200, Shanghai 201203, China; Engineering Research Centre of Shanghai Colleges for TCM New Drug Discovery, Cai Lun Lu 1200, Shanghai 201203, China
| | - Wen-Wei Fu
- School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Cai Lun Lu 1200, Shanghai 201203, China; Engineering Research Centre of Shanghai Colleges for TCM New Drug Discovery, Cai Lun Lu 1200, Shanghai 201203, China.
| | - Hong-Xi Xu
- School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Cai Lun Lu 1200, Shanghai 201203, China; Engineering Research Centre of Shanghai Colleges for TCM New Drug Discovery, Cai Lun Lu 1200, Shanghai 201203, China.
| |
Collapse
|
8
|
Zabolotna Y, Volochnyuk DM, Ryabukhin SV, Horvath D, Gavrilenko KS, Marcou G, Moroz YS, Oksiuta O, Varnek A. A Close-up Look at the Chemical Space of Commercially Available Building Blocks for Medicinal Chemistry. J Chem Inf Model 2021; 62:2171-2185. [PMID: 34928600 DOI: 10.1021/acs.jcim.1c00811] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The ability to efficiently synthesize desired compounds can be a limiting factor for chemical space exploration in drug discovery. This ability is conditioned not only by the existence of well-studied synthetic protocols but also by the availability of corresponding reagents, so-called building blocks (BBs). In this work, we present a detailed analysis of the chemical space of 400 000 purchasable BBs. The chemical space was defined by corresponding synthons─fragments contributed to the final molecules upon reaction. They allow an analysis of BB physicochemical properties and diversity, unbiased by the leaving and protective groups in actual reagents. The main classes of BBs were analyzed in terms of their availability, rule-of-two-defined quality, and diversity. Available BBs were eventually compared to a reference set of biologically relevant synthons derived from ChEMBL fragmentation, in order to illustrate how well they cover the actual medicinal chemistry needs. This was performed on a newly constructed universal generative topographic map of synthon chemical space that enables visualization of both libraries and analysis of their overlapped and library-specific regions.
Collapse
Affiliation(s)
- Yuliana Zabolotna
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Dmitriy M Volochnyuk
- Institute of Organic Chemistry, National Academy of Sciences of Ukraine, Murmanska Street 5, Kyiv 02660, Ukraine.,Enamine Ltd., 78 Chervonotkatska str., 02660 Kiev, Ukraine
| | - Sergey V Ryabukhin
- The Institute of High Technologies, Kyiv National Taras Shevchenko University, 64 Volodymyrska Street, Kyiv 01601, Ukraine.,Enamine Ltd., 78 Chervonotkatska str., 02660 Kiev, Ukraine
| | - Dragos Horvath
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Konstantin S Gavrilenko
- Research-And-Education ChemBioCenter, National Taras Shevchenko University of Kyiv, Chervonotkatska str., 61, 03022 Kiev, Ukraine.,Enamine Ltd., 78 Chervonotkatska str., 02660 Kiev, Ukraine
| | - Gilles Marcou
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Yurii S Moroz
- Research-And-Education ChemBioCenter, National Taras Shevchenko University of Kyiv, Chervonotkatska str., 61, 03022 Kiev, Ukraine.,Chemspace, Chervonotkatska Street 78, 02094 Kyiv, Ukraine
| | - Oleksandr Oksiuta
- Institute of Organic Chemistry, National Academy of Sciences of Ukraine, Murmanska Street 5, Kyiv 02660, Ukraine.,Chemspace, Chervonotkatska Street 78, 02094 Kyiv, Ukraine
| | - Alexandre Varnek
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France.,Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| |
Collapse
|
9
|
Muratov EN, Amaro R, Andrade CH, Brown N, Ekins S, Fourches D, Isayev O, Kozakov D, Medina-Franco JL, Merz KM, Oprea TI, Poroikov V, Schneider G, Todd MH, Varnek A, Winkler DA, Zakharov AV, Cherkasov A, Tropsha A. A critical overview of computational approaches employed for COVID-19 drug discovery. Chem Soc Rev 2021; 50:9121-9151. [PMID: 34212944 PMCID: PMC8371861 DOI: 10.1039/d0cs01065k] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Indexed: 01/18/2023]
Abstract
COVID-19 has resulted in huge numbers of infections and deaths worldwide and brought the most severe disruptions to societies and economies since the Great Depression. Massive experimental and computational research effort to understand and characterize the disease and rapidly develop diagnostics, vaccines, and drugs has emerged in response to this devastating pandemic and more than 130 000 COVID-19-related research papers have been published in peer-reviewed journals or deposited in preprint servers. Much of the research effort has focused on the discovery of novel drug candidates or repurposing of existing drugs against COVID-19, and many such projects have been either exclusively computational or computer-aided experimental studies. Herein, we provide an expert overview of the key computational methods and their applications for the discovery of COVID-19 small-molecule therapeutics that have been reported in the research literature. We further outline that, after the first year the COVID-19 pandemic, it appears that drug repurposing has not produced rapid and global solutions. However, several known drugs have been used in the clinic to cure COVID-19 patients, and a few repurposed drugs continue to be considered in clinical trials, along with several novel clinical candidates. We posit that truly impactful computational tools must deliver actionable, experimentally testable hypotheses enabling the discovery of novel drugs and drug combinations, and that open science and rapid sharing of research results are critical to accelerate the development of novel, much needed therapeutics for COVID-19.
Collapse
Affiliation(s)
- Eugene N. Muratov
- UNC Eshelman School of Pharmacy, University of North CarolinaChapel HillNCUSA
| | - Rommie Amaro
- University of California in San DiegoSan DiegoCAUSA
| | | | | | - Sean Ekins
- Collaborations PharmaceuticalsRaleighNCUSA
| | - Denis Fourches
- Department of Chemistry, North Carolina State UniversityRaleighNCUSA
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Melon UniversityPittsburghPAUSA
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook UniversityStony BrookNYUSA
| | | | - Kenneth M. Merz
- Department of Chemistry, Michigan State UniversityEast LansingMIUSA
| | - Tudor I. Oprea
- Department of Internal Medicine and UNM Comprehensive Cancer Center, University of New Mexico, AlbuquerqueNMUSA
- Department of Rheumatology and Inflammation Research, Gothenburg UniversitySweden
- Novo Nordisk Foundation Center for Protein Research, University of CopenhagenDenmark
| | | | - Gisbert Schneider
- Institute of Pharmaceutical Sciences, Swiss Federal Institute of TechnologyZurichSwitzerland
| | | | - Alexandre Varnek
- Department of Chemistry, University of StrasbourgStrasbourgFrance
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido UniversitySapporoJapan
| | - David A. Winkler
- Monash Institute of Pharmaceutical Sciences, Monash UniversityMelbourneVICAustralia
- School of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe UniversityBundooraAustralia
- School of Pharmacy, University of NottinghamNottinghamUK
| | | | - Artem Cherkasov
- Vancouver Prostate Centre, University of British ColumbiaVancouverBCCanada
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North CarolinaChapel HillNCUSA
| |
Collapse
|
10
|
Baybekov S, Marcou G, Ramos P, Saurel O, Galzi JL, Varnek A. DMSO Solubility Assessment for Fragment-Based Screening. Molecules 2021; 26:3950. [PMID: 34203441 PMCID: PMC8271413 DOI: 10.3390/molecules26133950] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 06/23/2021] [Accepted: 06/23/2021] [Indexed: 11/16/2022] Open
Abstract
In this paper, we report comprehensive experimental and chemoinformatics analyses of the solubility of small organic molecules ("fragments") in dimethyl sulfoxide (DMSO) in the context of their ability to be tested in screening experiments. Here, DMSO solubility of 939 fragments has been measured experimentally using an NMR technique. A Support Vector Classification model was built on the obtained data using the ISIDA fragment descriptors. The analysis revealed 34 outliers: experimental issues were retrospectively identified for 28 of them. The updated model performs well in 5-fold cross-validation (balanced accuracy = 0.78). The datasets are available on the Zenodo platform (DOI:10.5281/zenodo.4767511) and the model is available on the website of the Laboratory of Chemoinformatics.
Collapse
Affiliation(s)
- Shamkhal Baybekov
- Laboratoire de Chémoinformatique UMR 7140 CNRS, Institut Le Bel, University of Strasbourg, 4 Rue Blaise Pascal, 67081 Strasbourg, France; (S.B.); (G.M.)
| | - Gilles Marcou
- Laboratoire de Chémoinformatique UMR 7140 CNRS, Institut Le Bel, University of Strasbourg, 4 Rue Blaise Pascal, 67081 Strasbourg, France; (S.B.); (G.M.)
| | - Pascal Ramos
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse CNRS, UPS, 205 Route de Narbonne, 31077 Toulouse, France; (P.R.); (O.S.)
| | - Olivier Saurel
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse CNRS, UPS, 205 Route de Narbonne, 31077 Toulouse, France; (P.R.); (O.S.)
| | - Jean-Luc Galzi
- Biotechnologie et Signalisation Cellulaire UMR 7242 CNRS, École Supérieure de Biotechnologie de Strasbourg, University of Strasbourg, 300 Boulevard Sébastien Brant, 67412 Illkirch, France;
- ChemBioFrance—Chimiothèque Nationale UAR3035, 8 Rue de L’école Normale, CEDEX 05, 34296 Montpellier, France
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique UMR 7140 CNRS, Institut Le Bel, University of Strasbourg, 4 Rue Blaise Pascal, 67081 Strasbourg, France; (S.B.); (G.M.)
| |
Collapse
|
11
|
Kunkel C, Margraf JT, Chen K, Oberhofer H, Reuter K. Active discovery of organic semiconductors. Nat Commun 2021; 12:2422. [PMID: 33893287 PMCID: PMC8065160 DOI: 10.1038/s41467-021-22611-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 03/15/2021] [Indexed: 01/16/2023] Open
Abstract
The versatility of organic molecules generates a rich design space for organic semiconductors (OSCs) considered for electronics applications. Offering unparalleled promise for materials discovery, the vastness of this design space also dictates efficient search strategies. Here, we present an active machine learning (AML) approach that explores an unlimited search space through consecutive application of molecular morphing operations. Evaluating the suitability of OSC candidates on the basis of charge injection and mobility descriptors, the approach successively queries predictive-quality first-principles calculations to build a refining surrogate model. The AML approach is optimized in a truncated test space, providing deep methodological insight by visualizing it as a chemical space network. Significantly outperforming a conventional computational funnel, the optimized AML approach rapidly identifies well-known and hitherto unknown molecular OSC candidates with superior charge conduction properties. Most importantly, it constantly finds further candidates with highest efficiency while continuing its exploration of the endless design space.
Collapse
Affiliation(s)
- Christian Kunkel
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany
| | - Johannes T Margraf
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany
| | - Ke Chen
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany
| | - Harald Oberhofer
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany
| | - Karsten Reuter
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany.
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin, Germany.
| |
Collapse
|
12
|
Pereira F. Machine Learning Methods to Predict the Terrestrial and Marine Origin of Natural Products. Mol Inform 2021; 40:e2060034. [PMID: 33787065 DOI: 10.1002/minf.202060034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 02/04/2021] [Indexed: 12/23/2022]
Abstract
In recent years there has been a growing interest in studying the differences between the chemical and biological space represented by natural products (NPs) of terrestrial and marine origin. In order to learn more about these two chemical spaces, marine natural products (MNPs) and terrestrial natural products (TNPs), a machine learning (ML) approach was developed in the current work to predict three classes, MNPs, TNPs and a third class of NPs that appear in both the terrestrial and marine environments. In total 22,398 NPs were retrieved from the Reaxys® database, from those 10,790 molecules are recorded as MNPs, 10,857 as TNPs, and 761 NPs appear registered as both MNPs and TNPs. Several ML algorithms such as Random Forest, Support Vector Machines, and deep learning Multilayer Perceptron networks have been benchmarked. The best performance was achieved with a consensus classification model, which predicted the external test set with an overall predictive accuracy up to 81 %. As far as we know this approach has never been intended and therefore allow to be used to better understand the chemical space defined by MNPs, TNPs or both, but also in virtual screening to define the applicability domain of QSAR models of MNPs and TNPs.
Collapse
Affiliation(s)
- Florbela Pereira
- LAQV and REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal
| |
Collapse
|
13
|
Kusaba M, Liu C, Koyama Y, Terakura K, Yoshida R. Recreation of the periodic table with an unsupervised machine learning algorithm. Sci Rep 2021; 11:4780. [PMID: 33637773 PMCID: PMC7910619 DOI: 10.1038/s41598-021-81850-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 12/29/2020] [Indexed: 11/24/2022] Open
Abstract
In 1869, the first draft of the periodic table was published by Russian chemist Dmitri Mendeleev. In terms of data science, his achievement can be viewed as a successful example of feature embedding based on human cognition: chemical properties of all known elements at that time were compressed onto the two-dimensional grid system for a tabular display. In this study, we seek to answer the question of whether machine learning can reproduce or recreate the periodic table by using observed physicochemical properties of the elements. To achieve this goal, we developed a periodic table generator (PTG). The PTG is an unsupervised machine learning algorithm based on the generative topographic mapping, which can automate the translation of high-dimensional data into a tabular form with varying layouts on-demand. The PTG autonomously produced various arrangements of chemical symbols, which organized a two-dimensional array such as Mendeleev’s periodic table or three-dimensional spiral table according to the underlying periodicity in the given data. We further showed what the PTG learned from the element data and how the element features, such as melting point and electronegativity, are compressed to the lower-dimensional latent spaces.
Collapse
Affiliation(s)
- Minoru Kusaba
- The Graduate University for Advanced Studies, SOKENDAI, Tachikawa, Tokyo, 190-8562, Japan.
| | - Chang Liu
- The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo, 190-8562, Japan
| | - Yukinori Koyama
- National Institute for Materials Science, Tsukuba, Ibaraki, 305-0047, Japan
| | - Kiyoyuki Terakura
- National Institute of Advanced Industrial Science and Technology, Tsukuba, Ibaraki, 305-8560, Japan
| | - Ryo Yoshida
- The Graduate University for Advanced Studies, SOKENDAI, Tachikawa, Tokyo, 190-8562, Japan. .,The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo, 190-8562, Japan. .,National Institute for Materials Science, Tsukuba, Ibaraki, 305-0047, Japan.
| |
Collapse
|
14
|
Bort W, Baskin II, Gimadiev T, Mukanov A, Nugmanov R, Sidorov P, Marcou G, Horvath D, Klimchuk O, Madzhidov T, Varnek A. Discovery of novel chemical reactions by deep generative recurrent neural network. Sci Rep 2021; 11:3178. [PMID: 33542271 PMCID: PMC7862614 DOI: 10.1038/s41598-021-81889-y] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 01/06/2021] [Indexed: 12/18/2022] Open
Abstract
The "creativity" of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that "creative" AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class. A sequence-to-sequence autoencoder with bidirectional Long Short-Term Memory layers was trained on on-purpose developed "SMILES/CGR" strings, encoding reactions of the USPTO database. The autoencoder latent space was visualized on a generative topographic map. Novel latent space points were sampled around a map area populated by Suzuki reactions and decoded to corresponding reactions. These can be critically analyzed by the expert, cleaned of irrelevant functional groups and eventually experimentally attempted, herewith enlarging the synthetic purpose of popular synthetic pathways.
Collapse
Affiliation(s)
- William Bort
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Igor I Baskin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
- Department of Materials Science and Engineering, Technion - Israel Institute of Technology, 3200003, Haifa, Israel
| | - Timur Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, 001-0021, Japan
| | - Artem Mukanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
| | - Ramil Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, 001-0021, Japan
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Olga Klimchuk
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France.
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, 001-0021, Japan.
| |
Collapse
|
15
|
Horvath D, Orlov A, Osolodkin DI, Ishmukhametov AA, Marcou G, Varnek A. A Chemographic Audit of anti-Coronavirus Structure-activity Information from Public Databases (ChEMBL). Mol Inform 2020; 39:e2000080. [PMID: 32363750 PMCID: PMC7267182 DOI: 10.1002/minf.202000080] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 04/26/2020] [Indexed: 01/30/2023]
Abstract
Discovery of drugs against newly emerged pathogenic agents like the SARS-CoV-2 coronavirus (CoV) must be based on previous research against related species. Scientists need to get acquainted with and develop a global oversight over so-far tested molecules. Chemography (herein used Generative Topographic Mapping, in particular) places structures on a human-readable 2D map (obtained by dimensionality reduction of the chemical space of molecular descriptors) and is thus well suited for such an audit. The goal is to map medicinal chemistry efforts so far targeted against CoVs. This includes comparing libraries tested against various virus species/genera, predicting their polypharmacological profiles and highlighting often encountered chemotypes. Maps are challenged to provide predictive activity landscapes against viral proteins. Definition of "anti-CoV" map zones led to selection of therein residing 380 potential anti-CoV agents, out of a vast pool of 800 M organic compounds.
Collapse
Affiliation(s)
- Dragos Horvath
- Chemoinformatics LaboratoryUMR 7140 CNRS/University of Strasbourg4, rue Blaise Pascal67000Strasbourg
| | - Alexey Orlov
- Chemoinformatics LaboratoryUMR 7140 CNRS/University of Strasbourg4, rue Blaise Pascal67000Strasbourg
- FSBSI “Chumakov FSC R&D IBP RAS”Poselok Instituta Poliomielita 8 bd. 1Poselenie MoskovskyMoscow108819Russia
| | - Dmitry I. Osolodkin
- FSBSI “Chumakov FSC R&D IBP RAS”Poselok Instituta Poliomielita 8 bd. 1Poselenie MoskovskyMoscow108819Russia
- Institute of Translational Medicine and BiotechnologySechenov First Moscow State Medical UniversityTrubetskaya ul. 8Moscow119991Russia
| | - Aydar A. Ishmukhametov
- FSBSI “Chumakov FSC R&D IBP RAS”Poselok Instituta Poliomielita 8 bd. 1Poselenie MoskovskyMoscow108819Russia
- Institute of Translational Medicine and BiotechnologySechenov First Moscow State Medical UniversityTrubetskaya ul. 8Moscow119991Russia
| | - Gilles Marcou
- Chemoinformatics LaboratoryUMR 7140 CNRS/University of Strasbourg4, rue Blaise Pascal67000Strasbourg
| | - Alexandre Varnek
- Chemoinformatics LaboratoryUMR 7140 CNRS/University of Strasbourg4, rue Blaise Pascal67000Strasbourg
| |
Collapse
|
16
|
Lin A, Baskin II, Marcou G, Horvath D, Beck B, Varnek A. Parallel Generative Topographic Mapping: An Efficient Approach for Big Data Handling. Mol Inform 2020; 39:e2000009. [PMID: 32347666 PMCID: PMC7757192 DOI: 10.1002/minf.202000009] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 04/10/2020] [Indexed: 11/12/2022]
Abstract
Generative Topographic Mapping (GTM) can be efficiently used to visualize, analyze and model large chemical data. The GTM manifold needs to span the chemical space deemed relevant for a given problem. Therefore, the Frame set (FS) of compounds used for the manifold construction must well cover a given chemical space. Intuitively, the FS size must raise with the size and diversity of the target library. At the same time, the GTM training can be very slow or even becomes technically impossible at FS sizes of the order of 105 compounds - which is a very small number compared to today's commercially accessible compounds, and, especially, to the theoretically feasible molecules. In order to solve this problem, we propose a Parallel GTM algorithm based on the merging of "intermediate" manifolds constructed in parallel for different subsets of molecules. An ensemble of these subsets forms a FS for the "final" manifold. In order to assess the efficiency of the new algorithm, 80 GTMs were built on the FSs of different sizes ranging from 10 to 1.8 M compounds selected from the ChEMBL database. Each GTM was challenged to build classification models for up to 712 biological activities (depending on the FS size). With the novel parallel GTM procedure, we could thus cover the entire spectrum of possible FS sizes, whereas previous studies were forced to rely on the working hypothesis that FS sizes of few thousands of compounds are sufficient to describe the ChEMBL chemical space. In fact, this study formally proves this to be true: a FS containing only 5000 randomly picked compounds is sufficient to represent the entire ChEMBL collection (1.8 M molecules), in the sense that a further increase of FS compound numbers has no benefice impact on the predictive propensity of the above-mentioned 712 activity classification models. Parallel GTM may, however, be required to generate maps based on very large FS, that might improve chemical space cartography of big commercial and virtual libraries, approaching billions of compounds.
Collapse
Affiliation(s)
- Arkadii Lin
- University of StrasbourgLaboratory of Chemoinformatics, Faculty of Chemistry4, Blaise Pascal str.67081StrasbourgFrance
| | - Igor I. Baskin
- Faculty of PhysicsLomonosov Moscow State University1/2, Leninskie Gory str.119991MoscowRussia
| | - Gilles Marcou
- University of StrasbourgLaboratory of Chemoinformatics, Faculty of Chemistry4, Blaise Pascal str.67081StrasbourgFrance
| | - Dragos Horvath
- University of StrasbourgLaboratory of Chemoinformatics, Faculty of Chemistry4, Blaise Pascal str.67081StrasbourgFrance
| | - Bernd Beck
- Department of Medicinal ChemistryBoehringer Ingelheim Pharma GmbH & Co. KG65, Birkendorfer str.88397Biberach an der RissGermany
| | - Alexandre Varnek
- University of StrasbourgLaboratory of Chemoinformatics, Faculty of Chemistry4, Blaise Pascal str.67081StrasbourgFrance
| |
Collapse
|
17
|
Lunghini F, Gilles M, Azam P, Enrici MH, Van Miert E, Varnek A. Visualization and Analysis of the REACH-chemical Space with Generative Topographic Mapping. Mol Inform 2020; 40:e2000232. [PMID: 33231933 DOI: 10.1002/minf.202000232] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 11/13/2020] [Indexed: 11/09/2022]
Abstract
In the framework of REACH (Registration Evaluation Authorization and restriction of Chemicals) regulation, industries have generated and reported a huge amount of (eco)toxicological data on substance produced or imported in Europe. The registration procedure initiated the creation of a large REACH database of well defined (eco)toxicological properties. Here, the data distribution in the REACH chemical space was analyzed with the help of the Generative Topographic Mapping (GTM) approach. GTM generates 2-dimensional maps on which each compound is represented as a data point. The 3rd dimension can be used in order to display a distribution of the given (eco)toxicological property, which can further be used for property assessment of new compounds projected on the map. We report the "Universal REACH map" which accommodates 11 endpoints, covering environmental fate and (eco)toxicological properties. This map demonstrates acceptable predictive performance: in cross-validation, balanced accuracy ranges from 0.60 to 0.78. The 11 endpoints profile has been computed for each REACH-registered substance. Some concerns related to acute aquatic toxicity have been identified, whereas for environmental fate and human health endpoints the amount of compounds predicted as of concern was much smaller. It has been demonstrated that superposition of several class landscapes allows to select the zones in the chemical space populated by compounds with a given (eco)toxicological profile.
Collapse
Affiliation(s)
- Filippo Lunghini
- Laboratory of Chemoinformatics - UMR7140, University of Strasbourg, 4 Rue Blaise Pascal, 67081, Strasbourg, France.,Toxicological and Environmental Risk Assessment unit, Solvay S.A., 85, avenue des Frères Perret, 69192, St. Fons, France
| | - Marcou Gilles
- Laboratory of Chemoinformatics - UMR7140, University of Strasbourg, 4 Rue Blaise Pascal, 67081, Strasbourg, France
| | - Philippe Azam
- Toxicological and Environmental Risk Assessment unit, Solvay S.A., 85, avenue des Frères Perret, 69192, St. Fons, France
| | - Marie-Hélène Enrici
- Toxicological and Environmental Risk Assessment unit, Solvay S.A., 85, avenue des Frères Perret, 69192, St. Fons, France
| | - Erik Van Miert
- Toxicological and Environmental Risk Assessment unit, Solvay S.A., 85, avenue des Frères Perret, 69192, St. Fons, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics - UMR7140, University of Strasbourg, 4 Rue Blaise Pascal, 67081, Strasbourg, France
| |
Collapse
|
18
|
Zhao L, Ciallella HL, Aleksunes LM, Zhu H. Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling. Drug Discov Today 2020; 25:1624-1638. [PMID: 32663517 PMCID: PMC7572559 DOI: 10.1016/j.drudis.2020.07.005] [Citation(s) in RCA: 75] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 06/26/2020] [Accepted: 07/06/2020] [Indexed: 02/06/2023]
Abstract
Advancing a new drug to market requires substantial investments in time as well as financial resources. Crucial bioactivities for drug candidates, including their efficacy, pharmacokinetics (PK), and adverse effects, need to be investigated during drug development. With advancements in chemical synthesis and biological screening technologies over the past decade, a large amount of biological data points for millions of small molecules have been generated and are stored in various databases. These accumulated data, combined with new machine learning (ML) approaches, such as deep learning, have shown great potential to provide insights into relevant chemical structures to predict in vitro, in vivo, and clinical outcomes, thereby advancing drug discovery and development in the big data era.
Collapse
Affiliation(s)
- Linlin Zhao
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA
| | - Heather L Ciallella
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA
| | - Lauren M Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ 08854, USA
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA; Department of Chemistry, Rutgers University, Camden, NJ 08102, USA.
| |
Collapse
|
19
|
Donmez A, Rifaioglu AS, Acar A, Doğan T, Cetin-Atalay R, Atalay V. iBioProVis: interactive visualization and analysis of compound bioactivity space. Bioinformatics 2020; 36:4227-4230. [PMID: 32407491 PMCID: PMC7454317 DOI: 10.1093/bioinformatics/btaa496] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 03/25/2020] [Accepted: 05/08/2020] [Indexed: 12/23/2022] Open
Abstract
SUMMARY iBioProVis is an interactive tool for visual analysis of the compound bioactivity space in the context of target proteins, drugs and drug candidate compounds. iBioProVis tool takes target protein identifiers and, optionally, compound SMILES as input, and uses the state-of-the-art non-linear dimensionality reduction method t-Distributed Stochastic Neighbor Embedding (t-SNE) to plot the distribution of compounds embedded in a 2D map, based on the similarity of structural properties of compounds and in the context of compounds' cognate targets. Similar compounds, which are embedded to proximate points on the 2D map, may bind the same or similar target proteins. Thus, iBioProVis can be used to easily observe the structural distribution of one or two target proteins' known ligands on the 2D compound space, and to infer new binders to the same protein, or to infer new potential target(s) for a compound of interest, based on this distribution. Principal component analysis (PCA) projection of the input compounds is also provided, Hence the user can interactively observe the same compound or a group of selected compounds which is projected by both PCA and embedded by t-SNE. iBioProVis also provides detailed information about drugs and drug candidate compounds through cross-references to widely used and well-known databases, in the form of linked table views. Two use-case studies were demonstrated, one being on angiotensin-converting enzyme 2 (ACE2) protein which is Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Spike protein receptor. ACE2 binding compounds and seven antiviral drugs were closely embedded in which two of them have been under clinical trial for Coronavirus disease 19 (COVID-19). AVAILABILITY AND IMPLEMENTATION iBioProVis and its carefully filtered dataset are available at https://ibpv.kansil.org/ for public use. CONTACT vatalay@metu.edu.tr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ataberk Donmez
- Department of Computer Engineering, METU, Ankara 06800, Turkey
| | - Ahmet Sureyya Rifaioglu
- Department of Computer Engineering, METU, Ankara 06800, Turkey
- Department of Computer Engineering, İskenderun Technical University, Hatay 31200, Turkey
| | - Aybar Acar
- Department of Health Informatics, KanSiL, Graduate School of Informatics, METU
| | - Tunca Doğan
- Department of Computer Engineering, Hacettepe University, 06800 Ankara, Turkey
- Institute of Informatics, Hacettepe University, 06800 Ankara, Turkey
| | - Rengul Cetin-Atalay
- Department of Health Informatics, KanSiL, Graduate School of Informatics, METU
- Department of Medicine, Section of Pulmonary and Critical Care Medicine, the University of Chicago, Chicago, IL 60637, USA
| | - Volkan Atalay
- Department of Computer Engineering, METU, Ankara 06800, Turkey
| |
Collapse
|
20
|
Capecchi A, Zhang A, Reymond JL. Populating Chemical Space with Peptides Using a Genetic Algorithm. J Chem Inf Model 2020; 60:121-132. [PMID: 31868369 DOI: 10.1021/acs.jcim.9b01014] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In drug discovery, one uses chemical space as a concept to organize molecules according to their structures and properties. One often would like to generate new possible molecules at a specific location in the chemical space marked by a molecule of interest. Herein, we report the peptide design genetic algorithm (PDGA, code available at https://github.com/reymond-group/PeptideDesignGA ), a computational tool capable of producing peptide sequences of various topologies (linear, cyclic/polycyclic, or dendritic) in proximity of any molecule of interest in a chemical space defined by macromolecule extended atom-pair fingerprint (MXFP), an atom-pair fingerprint describing molecular shape and pharmacophores. We show that the PDGA generates high-similarity analogues of bioactive peptides with diverse peptide chain topologies and of nonpeptide target molecules. We illustrate the chemical space accessible by the PDGA with an interactive 3D map of the MXFP property space available at http://faerun.gdb.tools/ . The PDGA should be generally useful to generate peptides at any location in the chemical space.
Collapse
Affiliation(s)
- Alice Capecchi
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland
| | - Alain Zhang
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland
| |
Collapse
|
21
|
Pereira F. Machine learning methods to predict the crystallization propensity of small organic molecules. CrystEngComm 2020. [DOI: 10.1039/d0ce00070a] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Machine learning algorithms were explored for the prediction of the crystallization propensity based on molecular descriptors and fingerprints generated from 2D chemical structures and 3D chemical structures optimized with empirical methods.
Collapse
Affiliation(s)
- Florbela Pereira
- LAQV and REQUIMTE
- Departamento de Química
- Faculdade de Ciências e Tecnologia
- Universidade Nova de Lisboa
- Caparica
| |
Collapse
|
22
|
Lunghini F, Marcou G, Azam P, Horvath D, Patoux R, Van Miert E, Varnek A. Consensus models to predict oral rat acute toxicity and validation on a dataset coming from the industrial context. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2019; 30:879-897. [PMID: 31607169 DOI: 10.1080/1062936x.2019.1672089] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 09/21/2019] [Indexed: 06/10/2023]
Abstract
We report predictive models of acute oral systemic toxicity representing a follow-up of our previous work in the framework of the NICEATM project. It includes the update of original models through the addition of new data and an external validation of the models using a dataset relevant for the chemical industry context. A regression model for LD50 and multi-class classification model for toxicity classes according to the Global Harmonized System categories were prepared. ISIDA descriptors were used to encode molecular structures. Machine learning algorithms included support vector machine (SVM), random forest (RF) and naïve Bayesian. Selected individual models were combined in consensus. The different datasets were compared using the generative topographic mapping approach. It appeared that the NICEATM datasets were lacking some relevant chemotypes for chemical industry. The new models trained on enlarged data sets have applicability domains (AD) sufficiently large to accommodate industrial compounds. The fraction of compounds inside the models' AD increased from 58% (NICEATM model) to 94% (new model). The increase of training sets improved models' prediction performance: RMSE values decreased from 0.56 to 0.47 and balanced accuracies increased from 0.69 to 0.71 for NICEATM and new models, respectively.
Collapse
Affiliation(s)
- F Lunghini
- Laboratory of Chemoinformatics, University of Strasbourg, Strasbourg, France
- Toxicological and Environmental Risk Assessment unit, Solvay S.A., St. Fons, France
| | - G Marcou
- Laboratory of Chemoinformatics, University of Strasbourg, Strasbourg, France
| | - P Azam
- Toxicological and Environmental Risk Assessment unit, Solvay S.A., St. Fons, France
| | - D Horvath
- Laboratory of Chemoinformatics, University of Strasbourg, Strasbourg, France
| | - R Patoux
- Toxicological and Environmental Risk Assessment unit, Solvay S.A., St. Fons, France
| | - E Van Miert
- Toxicological and Environmental Risk Assessment unit, Solvay S.A., St. Fons, France
| | - A Varnek
- Laboratory of Chemoinformatics, University of Strasbourg, Strasbourg, France
| |
Collapse
|
23
|
Horvath D, Marcou G, Varnek A. Generative topographic mapping in drug design. DRUG DISCOVERY TODAY. TECHNOLOGIES 2019; 32-33:99-107. [PMID: 33386101 DOI: 10.1016/j.ddtec.2020.06.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 06/10/2020] [Accepted: 06/18/2020] [Indexed: 06/12/2023]
Abstract
This is a review article of Generative Topographic Mapping (GTM) - a non-linear dimensionality reduction technique producing generative 2D maps of high-dimensional vector spaces - and its specific applications in Drug Design (chemical space cartography, compound library design and analysis, virtual screening, pharmacological profiling, de novo drug design, conformational space & docking interaction cartography, etc.) Written by chemoinformaticians for potential users among medicinal chemists and biologists, the article purposely avoids all underlying mathematics. First, the GTM concept is intuitively explained, based on the strong analogies with the rather popular Self-Organizing Maps (SOMs), which are well established library analysis tools. GTM is basically a fuzzy-logics-based generalization of SOMs. The second part of the review, some of published GTM applications in drug design are briefly revisited.
Collapse
Affiliation(s)
- Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France.
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France.
| |
Collapse
|
24
|
Lin A, Beck B, Horvath D, Marcou G, Varnek A. Diversifying chemical libraries with generative topographic mapping. J Comput Aided Mol Des 2019; 34:805-815. [DOI: 10.1007/s10822-019-00215-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 07/15/2019] [Indexed: 01/28/2023]
|
25
|
Esteki M, Shahsavari Z, Simal-Gandara J. Gas Chromatographic Fingerprinting Coupled to Chemometrics for Food Authentication. FOOD REVIEWS INTERNATIONAL 2019. [DOI: 10.1080/87559129.2019.1649691] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- M. Esteki
- Department of Chemistry, University of Zanjan, Zanjan, Iran
| | - Z. Shahsavari
- Department of Chemistry, University of Zanjan, Zanjan, Iran
| | - J. Simal-Gandara
- Nutrition and Bromatology Group, Department of Analytical and Food Chemistry, Faculty of Food Science and Technology, University of Vigo – Ourense Campus, Ourense, Spain
| |
Collapse
|
26
|
Osypenko A, Dhers S, Lehn JM. Pattern Generation and Information Transfer through a Liquid/Liquid Interface in 3D Constitutional Dynamic Networks of Imine Ligands in Response to Metal Cation Effectors. J Am Chem Soc 2019; 141:12724-12737. [DOI: 10.1021/jacs.9b05438] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Artem Osypenko
- Laboratoire de Chimie Supramoléculaire, Institut de Science et d’Ingénierie Supramoléculaires (ISIS), Université de Strasbourg, 8 allée Gaspard Monge, 67000 Strasbourg, France
| | - Sébastien Dhers
- Laboratoire de Chimie Supramoléculaire, Institut de Science et d’Ingénierie Supramoléculaires (ISIS), Université de Strasbourg, 8 allée Gaspard Monge, 67000 Strasbourg, France
| | - Jean-Marie Lehn
- Laboratoire de Chimie Supramoléculaire, Institut de Science et d’Ingénierie Supramoléculaires (ISIS), Université de Strasbourg, 8 allée Gaspard Monge, 67000 Strasbourg, France
| |
Collapse
|
27
|
Lunghini F, Marcou G, Azam P, Patoux R, Enrici MH, Bonachera F, Horvath D, Varnek A. QSPR models for bioconcentration factor (BCF): are they able to predict data of industrial interest? SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2019; 30:507-524. [PMID: 31244346 DOI: 10.1080/1062936x.2019.1626278] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 05/29/2019] [Indexed: 05/27/2023]
Abstract
The bioconcentration factor (BCF), a key parameter required by the REACH regulation, estimates the tendency for a xenobiotic to concentrate inside living organisms. In silico methods can be valid alternatives to costly data measurements. However, in the industrial context, these theoretical approaches may fail to predict BCF with reasonable accuracy. We analyzed whether models built on public data only have adequate performances when challenged to predict industrial compounds. A new set of 1129 compounds has been collected by merging publicly available datasets. Generative Topographic Mapping was employed to compare this chemical space with a set of new compounds issued from the industry. Some new chemotypes absent in the training set (such as siloxanes) have been detected. A new BCF model has been built using ISIDA (In SIlico design and Data Analysis) fragment descriptors, support vector regression and random forest machine-learning methods. It has been externally validated on: (i) collected data from the literature and (ii) industrial data. The latter also served as benchmark for the freely available tools VEGA, EPISuite, TEST, OPERA. New model performs (RMSE of 0.58 log BCF units) comparably to existing ones but benefits of an extended applicability, covering the industrial set chemical space (78% data coverage).
Collapse
Affiliation(s)
- F Lunghini
- a Laboratory of Chemoinformatics , University of Strasbourg , Strasbourg , France
- b Solvay S.A ., France
| | - G Marcou
- a Laboratory of Chemoinformatics , University of Strasbourg , Strasbourg , France
| | | | | | | | - F Bonachera
- a Laboratory of Chemoinformatics , University of Strasbourg , Strasbourg , France
| | - D Horvath
- a Laboratory of Chemoinformatics , University of Strasbourg , Strasbourg , France
| | - A Varnek
- a Laboratory of Chemoinformatics , University of Strasbourg , Strasbourg , France
| |
Collapse
|
28
|
Awale M, Sirockin F, Stiefl N, Reymond JL. Medicinal Chemistry Aware Database GDBMedChem. Mol Inform 2019; 38:e1900031. [PMID: 31169974 DOI: 10.1002/minf.201900031] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 05/21/2019] [Indexed: 12/17/2022]
Abstract
The generated database GDB17 enumerates 166.4 billion possible molecules up to 17 atoms of C, N, O, S and halogens following simple chemical stability and synthetic feasibility rules, however medicinal chemistry criteria are not taken into account. Here we applied rules inspired by medicinal chemistry to exclude problematic functional groups and complex molecules from GDB17, and sampled the resulting subset uniformly across molecular size, stereochemistry and polarity to form GDBMedChem as a compact collection of 10 million small molecules. This collection has reduced complexity and better synthetic accessibility than the entire GDB17 but retains higher sp3 -carbon fraction and natural product likeness scores compared to known drugs. GDBMedChem molecules are more diverse and very different from known molecules in terms of substructures and represent an unprecedented source of diversity for drug design. GDBMedChem is available for 3D-visualization, similarity searching and for download at http://gdb.unibe.ch.
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Finton Sirockin
- Novartis Institutes for Biomedical Research, Basel, Switzerland
| | - Nikolaus Stiefl
- Novartis Institutes for Biomedical Research, Basel, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| |
Collapse
|
29
|
Capecchi A, Awale M, Probst D, Reymond JL. PubChem and ChEMBL beyond Lipinski. Mol Inform 2019; 38:e1900016. [PMID: 30844149 DOI: 10.1002/minf.201900016] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 02/18/2019] [Indexed: 12/13/2022]
Abstract
Seven million of the currently 94 million entries in the PubChem database break at least one of the four Lipinski constraints for oral bioavailability, 183,185 of which are also found in the ChEMBL database. These non-Lipinski PubChem (NLP) and ChEMBL (NLC) subsets are interesting because they contain new modalities that can display biological properties not accessible to small molecule drugs. Unfortunately, the current search tools in PubChem and ChEMBL are designed for small molecules and are not well suited to explore these subsets, which therefore remain poorly appreciated. Herein we report MXFP (macromolecule extended atom-pair fingerprint), a 217-D fingerprint tailored to analyze large molecules in terms of molecular shape and pharmacophores. We implement MXFP in two web-based applications, the first one to visualize NLP and NLC interactively using Faerun (http://faerun.gdb.tools/), the second one to perform MXFP nearest neighbor searches in NLP and NLC (http://similaritysearch.gdb.tools/). We show that these tools provide a meaningful insight into the diversity of large molecules in NLP and NLC. The interactive tools presented here are publicly available at http://gdb.unibe.ch and can be used freely to explore and better understand the diversity of non-Lipinski molecules in PubChem and ChEMBL.
Collapse
Affiliation(s)
- Alice Capecchi
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Mahendra Awale
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Daniel Probst
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| |
Collapse
|
30
|
Probabilistic ancestry maps: a method to assess and visualize population substructures in genetics. BMC Bioinformatics 2019; 20:116. [PMID: 30845922 PMCID: PMC6407257 DOI: 10.1186/s12859-019-2680-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Accepted: 02/14/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Principal component analysis (PCA) is a standard method to correct for population stratification in ancestry-specific genome-wide association studies (GWASs) and is used to cluster individuals by ancestry. Using the 1000 genomes project data, we examine how non-linear dimensionality reduction methods such as t-distributed stochastic neighbor embedding (t-SNE) or generative topographic mapping (GTM) can be used to provide improved ancestry maps by accounting for a higher percentage of explained variance in ancestry, and how they can help to estimate the number of principal components necessary to account for population stratification. GTM generates posterior probabilities of class membership which can be used to assess the probability of an individual to belong to a given population - as opposed to t-SNE, GTM can be used for both clustering and classification. RESULTS PCA only partially identifies population clusters and does not separate most populations within a given continent, such as Japanese and Han Chinese in East Asia, or Mende and Yoruba in Africa. t-SNE and GTM, taking into account more data variance, can identify more fine-grained population clusters. GTM can be used to build probabilistic classification models, and is as efficient as support vector machine (SVM) for classifying 1000 Genomes Project populations. CONCLUSION The main interest of probabilistic GTM maps is to attain two objectives with only one map: provide a better visualization that separates populations efficiently, and infer genetic ancestry for individuals or populations. This paper is a first application of GTM for ancestry classification models. Our code ( https://github.com/hagax8/ancestry_viz ) and interactive visualizations ( https://lovingscience.com/ancestries ) are available online.
Collapse
|
31
|
Sattarov B, Baskin II, Horvath D, Marcou G, Bjerrum EJ, Varnek A. De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping. J Chem Inf Model 2019; 59:1182-1196. [PMID: 30785751 DOI: 10.1021/acs.jcim.8b00751] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topographic map. Targeted map zones can be used for generating novel molecular structures by sampling associated latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compound properties "on the fly". The generated focused molecular libraries were shown to contain original and a priori feasible compounds which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estimation procedures (pharmacophore matching, docking).
Collapse
Affiliation(s)
- Boris Sattarov
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Igor I Baskin
- Faculty of Physics , M.V. Lomonosov Moscow State University , Leninskie Gory , Moscow 19991 , Russia
| | - Dragos Horvath
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Gilles Marcou
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Esben Jannik Bjerrum
- Wildcard Pharmaceutical Consulting, Zeaborg Science Center, Frødings Allé 41 , 2860 Søborg , Denmark
| | - Alexandre Varnek
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| |
Collapse
|
32
|
Pros and cons of virtual screening based on public “Big Data”: In silico mining for new bromodomain inhibitors. Eur J Med Chem 2019; 165:258-272. [DOI: 10.1016/j.ejmech.2019.01.010] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 12/24/2018] [Accepted: 01/05/2019] [Indexed: 12/22/2022]
|
33
|
Delalande C, Awale M, Rubin M, Probst D, Ozhathil LC, Gertsch J, Abriel H, Reymond JL. Optimizing TRPM4 inhibitors in the MHFP6 chemical space. Eur J Med Chem 2019; 166:167-177. [DOI: 10.1016/j.ejmech.2019.01.048] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Revised: 12/18/2018] [Accepted: 01/19/2019] [Indexed: 12/12/2022]
|
34
|
Lin A, Horvath D, Marcou G, Beck B, Varnek A. Multi-task generative topographic mapping in virtual screening. J Comput Aided Mol Des 2019; 33:331-343. [PMID: 30739238 DOI: 10.1007/s10822-019-00188-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 02/02/2019] [Indexed: 12/16/2022]
Abstract
The previously reported procedure to generate "universal" Generative Topographic Maps (GTMs) of the drug-like chemical space is in practice a multi-task learning process, in which both operational GTM parameters (example: map grid size) and hyperparameters (key example: the molecular descriptor space to be used) are being chosen by an evolutionary process in order to fit/select "universal" GTM manifolds. After selection (a one-time task aimed at optimizing the compromise in terms of neighborhood behavior compliance, over a large pool of various biological targets), for any further use the manifolds are ready to provide "fit-free" predictive models. Using any structure-activity set-irrespectively whether the associated target served at map fitting stage or not-the generation or "coloring" a property landscape enables predicting the property for any external molecule, with zero additional fitable parameters involved. While previous works have signaled the excellent behavior of such models in aggressive three-fold cross-validation assessments of their predictive power, the present work wished to explore their behavior in Virtual Screening (VS), here simulated on hand of external DUD ligand and decoy series that are fully disjoint from the ChEMBL-extracted landscape coloring sets. Beyond the rather robust results of the universal GTM manifolds in this challenge, it could be shown that the descriptor spaces selected by the evolutionary multi-task learner were intrinsically able to serve as an excellent support for many other VS procedures, starting from parameter-free similarity searching, to local (target-specific) GTM models, to parameter-rich, nonlinear Random Forest and Neural Network approaches.
Collapse
Affiliation(s)
- Arkadii Lin
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France.,Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorferstrasse 65, 88397, Biberach an der Riss, Germany
| | - Dragos Horvath
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France
| | - Bernd Beck
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorferstrasse 65, 88397, Biberach an der Riss, Germany
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France.
| |
Collapse
|
35
|
Volochnyuk DM, Ryabukhin SV, Moroz YS, Savych O, Chuprina A, Horvath D, Zabolotna Y, Varnek A, Judd DB. Evolution of commercially available compounds for HTS. Drug Discov Today 2019; 24:390-402. [DOI: 10.1016/j.drudis.2018.10.016] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Revised: 10/02/2018] [Accepted: 10/30/2018] [Indexed: 12/17/2022]
|
36
|
Karlov DS, Sosnin S, Tetko IV, Fedorov MV. Chemical space exploration guided by deep neural networks. RSC Adv 2019; 9:5151-5157. [PMID: 35514634 PMCID: PMC9060647 DOI: 10.1039/c8ra10182e] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 01/29/2019] [Indexed: 11/21/2022] Open
Abstract
A parametric t-SNE approach based on deep feed-forward neural networks was applied to the chemical space visualization problem. It is able to retain more information than certain dimensionality reduction techniques used for this purpose (principal component analysis (PCA), multidimensional scaling (MDS)). The applicability of this method to some chemical space navigation tasks (activity cliffs and activity landscapes identification) is discussed. We created a simple web tool to illustrate our work (http://space.syntelly.com). A parametric t-SNE approach based on deep feed-forward neural networks was applied to the chemical space visualization problem.![]()
Collapse
Affiliation(s)
- Dmitry S. Karlov
- Skolkovo Institute of Science and Technology
- Skolkovo Innovation Center
- Moscow 143026
- Russia
| | - Sergey Sosnin
- Skolkovo Institute of Science and Technology
- Skolkovo Innovation Center
- Moscow 143026
- Russia
- Syntelly LLC
| | - Igor V. Tetko
- Helmholtz Zentrum München – Research Center for Environmental Health (GmbH)
- Institute of Structural Biology
- Germany
- BIGCHEM GmbH
- Germany
| | - Maxim V. Fedorov
- Skolkovo Institute of Science and Technology
- Skolkovo Innovation Center
- Moscow 143026
- Russia
- Syntelly LLC
| |
Collapse
|
37
|
Glavatskikh M, Madzhidov T, Baskin II, Horvath D, Nugmanov R, Gimadiev T, Marcou G, Varnek A. Visualization and Analysis of Complex Reaction Data: The Case of Tautomeric Equilibria. Mol Inform 2018; 37:e1800056. [DOI: 10.1002/minf.201800056] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Accepted: 06/29/2018] [Indexed: 11/07/2022]
Affiliation(s)
- Marta Glavatskikh
- Laboratoire de Chémoinformatique, UMR 7140 CNRS; Université de Strasbourg; 1, rue Blaise Pascal 67000 Strasbourg France
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institut of Chemistry; Kazan Federal University; Kremlevskaya str. 18 Kazan Russia
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institut of Chemistry; Kazan Federal University; Kremlevskaya str. 18 Kazan Russia
| | - Igor I. Baskin
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institut of Chemistry; Kazan Federal University; Kremlevskaya str. 18 Kazan Russia
- Faculty of Physics; Lomonosov Moscow State University; Leninskie Gory 1/2 119991 Moscow Russia
| | - Dragos Horvath
- Laboratoire de Chémoinformatique, UMR 7140 CNRS; Université de Strasbourg; 1, rue Blaise Pascal 67000 Strasbourg France
| | - Ramil Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institut of Chemistry; Kazan Federal University; Kremlevskaya str. 18 Kazan Russia
| | - Timur Gimadiev
- Laboratoire de Chémoinformatique, UMR 7140 CNRS; Université de Strasbourg; 1, rue Blaise Pascal 67000 Strasbourg France
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institut of Chemistry; Kazan Federal University; Kremlevskaya str. 18 Kazan Russia
| | - Gilles Marcou
- Laboratoire de Chémoinformatique, UMR 7140 CNRS; Université de Strasbourg; 1, rue Blaise Pascal 67000 Strasbourg France
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique, UMR 7140 CNRS; Université de Strasbourg; 1, rue Blaise Pascal 67000 Strasbourg France
| |
Collapse
|
38
|
Sidorov P, Davioud-Charvet E, Marcou G, Horvath D, Varnek A. AntiMalarial Mode of Action (AMMA) Database: Data Selection, Verification and Chemical Space Analysis. Mol Inform 2018; 37:e1800021. [DOI: 10.1002/minf.201800021] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Accepted: 04/14/2018] [Indexed: 12/15/2022]
Affiliation(s)
- Pavel Sidorov
- Laboratoire de Chemoinformatique; UMR 7140 CNRS-Univ. Strasbourg; 1 rue Blaise Pascal Strasbourg 67000 France
| | - Elisabeth Davioud-Charvet
- Laboratoire d'Innovation Moléculaire et Applications (LIMA); UMR7042 CNRS-Unistra-UHA; Bioorganic and Medicinal Chemistry Team, European School of Chemistry, Polymers and Materials (ECPM); 25, rue Becquerel Strasbourg F-67087 France
| | - Gilles Marcou
- Laboratoire de Chemoinformatique; UMR 7140 CNRS-Univ. Strasbourg; 1 rue Blaise Pascal Strasbourg 67000 France
| | - Dragos Horvath
- Laboratoire de Chemoinformatique; UMR 7140 CNRS-Univ. Strasbourg; 1 rue Blaise Pascal Strasbourg 67000 France
| | - Alexandre Varnek
- Laboratoire de Chemoinformatique; UMR 7140 CNRS-Univ. Strasbourg; 1 rue Blaise Pascal Strasbourg 67000 France
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry; Kazan Federal University; Kazan Russia
| |
Collapse
|
39
|
Konovalov AI, Antipin IS, Burilov VA, Madzhidov TI, Kurbangalieva AR, Nemtarev AV, Solovieva SE, Stoikov II, Mamedov VA, Zakharova LY, Gavrilova EL, Sinyashin OG, Balova IA, Vasilyev AV, Zenkevich IG, Krasavin MY, Kuznetsov MA, Molchanov AP, Novikov MS, Nikolaev VA, Rodina LL, Khlebnikov AF, Beletskaya IP, Vatsadze SZ, Gromov SP, Zyk NV, Lebedev AT, Lemenovskii DA, Petrosyan VS, Nenaidenko VG, Negrebetskii VV, Baukov YI, Shmigol’ TA, Korlyukov AA, Tikhomirov AS, Shchekotikhin AE, Traven’ VF, Voskresenskii LG, Zubkov FI, Golubchikov OA, Semeikin AS, Berezin DB, Stuzhin PA, Filimonov VD, Krasnokutskaya EA, Fedorov AY, Nyuchev AV, Orlov VY, Begunov RS, Rusakov AI, Kolobov AV, Kofanov ER, Fedotova OV, Egorova AY, Charushin VN, Chupakhin ON, Klimochkin YN, Osyanin VA, Reznikov AN, Fisyuk AS, Sagitullina GP, Aksenov AV, Aksenov NA, Grachev MK, Maslennikova VI, Koroteev MP, Brel’ AK, Lisina SV, Medvedeva SM, Shikhaliev KS, Suboch GA, Tovbis MS, Mironovich LM, Ivanov SM, Kurbatov SV, Kletskii ME, Burov ON, Kobrakov KI, Kuznetsov DN. Modern Trends of Organic Chemistry in Russian Universities. RUSSIAN JOURNAL OF ORGANIC CHEMISTRY 2018. [DOI: 10.1134/s107042801802001x] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
40
|
Abstract
INTRODUCTION Activity landscapes (ALs) are representations and models of compound data sets annotated with a target-specific activity. In contrast to quantitative structure-activity relationship (QSAR) models, ALs aim at characterizing structure-activity relationships (SARs) on a large-scale level encompassing all active compounds for specific targets. The popularity of AL modeling has grown substantially with the public availability of large activity-annotated compound data sets. AL modeling crucially depends on molecular representations and similarity metrics used to assess structural similarity. Areas covered: The concepts of AL modeling are introduced and its basis in quantitatively assessing molecular similarity is discussed. The different types of AL modeling approaches are introduced. AL designs can broadly be divided into three categories: compound-pair based, dimensionality reduction, and network approaches. Recent developments for each of these categories are discussed focusing on the application of mathematical, statistical, and machine learning tools for AL modeling. AL modeling using chemical space networks is covered in more detail. Expert opinion: AL modeling has remained a largely descriptive approach for the analysis of SARs. Beyond mere visualization, the application of analytical tools from statistics, machine learning and network theory has aided in the sophistication of AL designs and provides a step forward in transforming ALs from descriptive to predictive tools. To this end, optimizing representations that encode activity relevant features of molecules might prove to be a crucial step.
Collapse
Affiliation(s)
- Martin Vogt
- a Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry , Rheinische Friedrich-Wilhelms-Universität , Bonn , Germany
| |
Collapse
|
41
|
Lin A, Horvath D, Afonina V, Marcou G, Reymond JL, Varnek A. Mapping of the Available Chemical Space versus the Chemical Universe of Lead-Like Compounds. ChemMedChem 2018; 13:540-554. [PMID: 29154440 DOI: 10.1002/cmdc.201700561] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Revised: 11/07/2017] [Indexed: 12/15/2022]
Abstract
This is, to our knowledge, the most comprehensive analysis to date based on generative topographic mapping (GTM) of fragment-like chemical space (40 million molecules with no more than 17 heavy atoms, both from the theoretically enumerated GDB-17 and real-world PubChem/ChEMBL databases). The challenge was to prove that a robust map of fragment-like chemical space can actually be built, in spite of a limited (≪105 ) maximal number of compounds ("frame set") usable for fitting the GTM manifold. An evolutionary map building strategy has been updated with a "coverage check" step, which discards manifolds failing to accommodate compounds outside the frame set. The evolved map has a good propensity to separate actives from inactives for more than 20 external structure-activity sets. It was proven to properly accommodate the entire collection of 40 m compounds. Next, it served as a library comparison tool to highlight biases of real-world molecules (PubChem and ChEMBL) versus the universe of all possible species represented by FDB-17, a fragment-like subset of GDB-17 containing 10 million molecules. Specific patterns, proper to some libraries and absent from others (diversity holes), were highlighted.
Collapse
Affiliation(s)
- Arkadii Lin
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4 Blaise Pascal str., 67081, Strasbourg, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4 Blaise Pascal str., 67081, Strasbourg, France
| | - Valentina Afonina
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4 Blaise Pascal str., 67081, Strasbourg, France.,Laboratory of Chemoinformatics and Molecular Modeling, Department of Organic Chemistry, A.M. Butlerov Institute of Chemistry, Kazan Federal University, 18 Kremlyovskaya str., 420008, Kazan, Russia
| | - Gilles Marcou
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4 Blaise Pascal str., 67081, Strasbourg, France
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Berne, 3 Freiestrasse, 3012, Berne, Switzerland
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4 Blaise Pascal str., 67081, Strasbourg, France
| |
Collapse
|
42
|
Druzhilovskiy DS, Rudik AV, Filimonov DA, Gloriozova TA, Lagunin AA, Dmitriev AV, Pogodin PV, Dubovskaya VI, Ivanov SM, Tarasova OA, Bezhentsev VM, Murtazalieva KA, Semin MI, Maiorov IS, Gaur AS, Sastry GN, Poroikov VV. Computational platform Way2Drug: from the prediction of biological activity to drug repurposing. Russ Chem Bull 2018. [DOI: 10.1007/s11172-017-1954-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
43
|
Abstract
Various methods of machine learning, supervised and unsupervised, linear and nonlinear, classification and regression, in combination with various types of molecular descriptors, both "handcrafted" and "data-driven," are considered in the context of their use in computational toxicology. The use of multiple linear regression, variants of naïve Bayes classifier, k-nearest neighbors, support vector machine, decision trees, ensemble learning, random forest, several types of neural networks, and deep learning is the focus of attention of this review. The role of fragment descriptors, graph mining, and graph kernels is highlighted. The application of unsupervised methods, such as Kohonen's self-organizing maps and related approaches, which allow for combining predictions with data analysis and visualization, is also considered. The necessity of applying a wide range of machine learning methods in computational toxicology is underlined.
Collapse
Affiliation(s)
- Igor I Baskin
- Faculty of Physics, M.V. Lomonosov Moscow State University, Moscow, Russian Federation.
- Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russian Federation.
| |
Collapse
|
44
|
Horvath D, Marcou G, Varnek A. Monitoring of the Conformational Space of Dipeptides by Generative Topographic Mapping. Mol Inform 2017; 37. [DOI: 10.1002/minf.201700115] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 11/08/2017] [Indexed: 12/28/2022]
Affiliation(s)
- Dragos Horvath
- Laboratoire de Chémoinformatique; UMR 7140 CNRS-Université de Strasbourg; 1 rue Blaise Pascal Strasbourg 67000 France
| | - Gilles Marcou
- Laboratoire de Chémoinformatique; UMR 7140 CNRS-Université de Strasbourg; 1 rue Blaise Pascal Strasbourg 67000 France
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique; UMR 7140 CNRS-Université de Strasbourg; 1 rue Blaise Pascal Strasbourg 67000 France
| |
Collapse
|
45
|
Visini R, Arús-Pous J, Awale M, Reymond JL. Virtual Exploration of the Ring Systems Chemical Universe. J Chem Inf Model 2017; 57:2707-2718. [PMID: 29019686 DOI: 10.1021/acs.jcim.7b00457] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Here, we explore the chemical space of all virtually possible organic molecules focusing on ring systems, which represent the cyclic cores of organic molecules obtained by removing all acyclic bonds and converting all remaining atoms to carbon. This approach circumvents the combinatorial explosion encountered when enumerating the molecules themselves. We report the chemical universe database GDB4c containing 916 130 ring systems up to four saturated or aromatic rings and maximum ring size of 14 atoms and GDB4c3D containing the corresponding 6 555 929 stereoisomers. Almost all (98.6%) of these ring systems are unknown and represent chiral 3D-shaped macrocycles containing small rings and quaternary centers reminiscent of polycyclic natural products. We envision that GDB4c can serve to select new ring systems from which to design analogs of such natural products. The database is available for download at www.gdb.unibe.ch together with interactive visualization and search tools as a resource for molecular design.
Collapse
Affiliation(s)
- Ricardo Visini
- Department of Chemistry and Biochemistry, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Josep Arús-Pous
- Department of Chemistry and Biochemistry, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Mahendra Awale
- Department of Chemistry and Biochemistry, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|
46
|
Predictive cartography of metal binders using generative topographic mapping. J Comput Aided Mol Des 2017; 31:701-714. [DOI: 10.1007/s10822-017-0033-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2017] [Accepted: 06/11/2017] [Indexed: 12/27/2022]
|
47
|
Horvath D, Baskin I, Marcou G, Varnek A. Generative Topographic Mapping of Conformational Space. Mol Inform 2017; 36. [PMID: 28421706 DOI: 10.1002/minf.201700036] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 03/31/2017] [Indexed: 12/17/2022]
Abstract
Herein, Generative Topographic Mapping (GTM) was challenged to produce planar projections of the high-dimensional conformational space of complex molecules (the 1LE1 peptide). GTM is a probability-based mapping strategy, and its capacity to support property prediction models serves to objectively assess map quality (in terms of regression statistics). The properties to predict were total, non-bonded and contact energies, surface area and fingerprint darkness. Map building and selection was controlled by a previously introduced evolutionary strategy allowed to choose the best-suited conformational descriptors, options including classical terms and novel atom-centric autocorrellograms. The latter condensate interatomic distance patterns into descriptors of rather low dimensionality, yet precise enough to differentiate between close favorable contacts and atom clashes. A subset of 20 K conformers of the 1LE1 peptide, randomly selected from a pool of 2 M geometries (generated by the S4MPLE tool) was employed for map building and cross-validation of property regression models. The GTM build-up challenge reached robust three-fold cross-validated determination coefficients of Q2 =0.7…0.8, for all modeled properties. Mapping of the full 2 M conformer set produced intuitive and information-rich property landscapes. Functional and folding subspaces appear as well-separated zones, even though RMSD with respect to the PDB structure was never used as a selection criterion of the maps.
Collapse
Affiliation(s)
- Dragos Horvath
- Laboratoire de Chémoinformatique, UMR 7140 CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, Strasbourg, 67000, France
| | | | - Gilles Marcou
- Laboratoire de Chémoinformatique, UMR 7140 CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, Strasbourg, 67000, France
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique, UMR 7140 CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, Strasbourg, 67000, France
| |
Collapse
|
48
|
Kontijevskis A. Mapping of Drug-like Chemical Universe with Reduced Complexity Molecular Frameworks. J Chem Inf Model 2017; 57:680-699. [DOI: 10.1021/acs.jcim.7b00006] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
49
|
González-Medina M, Owen JR, El-Elimat T, Pearce CJ, Oberlies NH, Figueroa M, Medina-Franco JL. Scaffold Diversity of Fungal Metabolites. Front Pharmacol 2017; 8:180. [PMID: 28420994 PMCID: PMC5376591 DOI: 10.3389/fphar.2017.00180] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 03/17/2017] [Indexed: 11/26/2022] Open
Abstract
Many drug discovery projects rely on commercial compounds to discover active leads. However, current commercial libraries, with mostly synthetic compounds, access a small fraction of the possible chemical diversity. Natural products, in contrast, possess a vast structural diversity and have proven to be an outstanding source of new drugs. Several chemoinformatic analyses of natural products have demonstrated their diversity and structural complexity. However, to our knowledge, the scaffold content and structural diversity of fungal secondary metabolites have never been studied. Herein, the scaffold diversity of 223 fungal metabolites was measured and compared to the diversity of approved drugs and commercial libraries for HTS containing natural, synthetic, and semi-synthetic compounds. In addition, the global diversity of the fungal isolates was assessed and compared to other reference data sets using Consensus Diversity Plots, a chemoinformatic tool recently developed. It was concluded that fungal secondary metabolites are cyclic systems with few ramifications and more diverse than the commercial libraries with natural products and semi-synthetic compounds. The fungal metabolites data set was one of the most structurally diverse, containing a large proportion of different and unique scaffolds not found in the other compound data sets including ChEMBL. Therefore, fungal metabolites offer a rich source of molecules suited for identifying diverse candidates for drug discovery.
Collapse
Affiliation(s)
- Mariana González-Medina
- Departamento de Farmacia, Facultad de Química, Universidad Nacional Autónoma de MéxicoMexico, Mexico
| | - John R Owen
- High-Performance Computing Research Group, ECIT Institute, Northern Ireland Science ParkBelfast, UK
| | - Tamam El-Elimat
- Department of Medicinal Chemistry and Pharmacognosy, Faculty of Pharmacy, Jordan University of Science and TechnologyIrbid, Jordan
| | | | - Nicholas H Oberlies
- Department of Chemistry and Biochemistry, University of North Carolina at GreensboroGreensboro, NC, USA
| | - Mario Figueroa
- Departamento de Farmacia, Facultad de Química, Universidad Nacional Autónoma de MéxicoMexico, Mexico
| | - José L Medina-Franco
- Departamento de Farmacia, Facultad de Química, Universidad Nacional Autónoma de MéxicoMexico, Mexico
| |
Collapse
|
50
|
Awale M, Probst D, Reymond JL. WebMolCS: A Web-Based Interface for Visualizing Molecules in Three-Dimensional Chemical Spaces. J Chem Inf Model 2017; 57:643-649. [PMID: 28316236 DOI: 10.1021/acs.jcim.6b00690] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The concept of chemical space provides a convenient framework to analyze large collections of molecules by placing them in property spaces where distances represent similarities. Here we report webMolCS, a new type of web-based interface visualizing up to 5000 user-defined molecules in six different three-dimensional (3D) chemical spaces obtained by principal component analysis or similarity mapping of multidimensional property spaces describing composition (MQN: 42D molecular quantum numbers, SMIfp: 34D SMILES fingerprint), shapes and pharmacophores (APfp: 20D atom pair fingerprint, Xfp: 55D category extended atom pair fingerprint), and substructures (Sfp: 1024D binary substructure fingerprint, ECfp4:1024D extended connectivity fingerprint). Each molecule is shown as a sphere, and its structure appears on mouse over. The sphere is color-coded by similarity to the first compound in the list, by the list rank, or by a user-defined value, which reveals the relationship between any property encoded by these values and structural similarities. WebMolCS is freely available at www.gdb.unibe.ch .
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Daniel Probst
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|