1
|
Akgüller Ö, Balcı MA, Cioca G. Clustering Molecules at a Large Scale: Integrating Spectral Geometry with Deep Learning. Molecules 2024; 29:3902. [PMID: 39202980 PMCID: PMC11357287 DOI: 10.3390/molecules29163902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Revised: 08/14/2024] [Accepted: 08/14/2024] [Indexed: 09/03/2024] Open
Abstract
This study conducts an in-depth analysis of clustering small molecules using spectral geometry and deep learning techniques. We applied a spectral geometric approach to convert molecular structures into triangulated meshes and used the Laplace-Beltrami operator to derive significant geometric features. By examining the eigenvectors of these operators, we captured the intrinsic geometric properties of the molecules, aiding their classification and clustering. The research utilized four deep learning methods: Deep Belief Network, Convolutional Autoencoder, Variational Autoencoder, and Adversarial Autoencoder, each paired with k-means clustering at different cluster sizes. Clustering quality was evaluated using the Calinski-Harabasz and Davies-Bouldin indices, Silhouette Score, and standard deviation. Nonparametric tests were used to assess the impact of topological descriptors on clustering outcomes. Our results show that the DBN + k-means combination is the most effective, particularly at lower cluster counts, demonstrating significant sensitivity to structural variations. This study highlights the potential of integrating spectral geometry with deep learning for precise and efficient molecular clustering.
Collapse
Affiliation(s)
- Ömer Akgüller
- Faculty of Science, Department of Mathematics, Mugla Sitki Kocman University, Muğla 48000, Turkey;
| | - Mehmet Ali Balcı
- Faculty of Science, Department of Mathematics, Mugla Sitki Kocman University, Muğla 48000, Turkey;
| | - Gabriela Cioca
- Faculty of Medicine, Preclinical Department, Lucian Blaga University of Sibiu, 550024 Sibiu, Romania;
| |
Collapse
|
2
|
Emonts J, Buyel J. An overview of descriptors to capture protein properties - Tools and perspectives in the context of QSAR modeling. Comput Struct Biotechnol J 2023; 21:3234-3247. [PMID: 38213891 PMCID: PMC10781719 DOI: 10.1016/j.csbj.2023.05.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/23/2023] [Accepted: 05/23/2023] [Indexed: 01/13/2024] Open
Abstract
Proteins are important ingredients in food and feed, they are the active components of many pharmaceutical products, and they are necessary, in the form of enzymes, for the success of many technical processes. However, production can be challenging, especially when using heterologous host cells such as bacteria to express and assemble recombinant mammalian proteins. The manufacturability of proteins can be hindered by low solubility, a tendency to aggregate, or inefficient purification. Tools such as in silico protein engineering and models that predict separation criteria can overcome these issues but usually require the complex shape and surface properties of proteins to be represented by a small number of quantitative numeric values known as descriptors, as similarly used to capture the features of small molecules. Here, we review the current status of protein descriptors, especially for application in quantitative structure activity relationship (QSAR) models. First, we describe the complexity of proteins and the properties that descriptors must accommodate. Then we introduce descriptors of shape and surface properties that quantify the global and local features of proteins. Finally, we highlight the current limitations of protein descriptors and propose strategies for the derivation of novel protein descriptors that are more informative.
Collapse
Affiliation(s)
- J. Emonts
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Germany
| | - J.F. Buyel
- University of Natural Resources and Life Sciences, Vienna (BOKU), Department of Biotechnology (DBT), Institute of Bioprocess Science and Engineering (IBSE), Muthgasse 18, 1190 Vienna, Austria
- Institute for Molecular Biotechnology, Worringerweg 1, RWTH Aachen University, 52074 Aachen, Germany
| |
Collapse
|
3
|
Xiouras C, Cameli F, Quilló GL, Kavousanakis ME, Vlachos DG, Stefanidis GD. Applications of Artificial Intelligence and Machine Learning Algorithms to Crystallization. Chem Rev 2022; 122:13006-13042. [PMID: 35759465 DOI: 10.1021/acs.chemrev.2c00141] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Artificial intelligence and specifically machine learning applications are nowadays used in a variety of scientific applications and cutting-edge technologies, where they have a transformative impact. Such an assembly of statistical and linear algebra methods making use of large data sets is becoming more and more integrated into chemistry and crystallization research workflows. This review aims to present, for the first time, a holistic overview of machine learning and cheminformatics applications as a novel, powerful means to accelerate the discovery of new crystal structures, predict key properties of organic crystalline materials, simulate, understand, and control the dynamics of complex crystallization process systems, as well as contribute to high throughput automation of chemical process development involving crystalline materials. We critically review the advances in these new, rapidly emerging research areas, raising awareness in issues such as the bridging of machine learning models with first-principles mechanistic models, data set size, structure, and quality, as well as the selection of appropriate descriptors. At the same time, we propose future research at the interface of applied mathematics, chemistry, and crystallography. Overall, this review aims to increase the adoption of such methods and tools by chemists and scientists across industry and academia.
Collapse
Affiliation(s)
- Christos Xiouras
- Chemical Process R&D, Crystallization Technology Unit, Janssen R&D, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Fabio Cameli
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, Delaware 19716, United States
| | - Gustavo Lunardon Quilló
- Chemical Process R&D, Crystallization Technology Unit, Janssen R&D, Turnhoutseweg 30, 2340 Beerse, Belgium.,Chemical and BioProcess Technology and Control, Department of Chemical Engineering, Faculty of Engineering Technology, KU Leuven, Gebroeders de Smetstraat 1, 9000 Ghent, Belgium
| | - Mihail E Kavousanakis
- School of Chemical Engineering, National Technical University of Athens, Heroon Polytechniou 9, 15780 Zografou, Greece
| | - Dionisios G Vlachos
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, Delaware 19716, United States
| | - Georgios D Stefanidis
- School of Chemical Engineering, National Technical University of Athens, Heroon Polytechniou 9, 15780 Zografou, Greece.,Laboratory for Chemical Technology, Ghent University; Tech Lane Ghent Science Park 125, B-9052 Ghent, Belgium
| |
Collapse
|
4
|
Kyrylchuk A, Kravets I, Cherednichenko A, Tararina V, Kapeliukha A, Dudenko D, Protopopov M. Creation of targeted compound libraries based on 3D shape recognition. Mol Divers 2022; 27:939-949. [PMID: 35608807 DOI: 10.1007/s11030-022-10447-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 04/19/2022] [Indexed: 11/30/2022]
Abstract
In the emerging field of drug discovery, rapid virtual screening methods become extremely valuable, especially when dealing with ultra-large databases of organic small bioactive molecules. In this work, we present a fast, computationally resource-efficient, and simple workflow for screening targeted compound libraries generated from ultra-large virtual chemical space. This workflow aims to find compounds with similar molecular 3D shapes with reference ones, and at the same time to expand chemical diversity and to identify new and potentially active scaffolds. This pipeline ensures the enrichment of the generated libraries with novel chemotypes. Also, it was shown that delicate tailoring of the physicochemical parameters of the search set ensures that all library compounds will possess desired property distributions. A visual inspection has shown that found structures bind to the receptor in the same way as the reference ones. Using our screening workflow, we have created a number of conventional protein-targeted libraries: the GPCRs Targeted Library (531 K compounds) and the Protein Kinases Targeted Library (113 K compounds). The described pipeline and scripts are freely accessible at: https://github.com/ChemSpace-LLC/usrcat_sim .
Collapse
Affiliation(s)
- Andrii Kyrylchuk
- Chemspace LLC, Kiev, Ukraine.,Institute of Organic Chemistry, National Academy of Sciences, Kiev, Ukraine
| | - Iryna Kravets
- Chemspace LLC, Kiev, Ukraine.,Taras Shevchenko National University of Kyiv, Kiev, Ukraine
| | - Anton Cherednichenko
- Chemspace LLC, Kiev, Ukraine.,Taras Shevchenko National University of Kyiv, Kiev, Ukraine
| | - Valentyna Tararina
- Chemspace LLC, Kiev, Ukraine.,Taras Shevchenko National University of Kyiv, Kiev, Ukraine
| | - Anna Kapeliukha
- Chemspace LLC, Kiev, Ukraine.,Taras Shevchenko National University of Kyiv, Kiev, Ukraine
| | | | | |
Collapse
|
5
|
Wassenaar PNH, Rorije E, Vijver MG, Peijnenburg WJGM. ZZS
similarity tool: The online tool for similarity screening to identify chemicals of potential concern. J Comput Chem 2022; 43:1042-1052. [PMID: 35403727 PMCID: PMC9322536 DOI: 10.1002/jcc.26859] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 02/15/2022] [Accepted: 03/22/2022] [Indexed: 11/16/2022]
Abstract
Screening and prioritization of chemicals is essential to ensure that available evaluation capacity is invested in those substances that are of highest concern. We, therefore, recently developed structural similarity models that evaluate the structural similarity of substances with unknown properties to known Substances of Very High Concern (SVHC), which could be an indication of comparable effects. In the current study the performance of these models is improved by (1) separating known SVHCs in more specific subgroups, (2) (re‐)optimizing similarity models for the various SVHC‐subgroups, and (3) improving interpretability of the predicted outcomes by providing a confidence score. The improvements are directly incorporated in a freely accessible web‐based tool, named the ZZS similarity tool: https://rvszoeksysteem.rivm.nl/ZzsSimilarityTool. Accordingly, this tool can be used by risk assessors, academia and industrial partners to screen and prioritize chemicals for further action and evaluation within varying frameworks, and could support the identification of tomorrow's substances of concern.
Collapse
Affiliation(s)
- Pim N. H. Wassenaar
- National Institute for Public Health and the Environment (RIVM) Bilthoven The Netherlands
- Institute of Environmental Sciences (CML) Leiden University Leiden The Netherlands
| | - Emiel Rorije
- National Institute for Public Health and the Environment (RIVM) Bilthoven The Netherlands
| | - Martina G. Vijver
- Institute of Environmental Sciences (CML) Leiden University Leiden The Netherlands
| | - Willie J. G. M. Peijnenburg
- National Institute for Public Health and the Environment (RIVM) Bilthoven The Netherlands
- Institute of Environmental Sciences (CML) Leiden University Leiden The Netherlands
| |
Collapse
|
6
|
Santana K, do Nascimento LD, Lima e Lima A, Damasceno V, Nahum C, Braga RC, Lameira J. Applications of Virtual Screening in Bioprospecting: Facts, Shifts, and Perspectives to Explore the Chemo-Structural Diversity of Natural Products. Front Chem 2021; 9:662688. [PMID: 33996755 PMCID: PMC8117418 DOI: 10.3389/fchem.2021.662688] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 02/25/2021] [Indexed: 12/22/2022] Open
Abstract
Natural products are continually explored in the development of new bioactive compounds with industrial applications, attracting the attention of scientific research efforts due to their pharmacophore-like structures, pharmacokinetic properties, and unique chemical space. The systematic search for natural sources to obtain valuable molecules to develop products with commercial value and industrial purposes remains the most challenging task in bioprospecting. Virtual screening strategies have innovated the discovery of novel bioactive molecules assessing in silico large compound libraries, favoring the analysis of their chemical space, pharmacodynamics, and their pharmacokinetic properties, thus leading to the reduction of financial efforts, infrastructure, and time involved in the process of discovering new chemical entities. Herein, we discuss the computational approaches and methods developed to explore the chemo-structural diversity of natural products, focusing on the main paradigms involved in the discovery and screening of bioactive compounds from natural sources, placing particular emphasis on artificial intelligence, cheminformatics methods, and big data analyses.
Collapse
Affiliation(s)
- Kauê Santana
- Instituto de Biodiversidade, Universidade Federal do Oeste do Pará, Santarém, Brazil
| | | | - Anderson Lima e Lima
- Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| | - Vinícius Damasceno
- Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| | - Claudio Nahum
- Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| | | | - Jerônimo Lameira
- Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, Brazil
| |
Collapse
|
7
|
Zarnecka J, Lukac I, Messham SJ, Hussin A, Coppola F, Enoch SJ, Dossetter AG, Griffen EJ, Leach AG. Mapping Ligand-Shape Space for Protein-Ligand Systems: Distinguishing Key-in-Lock and Hand-in-Glove Proteins. J Chem Inf Model 2021; 61:1859-1874. [PMID: 33755448 DOI: 10.1021/acs.jcim.1c00089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Many of the recently developed methods to study the shape of molecules permit one conformation of one molecule to be compared to another conformation of the same or a different molecule: a relative shape. Other methods provide an absolute description of the shape of a conformation that does not rely on comparisons or overlays. Any absolute description of shape can be used to generate a self-organizing map (shape map) that places all molecular shapes relative to one another; in the studies reported here, the shape fingerprint and ultrafast shape recognition methods are employed to create such maps. In the shape maps, molecules that are near one another have similar shapes, and the maps for the 102 targets in the DUD-E set have been generated. By examining the distribution of actives in comparison with their physical-property-matched decoys, we show that the proteins of key-in-lock type (relatively rigid receptor and ligand) can be distinguished from those that are more of a hand-in-glove type (more flexible receptor and ligand). These are linked to known differences in protein flexibility and binding-site size.
Collapse
Affiliation(s)
- Joanna Zarnecka
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, U.K
| | - Iva Lukac
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, U.K
| | - Stephen J Messham
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, U.K
| | - Alhusein Hussin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, U.K
| | - Francesco Coppola
- Division of Pharmacy and Optometry, School of Health Sciences, University of Manchester, Stopford Building, Oxford Road, Manchester M13 9PT, U.K
| | - Steven J Enoch
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, U.K
| | | | - Edward J Griffen
- MedChemica Limited, Biohub, Mereside, Alderley Park, Macclesfield SK10 4TG, U.K
| | - Andrew G Leach
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, U.K.,MedChemica Limited, Biohub, Mereside, Alderley Park, Macclesfield SK10 4TG, U.K.,Division of Pharmacy and Optometry, School of Health Sciences, University of Manchester, Stopford Building, Oxford Road, Manchester M13 9PT, U.K
| |
Collapse
|
8
|
Poitevin F, Kushner A, Li X, Dao Duc K. Structural Heterogeneities of the Ribosome: New Frontiers and Opportunities for Cryo-EM. Molecules 2020; 25:E4262. [PMID: 32957592 PMCID: PMC7570653 DOI: 10.3390/molecules25184262] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 09/11/2020] [Accepted: 09/15/2020] [Indexed: 12/18/2022] Open
Abstract
The extent of ribosomal heterogeneity has caught increasing interest over the past few years, as recent studies have highlighted the presence of structural variations of the ribosome. More precisely, the heterogeneity of the ribosome covers multiple scales, including the dynamical aspects of ribosomal motion at the single particle level, specialization at the cellular and subcellular scale, or evolutionary differences across species. Upon solving the ribosome atomic structure at medium to high resolution, cryogenic electron microscopy (cryo-EM) has enabled investigating all these forms of heterogeneity. In this review, we present some recent advances in quantifying ribosome heterogeneity, with a focus on the conformational and evolutionary variations of the ribosome and their functional implications. These efforts highlight the need for new computational methods and comparative tools, to comprehensively model the continuous conformational transition pathways of the ribosome, as well as its evolution. While developing these methods presents some important challenges, it also provides an opportunity to extend our interpretation and usage of cryo-EM data, which would more generally benefit the study of molecular dynamics and evolution of proteins and other complexes.
Collapse
Affiliation(s)
- Frédéric Poitevin
- Department of LCLS Data Analytics, Linac Coherent Light Source, SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA;
| | - Artem Kushner
- Department of Mathematics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada; (A.K.); (X.L.)
- Department of Computer Science, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Xinpei Li
- Department of Mathematics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada; (A.K.); (X.L.)
- Department of Computer Science, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Khanh Dao Duc
- Department of Mathematics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada; (A.K.); (X.L.)
- Department of Computer Science, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Department of Zoology, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| |
Collapse
|
9
|
Pracht P, Bohle F, Grimme S. Automated exploration of the low-energy chemical space with fast quantum chemical methods. Phys Chem Chem Phys 2020; 22:7169-7192. [PMID: 32073075 DOI: 10.1039/c9cp06869d] [Citation(s) in RCA: 972] [Impact Index Per Article: 243.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
We propose and discuss an efficient scheme for the in silico sampling for parts of the molecular chemical space by semiempirical tight-binding methods combined with a meta-dynamics driven search algorithm. The focus of this work is set on the generation of proper thermodynamic ensembles at a quantum chemical level for conformers, but similar procedures for protonation states, tautomerism and non-covalent complex geometries are also discussed. The conformational ensembles consisting of all significantly populated minimum energy structures normally form the basis of further, mostly DFT computational work, such as the calculation of spectra or macroscopic properties. By using basic quantum chemical methods, electronic effects or possible bond breaking/formation are accounted for and a very reasonable initial energetic ranking of the candidate structures is obtained. Due to the huge computational speedup gained by the fast low-cost quantum chemical methods, overall short computation times even for systems with hundreds of atoms (typically drug-sized molecules) are achieved. Furthermore, specialized applications, such as sampling with implicit solvation models or constrained conformational sampling for transition-states, metal-, surface-, or noncovalently bound complexes are discussed, opening many possible applications in modern computational chemistry and drug discovery. The procedures have been implemented in a freely available computer code called CREST, that makes use of the fast and reliable GFNn-xTB methods.
Collapse
Affiliation(s)
- Philipp Pracht
- Mulliken Center for Theoretical Chemistry, Universität Bonn, Beringstr. 4, 53115 Bonn, Germany.
| | | | | |
Collapse
|