1
|
Leniak A, Pietruś W, Kurczab R. From NMR to AI: Designing a Novel Chemical Representation to Enhance Machine Learning Predictions of Physicochemical Properties. J Chem Inf Model 2024; 64:3302-3321. [PMID: 38529877 DOI: 10.1021/acs.jcim.3c02039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
A novel approach to the utilization of nuclear magnetic resonance (NMR) spectroscopy data in the prediction of logD through machine learning algorithms is shown. In the analysis, a data set of 754 chemical compounds, organized into 30 clusters, was evaluated using advanced machine learning models, such as Support Vector Regression (SVR), Gradient Boosting, and AdaBoost, and comprehensive validation and testing methods were employed, including 10-fold cross-validation, bootstrapping, and leave-one-out. The study revealed the superior performance of the Bucket Integration method for dimensionality reduction, consistently yielding the lowest root mean square error (RMSE) across all data sets and normalization schemes. The SVR prediction models demonstrated remarkable computational efficiency and low cost, with the best RMSE value reaching 0.66. Our best model outperformed existing tools like JChem Suite's logD Predictor (0.91) and CplogD (1.27), and a comparison with traditional molecular representations yielded a comparable RMSE (0.50), emphasizing the robustness of our NMR data integration. The widespread availability of NMR data in pharmaceutical and industrial research presents an untapped resource for predictive modeling, highlighting the need for accessible methodologies like ours that complement the analytical toolbox beyond conventional 2D approaches. Our approach, designed to leverage the rich spatial data from NMR spectroscopy, provides additional insights and enriches drug discovery and computational chemistry with a freely accessible tool.
Collapse
Affiliation(s)
- Arkadiusz Leniak
- Department of Medicinal Chemistry, Celon Pharma S.A., ul. Marymoncka 15, 05-152 Kazuń Nowy, Poland
| | - Wojciech Pietruś
- Department of Medicinal Chemistry, Celon Pharma S.A., ul. Marymoncka 15, 05-152 Kazuń Nowy, Poland
- Department of Medicinal Chemistry, Maj Institute of Pharmacology, Polish Academy of Sciences, Smetna 12, 31-343 Kraków, Poland
| | - Rafał Kurczab
- Department of Medicinal Chemistry, Maj Institute of Pharmacology, Polish Academy of Sciences, Smetna 12, 31-343 Kraków, Poland
| |
Collapse
|
2
|
Sun H, Xue X, Liu X, Hu HY, Deng Y, Wang X. Cross-Modal Retrieval Between 13C NMR Spectra and Structures Based on Focused Libraries. Anal Chem 2024; 96:5763-5770. [PMID: 38564366 DOI: 10.1021/acs.analchem.3c04294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Library matching by comparing carbon-13 nuclear magnetic resonance (13C NMR) spectra with spectral data in the library is a crucial method for compound identification. In our previous paper, we introduced a deep contrastive learning system called CReSS, which used a library that contained more structures. However, CReSS has two limitations: there were no unknown structures in the library, and a redundant library reduces the structure-elucidation accuracy. Herein, we replaced the oversize traditional libraries with focused libraries containing a small number of molecules. A previously generative model, CMGNet, was used to generate focused libraries for CReSS. The combined model achieved a Top-10 accuracy of 54.03% when tested on 6,471 13C NMR spectra. In comparison, CReSS with a random reference structure library achieved an accuracy of only 9.17%. Furthermore, to expand the advantages of the focused libraries, we proposed SAmpRNN, which is a recurrent neural network (RNN). With the large focused library amplified by SAmpRNN, the structure-identification accuracy of the model increased in 70.0% of the 30 random example cases. In general, cross-modal retrieval between 13C NMR spectra and structures based on focused libraries (CFLS) achieved high accuracy and provided more accurate candidate structures than traditional libraries for compound identification.
Collapse
Affiliation(s)
- Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China
- Beijing Key Laboratory of Active Substances Discovery and Druggability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China
| | - Xi Xue
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China
| | - Hai-Yu Hu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China
- Beijing Key Laboratory of Active Substances Discovery and Druggability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China
| |
Collapse
|
3
|
Hu G, Qiu M. Machine learning-assisted structure annotation of natural products based on MS and NMR data. Nat Prod Rep 2023; 40:1735-1753. [PMID: 37519196 DOI: 10.1039/d3np00025g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/01/2023]
Abstract
Covering: up to March 2023Machine learning (ML) has emerged as a popular tool for analyzing the structures of natural products (NPs). This review presents a summary of the recent advancements in ML-assisted mass spectrometry (MS) and nuclear magnetic resonance (NMR) data analysis to establish the chemical structures of NPs. First, ML-based MS/MS analyses that rely on library matching are discussed, which involves the utilization of ML algorithms to calculate similarity, predict the MS/MS fragments, and form molecular fingerprint. Then, ML assisted MS/MS structural annotation without library matching is reviewed. Furthermore, the cases of ML algorithms in assisting structural studies of NPs based on NMR are discussed from four perspectives: NMR prediction, functional group identification, structural categorization and quantum chemical calculation. Finally, the review concludes with a discussion of the challenges and the trends associated with the structural establishment of NPs based on ML algorithms.
Collapse
Affiliation(s)
- Guilin Hu
- State Key Laboratory of Phytochemistry and Plant Resources in West China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China.
- University of the Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | - Minghua Qiu
- State Key Laboratory of Phytochemistry and Plant Resources in West China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China.
- University of the Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| |
Collapse
|
4
|
Dias-Silva JR, Oliveira VM, Sanches-Neto FO, Wilhelms RZ, Queiroz Júnior LHK. SpectraFP: a new spectra-based descriptor to aid in cheminformatics, molecular characterization and search algorithm applications. Phys Chem Chem Phys 2023. [PMID: 37378661 DOI: 10.1039/d3cp00734k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2023]
Abstract
We have developed an algorithm to generate a new spectra-based descriptor, called SpectraFP, in order to digitalize the chemical shifts of 13C NMR spectra, as well as potentially important data from other spectroscopic techniques. This descriptor is a fingerprint vector with defined sizes and values of 0 and 1, with the ability to correct chemical shift fluctuations. To explore the applicability of SpectraFP, we outlined two application scenarios: (1) the prediction of six functional groups by machine learning (ML) models and (2) the search for structures based on the similarity between the query spectrum and spectra in an experimental database, both in the SpectraFP format. For each functional group, five ML models were built and validated following the OECD principles: internal and external validations, applicability domains, and mechanistic interpretations. All the models resulted in high goodness-of-fit for the training and test sets with MCC respectively between 0.626 and 0.909 and 0.653 and 0.917, and J ranging from 0.812 to 0.957 and 0.825 to 0.961. Using the SHAP (SHapley Additive exPlanations) approach, the mechanistic interpretations of the models were explored; the results indicated that the most important variables for model decision making were coherent with the expected chemical shifts for each functional group. Several metrics, including Tanimoto, geometric, arithmetic, and Tversky, can be used to perform the similarity calculation for the search algorithm. This algorithm can also incorporate additional variables, such as the correction parameter and the difference between the amount of signals in the query spectrum and the database spectra, while preserving its high performance speed. We hope that our descriptor can link information from spectroscopic/spectrometric techniques with ML models to expand the possibilities in understanding the field of cheminformatics. All databases and algorithms developed for this work are open sources and freely accessible.
Collapse
Affiliation(s)
| | - Vitor M Oliveira
- Instituto de Química, Universidade Federal de Goiás, Goiânia, Brazil.
| | - Flávio O Sanches-Neto
- Instituto de Química, Universidade Federal de Goiás, Goiânia, Brazil.
- Instituto Federal de Educação, Ciência e Tecnologia de Goiás, Valparaíso de Goiás, Goiania, GO, CEP: 72876-601, Brazil
| | - Renan Z Wilhelms
- Instituto de Química, Universidade Federal de Goiás, Goiânia, Brazil.
| | | |
Collapse
|
5
|
Dandawate M, Choudhury R, Krishna GR, Reddy DS. Total Synthesis and Absolute Configuration Determination of the α-Glycosidase Inhibitor (3 S,4 R)-6-Acetyl-3-hydroxy-2,2-dimethylchroman-4-yl ( Z)-2-Methylbut-2-enoate from Ageratina grandifolia. JOURNAL OF NATURAL PRODUCTS 2023. [PMID: 37316456 DOI: 10.1021/acs.jnatprod.3c00236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Herein, we report the first total synthesis of α-glycosidase inhibitor (3R, 4S)-6-acetyl-3-hydroxy-2,2-dimethylchroman-4-yl (Z)-2-methylbut-2-enoate as well as its enantiomer. Our synthesis confirms the chromane structure separately proposed by Navarro-Vazquez and Mata, on the basis of DFT computations. Furthermore, our synthesis allowed us to determine the absolute configuration of the natural compound as (3S, 4R) and not (3R, 4S).
Collapse
Affiliation(s)
- Monica Dandawate
- Organic Chemistry Division, CSIR-National Chemical Laboratory, Dr. Homi Bhabha Road, Pune 411008, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- CSIR-Indian Institute of Integrative Medicine, Canal Road, Jammu 180001, India
| | - Rahul Choudhury
- Organic Chemistry Division, CSIR-National Chemical Laboratory, Dr. Homi Bhabha Road, Pune 411008, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Gamidi Rama Krishna
- Organic Chemistry Division, CSIR-National Chemical Laboratory, Dr. Homi Bhabha Road, Pune 411008, India
| | - D Srinivasa Reddy
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- CSIR-Indian Institute of Integrative Medicine, Canal Road, Jammu 180001, India
- CSIR-Indian Institute of Chemical Technology (CSIR-IICT), Hyderabad 500007, India
| |
Collapse
|
6
|
Gaudêncio SP, Bayram E, Lukić Bilela L, Cueto M, Díaz-Marrero AR, Haznedaroglu BZ, Jimenez C, Mandalakis M, Pereira F, Reyes F, Tasdemir D. Advanced Methods for Natural Products Discovery: Bioactivity Screening, Dereplication, Metabolomics Profiling, Genomic Sequencing, Databases and Informatic Tools, and Structure Elucidation. Mar Drugs 2023; 21:md21050308. [PMID: 37233502 DOI: 10.3390/md21050308] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/11/2023] [Accepted: 05/12/2023] [Indexed: 05/27/2023] Open
Abstract
Natural Products (NP) are essential for the discovery of novel drugs and products for numerous biotechnological applications. The NP discovery process is expensive and time-consuming, having as major hurdles dereplication (early identification of known compounds) and structure elucidation, particularly the determination of the absolute configuration of metabolites with stereogenic centers. This review comprehensively focuses on recent technological and instrumental advances, highlighting the development of methods that alleviate these obstacles, paving the way for accelerating NP discovery towards biotechnological applications. Herein, we emphasize the most innovative high-throughput tools and methods for advancing bioactivity screening, NP chemical analysis, dereplication, metabolite profiling, metabolomics, genome sequencing and/or genomics approaches, databases, bioinformatics, chemoinformatics, and three-dimensional NP structure elucidation.
Collapse
Affiliation(s)
- Susana P Gaudêncio
- Associate Laboratory i4HB-Institute for Health and Bioeconomy, NOVA School of Science and Technology, NOVA University Lisbon, 2819-516 Caparica, Portugal
- UCIBIO-Applied Molecular Biosciences Unit, Chemistry Department, NOVA School of Science and Technology, NOVA University of Lisbon, 2819-516 Caparica, Portugal
| | - Engin Bayram
- Institute of Environmental Sciences, Room HKC-202, Hisar Campus, Bogazici University, Bebek, Istanbul 34342, Turkey
| | - Lada Lukić Bilela
- Department of Biology, Faculty of Science, University of Sarajevo, 71000 Sarajevo, Bosnia and Herzegovina
| | - Mercedes Cueto
- Instituto de Productos Naturales y Agrobiología-CSIC, 38206 La Laguna, Spain
| | - Ana R Díaz-Marrero
- Instituto de Productos Naturales y Agrobiología-CSIC, 38206 La Laguna, Spain
- Instituto Universitario de Bio-Orgánica (IUBO), Universidad de La Laguna, 38206 La Laguna, Spain
| | - Berat Z Haznedaroglu
- Institute of Environmental Sciences, Room HKC-202, Hisar Campus, Bogazici University, Bebek, Istanbul 34342, Turkey
| | - Carlos Jimenez
- CICA- Centro Interdisciplinar de Química e Bioloxía, Departamento de Química, Facultade de Ciencias, Universidade da Coruña, 15071 A Coruña, Spain
| | - Manolis Mandalakis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, HCMR Thalassocosmos, 71500 Gournes, Crete, Greece
| | - Florbela Pereira
- LAQV, REQUIMTE, Chemistry Department, NOVA School of Science and Technology, NOVA University of Lisbon, 2819-516 Caparica, Portugal
| | - Fernando Reyes
- Fundación MEDINA, Avda. del Conocimiento 34, 18016 Armilla, Spain
| | - Deniz Tasdemir
- GEOMAR Centre for Marine Biotechnology (GEOMAR-Biotech), Research Unit Marine Natural Products Chemistry, GEOMAR Helmholtz Centre for Ocean Research Kiel, Am Kiel-Kanal 44, 24106 Kiel, Germany
- Faculty of Mathematics and Natural Science, Kiel University, Christian-Albrechts-Platz 4, 24118 Kiel, Germany
| |
Collapse
|
7
|
Baxter JR, Holland DC, Gavranich B, Nicolle D, Hayton JB, Avery VM, Carroll AR. NMR Fingerprints of Formyl Phloroglucinol Meroterpenoids and Their Application to the Investigation of Eucalyptus gittinsii subsp. gittinsii. JOURNAL OF NATURAL PRODUCTS 2023; 86:1317-1334. [PMID: 37171174 DOI: 10.1021/acs.jnatprod.3c00139] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
NMR fingerprints provide powerful tools to identify natural products in complex mixtures. Principal component analysis and machine learning using 1H and 13C NMR data, alongside structural information from 180 published formyl phloroglucinols, have generated diagnostic NMR fingerprints to categorize subclasses within this group. This resulted in the reassignment of 167 NMR chemical shifts ascribed to 44 compounds. Three pyrano-diformyl phloroglucinols, euglobal In-1 and psiguadiols E and G, contained 1H and 13C NMR data inconsistent with their predicted phloroglucinol subclass. Subsequent reinterpretation of their 2D NMR data combined with DFT 13C NMR chemical shift and ECD calculations led to their structure revisions. Direct covariance processing of HMBC data permitted 1H resonances for individual compounds in mixtures to be associated, and analysis of their 1H/13C HMBC correlations using the fingerprint tool further classified components into phloroglucinol subclasses. NMR fingerprinting HMBC data obtained for six eucalypt flower extracts identified three subclasses of pyrano-acyl-formyl phloroglucinols from Eucalyptus gittinsii subsp. gittinsii. New, eucalteretial F and (+)-eucalteretial B, and known, (-)-euglobal VII and eucalrobusone C, compounds, each belonging to predicted subclasses, were isolated and characterized. Staphylococcus aureus and Plasmodium falciparum screening revealed eucalrobusone C as the most potent antiplasmodial formyl phloroglucinol to date.
Collapse
Affiliation(s)
- James R Baxter
- School of Environment and Science, Griffith University, Gold Coast, Qld 4222, Australia
| | - Darren C Holland
- School of Environment and Science, Griffith University, Gold Coast, Qld 4222, Australia
| | - Brody Gavranich
- School of Environment and Science, Griffith University, Gold Coast, Qld 4222, Australia
| | - Dean Nicolle
- Currency Creek Arboretum, PO Box 808, Melrose Park, SA 5039, Australia
| | - Joshua B Hayton
- School of Environment and Science, Griffith University, Gold Coast, Qld 4222, Australia
| | - Vicky M Avery
- Griffith Institute for Drug Discovery, Griffith University, Brisbane, Qld 4111, Australia
- Discovery Biology, Griffith University, Brisbane, QLD 4111, Australia
| | - Anthony R Carroll
- School of Environment and Science, Griffith University, Gold Coast, Qld 4222, Australia
- Griffith Institute for Drug Discovery, Griffith University, Brisbane, Qld 4111, Australia
| |
Collapse
|
8
|
Yao L, Yang M, Song J, Yang Z, Sun H, Shi H, Liu X, Ji X, Deng Y, Wang X. Conditional Molecular Generation Net Enables Automated Structure Elucidation Based on 13C NMR Spectra and Prior Knowledge. Anal Chem 2023; 95:5393-5401. [PMID: 36926883 DOI: 10.1021/acs.analchem.2c05817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Structure elucidation of unknown compounds based on nuclear magnetic resonance (NMR) remains a challenging problem in both synthetic organic and natural product chemistry. Library matching has been an efficient method to assist structure elucidation. However, it is limited by the coverage of libraries. In addition, prior knowledge such as molecular fragments is neglected. To solve the problem, we propose a conditional molecular generation net (CMGNet) to allow input of multiple sources of information. CMGNet not only uses 13C NMR spectrum data as input but molecular formulas and fragments of molecules are also employed as input conditions. Our model applies large-scale pretraining for molecular understanding and fine-tuning on two NMR spectral data sets of different granularity levels to accommodate structure elucidation tasks. CMGNet generates structures based on 13C NMR data, molecular formula, and fragment information, with a recovery rate of 94.17% in the top 10 recommendations. In addition, the generative model performed well in the generation of various classes of compounds and in the structural revision task. CMGNet has a deep understanding of molecular connectivities from 13C NMR, molecular formula, and fragments, paving the way for a new paradigm of deep learning-assisted inverse problem-solving.
Collapse
Affiliation(s)
- Lin Yao
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Jianfei Song
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Zhuo Yang
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hui Shi
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Xiangyang Ji
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China.,Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China.,CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| |
Collapse
|
9
|
Zhao JX, Yue JM. Frontier studies on natural products: moving toward paradigm shifts. Sci China Chem 2023. [DOI: 10.1007/s11426-022-1512-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
10
|
Ogawa K, Sakamoto D, Hosoki R. Computer Science Technology in Natural Products Research: A Review of Its Applications and Implications. Chem Pharm Bull (Tokyo) 2023; 71:486-494. [PMID: 37394596 DOI: 10.1248/cpb.c23-00039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Computational approaches to drug development are rapidly growing in popularity and have been used to produce significant results. Recent developments in information science have expanded databases and chemical informatics knowledge relating to natural products. Natural products have long been well-studied, and a large number of unique structures and remarkable active substances have been reported. Analyzing accumulated natural product knowledge using emerging computational science techniques is expected to yield more new discoveries. In this article, we discuss the current state of natural product research using machine learning. The basic concepts and frameworks of machine learning are summarized. Natural product research that utilizes machine learning is described in terms of the exploration of active compounds, automatic compound design, and application to spectral data. In addition, efforts to develop drugs for intractable diseases will be addressed. Lastly, we discuss key considerations for applying machine learning in this field. This paper aims to promote progress in natural product research by presenting the current state of computational science and chemoinformatics approaches in terms of its applications, strengths, limitations, and implications for the field.
Collapse
Affiliation(s)
- Keiko Ogawa
- Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University
| | - Daiki Sakamoto
- Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University
| | - Rumiko Hosoki
- Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University
| |
Collapse
|
11
|
Sahayasheela VJ, Lankadasari MB, Dan VM, Dastager SG, Pandian GN, Sugiyama H. Artificial intelligence in microbial natural product drug discovery: current and emerging role. Nat Prod Rep 2022; 39:2215-2230. [PMID: 36017693 PMCID: PMC9931531 DOI: 10.1039/d2np00035k] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Covering: up to the end of 2022Microorganisms are exceptional sources of a wide array of unique natural products and play a significant role in drug discovery. During the golden era, several life-saving antibiotics and anticancer agents were isolated from microbes; moreover, they are still widely used. However, difficulties in the isolation methods and repeated discoveries of the same molecules have caused a setback in the past. Artificial intelligence (AI) has had a profound impact on various research fields, and its application allows the effective performance of data analyses and predictions. With the advances in omics, it is possible to obtain a wealth of information for the identification, isolation, and target prediction of secondary metabolites. In this review, we discuss drug discovery based on natural products from microorganisms with the help of AI and machine learning.
Collapse
Affiliation(s)
- Vinodh J Sahayasheela
- Department of Chemistry, Graduate School of Science, Kyoto University, Kitashirakawa-Oiwakecho, Sakyo-Ku, Kyoto 606-8502, Japan.
| | - Manendra B Lankadasari
- Thoracic Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Vipin Mohan Dan
- Microbiology Division, Jawaharlal Nehru Tropical Botanic Garden and Research Institute, Thiruvananthapuram, Kerala, India
| | - Syed G Dastager
- NCIM Resource Centre, Division of Biochemical Sciences, CSIR - National Chemical Laboratory, Pune, Maharashtra, India
| | - Ganesh N Pandian
- Institute for Integrated Cell-Material Sciences (WPI-iCeMS), Kyoto University, Yoshida-Ushinomaecho, Sakyo-Ku, Kyoto 606-8501, Japan
| | - Hiroshi Sugiyama
- Department of Chemistry, Graduate School of Science, Kyoto University, Kitashirakawa-Oiwakecho, Sakyo-Ku, Kyoto 606-8502, Japan.
- Institute for Integrated Cell-Material Sciences (WPI-iCeMS), Kyoto University, Yoshida-Ushinomaecho, Sakyo-Ku, Kyoto 606-8501, Japan
| |
Collapse
|
12
|
Moshawih S, Goh HP, Kifli N, Idris AC, Yassin H, Kotra V, Goh KW, Liew KB, Ming LC. Synergy between machine learning and natural products cheminformatics: Application to the lead discovery of anthraquinone derivatives. Chem Biol Drug Des 2022; 100:185-217. [PMID: 35490393 DOI: 10.1111/cbdd.14062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 04/15/2022] [Accepted: 04/23/2022] [Indexed: 11/28/2022]
Abstract
Cheminformatics utilizing machine learning (ML) techniques have opened up a new horizon in drug discovery. This is owing to vast chemical space expansion with rocketing numbers of expected hits and lead compounds that match druggable macromolecular targets, in particular from natural compounds. Due to the natural products' (NP) structural complexity, uniqueness, and diversity, they could occupy a bigger space in pharmaceuticals, allowing the industry to pursue more selective leads in the nanomolar range of binding affinity. ML is an essential part of each step of the drug design pipeline, such as target prediction, compound library preparation, and lead optimization. Notably, molecular mechanic and dynamic simulations, induced docking, and free energy perturbations are essential in predicting best binding poses, binding free energy values, and molecular mechanics force fields. Those applications have leveraged from artificial intelligence (AI), which decreases the computational costs required for such costly simulations. This review aimed to describe chemical space and compound libraries related to NPs. High-throughput screening utilized for fractionating NPs and high-throughput virtual screening and their strategies, and significance, are reviewed. Particular emphasis was given to AI approaches, ML tools, algorithms, and techniques, especially in drug discovery of macrocyclic compounds and approaches in computer-aided and ML-based drug discovery. Anthraquinone derivatives were discussed as a source of new lead compounds that can be developed using ML tools for diverse medicinal uses such as cancer, infectious diseases, and metabolic disorders. Furthermore, the power of principal component analysis in understanding relevant protein conformations, and molecular modeling of protein-ligand interaction were also presented. Apart from being a concise reference for cheminformatics, this review is a useful text to understand the application of ML-based algorithms to molecular dynamics simulation and in silico absorption, distribution, metabolism, excretion, and toxicity prediction.
Collapse
Affiliation(s)
- Said Moshawih
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Hui Poh Goh
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Nurolaini Kifli
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Azam Che Idris
- Faculty of Integrated Technologies, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Hayati Yassin
- Faculty of Integrated Technologies, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Vijay Kotra
- Faculty of Pharmacy, Quest International University, Perak, Malaysia
| | - Khang Wen Goh
- Faculty of Data Science and Information Technology, INTI International University, Nilai, Malaysia
| | - Kai Bin Liew
- Faculty of Pharmacy, University of Cyberjaya, Cyberjaya, Malaysia
| | - Long Chiau Ming
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| |
Collapse
|
13
|
Yokoyama D, Suzuki S, Asakura T, Kikuchi J. Chemometric Analysis of NMR Spectra and Machine Learning to Investigate Membrane Fouling. ACS OMEGA 2022; 7:12654-12660. [PMID: 35474825 PMCID: PMC9025983 DOI: 10.1021/acsomega.1c06891] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 03/02/2022] [Indexed: 05/26/2023]
Abstract
Efficient membrane filtration requires the understanding of the membrane foulants and the functional properties of different membrane types in water purification. In this study, dead-end filtration of aquaculture system effluents was performed and the membrane foulants were investigated via nuclear magnetic resonance (NMR) spectroscopy. Several machine learning models (Random Forest; RF, Extreme Gradient Boosting; XGBoost, Support Vector Machine; SVM, and Neural Network; NN) were constructed, one to predict the maximum transmembrane pressure, for revealing the chemical compounds causing fouling, and the other to classify the membrane materials based on chemometric analysis of NMR spectra, for determining their effect on the properties of the different membrane types tested. Especially, RF models exhibited high accuracy; the important chemical shifts observed in both the regression and classification models suggested that the proportional patterns of sugars and proteins are key factors in the fouling progress and the classification of membrane types. Therefore, the proposed strategy of chemometric analysis of NMR spectra is suitable for membrane research, which aims at investigating comprehensively the fouling phenomenon and how the foulants and environmental conditions vary according to the filtration systems.
Collapse
Affiliation(s)
- Daiki Yokoyama
- RIKEN
Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Graduate
School of Medical Life Science, Yokohama
City University, 1-7-29
Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Sosei Suzuki
- Graduate
School of Medical Life Science, Yokohama
City University, 1-7-29
Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Taiga Asakura
- RIKEN
Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Graduate
School of Medical Life Science, Yokohama
City University, 1-7-29
Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Jun Kikuchi
- RIKEN
Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Graduate
School of Medical Life Science, Yokohama
City University, 1-7-29
Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Graduate
School of Bioagricultural Sciences, Nagoya
University, 1 Furo-cho, Chikusa-ku, Nagoya, Aichi 464-0810, Japan
| |
Collapse
|
14
|
A Brief Review of Machine Learning-Based Bioactive Compound Research. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12062906] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Bioactive compounds are often used as initial substances for many therapeutic agents. In recent years, both theoretical and practical innovations in hardware-assisted and fast-evolving machine learning (ML) have made it possible to identify desired bioactive compounds in chemical spaces, such as those in natural products (NPs). This review introduces how machine learning approaches can be used for the identification and evaluation of bioactive compounds. It also provides an overview of recent research trends in machine learning-based prediction and the evaluation of bioactive compounds by listing real-world examples along with various input data. In addition, several ML-based approaches to identify specific bioactive compounds for cardiovascular and metabolic diseases are described. Overall, these approaches are important for the discovery of novel bioactive compounds and provide new insights into the machine learning basis for various traditional applications of bioactive compound-related research.
Collapse
|
15
|
Accurate predictions of drugs aqueous solubility via deep learning tools. J Mol Struct 2022. [DOI: 10.1016/j.molstruc.2021.131562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
16
|
Yang Z, Song J, Yang M, Yao L, Zhang J, Shi H, Ji X, Deng Y, Wang X. Cross-Modal Retrieval between 13C NMR Spectra and Structures for Compound Identification Using Deep Contrastive Learning. Anal Chem 2021; 93:16947-16955. [PMID: 34841854 DOI: 10.1021/acs.analchem.1c04307] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Library matching using carbon-13 nuclear magnetic resonance (13C NMR) spectra has been a popular method adopted in compound identification systems. However, the usability of existing approaches has been restricted as enlarging a library containing both a chemical structure and spectrum is a costly and time-consuming process. Therefore, we propose a fundamentally different, novel approach to match 13C NMR spectra directly against a molecular structure library. We develop a cross-modal retrieval between spectrum and structure (CReSS) system using deep contrastive learning, which allows us to search a molecular structure library using the 13C NMR spectrum of a compound. In the test of searching 41,494 13C NMR spectra against a reference structure library containing 10.4 million compounds, CReSS reached a recall@10 accuracy of 91.64% and a processing speed of 0.114 s per query spectrum. When further incorporating a filter with a molecular weight tolerance of 5 Da, CReSS achieved a new remarkable recall@10 of 98.39%. Furthermore, CReSS has potential in detecting scaffolds of novel structures and demonstrates great performance for the task of structural revision. CReSS is built and developed to bridge the gap between 13C NMR spectra and structures and could be generally applicable in compound identification.
Collapse
Affiliation(s)
- Zhuo Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences. Beijing 100050, China
| | - Jianfei Song
- Institute of Artificial Intelligence Research, Qihoo of Beijing Science and Technology Co. Ltd., Beijing 100015, China
| | - Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences. Beijing 100050, China
| | - Lin Yao
- Institute of Artificial Intelligence Research, Qihoo of Beijing Science and Technology Co. Ltd., Beijing 100015, China
| | - Jiahua Zhang
- Institute of Artificial Intelligence Research, Qihoo of Beijing Science and Technology Co. Ltd., Beijing 100015, China
| | - Hui Shi
- The Pharmacy Informatics Branch of China International Exchange and Promotive Association for Medical and Health Care, Beijing 100005, China
| | - Xiangyang Ji
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yafeng Deng
- Institute of Artificial Intelligence Research, Qihoo of Beijing Science and Technology Co. Ltd., Beijing 100015, China.,Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences. Beijing 100050, China
| |
Collapse
|
17
|
Kim HW, Wang M, Leber CA, Nothias LF, Reher R, Kang KB, van der Hooft JJJ, Dorrestein PC, Gerwick WH, Cottrell GW. NPClassifier: A Deep Neural Network-Based Structural Classification Tool for Natural Products. JOURNAL OF NATURAL PRODUCTS 2021; 84:2795-2807. [PMID: 34662515 PMCID: PMC8631337 DOI: 10.1021/acs.jnatprod.1c00399] [Citation(s) in RCA: 112] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Indexed: 05/04/2023]
Abstract
Computational approaches such as genome and metabolome mining are becoming essential to natural products (NPs) research. Consequently, a need exists for an automated structure-type classification system to handle the massive amounts of data appearing for NP structures. An ideal semantic ontology for the classification of NPs should go beyond the simple presence/absence of chemical substructures, but also include the taxonomy of the producing organism, the nature of the biosynthetic pathway, and/or their biological properties. Thus, a holistic and automatic NP classification framework could have considerable value to comprehensively navigate the relatedness of NPs, and especially so when analyzing large numbers of NPs. Here, we introduce NPClassifier, a deep-learning tool for the automated structural classification of NPs from their counted Morgan fingerprints. NPClassifier is expected to accelerate and enhance NP discovery by linking NP structures to their underlying properties.
Collapse
Affiliation(s)
- Hyun Woo Kim
- Center
for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, United States
| | - Mingxun Wang
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
- Ometa
Laboratories LLC, San Diego, California 92121, United States
| | - Christopher A. Leber
- Center
for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, United States
| | - Louis-Félix Nothias
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Raphael Reher
- Center
for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, United States
- Institute
of Pharmacy Martin-Luther-University Halle-Wittenberg, Universitätsplatz 10, 06108 Halle (Saale), Germany
| | - Kyo Bin Kang
- Research
Institute of Pharmaceutical Sciences, College of Pharmacy, Sookmyung Women’s University, Seoul 04310, Korea
| | | | - Pieter C. Dorrestein
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - William H. Gerwick
- Center
for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, United States
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Garrison W. Cottrell
- Department
of Computer Science and Engineering, University
of California, San Diego, La Jolla, California 92093, United States
| |
Collapse
|
18
|
Cech NB, Medema MH, Clardy J. Benefiting from big data in natural products: importance of preserving foundational skills and prioritizing data quality. Nat Prod Rep 2021; 38:1947-1953. [PMID: 34734219 PMCID: PMC8597707 DOI: 10.1039/d1np00061f] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Indexed: 12/02/2022]
Abstract
Systematic, large-scale, studies at the genomic, metabolomic, and functional level have transformed the natural product sciences. Improvements in technology and reduction in cost for obtaining spectroscopic, chromatographic, and genomic data coupled with the creation of readily accessible curated and functionally annotated data sets have altered the practices of virtually all natural product research laboratories. Gone are the days when the natural products researchers were expected to devote themselves exclusively to the isolation, purification, and structure elucidation of small molecules. We now also engage with big data in taxonomic, genomic, proteomic, and/or metabolomic collections, and use these data to generate and test hypotheses. While the oft stated aim for the use of large-scale -omics data in the natural products sciences is to achieve a rapid increase in the rate of discovery of new drugs, this has not yet come to pass. At the same time, new technologies have provided unexpected opportunities for natural products chemists to ask and answer new and different questions. With this viewpoint, we discuss the evolution of big data as a part of natural products research and provide a few examples of how discoveries have been enabled by access to big data. We also draw attention to some of the limitations in our existing engagement with large datasets and consider what would be necessary to overcome them.
Collapse
Affiliation(s)
- Nadja B Cech
- Chemistry, University of North Carolina Greensboro, USA.
| | | | - Jon Clardy
- Biological Chemistry and Molecular Pharmacology, Harvard Medical School, USA.
| |
Collapse
|
19
|
Beniddir MA, Kang KB, Genta-Jouve G, Huber F, Rogers S, van der Hooft JJJ. Advances in decomposing complex metabolite mixtures using substructure- and network-based computational metabolomics approaches. Nat Prod Rep 2021; 38:1967-1993. [PMID: 34821250 PMCID: PMC8597898 DOI: 10.1039/d1np00023c] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Indexed: 12/13/2022]
Abstract
Covering: up to the end of 2020Recently introduced computational metabolome mining tools have started to positively impact the chemical and biological interpretation of untargeted metabolomics analyses. We believe that these current advances make it possible to start decomposing complex metabolite mixtures into substructure and chemical class information, thereby supporting pivotal tasks in metabolomics analysis including metabolite annotation, the comparison of metabolic profiles, and network analyses. In this review, we highlight and explain key tools and emerging strategies covering 2015 up to the end of 2020. The majority of these tools aim at processing and analyzing liquid chromatography coupled to mass spectrometry fragmentation data. We start with defining what substructures are, how they relate to molecular fingerprints, and how recognizing them helps to decompose complex mixtures. We continue with chemical classes that are based on the presence or absence of particular molecular scaffolds and/or functional groups and are thus intrinsically related to substructures. We discuss novel tools to mine substructures, annotate chemical compound classes, and create mass spectral networks from metabolomics data and demonstrate them using two case studies. We also review and speculate about the opportunities that NMR spectroscopy-based metabolome mining of complex metabolite mixtures offers to discover substructures and chemical classes. Finally, we will describe the main benefits and limitations of the current tools and strategies that rely on them, and our vision on how this exciting field can develop toward repository-scale-sized metabolomics analyses. Complementary sources of structural information from genomics analyses and well-curated taxonomic records are also discussed. Many research fields such as natural products discovery, pharmacokinetic and drug metabolism studies, and environmental metabolomics increasingly rely on untargeted metabolomics to gain biochemical and biological insights. The here described technical advances will benefit all those metabolomics disciplines by transforming spectral data into knowledge that can answer biological questions.
Collapse
Affiliation(s)
- Mehdi A Beniddir
- Université Paris-Saclay, CNRS, BioCIS, 5 rue J.-B Clément, 92290 Châtenay-Malabry, France
| | - Kyo Bin Kang
- Research Institute of Pharmaceutical Sciences, College of Pharmacy, Sookmyung Women's University, Seoul 04310, Republic of Korea
| | - Grégory Genta-Jouve
- Laboratoire de Chimie-Toxicologie Analytique et Cellulaire (C-TAC), UMR CNRS 8038, CiTCoM, Université de Paris, 4, Avenue de l'Observatoire, 75006, Paris, France
- Laboratoire Ecologie, Evolution, Interactions des Systèmes Amazoniens (LEEISA), USR 3456, Université De Guyane, CNRS Guyane, 275 Route de Montabo, 97334 Cayenne, French Guiana, France
| | - Florian Huber
- Netherlands eScience Center, 1098 XG Amsterdam, The Netherlands
| | - Simon Rogers
- School of Computing Science, University of Glasgow, Glasgow G12 8QQ, UK
| | | |
Collapse
|
20
|
Breitling R, Avbelj M, Bilyk O, Carratore F, Filisetti A, Hanko EKR, Iorio M, Redondo RP, Reyes F, Rudden M, Severi E, Slemc L, Schmidt K, Whittall DR, Donadio S, García AR, Genilloud O, Kosec G, De Lucrezia D, Petković H, Thomas G, Takano E. Synthetic biology approaches to actinomycete strain improvement. FEMS Microbiol Lett 2021; 368:6289918. [PMID: 34057181 PMCID: PMC8195692 DOI: 10.1093/femsle/fnab060] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 05/28/2021] [Indexed: 12/17/2022] Open
Abstract
Their biochemical versatility and biotechnological importance make actinomycete bacteria attractive targets for ambitious genetic engineering using the toolkit of synthetic biology. But their complex biology also poses unique challenges. This mini review discusses some of the recent advances in synthetic biology approaches from an actinomycete perspective and presents examples of their application to the rational improvement of industrially relevant strains.
Collapse
Affiliation(s)
- Rainer Breitling
- Department of Chemistry, Manchester Institute of Biotechnology, Manchester Synthetic Biology Research Centre SYNBIOCHEM, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK
| | - Martina Avbelj
- Biotechnical Faculty, University of Ljubljana, Jamnikarjeva 101, 1000 Ljubljana, Slovenia
| | - Oksana Bilyk
- Department of Chemistry, Manchester Institute of Biotechnology, Manchester Synthetic Biology Research Centre SYNBIOCHEM, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK
| | - Francesco Del Carratore
- Department of Chemistry, Manchester Institute of Biotechnology, Manchester Synthetic Biology Research Centre SYNBIOCHEM, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK
| | | | - Erik K R Hanko
- Department of Chemistry, Manchester Institute of Biotechnology, Manchester Synthetic Biology Research Centre SYNBIOCHEM, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK
| | | | | | - Fernando Reyes
- Fundación MEDINA, Centro de Excelencia en Investigación de Medicamentos Innovadores en Andalucía, Avenida del Conocimiento 34, Parque Tecnologico de Ciencias de la Salud, 18016 Armilla, Granada, Spain
| | - Michelle Rudden
- Department of Biology, University of York, Wentworth Way, York, YO10 5DD, UK
| | | | - Lucija Slemc
- Biotechnical Faculty, University of Ljubljana, Jamnikarjeva 101, 1000 Ljubljana, Slovenia
| | - Kamila Schmidt
- Department of Chemistry, Manchester Institute of Biotechnology, Manchester Synthetic Biology Research Centre SYNBIOCHEM, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK
| | - Dominic R Whittall
- Department of Chemistry, Manchester Institute of Biotechnology, Manchester Synthetic Biology Research Centre SYNBIOCHEM, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK
| | | | | | - Olga Genilloud
- Fundación MEDINA, Centro de Excelencia en Investigación de Medicamentos Innovadores en Andalucía, Avenida del Conocimiento 34, Parque Tecnologico de Ciencias de la Salud, 18016 Armilla, Granada, Spain
| | - Gregor Kosec
- Acies Bio d.o.o., Tehnološki Park 21, 1000, Ljubljana, Slovenia
| | - Davide De Lucrezia
- Explora Biotech Srl, Doulix business unit, Via Torino 107, 30133 Venice, Italy
| | - Hrvoje Petković
- Biotechnical Faculty, University of Ljubljana, Jamnikarjeva 101, 1000 Ljubljana, Slovenia
| | - Gavin Thomas
- Department of Biology, University of York, Wentworth Way, York, YO10 5DD, UK
| | - Eriko Takano
- Corresponding author: Department of Chemistry, Manchester Institute of Biotechnology, Manchester Synthetic Biology Research Centre SYNBIOCHEM, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK. E-mail:
| |
Collapse
|
21
|
Jeon J, Kang S, Kim HU. Predicting biochemical and physiological effects of natural products from molecular structures using machine learning. Nat Prod Rep 2021; 38:1954-1966. [PMID: 34047331 DOI: 10.1039/d1np00016k] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Covering: 2016 to 2021Discovery of novel natural products has been greatly facilitated by advances in genome sequencing, genome mining and analytical techniques. As a result, the volume of data for natural products has increased over the years, which started to serve as ingredients for developing machine learning models. In the past few years, a number of machine learning models have been developed to examine various aspects of a molecule by effectively processing its molecular structure. Understanding of the biological effects of natural products can benefit from such machine learning approaches. In this context, this Highlight reviews recent studies on machine learning models developed to infer various biological effects of molecules. A particular attention is paid to molecular featurization, or computational representation of a molecular structure, which is an essential process during the development of a machine learning model. Technical challenges associated with the use of machine learning for natural products are further discussed.
Collapse
Affiliation(s)
- Junhyeok Jeon
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
| | - Seongmo Kang
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea. and KAIST Institute for Artificial Intelligence, KAIST, Daejeon 34141, Republic of Korea and BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon 34141, Republic of Korea
| |
Collapse
|
22
|
Santana K, do Nascimento LD, Lima e Lima A, Damasceno V, Nahum C, Braga RC, Lameira J. Applications of Virtual Screening in Bioprospecting: Facts, Shifts, and Perspectives to Explore the Chemo-Structural Diversity of Natural Products. Front Chem 2021; 9:662688. [PMID: 33996755 PMCID: PMC8117418 DOI: 10.3389/fchem.2021.662688] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 02/25/2021] [Indexed: 12/22/2022] Open
Abstract
Natural products are continually explored in the development of new bioactive compounds with industrial applications, attracting the attention of scientific research efforts due to their pharmacophore-like structures, pharmacokinetic properties, and unique chemical space. The systematic search for natural sources to obtain valuable molecules to develop products with commercial value and industrial purposes remains the most challenging task in bioprospecting. Virtual screening strategies have innovated the discovery of novel bioactive molecules assessing in silico large compound libraries, favoring the analysis of their chemical space, pharmacodynamics, and their pharmacokinetic properties, thus leading to the reduction of financial efforts, infrastructure, and time involved in the process of discovering new chemical entities. Herein, we discuss the computational approaches and methods developed to explore the chemo-structural diversity of natural products, focusing on the main paradigms involved in the discovery and screening of bioactive compounds from natural sources, placing particular emphasis on artificial intelligence, cheminformatics methods, and big data analyses.
Collapse
Affiliation(s)
- Kauê Santana
- Instituto de Biodiversidade, Universidade Federal do Oeste do Pará, Santarém, Brazil
| | | | - Anderson Lima e Lima
- Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| | - Vinícius Damasceno
- Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| | - Claudio Nahum
- Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| | | | - Jerônimo Lameira
- Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, Brazil
| |
Collapse
|
23
|
Specht T, Münnemann K, Hasse H, Jirasek F. Automated Methods for Identification and Quantification of Structural Groups from Nuclear Magnetic Resonance Spectra Using Support Vector Classification. J Chem Inf Model 2021; 61:143-155. [PMID: 33405926 DOI: 10.1021/acs.jcim.0c01186] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Nuclear magnetic resonance (NMR) spectroscopy is a powerful tool for elucidating the structure of unknown components and the composition of liquid mixtures. However, these tasks are often tedious and challenging, especially if complex samples are considered. In this work, we introduce automated methods for the identification and quantification of structural groups in pure components and mixtures from NMR spectra using support vector classification. As input, a 1H NMR spectrum and a 13C NMR spectrum of the liquid sample (pure component or mixture) that is to be analyzed is needed. The first method, called group-identification method, yields qualitative information on the structural groups in the sample. The second method, called group-assignment method, provides the basis for a quantitative analysis of the sample by identifying the structural groups and assigning them to signals in the 13C NMR spectrum of the sample; quantitative information can then be obtained with readily available tools by simple integration. We demonstrate that both methods, after being trained to NMR spectra of nearly 1000 pure components, yield excellent predictions for pure components that were not part of the training set as well as mixtures. The structural group-specific information obtained with the presented methods can, e.g., be used in combination with thermodynamic group-contribution methods to predict fluid properties of unknown samples.
Collapse
Affiliation(s)
- Thomas Specht
- Laboratory of Engineering Thermodynamics (LTD), TU Kaiserslautern, Erwin-Schrödinger-Straße 44, 67663 Kaiserslautern, Germany
| | - Kerstin Münnemann
- Laboratory of Engineering Thermodynamics (LTD), TU Kaiserslautern, Erwin-Schrödinger-Straße 44, 67663 Kaiserslautern, Germany
| | - Hans Hasse
- Laboratory of Engineering Thermodynamics (LTD), TU Kaiserslautern, Erwin-Schrödinger-Straße 44, 67663 Kaiserslautern, Germany
| | - Fabian Jirasek
- Laboratory of Engineering Thermodynamics (LTD), TU Kaiserslautern, Erwin-Schrödinger-Straße 44, 67663 Kaiserslautern, Germany
| |
Collapse
|
24
|
Zhang R, Li X, Zhang X, Qin H, Xiao W. Machine learning approaches for elucidating the biological effects of natural products. Nat Prod Rep 2021; 38:346-361. [PMID: 32869826 DOI: 10.1039/d0np00043d] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Covering: 2000 to 2020 Machine learning (ML) is an efficient tool for the prediction of bioactivity and the study of structure-activity relationships. Over the past decade, an emerging trend for combining these approaches with the study of natural products (NPs) has developed in order to manage the challenge of the discovery of bioactive NPs. In the present review, we will introduce the basic principles and protocols for using the ML approach to investigate the bioactivity of NPs, citing a series of practical examples regarding the study of anti-microbial, anti-cancer, and anti-inflammatory NPs, etc. ML algorithms manage a variety of classification and regression problems associated with bioactive NPs, from those that are linear to non-linear and from pure compounds to plant extracts. Inspired by cases reported in the literature and our own experience, a number of key points have been emphasized for reducing modeling errors, including dataset preparation and applicability domain analysis.
Collapse
Affiliation(s)
- Ruihan Zhang
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Xiaoli Li
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Xingjie Zhang
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Huayan Qin
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Weilie Xiao
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| |
Collapse
|
25
|
Medema MH. The year 2020 in natural product bioinformatics: an overview of the latest tools and databases. Nat Prod Rep 2021; 38:301-306. [PMID: 33533785 DOI: 10.1039/d0np00090f] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Covering: 2020 Bioinformatic approaches to document and analyse chemical structures, biosynthetic gene clusters and analytical data play an important role in the study of natural products. Every year, such a large number of new algorithms, tools and databases are released, that it is difficult to keep track of all the latest developments. The aim of this short article is to provide a concise overview of and reference to the major tools, methods and databases that have been released in the past year.
Collapse
Affiliation(s)
- Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
| |
Collapse
|
26
|
Chen Y, Kirchmair J. Cheminformatics in Natural Product-based Drug Discovery. Mol Inform 2020; 39:e2000171. [PMID: 32725781 PMCID: PMC7757247 DOI: 10.1002/minf.202000171] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 07/28/2020] [Indexed: 12/20/2022]
Abstract
This review seeks to provide a timely survey of the scope and limitations of cheminformatics methods in natural product-based drug discovery. Following an overview of data resources of chemical, biological and structural information on natural products, we discuss, among other aspects, in silico methods for (i) data curation and natural products dereplication, (ii) analysis, visualization, navigation and comparison of the chemical space, (iii) quantification of natural product-likeness, (iv) prediction of the bioactivities (virtual screening, target prediction), ADME and safety profiles (toxicity) of natural products, (v) natural products-inspired de novo design and (vi) prediction of natural products prone to cause interference with biological assays. Among the many methods discussed are rule-based, similarity-based, shape-based, pharmacophore-based and network-based approaches, docking and machine learning methods.
Collapse
Affiliation(s)
- Ya Chen
- Center for Bioinformatics (ZBH)Department of Computer ScienceFaculty of MathematicsInformatics and Natural SciencesUniversität Hamburg20146HamburgGermany
| | - Johannes Kirchmair
- Center for Bioinformatics (ZBH)Department of Computer ScienceFaculty of MathematicsInformatics and Natural SciencesUniversität Hamburg20146HamburgGermany
- Department of Pharmaceutical ChemistryFaculty of Life SciencesUniversity of Vienna1090ViennaAustria
| |
Collapse
|
27
|
Gao P, Zhang J, Sun Y, Yu J. Accurate predictions of aqueous solubility of drug molecules via the multilevel graph convolutional network (MGCN) and SchNet architectures. Phys Chem Chem Phys 2020; 22:23766-23772. [PMID: 33063077 DOI: 10.1039/d0cp03596c] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Deep learning based methods have been widely applied to predict various kinds of molecular properties in the pharmaceutical industry with increasingly more success. In this study, we propose two novel models for aqueous solubility predictions, based on the Multilevel Graph Convolutional Network (MGCN) and SchNet architectures, respectively. The advantage of the MGCN lies in the fact that it could extract the graph features of the target molecules directly from the (3D) structural information; therefore, it doesn't need to rely on a lot of intra-molecular descriptors to learn the features, which are of significance for accurate predictions of the molecular properties. The SchNet performs well in modelling the interatomic interactions inside a molecule, and such a deep learning architecture is also capable of extracting structural information and further predicting the related properties. The actual accuracy of these two novel approaches was systematically benchmarked with four different independent datasets. We found that both the MGCN and SchNet models performed well for aqueous solubility predictions. In the future, we believe such promising predictive models will be applicable to enhancing the efficiency of the screening, crystallization and delivery of drug molecules, essentially as a useful tool to promote the development of molecular pharmaceutics.
Collapse
Affiliation(s)
- Peng Gao
- School of Chemistry and Molecular Bioscience, University of Wollongong, NSW 2500, Australia
| | | | | | | |
Collapse
|
28
|
Capecchi A, Reymond JL. Assigning the Origin of Microbial Natural Products by Chemical Space Map and Machine Learning. Biomolecules 2020; 10:E1385. [PMID: 32998475 PMCID: PMC7600738 DOI: 10.3390/biom10101385] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 09/22/2020] [Accepted: 09/25/2020] [Indexed: 12/20/2022] Open
Abstract
Microbial natural products (NPs) are an important source of drugs, however, their structural diversity remains poorly understood. Here we used our recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin. To visualize NPAtlas by MAP4 similarity, we used the dimensionality reduction method tree map (TMAP). The resulting interactive map organizes molecules by physico-chemical properties and compound families such as peptides and glycosides. Remarkably, the map separates bacterial and fungal NPs from one another, revealing that these two compound families are intrinsically different despite their related biosynthetic pathways. We used these differences to train a machine learning model capable of distinguishing between NPs of bacterial or fungal origin.
Collapse
Affiliation(s)
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland;
| |
Collapse
|