1
|
Emsley L. Spiers Memorial Lecture: NMR crystallography. Faraday Discuss 2024. [PMID: 39405130 PMCID: PMC11477664 DOI: 10.1039/d4fd00151f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Accepted: 09/03/2024] [Indexed: 10/19/2024]
Abstract
Chemical function is directly related to the spatial arrangement of atoms. Consequently, the determination of atomic-level three-dimensional structures has transformed molecular and materials science over the past 60 years. In this context, solid-state NMR has emerged to become the method of choice for atomic-level characterization of complex materials in powder form. In the following we present an overview of current methods for chemical shift driven NMR crystallography, illustrated with applications to complex materials.
Collapse
Affiliation(s)
- Lyndon Emsley
- Institut des Sciences et Ingénierie Chimiques, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland.
| |
Collapse
|
2
|
Gao X, Baimacheva N, Aires-de-Sousa J. Exploring Molecular Heteroencoders with Latent Space Arithmetic: Atomic Descriptors and Molecular Operators. Molecules 2024; 29:3969. [PMID: 39203047 PMCID: PMC11357237 DOI: 10.3390/molecules29163969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 08/04/2024] [Accepted: 08/06/2024] [Indexed: 09/03/2024] Open
Abstract
A variational heteroencoder based on recurrent neural networks, trained with SMILES linear notations of molecular structures, was used to derive the following atomic descriptors: delta latent space vectors (DLSVs) obtained from the original SMILES of the whole molecule and the SMILES of the same molecule with the target atom replaced. Different replacements were explored, namely, changing the atomic element, replacement with a character of the model vocabulary not used in the training set, or the removal of the target atom from the SMILES. Unsupervised mapping of the DLSV descriptors with t-distributed stochastic neighbor embedding (t-SNE) revealed a remarkable clustering according to the atomic element, hybridization, atomic type, and aromaticity. Atomic DLSV descriptors were used to train machine learning (ML) models to predict 19F NMR chemical shifts. An R2 of up to 0.89 and mean absolute errors of up to 5.5 ppm were obtained for an independent test set of 1046 molecules with random forests or a gradient-boosting regressor. Intermediate representations from a Transformer model yielded comparable results. Furthermore, DLSVs were applied as molecular operators in the latent space: the DLSV of a halogenation (H→F substitution) was summed to the LSVs of 4135 new molecules with no fluorine atom and decoded into SMILES, yielding 99% of valid SMILES, with 75% of the SMILES incorporating fluorine and 56% of the structures incorporating fluorine with no other structural change.
Collapse
Affiliation(s)
- Xinyue Gao
- Faculty of Sciences, Université Paris Cité, 75013 Paris, France
| | - Natalia Baimacheva
- Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081 Strasbourg, France
| | - Joao Aires-de-Sousa
- LAQV and REQUIMTE, Chemistry Department, NOVA School of Science and Technology, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal
| |
Collapse
|
3
|
Stumpe B, Stuhrmann N, Jostmeier A, Marschner B. Urban cemeteries: The forgotten but powerful cooling islands. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 934:173167. [PMID: 38761931 DOI: 10.1016/j.scitotenv.2024.173167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 04/23/2024] [Accepted: 05/10/2024] [Indexed: 05/20/2024]
Abstract
Urban parks play a key role in UHI mitigation. However, the role of other prominent types of urban green infrastructure has not been comprehensively studied. Thus, the main objective of this study was to evaluate the role of cemeteries and allotments as cooling islands compared to the well-studied park areas. We assessed the LST of cemeteries, allotments and parks based on Landsat 8 TM images across the five largest German cities during summertime. Random forest regressions explain the LST spatial variability of the different urban green spaces (UGS) with spectral indices (NDVI, NDMI, NDBaI) as well as with tree characteristics (tree type, tree age, trunk circumferences, trunk height or canopy density). As a result, allotments were identified as the hottest UGS with the city means varying between 23.1 and 26.9 °C, since they contain a relatively high proportion of sealed surfaces. The LST spatial variability of allotment gardens was best explained by the NDVI indicating that fields with a higher percentage of flowering shrubs and trees reveal lower LST values than those covered by annual crops. Interestingly, cemeteries were characterized as the coolest UGS, with city means between 20.4 and 24.7 °C. Despite their high proportion of sealed surfaces, they are dominated by old trees resulting in intensive transpiration processes. Parks show heterogeneous LST patterns which could not be systematically explained by spectral indices due to the variability of park functionality and shape. Compared to parks, the tree-covered areas of cemeteries have a higher cooling potential since cemeteries as cultural heritage sites are well-protected allowing old tree growth with intensive transpiration. These findings underline the relevance of cemeteries as cooling islands and deepen the understanding of the role of tree characteristics in the cooling process.
Collapse
Affiliation(s)
- Britta Stumpe
- Department of General Geography/Human-Environment Research, Institute of Geography, University of Wuppertal, 42119 Wuppertal, Germany.
| | - Niklas Stuhrmann
- Department of General Geography/Human-Environment Research, Institute of Geography, University of Wuppertal, 42119 Wuppertal, Germany
| | - Anna Jostmeier
- Department of General Geography/Human-Environment Research, Institute of Geography, University of Wuppertal, 42119 Wuppertal, Germany
| | - Bernd Marschner
- Department of Soil Science and Soil Ecology, Geographical Institute, Ruhr-University Bochum, Universitaetsstr. 150, 44801 Bochum, Germany
| |
Collapse
|
4
|
Han C, Zhang D, Xia S, Zhang Y. Accurate Prediction of NMR Chemical Shifts: Integrating DFT Calculations with Three-Dimensional Graph Neural Networks. J Chem Theory Comput 2024; 20:5250-5258. [PMID: 38842505 PMCID: PMC11209944 DOI: 10.1021/acs.jctc.4c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/25/2024] [Accepted: 05/29/2024] [Indexed: 06/07/2024]
Abstract
Computer prediction of NMR chemical shifts plays an increasingly important role in molecular structure assignment and elucidation for organic molecule studies. Density functional theory (DFT) and gauge-including atomic orbital (GIAO) have established a framework to predict NMR chemical shifts but often at a significant computational expense with a limited prediction accuracy. Recent advancements in deep learning methods, especially graph neural networks (GNNs), have shown promise in improving the accuracy of predicting experimental chemical shifts, either by using 2D molecular topological features or 3D conformational representation. This study presents a new 3D GNN model to predict 1H and 13C chemical shifts, CSTShift, that combines atomic features with DFT-calculated shielding tensor descriptors, capturing both isotropic and anisotropic shielding effects. Utilizing the NMRShiftDB2 data set and conducting DFT optimization and GIAO calculations at the B3LYP/6-31G(d) level, we prepared the NMRShiftDB2-DFT data set of high-quality 3D structures and shielding tensors with corresponding experimentally measured 1H and 13C chemical shifts. The developed CSTShift models achieve the state-of-the-art prediction performance on both the NMRShiftDB2-DFT test data set and external CHESHIRE data set. Further case studies on identifying correct structures from two groups of constitutional isomers show its capability for structure assignment and elucidation. The source code and data are accessible at https://yzhang.hpc.nyu.edu/IMA.
Collapse
Affiliation(s)
- Chao Han
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Dongdong Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
5
|
Ai WJ, Li J, Cao D, Liu S, Yuan YY, Li Y, Tan GS, Xu KP, Yu X, Kang F, Zou ZX, Wang WX. A Very Deep Graph Convolutional Network for 13C NMR Chemical Shift Calculations with Density Functional Theory Level Performance for Structure Assignment. JOURNAL OF NATURAL PRODUCTS 2024; 87:743-752. [PMID: 38359467 DOI: 10.1021/acs.jnatprod.3c00862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]
Abstract
Nuclear magnetic resonance (NMR) chemical shift calculations are powerful tools for structure elucidation and have been extensively employed in both natural product and synthetic chemistry. However, density functional theory (DFT) NMR chemical shift calculations are usually time-consuming, while fast data-driven methods often lack reliability, making it challenging to apply them to computationally intensive tasks with a high requirement on quality. Herein, we have constructed a 54-layer-deep graph convolutional network for 13C NMR chemical shift calculations, which achieved high accuracy with low time-cost and performed competitively with DFT NMR chemical shift calculations on structure assignment benchmarks. Our model utilizes a semiempirical method, GFN2-xTB, and is compatible with a broad variety of organic systems, including those composed of hundreds of atoms or elements ranging from H to Rn. We used this model to resolve the controversial J/K ring junction problem of maitotoxin, which is the largest whole molecule assigned by NMR calculations to date. This model has been developed into user-friendly software, providing a useful tool for routine rapid structure validation and assignation as well as a new approach to elucidate the large structures that were previously unsuitable for NMR calculations.
Collapse
Affiliation(s)
- Wen-Jing Ai
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, People's Republic of China
| | - Jing Li
- Department of Pharmacy, National Clinical Research Center for Geriatric Disorder, in Xiangya Hospital, Central South University, Changsha, Hunan 410013, People's Republic of China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, People's Republic of China
| | - Shao Liu
- Department of Pharmacy, National Clinical Research Center for Geriatric Disorder, in Xiangya Hospital, Central South University, Changsha, Hunan 410013, People's Republic of China
| | - Yi-Yun Yuan
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, People's Republic of China
| | - Yan Li
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, People's Republic of China
| | - Gui-Shan Tan
- Department of Pharmacy, National Clinical Research Center for Geriatric Disorder, in Xiangya Hospital, Central South University, Changsha, Hunan 410013, People's Republic of China
| | - Kang-Ping Xu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, People's Republic of China
| | - Xia Yu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, People's Republic of China
| | - Fenghua Kang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, People's Republic of China
| | - Zhen-Xing Zou
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, People's Republic of China
| | - Wen-Xuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, People's Republic of China
- Hunan Prima Drug Research Center Co., Ltd, Hunan Research Center for Drug Safety Evaluation, Hunan Key Laboratory of Pharmacodynamics and Safety Evaluation of New Drugs, Changsha, Hunan 410331, People's Republic of China
| |
Collapse
|
6
|
Leniak A, Pietruś W, Kurczab R. From NMR to AI: Designing a Novel Chemical Representation to Enhance Machine Learning Predictions of Physicochemical Properties. J Chem Inf Model 2024; 64:3302-3321. [PMID: 38529877 DOI: 10.1021/acs.jcim.3c02039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
A novel approach to the utilization of nuclear magnetic resonance (NMR) spectroscopy data in the prediction of logD through machine learning algorithms is shown. In the analysis, a data set of 754 chemical compounds, organized into 30 clusters, was evaluated using advanced machine learning models, such as Support Vector Regression (SVR), Gradient Boosting, and AdaBoost, and comprehensive validation and testing methods were employed, including 10-fold cross-validation, bootstrapping, and leave-one-out. The study revealed the superior performance of the Bucket Integration method for dimensionality reduction, consistently yielding the lowest root mean square error (RMSE) across all data sets and normalization schemes. The SVR prediction models demonstrated remarkable computational efficiency and low cost, with the best RMSE value reaching 0.66. Our best model outperformed existing tools like JChem Suite's logD Predictor (0.91) and CplogD (1.27), and a comparison with traditional molecular representations yielded a comparable RMSE (0.50), emphasizing the robustness of our NMR data integration. The widespread availability of NMR data in pharmaceutical and industrial research presents an untapped resource for predictive modeling, highlighting the need for accessible methodologies like ours that complement the analytical toolbox beyond conventional 2D approaches. Our approach, designed to leverage the rich spatial data from NMR spectroscopy, provides additional insights and enriches drug discovery and computational chemistry with a freely accessible tool.
Collapse
Affiliation(s)
- Arkadiusz Leniak
- Department of Medicinal Chemistry, Celon Pharma S.A., ul. Marymoncka 15, 05-152 Kazuń Nowy, Poland
| | - Wojciech Pietruś
- Department of Medicinal Chemistry, Celon Pharma S.A., ul. Marymoncka 15, 05-152 Kazuń Nowy, Poland
- Department of Medicinal Chemistry, Maj Institute of Pharmacology, Polish Academy of Sciences, Smetna 12, 31-343 Kraków, Poland
| | - Rafał Kurczab
- Department of Medicinal Chemistry, Maj Institute of Pharmacology, Polish Academy of Sciences, Smetna 12, 31-343 Kraków, Poland
| |
Collapse
|
7
|
Daniel DT, Mitra S, Eichel RA, Diddens D, Granwehr J. Machine Learning Isotropic g Values of Radical Polymers. J Chem Theory Comput 2024; 20:2592-2604. [PMID: 38456629 DOI: 10.1021/acs.jctc.3c01252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
Methods for electronic structure computations, such as density functional theory (DFT), are routinely used for the calculation of spectroscopic parameters to establish and validate structure-parameter correlations. DFT calculations, however, are computationally expensive for large systems such as polymers. This work explores the machine learning (ML) of isotropic g values, giso, obtained from electron paramagnetic resonance (EPR) experiments of an organic radical polymer. An ML model based on regression trees is trained on DFT-calculated g values of poly(2,2,6,6-tetramethylpiperidinyloxy-4-yl methacrylate) (PTMA) polymer structures extracted from different time frames of a molecular dynamics trajectory. The DFT-derived g values, gisocalc, for different radical densities of PTMA, are compared against experimentally derived g values obtained from in operando EPR measurements of a PTMA-based organic radical battery. The ML-predicted giso values, gisopred, were compared with gisocalc to evaluate the performance of the model. Mean deviations of gisopred from gisocalc were found to be on the order of 0.0001. Furthermore, a performance evaluation on test structures from a separate MD trajectory indicated that the model is sensitive to the radical density and efficiently learns to predict giso values even for radical densities that were not part of the training data set. Since our trained model can reproduce the changes in giso along the MD trajectory and is sensitive to the extent of equilibration of the polymer structure, it is a promising alternative to computationally more expensive DFT methods, particularly for large systems that cannot be easily represented by a smaller model system.
Collapse
Affiliation(s)
- Davis Thomas Daniel
- Institute of Energy and Climate Research (IEK-9), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
- Institute of Technical and Macromolecular Chemistry, RWTH Aachen University, 52056 Aachen, Germany
| | - Souvik Mitra
- Institute of Physical Chemistry, University of Münster, 48149 Münster, Germany
| | - Rüdiger-A Eichel
- Institute of Energy and Climate Research (IEK-9), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
- Institute of Physical Chemistry, RWTH Aachen University, Aachen 52056, Germany
| | - Diddo Diddens
- Helmholtz Institute Münster (IEK-12), Forschungszentrum Jülich GmbH, 48149 Münster, Germany
| | - Josef Granwehr
- Institute of Energy and Climate Research (IEK-9), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
- Institute of Technical and Macromolecular Chemistry, RWTH Aachen University, 52056 Aachen, Germany
| |
Collapse
|
8
|
Kuhn S, Kolshorn H, Steinbeck C, Schlörer N. Twenty years of nmrshiftdb2: A case study of an open database for analytical chemistry. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2024; 62:74-83. [PMID: 38112483 DOI: 10.1002/mrc.5418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 11/10/2023] [Accepted: 11/10/2023] [Indexed: 12/21/2023]
Abstract
In October 2003, 20 years ago, the open-source and open-content database NMRshiftDB was announced. Since then, the database, renamed as nmrshiftdb2 later, has been continuously available and is one of the longer-running projects in the field of open data in chemistry. After 20 years, we evaluate the success of the project and present lessons learnt for similar projects.
Collapse
Affiliation(s)
- Stefan Kuhn
- Institute of Computer Science, University of Tartu Tartu Estonia and School of Computer Science and Informatics, De Montfort University, Leicester, UK
| | - Heinz Kolshorn
- Department Chemie, Johannes Gutenberg-Universität Mainz, Mainz, Germany
| | - Christoph Steinbeck
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-Universität Jena, Jena, Germany
| | - Nils Schlörer
- NMR-Plattform, Friedrich-Schiller-Universität Jena, Jena, Germany
| |
Collapse
|
9
|
Xue X, Sun H, Yang M, Liu X, Hu HY, Deng Y, Wang X. Advances in the Application of Artificial Intelligence-Based Spectral Data Interpretation: A Perspective. Anal Chem 2023; 95:13733-13745. [PMID: 37688541 DOI: 10.1021/acs.analchem.3c02540] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2023]
Abstract
The interpretation of spectral data, including mass, nuclear magnetic resonance, infrared, and ultraviolet-visible spectra, is critical for obtaining molecular structural information. The development of advanced sensing technology has multiplied the amount of available spectral data. Chemical experts must use basic principles corresponding to the spectral information generated by molecular fragments and functional groups. This is a time-consuming process that requires a solid professional knowledge base. In recent years, the rapid development of computer science and its applications in cheminformatics and the emergence of computer-aided expert systems have greatly reduced the difficulty in analyzing large quantities of data. For expert systems, however, the problem-solving strategy must be known in advance or extracted by human experts and translated into algorithms. Gratifyingly, the development of artificial intelligence (AI) methods has shown great promise for solving such problems. Traditional algorithms, including the latest neural network algorithms, have shown great potential for both extracting useful information and processing massive quantities of data. This Perspective highlights recent innovations covering all of the emerging AI-based spectral interpretation techniques. In addition, the main limitations and current obstacles are presented, and the corresponding directions for further research are proposed. Moreover, this Perspective gives the authors' personal outlook on the development and future applications of spectral interpretation.
Collapse
Affiliation(s)
- Xi Xue
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hai-Yu Hu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
| |
Collapse
|
10
|
Venetos MC, Wen M, Persson KA. Machine Learning Full NMR Chemical Shift Tensors of Silicon Oxides with Equivariant Graph Neural Networks. J Phys Chem A 2023; 127:2388-2398. [PMID: 36862997 PMCID: PMC10026072 DOI: 10.1021/acs.jpca.2c07530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/04/2023]
Abstract
The nuclear magnetic resonance (NMR) chemical shift tensor is a highly sensitive probe of the electronic structure of an atom and furthermore its local structure. Recently, machine learning has been applied to NMR in the prediction of isotropic chemical shifts from a structure. Current machine learning models, however, often ignore the full chemical shift tensor for the easier-to-predict isotropic chemical shift, effectively ignoring a multitude of structural information available in the NMR chemical shift tensor. Here we use an equivariant graph neural network (GNN) to predict full 29Si chemical shift tensors in silicate materials. The equivariant GNN model predicts full tensors to a mean absolute error of 1.05 ppm and is able to accurately determine the magnitude, anisotropy, and tensor orientation in a diverse set of silicon oxide local structures. When compared with other models, the equivariant GNN model outperforms the state-of-the-art machine learning models by 53%. The equivariant GNN model also outperforms historic analytical models by 57% for isotropic chemical shift and 91% for anisotropy. The software is available as a simple-to-use open-source repository, allowing similar models to be created and trained with ease.
Collapse
Affiliation(s)
- Maxwell C Venetos
- Department of Materials Science and Engineering, University of California, Berkeley, California 94720, United States
| | - Mingjian Wen
- Department of Chemical and Biomolecular Engineering, University of Houston, Houston, Texas 77204, United States
| | - Kristin A Persson
- Department of Materials Science and Engineering, University of California, Berkeley, California 94720, United States
- Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| |
Collapse
|
11
|
Jonas E, Kuhn S, Schlörer N. Prediction of chemical shift in NMR: A review. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2022; 60:1021-1031. [PMID: 34787335 DOI: 10.1002/mrc.5234] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 11/10/2021] [Accepted: 11/11/2021] [Indexed: 06/13/2023]
Abstract
Calculation of solution-state NMR parameters, including chemical shift values and scalar coupling constants, is often a crucial step for unambiguous structure assignment. Data-driven (sometimes called empirical) methods leverage databases of known parameter values to estimate parameters for unknown or novel molecules. This is in contrast to popular ab initio techniques that use detailed quantum computational chemistry calculations to arrive at parameter estimates. Data-driven methods have the potential to be considerably faster than ab inito techniques and have been the subject of renewed interest over the past decade with the rise of high-quality databases of NMR parameters and novel machine learning methods. Here, we review these methods, their strengths and pitfalls, and the databases they are built on.
Collapse
Affiliation(s)
- Eric Jonas
- Department of Computer Science, University of Chicago, Chicago, Illinois, 60637, USA
| | - Stefan Kuhn
- Cyber Technology Institute, De Montfort University, Leicester, LE1 9BH, UK
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Nils Schlörer
- NMR Core facility, Department of Chemistry, University of Cologne, Cologne, D-50939, Germany
| |
Collapse
|
12
|
Prediction of Resin Production in Copal Trees (Bursera spp.) Using a Random Forest Model. SUSTAINABILITY 2022. [DOI: 10.3390/su14138047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Non-timber forest products (NTFPs) are essential for community development, but their enormous demand has posed a serious threat to trees growing in their natural habitat. Copal resin is one of these products, which has a great deal of religious and ceremonial significance in Mexico and around the world. Resin extraction from a tree depends on its morphological and physiological characteristics, as well as its physical and sanitary condition. In this study, a methodology was proposed for determining the yield and health status of Copal trees, and a random forest (RF) model was developed to explain their resin production based on their morphological and condition characteristics. The experiment was conducted in the Agua Escondida watershed in Puebla, Mexico. With the training data, the average accuracy of the model was 99%, with a Kappa index of 98%, which is considered an excellent level of agreement beyond chance, and with the validation data, the average accuracy was 71% and 47%, which is considered a good level of agreement beyond chance. Tree condition was the most important factor affecting resin production in Copal trees, followed by stem diameter (33 and 38 cm), height (2 and 2.5 m), and diameter of secondary branches (from 8 to 15, 22 and 32 cm).
Collapse
|
13
|
Vulpetti A, Lingel A, Dalvit C, Schiering N, Oberer L, Henry C, Lu Y. Efficient Screening of Target-Specific Selected Compounds in Mixtures by 19F NMR Binding Assay with Predicted 19F NMR Chemical Shifts. ChemMedChem 2022; 17:e202200163. [PMID: 35475323 DOI: 10.1002/cmdc.202200163] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/26/2022] [Indexed: 11/06/2022]
Abstract
Ligand-based 19 F NMR screening is a highly effective and well-established hit-finding approach. The high sensitivity to protein binding makes it particularly suitable for fragment screening. Different criteria can be considered for generating fluorinated fragment libraries. One common strategy is to assemble a large, diverse, well-designed and characterized fragment library which is screened in mixtures, generated based on experimental 19 F NMR chemical shifts. Here, we introduce a complementary knowledge-based 19 F NMR screening approach, named 19 Focused screening, enabling the efficient screening of putative active molecules selected by computational hit finding methodologies, in mixtures assembled and on-the-fly deconvoluted based on predicted 19 F NMR chemical shifts. In this study, we developed a novel approach, named LEFshift , for 19 F NMR chemical shift prediction using rooted topological fluorine torsion fingerprints in combination with a random forest machine learning method. A demonstration of this approach to a real test case is reported.
Collapse
Affiliation(s)
- Anna Vulpetti
- Novartis Pharma AG, Global Discovery Chemistry, Novartis Campus, 4002, Basel, SWITZERLAND
| | - Andreas Lingel
- Novartis Institutes for BioMedical Research Basel, Global Discovery Chemistry, SWITZERLAND
| | - Claudio Dalvit
- Novartis Institutes for BioMedical Research Basel, Protease Platform, SWITZERLAND
| | - Nikolaus Schiering
- Novartis Institutes for BioMedical Research Basel, Protease Platform, SWITZERLAND
| | - Lukas Oberer
- Novartis Institutes for BioMedical Research Basel, Global Discovery Chemistry, SWITZERLAND
| | - Chrystelle Henry
- Novartis Institutes for BioMedical Research Basel, Protein Science, SWITZERLAND
| | - Yipin Lu
- Novartis Institutes for BioMedical Research Basel, Global Discovery Chemistry, SWITZERLAND
| |
Collapse
|
14
|
Krämer J, Kang R, Grimm LM, De Cola L, Picchetti P, Biedermann F. Molecular Probes, Chemosensors, and Nanosensors for Optical Detection of Biorelevant Molecules and Ions in Aqueous Media and Biofluids. Chem Rev 2022; 122:3459-3636. [PMID: 34995461 PMCID: PMC8832467 DOI: 10.1021/acs.chemrev.1c00746] [Citation(s) in RCA: 131] [Impact Index Per Article: 65.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Indexed: 02/08/2023]
Abstract
Synthetic molecular probes, chemosensors, and nanosensors used in combination with innovative assay protocols hold great potential for the development of robust, low-cost, and fast-responding sensors that are applicable in biofluids (urine, blood, and saliva). Particularly, the development of sensors for metabolites, neurotransmitters, drugs, and inorganic ions is highly desirable due to a lack of suitable biosensors. In addition, the monitoring and analysis of metabolic and signaling networks in cells and organisms by optical probes and chemosensors is becoming increasingly important in molecular biology and medicine. Thus, new perspectives for personalized diagnostics, theranostics, and biochemical/medical research will be unlocked when standing limitations of artificial binders and receptors are overcome. In this review, we survey synthetic sensing systems that have promising (future) application potential for the detection of small molecules, cations, and anions in aqueous media and biofluids. Special attention was given to sensing systems that provide a readily measurable optical signal through dynamic covalent chemistry, supramolecular host-guest interactions, or nanoparticles featuring plasmonic effects. This review shall also enable the reader to evaluate the current performance of molecular probes, chemosensors, and nanosensors in terms of sensitivity and selectivity with respect to practical requirement, and thereby inspiring new ideas for the development of further advanced systems.
Collapse
Affiliation(s)
- Joana Krämer
- Institute
of Nanotechnology, Karlsruhe Institute of
Technology (KIT), Hermann-von-Helmholtz Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Rui Kang
- Institute
of Nanotechnology, Karlsruhe Institute of
Technology (KIT), Hermann-von-Helmholtz Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Laura M. Grimm
- Institute
of Nanotechnology, Karlsruhe Institute of
Technology (KIT), Hermann-von-Helmholtz Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Luisa De Cola
- Institute
of Nanotechnology, Karlsruhe Institute of
Technology (KIT), Hermann-von-Helmholtz Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
- Dipartimento
DISFARM, University of Milano, via Camillo Golgi 19, 20133 Milano, Italy
- Department
of Molecular Biochemistry and Pharmacology, Instituto di Ricerche Farmacologiche Mario Negri, IRCCS, 20156 Milano, Italy
| | - Pierre Picchetti
- Institute
of Nanotechnology, Karlsruhe Institute of
Technology (KIT), Hermann-von-Helmholtz Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Frank Biedermann
- Institute
of Nanotechnology, Karlsruhe Institute of
Technology (KIT), Hermann-von-Helmholtz Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
15
|
Wang L, Wang K, Ma W, Abdel-Aty M, Li L. Real-time safety analysis for expressways considering the heterogeneity of different segment types. JOURNAL OF SAFETY RESEARCH 2022; 80:349-361. [PMID: 35249615 DOI: 10.1016/j.jsr.2021.12.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 07/19/2021] [Accepted: 12/11/2021] [Indexed: 06/14/2023]
Abstract
INTRODUCTION Studies have proven that the crash possibility and crash type are not the same among different expressway segment types. However, few studies have conducted real-time safety analysis considering different segment types. This study aimed to explore the crash mechanism's heterogeneity for different segment types (i.e., merge, diverge, weaving, and basic segments). METHOD To enable in-depth exploration, this study used detailed traffic data, which were 0-10 min before crash, at 1-min intervals, and from five detectors of both the upstream and downstream to the target segment. This study analyzed the crash mechanism's heterogeneity from the following aspects: crash characteristics, significant crash contributing variables, and variables' importance. Based on this, a variables selection method was proposed to solve the huge dimension scale in modeling. Then, a nested logit model was built, which could consider the crash mechanism's heterogeneity, to quantitatively analyze the impact of crash contributing factors on the crash risk. RESULTS The results revealed that there are statistically significant differences in crash characteristics between each segment type. Additionally, the sources of most crash contributing factors were found to be significantly different in the spatial-temporal dimension between each segment type. Moreover, this study found that the weather parameter, indicating pavement's wet condition, had a similar effect on crash risk between different segment types. However, the geometry and traffic parameters had significantly different impacts between different segment types. Moreover, when the number of target segments' upstream ramps increases or when the distance between ramps and the target segment decreases, the crash risk would increase. Practical Applications: This study can be applied in the intelligent transportation system to improve traffic safety performance, especially in active traffic management systems.
Collapse
Affiliation(s)
- Ling Wang
- The Key Laboratory of Road and Traffic Engineering of the Ministry of Education, Tongji University, 4800 Cao'an Road, Shanghai 201804, PR China.
| | - Kang Wang
- The Key Laboratory of Road and Traffic Engineering of the Ministry of Education, Tongji University, 4800 Cao'an Road, Shanghai 201804, PR China.
| | - Wanjing Ma
- The Key Laboratory of Road and Traffic Engineering of the Ministry of Education, Tongji University, 4800 Cao'an Road, Shanghai 201804, PR China.
| | - Mohamed Abdel-Aty
- Department of Civil, Environmental, and Construction Engineering, University of Central Florida, Orlando, FL 32816-2450, United States.
| | - Lin Li
- Tsinghua University, 30 Shuangqing Road, Beijing 201804, PR China; Shanghai international Automobile City Corporation, 888 Moyu South Road, Shanghai 201804, PR China.
| |
Collapse
|
16
|
Investigation of the factors affecting reverse osmosis membrane performance using machine-learning techniques. Comput Chem Eng 2022. [DOI: 10.1016/j.compchemeng.2022.107669] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
17
|
Huang Z, Chen MS, Woroch CP, Markland TE, Kanan MW. A framework for automated structure elucidation from routine NMR spectra. Chem Sci 2021; 12:15329-15338. [PMID: 34976353 PMCID: PMC8635205 DOI: 10.1039/d1sc04105c] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 11/08/2021] [Indexed: 12/25/2022] Open
Abstract
Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional 1H and/or 13C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms. A machine learning model and graph generator were able to accurately predict for the presence of nearly 1000 substructures and the connectivity of small organic molecules from experimental 1D NMR data.![]()
Collapse
Affiliation(s)
- Zhaorui Huang
- Department of Chemistry, Stanford University Stanford CA 94305 USA
| | - Michael S Chen
- Department of Chemistry, Stanford University Stanford CA 94305 USA
| | | | | | - Matthew W Kanan
- Department of Chemistry, Stanford University Stanford CA 94305 USA
| |
Collapse
|
18
|
Guan Y, Shree Sowndarya SV, Gallegos LC, St John PC, Paton RS. Real-time prediction of 1H and 13C chemical shifts with DFT accuracy using a 3D graph neural network. Chem Sci 2021; 12:12012-12026. [PMID: 34667567 PMCID: PMC8457395 DOI: 10.1039/d1sc03343c] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Accepted: 07/19/2021] [Indexed: 11/23/2022] Open
Abstract
Nuclear magnetic resonance (NMR) is one of the primary techniques used to elucidate the chemical structure, bonding, stereochemistry, and conformation of organic compounds. The distinct chemical shifts in an NMR spectrum depend upon each atom's local chemical environment and are influenced by both through-bond and through-space interactions with other atoms and functional groups. The in silico prediction of NMR chemical shifts using quantum mechanical (QM) calculations is now commonplace in aiding organic structural assignment since spectra can be computed for several candidate structures and then compared with experimental values to find the best possible match. However, the computational demands of calculating multiple structural- and stereo-isomers, each of which may typically exist as an ensemble of rapidly-interconverting conformations, are expensive. Additionally, the QM predictions themselves may lack sufficient accuracy to identify a correct structure. In this work, we address both of these shortcomings by developing a rapid machine learning (ML) protocol to predict 1H and 13C chemical shifts through an efficient graph neural network (GNN) using 3D structures as input. Transfer learning with experimental data is used to improve the final prediction accuracy of a model trained using QM calculations. When tested on the CHESHIRE dataset, the proposed model predicts observed 13C chemical shifts with comparable accuracy to the best-performing DFT functionals (1.5 ppm) in around 1/6000 of the CPU time. An automated prediction webserver and graphical interface are accessible online at http://nova.chem.colostate.edu/cascade/. We further demonstrate the model in three applications: first, we use the model to decide the correct organic structure from candidates through experimental spectra, including complex stereoisomers; second, we automatically detect and revise incorrect chemical shift assignments in a popular NMR database, the NMRShiftDB; and third, we use NMR chemical shifts as descriptors for determination of the sites of electrophilic aromatic substitution. From quantum chemical and experimental NMR data, a 3D graph neural network, CASCADE, has been developed to predict carbon and proton chemical shifts. Stereoisomers and conformers of organic molecules can be correctly distinguished.![]()
Collapse
Affiliation(s)
- Yanfei Guan
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - S V Shree Sowndarya
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Liliana C Gallegos
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Peter C St John
- Biosciences Center, National Renewable Energy Laboratory Golden CO 80401 USA
| | - Robert S Paton
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| |
Collapse
|
19
|
Muro S, Ishida M, Horie Y, Takeuchi W, Nakagawa S, Ban H, Nakagawa T, Kitamura T. Machine Learning Methods for the Diagnosis of Chronic Obstructive Pulmonary Disease in Healthy Subjects: Retrospective Observational Cohort Study. JMIR Med Inform 2021; 9:e24796. [PMID: 34255684 PMCID: PMC8293159 DOI: 10.2196/24796] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 11/17/2020] [Accepted: 04/11/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Airflow limitation is a critical physiological feature in chronic obstructive pulmonary disease (COPD), for which long-term exposure to noxious substances, including tobacco smoke, is an established risk. However, not all long-term smokers develop COPD, meaning that other risk factors exist. OBJECTIVE This study aimed to predict the risk factors for COPD diagnosis using machine learning in an annual medical check-up database. METHODS In this retrospective observational cohort study (ARTDECO [Analysis of Risk Factors to Detect COPD]), annual medical check-up records for all Hitachi Ltd employees in Japan collected from April 1998 to March 2019 were analyzed. Employees who provided informed consent via an opt-out model were screened and those aged 30 to 75 years without a prior diagnosis of COPD/asthma or a history of cancer were included. The database included clinical measurements (eg, pulmonary function tests) and questionnaire responses. To predict the risk factors for COPD diagnosis within a 3-year period, the Gradient Boosting Decision Tree machine learning (XGBoost) method was applied as a primary approach, with logistic regression as a secondary method. A diagnosis of COPD was made when the ratio of the prebronchodilator forced expiratory volume in 1 second (FEV1) to prebronchodilator forced vital capacity (FVC) was <0.7 during two consecutive examinations. RESULTS Of the 26,101 individuals screened, 1213 met the exclusion criteria, and thus, 24,815 individuals were included in the analysis. The top 10 predictors for COPD diagnosis were FEV1/FVC, smoking status, allergic symptoms, cough, pack years, hemoglobin A1c, serum albumin, mean corpuscular volume, percent predicted vital capacity, and percent predicted value of FEV1. The areas under the receiver operating characteristic curves of the XGBoost model and the logistic regression model were 0.956 and 0.943, respectively. CONCLUSIONS Using a machine learning model in this longitudinal database, we identified a number of parameters as risk factors other than smoking exposure or lung function to support general practitioners and occupational health physicians to predict the development of COPD. Further research to confirm our results is warranted, as our analysis involved a database used only in Japan.
Collapse
Affiliation(s)
- Shigeo Muro
- Department of Respiratory Medicine, Nara Medical University, Nara, Japan
| | - Masato Ishida
- Department of Respiratory and Immunology, Medical, AstraZeneca KK, Osaka, Japan
| | - Yoshiharu Horie
- Department of Data Science, Medical, AstraZeneca KK, Osaka, Japan
| | - Wataru Takeuchi
- Center for Technology Innovation-Artificial Intelligence, Research & Development Group, Hitachi, Ltd, Tokyo, Japan
| | - Shunki Nakagawa
- Center for Technology Innovation-Artificial Intelligence, Research & Development Group, Hitachi, Ltd, Tokyo, Japan
| | - Hideyuki Ban
- Center for Technology Innovation-Artificial Intelligence, Research & Development Group, Hitachi, Ltd, Tokyo, Japan
| | - Tohru Nakagawa
- Hitachi Health Care Center, Hitachi, Ltd, Ibaraki, Japan
| | - Tetsuhisa Kitamura
- Division of Environmental Medicine and Population Sciences, Department of Social and Environmental Medicine, Graduate School of Medicine, Osaka University, Osaka, Japan
| |
Collapse
|
20
|
Zhang F, Feng Y, Song S, Cai Q, Ji C, Zhu J. Temperature sensitivity of plant litter decomposition rate in China's forests. Ecosphere 2021. [DOI: 10.1002/ecs2.3541] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Affiliation(s)
- Fan Zhang
- State Key Laboratory of Grassland Agro‐Ecosystems College of Pastoral Agriculture Science and Technology Lanzhou University Lanzhou730020China
| | - Yuhao Feng
- Department of Ecology Key Laboratory for Earth Surface Processes of the Ministry of Education College of Urban and Environmental Sciences Peking University Beijing100871China
| | - Shanshan Song
- State Key Laboratory of Grassland Agro‐Ecosystems College of Pastoral Agriculture Science and Technology Lanzhou University Lanzhou730020China
| | - Qiong Cai
- Department of Ecology Key Laboratory for Earth Surface Processes of the Ministry of Education College of Urban and Environmental Sciences Peking University Beijing100871China
| | - Chengjun Ji
- Department of Ecology Key Laboratory for Earth Surface Processes of the Ministry of Education College of Urban and Environmental Sciences Peking University Beijing100871China
| | - Jianxiao Zhu
- State Key Laboratory of Grassland Agro‐Ecosystems College of Pastoral Agriculture Science and Technology Lanzhou University Lanzhou730020China
| |
Collapse
|
21
|
Kurotani A, Kakiuchi T, Kikuchi J. Solubility Prediction from Molecular Properties and Analytical Data Using an In-phase Deep Neural Network (Ip-DNN). ACS OMEGA 2021; 6:14278-14287. [PMID: 34124451 PMCID: PMC8190808 DOI: 10.1021/acsomega.1c01035] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 04/28/2021] [Indexed: 06/12/2023]
Abstract
Materials informatics is an emerging field that allows us to predict the properties of materials and has been applied in various research and development fields, such as materials science. In particular, solubility factors such as the Hansen and Hildebrand solubility parameters (HSPs and SP, respectively) and Log P are important values for understanding the physical properties of various substances. In this study, we succeeded at establishing a solubility prediction tool using a unique machine learning method called the in-phase deep neural network (ip-DNN), which starts exclusively from the analytical input data (e.g., NMR information, refractive index, and density) to predict solubility by predicting intermediate elements, such as molecular components and molecular descriptors, in the multiple-step method. For improving the level of accuracy of the prediction, intermediate regression models were employed when performing in-phase machine learning. In addition, we developed a website dedicated to the established solubility prediction method, which is freely available at "http://dmar.riken.jp/matsolca/".
Collapse
Affiliation(s)
- Atsushi Kurotani
- RIKEN
Center for Sustainable Resource Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Toshifumi Kakiuchi
- AGC
Yokohama Technical Center, 1-1 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Jun Kikuchi
- RIKEN
Center for Sustainable Resource Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Graduate
School of Medical Life Science, Yokohama
City University, 1-7-29
Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Graduate
School of Bioagricultural Sciences, Nagoya
University, 1 Furo-cho, Chikusa-ku, Nagoya, Aichi 464-0810, Japan
| |
Collapse
|
22
|
Han H, Choi S. Transfer Learning from Simulation to Experimental Data: NMR Chemical Shift Predictions. J Phys Chem Lett 2021; 12:3662-3668. [PMID: 33826849 DOI: 10.1021/acs.jpclett.1c00578] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
An accurate prediction of chemical shifts (δ) to elucidate molecular structures has been a challenging problem. Recently, noble machine learning architectures achieve accurate prediction performance, but the difficulty of building a huge chemical database limits the applicability of machine learning approaches. In this work, we demonstrate that the prior knowledge gained from the simulation database is successfully transferred into the problem of predicting an experimentally measured δ. Although both simulation and experimental databases are vastly different in chemical perspectives, reliable accuracy for δ is achieved by additional training with randomly sampled small numbers of experimental data. Furthermore, the prior knowledge allows us to successfully train the model on the more focused chemical space that the experimental database sparsely covers. The proposed approach, the knowledge transfer from the simulation database, can be utilized to enhance the usability of the local experimental database.
Collapse
Affiliation(s)
- Herim Han
- Division of National Supercomputing, Korea Institute of Science and Technology Information, 245 Daehak-Ro, Yuseong-Gu, Daejeon 34141, Republic of Korea
- Department of Polymer Science and Engineering, Dankook University, 152 Jukjeon-Ro, Suji-Gu, Yongin, Gyeonggi 16890, Republic of Korea
| | - Sunghwan Choi
- Division of National Supercomputing, Korea Institute of Science and Technology Information, 245 Daehak-Ro, Yuseong-Gu, Daejeon 34141, Republic of Korea
| |
Collapse
|
23
|
Anti-TROVE2 Antibody Determined by Immune-Related Array May Serve as a Predictive Marker for Adalimumab Immunogenicity and Effectiveness in RA. J Immunol Res 2021; 2021:6656121. [PMID: 33763493 PMCID: PMC7963899 DOI: 10.1155/2021/6656121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 02/20/2021] [Accepted: 02/23/2021] [Indexed: 11/18/2022] Open
Abstract
Anti-drug antibody (ADAb) development is associated with secondary therapeutic failure in biologic-treated rheumatoid arthritis (RA) patients. With a treat-to-target goal, we aimed to identify biomarkers for predicting ADAb development and therapeutic response in adalimumab-treated patients. Three independent cohorts were enrolled. In Cohort-1, 24 plasma samples (6 ADAb-positive and 6 ADAb-negative patients at baseline and week 24 of adalimumab therapy, respectively) were assayed with immune-related microarray containing 1,636 correctly folded functional proteins. Next, we executed statistically powered autoantibody profiling analysis of 50 samples in Cohort-2 (24 ADAb-positive and 26 ADAb-negative patients). Subsequently, immunofluorescence assay was performed on 48 samples in Cohort-3 to correlate with ADAb titers and drug levels. The biomarkers were identified for predicting ADAb development and therapeutic response using the immune-related microarray and machine learning approach. ADAb-positive patients had lower drug levels at week 24 (median = 0.024 μg/ml) compared with ADAb-negative patients (median = 6.38 μg/ml, p < 0.001). ROC analysis based on the ADAb status revealed the top 20 autoantibodies with AUC ≥ 0.7 in differentiating both groups in Cohort-1. Analysis of Cohort-2 dataset identified a panel of 8 biomarkers (TROVE2, SSB, NDE1, ZHX2, SH3GL1, CARD9, PTPN20, and KLHL12) with 80.6% specificity, 77.4% sensitivity, and 79.0% accuracy in discriminating poor from EULAR responders. Immunofluorescence assay validated that anti-TROVE2 antibody could highly predict ADAb development and poor EULAR response (AUC 0.79 and 0.89, respectively). Multivariate regression analysis proved anti-TROVE2 antibody to be an independent predictor for developing ADAb. Immune-related protein microarray and replication analysis identified anti-TROVE2 antibody as a useful biomarker for predicting ADAb development and therapeutic response in adalimumab-treated patients.
Collapse
|
24
|
Khan SR, Al Rijjal D, Piro A, Wheeler MB. Integration of AI and traditional medicine in drug discovery. Drug Discov Today 2021; 26:982-992. [PMID: 33476566 DOI: 10.1016/j.drudis.2021.01.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 12/01/2020] [Accepted: 01/11/2021] [Indexed: 11/24/2022]
Abstract
AI integration in plant-based traditional medicine could be used to overcome drug discovery challenges.
Collapse
Affiliation(s)
- Saifur R Khan
- Endocrine and Diabetes Platform, Department of Physiology, University of Toronto, Medical Sciences Building, Room 3352, 1 King's College Circle, Toronto, ON M5S 1A8, Canada; Advanced Diagnostics, Metabolism, Toronto General Hospital Research Institute, Toronto, ON, Canada.
| | - Dana Al Rijjal
- Endocrine and Diabetes Platform, Department of Physiology, University of Toronto, Medical Sciences Building, Room 3352, 1 King's College Circle, Toronto, ON M5S 1A8, Canada; Advanced Diagnostics, Metabolism, Toronto General Hospital Research Institute, Toronto, ON, Canada
| | - Anthony Piro
- Endocrine and Diabetes Platform, Department of Physiology, University of Toronto, Medical Sciences Building, Room 3352, 1 King's College Circle, Toronto, ON M5S 1A8, Canada; Advanced Diagnostics, Metabolism, Toronto General Hospital Research Institute, Toronto, ON, Canada
| | - Michael B Wheeler
- Endocrine and Diabetes Platform, Department of Physiology, University of Toronto, Medical Sciences Building, Room 3352, 1 King's College Circle, Toronto, ON M5S 1A8, Canada; Advanced Diagnostics, Metabolism, Toronto General Hospital Research Institute, Toronto, ON, Canada
| |
Collapse
|
25
|
Gao P, Zhang J, Peng Q, Zhang J, Glezakou VA. General Protocol for the Accurate Prediction of Molecular 13C/1H NMR Chemical Shifts via Machine Learning Augmented DFT. J Chem Inf Model 2020; 60:3746-3754. [DOI: 10.1021/acs.jcim.0c00388] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Peng Gao
- School of Chemistry and Molecular Bioscience, University of Wollongong, Wollongong, NSW 2500, Australia
| | - Jun Zhang
- Physical Sciences Division, Pacific Northwest National Laboratory (PNNL), Richland, Washington 99352, United States
| | - Qian Peng
- State Key Laboratory of Elemento-Organic Chemistry, College of Chemistry, Nankai University, Tianjin 300071, China
| | - Jie Zhang
- Centre of Chemistry and Chemical Biology, Guangzhou Regenerative Medicine and Health-Guangdong Laboratory, Science Park, Guangzhou 510530, China
| | - Vassiliki-Alexandra Glezakou
- Physical Sciences Division, Pacific Northwest National Laboratory (PNNL), Richland, Washington 99352, United States
| |
Collapse
|
26
|
Martínez-Treviño SH, Uc-Cetina V, Fernández-Herrera MA, Merino G. Prediction of Natural Product Classes Using Machine Learning and 13C NMR Spectroscopic Data. J Chem Inf Model 2020; 60:3376-3386. [DOI: 10.1021/acs.jcim.0c00293] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Saúl H. Martínez-Treviño
- Departamento de Fı́sica Aplicada, Centro de Investigación y de Estudios Avanzados, Km. 6 Antigua carretera a Progreso Apdo. Postal 73, Cordemex, 97310 Mérida, Mexico
| | - Víctor Uc-Cetina
- Facultad de Matemáticas, Universidad Autónoma de Yucatán, Av. Industrias no contaminantes, S/N, 97119 Mérida, Yucatán, Mexico
| | - María A. Fernández-Herrera
- Departamento de Fı́sica Aplicada, Centro de Investigación y de Estudios Avanzados, Km. 6 Antigua carretera a Progreso Apdo. Postal 73, Cordemex, 97310 Mérida, Mexico
| | - Gabriel Merino
- Departamento de Fı́sica Aplicada, Centro de Investigación y de Estudios Avanzados, Km. 6 Antigua carretera a Progreso Apdo. Postal 73, Cordemex, 97310 Mérida, Mexico
| |
Collapse
|
27
|
Cobas C. NMR signal processing, prediction, and structure verification with machine learning techniques. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2020; 58:512-519. [PMID: 31912547 DOI: 10.1002/mrc.4989] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 01/02/2020] [Accepted: 01/03/2020] [Indexed: 05/25/2023]
Abstract
Machine learning (ML) methods have been present in the field of NMR since decades, but it has experienced a tremendous growth in the last few years, especially thanks to the emergence of deep learning (DL) techniques taking advantage of the increased amounts of data and available computer power. These algorithms are successfully employed for classification, regression, clustering, or dimensionality reduction tasks of large data sets and have been intensively applied in different areas of NMR including metabonomics, clinical diagnosis, or relaxometry. In this article, we concentrate on the various applications of ML/DL in the areas of NMR signal processing and analysis of small molecules, including automatic structure verification and prediction of NMR observables in solution.
Collapse
Affiliation(s)
- Carlos Cobas
- Mestrelab Research, Santiago de Compostela, Spain
| |
Collapse
|
28
|
Stiegler J, Hoermann C, Müller J, Benbow ME, Heurich M. Carcass provisioning for scavenger conservation in a temperate forest ecosystem. Ecosphere 2020. [DOI: 10.1002/ecs2.3063] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Affiliation(s)
- Jonas Stiegler
- Department of Plant Ecology and Nature Conservation University of Potsdam Potsdam Germany
| | - Christian Hoermann
- Department of Wildlife Ecology and Management University of Freiburg Freiburg Germany
| | - Jörg Müller
- Department of Animal Ecology and Tropical Biology University of Würzburg Würzburg Germany
| | - M. Eric Benbow
- Department of Entomology Michigan State University East Lansing Michigan USA
- Department of Osteopathic Medical Specialties Michigan State University East Lansing Michigan USA
- Ecology, Evolutionary Biology and Behavior Program Michigan State University East Lansing Michigan USA
| | - Marco Heurich
- Department of Wildlife Ecology and Management University of Freiburg Freiburg Germany
| |
Collapse
|
29
|
Kuhn S, Colreavy-Donnelly S, Santana de Souza J, Borges RM. An integrated approach for mixture analysis using MS and NMR techniques. Faraday Discuss 2020; 218:339-353. [PMID: 31114813 DOI: 10.1039/c8fd00227d] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We suggest an improved software pipeline for mixture analysis. The improvements include combining tandem MS and 2D NMR data for a reliable identification of the constituents in an algorithm based on network analysis aiming for a robust and reliable identification routine. An important part of this pipeline is the use of open-data repositories, although it is not totally reliant on them. The NMR identification step emphasizes robustness and is less sensitive towards changes in data acquisition and processing than existing methods. The process starts with LC-ESI-MSMS based molecular network dereplication using data from the GNPS collaborative collection. We identify closely related structures by propagating structure elucidation through edges in the network. Those identified compounds are added on top of a candidate list for the following NMR filtering method that predicts HSQC and HMBC NMR data. The similarity of the predicted spectra of the set of closely related structures to the measured spectra of the mixture sample is taken as one indication of the most likely candidates for its compounds. The other indication is the match of the spectra to clusters built by a network analysis from the spectra of the mixture. The sensitivity gap between NMR and MS is anticipated and it will be reflected naturally by the eventual identification of fewer compounds, but with a higher confidence level, after the NMR analysis step. The contributions of the paper are an algorithm combining MS and NMR spectroscopy and a robust nJCH network analysis to explore the complementary aspects of both techniques. This delivers good results, even if a perfect computational separation of the compounds in the mixture is not possible. All of the scripts are freely available to aid studies such as with plants, marine organisms, and microorganism natural product chemistry and metabolomics, as those are the driving forces for this project.
Collapse
Affiliation(s)
- Stefan Kuhn
- De Montfort University, School of Computer Science and Informatics, The Gateway, Leicester LE1 9BH, UK.
| | | | | | | |
Collapse
|
30
|
Giri S, Zhang Z, Krasnuk D, Lathrop RG. Evaluating the impact of land uses on stream integrity using machine learning algorithms. THE SCIENCE OF THE TOTAL ENVIRONMENT 2019; 696:133858. [PMID: 31465920 DOI: 10.1016/j.scitotenv.2019.133858] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 08/05/2019] [Accepted: 08/08/2019] [Indexed: 06/10/2023]
Abstract
A general pattern of declining aquatic ecological integrity with increasing urban land use has been well established for a number of watersheds worldwide. A more nuanced characterization of the influence of different urban land uses and the determination of cumulative thresholds will further inform watershed planning and management. To this end, we investigated the utility of two machine learning algorithms (Random Forests (RF) and Boosted Regression Trees (BRT)) to model stream impairment through multimetric macroinvertebrate index known as High Gradient Macroinvertebrate Index (HGMI) in an urbanizing watershed located in north-central New Jersey, United States. These machine learning algorithms were able to explain at least 50% of the variability of stream integrity based on watershed land use/land cover. While comparable in results, RF was found to be easier to train and was somewhat more robust to model overfitting compared to BRT. Our results document the influence of increasing high-medium density (> 30% Impervious Surface cover (ISC)), low density (15-30% ISC) urban and transitional/barren land had in negatively affecting stream biological integrity. The thresholds generated by partial plots suggest that the stream integrity decreased abruptly when the percentage of high-medium and low density urban, and transitional/barren land went above 10%, 8%, and 2% of the watershed, respectively. Additionally, when rural residential surpassed 30% threshold, it behaved similar to low density urban towards stream integrity. Identification of such cumulative thresholds can help watershed managers and policymakers to craft land use zoning regulations and design restoration programs that are grounded by objective scientific criteria.
Collapse
Affiliation(s)
- Subhasis Giri
- Department of Ecology, Evolution, and Natural Resources, School of Environmental and Biological Sciences, Rutgers, The State University of New Jersey, New Brunswick NJ-08901, USA.
| | - Zhen Zhang
- Data Science and Informatics, DowDuPont, Indianapolis IN-46268, USA
| | - Daryl Krasnuk
- Department of Ecology, Evolution, and Natural Resources, School of Environmental and Biological Sciences, Rutgers, The State University of New Jersey, New Brunswick NJ-08901, USA
| | - Richard G Lathrop
- Department of Ecology, Evolution, and Natural Resources, School of Environmental and Biological Sciences, Rutgers, The State University of New Jersey, New Brunswick NJ-08901, USA
| |
Collapse
|
31
|
Razzaq A, Sadia B, Raza A, Khalid Hameed M, Saleem F. Metabolomics: A Way Forward for Crop Improvement. Metabolites 2019; 9:E303. [PMID: 31847393 PMCID: PMC6969922 DOI: 10.3390/metabo9120303] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 12/02/2019] [Accepted: 12/11/2019] [Indexed: 12/15/2022] Open
Abstract
Metabolomics is an emerging branch of "omics" and it involves identification and quantification of metabolites and chemical footprints of cellular regulatory processes in different biological species. The metabolome is the total metabolite pool in an organism, which can be measured to characterize genetic or environmental variations. Metabolomics plays a significant role in exploring environment-gene interactions, mutant characterization, phenotyping, identification of biomarkers, and drug discovery. Metabolomics is a promising approach to decipher various metabolic networks that are linked with biotic and abiotic stress tolerance in plants. In this context, metabolomics-assisted breeding enables efficient screening for yield and stress tolerance of crops at the metabolic level. Advanced metabolomics analytical tools, like non-destructive nuclear magnetic resonance spectroscopy (NMR), liquid chromatography mass-spectroscopy (LC-MS), gas chromatography-mass spectrometry (GC-MS), high performance liquid chromatography (HPLC), and direct flow injection (DFI) mass spectrometry, have sped up metabolic profiling. Presently, integrating metabolomics with post-genomics tools has enabled efficient dissection of genetic and phenotypic association in crop plants. This review provides insight into the state-of-the-art plant metabolomics tools for crop improvement. Here, we describe the workflow of plant metabolomics research focusing on the elucidation of biotic and abiotic stress tolerance mechanisms in plants. Furthermore, the potential of metabolomics-assisted breeding for crop improvement and its future applications in speed breeding are also discussed. Mention has also been made of possible bottlenecks and future prospects of plant metabolomics.
Collapse
Affiliation(s)
- Ali Razzaq
- Centre of Agricultural Biochemistry and Biotechnology (CABB), University of Agriculture, Faisalabad 38040, Pakistan; (A.R.); (B.S.)
| | - Bushra Sadia
- Centre of Agricultural Biochemistry and Biotechnology (CABB), University of Agriculture, Faisalabad 38040, Pakistan; (A.R.); (B.S.)
| | - Ali Raza
- Oil Crops Research Institute, Chinese Academy of Agricultural Sciences (CAAS), Wuhan 430062, China;
| | - Muhammad Khalid Hameed
- School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China;
| | - Fozia Saleem
- Centre of Agricultural Biochemistry and Biotechnology (CABB), University of Agriculture, Faisalabad 38040, Pakistan; (A.R.); (B.S.)
| |
Collapse
|
32
|
Kuhn S, Johnson SR. Stereo-Aware Extension of HOSE Codes. ACS OMEGA 2019; 4:7323-7329. [PMID: 31459832 PMCID: PMC6648302 DOI: 10.1021/acsomega.9b00488] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Accepted: 04/02/2019] [Indexed: 05/16/2023]
Abstract
Descriptions of molecular environments have many applications in chemoinformatics, including chemical shift prediction. Hierarchically ordered spherical environment (HOSE) codes are the most popular such descriptions. We developed a method to extend these with stereochemistry information. It enables distinguishing atoms which would be considered identical in traditional HOSE codes. The use of our method is demonstrated by chemical shift predictions for molecules in the nmrshiftdb2 database. We give a full specification and an implementation.
Collapse
Affiliation(s)
- Stefan Kuhn
- School
of Computer Science and Informatics, De
Montfort University, The Gateway LE1 9BH, Leicester, U.K.
- E-mail:
| | - Sean R. Johnson
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
33
|
Müller A, Parzych D. Determinants of Entrepreneurial Intentions at Universities. Warsaw University of Technology Case. PROBLEMY ZARZADZANIA 2019. [DOI: 10.7172/1644-9584.77.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
34
|
Chaker Z, Salanne M, Delaye JM, Charpentier T. NMR shifts in aluminosilicate glasses via machine learning. Phys Chem Chem Phys 2019; 21:21709-21725. [DOI: 10.1039/c9cp02803j] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Machine learning (ML) approaches are investigated for the prediction of nuclear magnetic resonance (NMR) shifts in aluminosilicate glasses, for which NMR has proven to be a cutting-edge method over the last decade.
Collapse
Affiliation(s)
- Ziyad Chaker
- NIMBE
- CEA
- CNRS
- Université Paris-Saclay
- F-91191 Gif-sur-Yvette Cedex
| | | | - Jean-Marc Delaye
- CEA
- DEN
- Service d'études de vitrification et procédés hautes températures
- 30207 Bagnols-sur-Cèze
- France
| | | |
Collapse
|
35
|
Retrieval of Daily PM2.5 Concentrations Using Nonlinear Methods: A Case Study of the Beijing–Tianjin–Hebei Region, China. REMOTE SENSING 2018. [DOI: 10.3390/rs10122006] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Exposure to fine particulate matter (PM2.5) is associated with adverse health impacts on the population. Satellite observations and machine learning algorithms have been applied to improve the accuracy of the prediction of PM2.5 concentrations. In this study, we developed a PM2.5 retrieval approach using machine-learning methods, based on aerosol products from the Moderate Resolution Imaging Spectroradiometer (MODIS) aboard the NASA Earth Observation System (EOS) Terra and Aqua polar-orbiting satellites, near-ground meteorological variables from the NASA Goddard Earth Observing System (GEOS), and ground-based PM2.5 observation data. Four models, which are orthogonal regression (OR), regression tree (Rpart), random forests (RF), and support vector machine (SVM), were tested and compared in the Beijing–Tianjin–Hebei (BTH) region of China in 2015. Aerosol products derived from the Terra and Aqua satellite sensors were also compared. The 10-repeat 5-fold cross-validation (10 × 5 CV) method was subsequently used to evaluate the performance of the different aerosol products and the four models. The results show that the performance of the Aqua dataset was better than that of the Terra dataset, and that the RF algorithm has the best predictive performance (Terra: R = 0.77, RMSE = 43.51 μg/m3; Aqua: R = 0.85, RMSE = 33.90 μg/m3). This study shows promise for predicting the spatiotemporal distribution of PM2.5 using the RF model and Aqua aerosol product with the assistance of PM2.5 site data.
Collapse
|
36
|
Ito K, Obuchi Y, Chikayama E, Date Y, Kikuchi J. Exploratory machine-learned theoretical chemical shifts can closely predict metabolic mixture signals. Chem Sci 2018; 9:8213-8220. [PMID: 30542569 PMCID: PMC6240814 DOI: 10.1039/c8sc03628d] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 08/23/2018] [Indexed: 12/30/2022] Open
Abstract
Various chemical shift predictive methodologies have been studied and developed, but there remains the problem of prediction accuracy. Assigning the NMR signals of metabolic mixtures requires high predictive performance owing to the complexity of the signals. Here we propose a new predictive tool that combines quantum chemistry and machine learning. A scaling factor as the objective variable to correct the errors of 2355 theoretical chemical shifts was optimized by exploring 91 machine learning algorithms and using the partial structure of 150 compounds as explanatory variables. The optimal predictive model gave RMSDs between experimental and predicted chemical shifts of 0.2177 ppm for δ 1H and 3.3261 ppm for δ 13C in the test data; thus, better accuracy was achieved compared with existing empirical and quantum chemical methods. The utility of the predictive model was demonstrated by applying it to assignments of experimental NMR signals of a complex metabolic mixture.
Collapse
Affiliation(s)
- Kengo Ito
- RIKEN Center for Sustainable Resource Science , 1-7-22 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan .
- Graduate School of Medical Life Science , Yokohama City University , 1-7-29 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan
| | - Yuka Obuchi
- Graduate School of Medical Life Science , Yokohama City University , 1-7-29 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan
| | - Eisuke Chikayama
- RIKEN Center for Sustainable Resource Science , 1-7-22 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan .
- Department of Information Systems , Niigata University of International and Information Studies , 3-1-1 Mizukino, Nishi-ku , Niigata-shi , Niigata 950-2292 , Japan
| | - Yasuhiro Date
- RIKEN Center for Sustainable Resource Science , 1-7-22 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan .
- Graduate School of Medical Life Science , Yokohama City University , 1-7-29 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan
| | - Jun Kikuchi
- RIKEN Center for Sustainable Resource Science , 1-7-22 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan .
- Graduate School of Medical Life Science , Yokohama City University , 1-7-29 Suehiro-cho, Tsurumi-ku , Yokohama , Kanagawa 230-0045 , Japan
- Graduate School of Bioagricultural Sciences , Nagoya University , 1 Furo-cho, Chikusa-ku , Nagoya , Aichi 464-0810 , Japan
| |
Collapse
|
37
|
Paruzzo FM, Hofstetter A, Musil F, De S, Ceriotti M, Emsley L. Chemical shifts in molecular solids by machine learning. Nat Commun 2018; 9:4501. [PMID: 30374021 PMCID: PMC6206069 DOI: 10.1038/s41467-018-06972-x] [Citation(s) in RCA: 135] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Accepted: 09/26/2018] [Indexed: 02/02/2023] Open
Abstract
Due to their strong dependence on local atonic environments, NMR chemical shifts are among the most powerful tools for strucutre elucidation of powdered solids or amorphous materials. Unfortunately, using them for structure determination depends on the ability to calculate them, which comes at the cost of high accuracy first-principles calculations. Machine learning has recently emerged as a way to overcome the need for quantum chemical calculations, but for chemical shifts in solids it is hindered by the chemical and combinatorial space spanned by molecular solids, the strong dependency of chemical shifts on their environment, and the lack of an experimental database of shifts. We propose a machine learning method based on local environments to accurately predict chemical shifts of molecular solids and their polymorphs to within DFT accuracy. We also demonstrate that the trained model is able to determine, based on the match between experimentally measured and ML-predicted shifts, the structures of cocaine and the drug 4-[4-(2-adamantylcarbamoyl)-5-tert-butylpyrazol-1-yl]benzoic acid. Solid-state nuclear magnetic resonance combined with quantum chemical shift predictions is limited by high computational cost. Here, the authors use machine learning based on local atomic environments to predict experimental chemical shifts in molecular solids with accuracy similar to density functional theory.
Collapse
Affiliation(s)
- Federico M Paruzzo
- Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland
| | - Albert Hofstetter
- Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland
| | - Félix Musil
- Institut des Sciences et Génie Matériaux, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland
| | - Sandip De
- Institut des Sciences et Génie Matériaux, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland
| | - Michele Ceriotti
- Institut des Sciences et Génie Matériaux, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland.
| | - Lyndon Emsley
- Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland.
| |
Collapse
|
38
|
Alexandri E, Ahmed R, Siddiqui H, Choudhary MI, Tsiafoulis CG, Gerothanassis IP. High Resolution NMR Spectroscopy as a Structural and Analytical Tool for Unsaturated Lipids in Solution. Molecules 2017; 22:E1663. [PMID: 28981459 PMCID: PMC6151582 DOI: 10.3390/molecules22101663] [Citation(s) in RCA: 130] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 09/28/2017] [Accepted: 10/01/2017] [Indexed: 12/13/2022] Open
Abstract
Mono- and polyunsaturated lipids are widely distributed in Nature, and are structurally and functionally a diverse class of molecules with a variety of physicochemical, biological, medicinal and nutritional properties. High resolution NMR spectroscopic techniques including 1H-, 13C- and 31P-NMR have been successfully employed as a structural and analytical tool for unsaturated lipids. The objective of this review article is to provide: (i) an overview of the critical 1H-, 13C- and 31P-NMR parameters for structural and analytical investigations; (ii) an overview of various 1D and 2D NMR techniques that have been used for resonance assignments; (iii) selected analytical and structural studies with emphasis in the identification of major and minor unsaturated fatty acids in complex lipid extracts without the need for the isolation of the individual components; (iv) selected investigations of oxidation products of lipids; (v) applications in the emerging field of lipidomics; (vi) studies of protein-lipid interactions at a molecular level; (vii) practical considerations and (viii) an overview of future developments in the field.
Collapse
Affiliation(s)
- Eleni Alexandri
- Section of Organic Chemistry and Biochemistry, Department of Chemistry, University of Ioannina, GR-45110 Ioannina, Greece.
| | - Raheel Ahmed
- H.E.J. Research Institute of Chemistry, International Center for Chemical and Biological Sciences, University of Karachi, Karachi 75270, Pakistan.
| | - Hina Siddiqui
- H.E.J. Research Institute of Chemistry, International Center for Chemical and Biological Sciences, University of Karachi, Karachi 75270, Pakistan.
| | - Muhammad I Choudhary
- Department of Biochemistry, Faculty of Science, King Abdulaziz University, Jeddah 214412, Saudi Arabia.
| | | | - Ioannis P Gerothanassis
- Section of Organic Chemistry and Biochemistry, Department of Chemistry, University of Ioannina, GR-45110 Ioannina, Greece.
- H.E.J. Research Institute of Chemistry, International Center for Chemical and Biological Sciences, University of Karachi, Karachi 75270, Pakistan.
| |
Collapse
|
39
|
Lang F, Aravamudhan S, Nolte H, Türk C, Hölper S, Müller S, Günther S, Blaauw B, Braun T, Krüger M. Dynamic changes in the mouse skeletal muscle proteome during denervation-induced atrophy. Dis Model Mech 2017; 10:881-896. [PMID: 28546288 PMCID: PMC5536905 DOI: 10.1242/dmm.028910] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Accepted: 05/16/2017] [Indexed: 01/07/2023] Open
Abstract
Loss of neuronal stimulation enhances protein breakdown and reduces protein synthesis, causing rapid loss of muscle mass. To elucidate the pathophysiological adaptations that occur in atrophying muscles, we used stable isotope labelling and mass spectrometry to quantify protein expression changes accurately during denervation-induced atrophy after sciatic nerve section in the mouse gastrocnemius muscle. Additionally, mice were fed a stable isotope labelling of amino acids in cell culture (SILAC) diet containing 13C6-lysine for 4, 7 or 11 days to calculate relative levels of protein synthesis in denervated and control muscles. Ubiquitin remnant peptides (K-ε-GG) were profiled by immunoaffinity enrichment to identify potential substrates of the ubiquitin-proteasomal pathway. Of the 4279 skeletal muscle proteins quantified, 850 were differentially expressed significantly within 2 weeks after denervation compared with control muscles. Moreover, pulse labelling identified Lys6 incorporation in 4786 proteins, of which 43 had differential Lys6 incorporation between control and denervated muscle. Enrichment of diglycine remnants identified 2100 endogenous ubiquitination sites and revealed a metabolic and myofibrillar protein diglycine signature, including myosin heavy chains, myomesins and titin, during denervation. Comparative analysis of these proteomic data sets with known atrogenes using a random forest approach identified 92 proteins subject to atrogene-like regulation that have not previously been associated directly with denervation-induced atrophy. Comparison of protein synthesis and proteomic data indicated that upregulation of specific proteins in response to denervation is mainly achieved by protein stabilization. This study provides the first integrated analysis of protein expression, synthesis and ubiquitin signatures during muscular atrophy in a living animal. Summary: Comprehensive proteomic profiling of protein expression, synthesis and ubiquitination during skeletal muscle atrophy reveals that complex regulatory networks are activated during muscle wasting.
Collapse
Affiliation(s)
- Franziska Lang
- Institute for Genetics, Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), 50931 Cologne, Germany
| | - Sriram Aravamudhan
- Max Planck Institute for Heart and Lung Research, 61231 Bad Nauheim, Germany
| | - Hendrik Nolte
- Institute for Genetics, Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), 50931 Cologne, Germany
| | - Clara Türk
- Institute for Genetics, Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), 50931 Cologne, Germany
| | - Soraya Hölper
- Institute of Biochemistry II, Goethe University Medical School, 60590 Frankfurt, Germany
| | - Stefan Müller
- Center for Molecular Medicine (CMMC), University of Cologne, 50931 Cologne, Germany
| | - Stefan Günther
- Max Planck Institute for Heart and Lung Research, 61231 Bad Nauheim, Germany
| | - Bert Blaauw
- Venetian Institute of Molecular Medicine (VIMM), Department of Biomedical Sciences Padova, University of Padova, 35137 Padova, Italy
| | - Thomas Braun
- Max Planck Institute for Heart and Lung Research, 61231 Bad Nauheim, Germany
| | - Marcus Krüger
- Institute for Genetics, Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), 50931 Cologne, Germany .,Center for Molecular Medicine (CMMC), University of Cologne, 50931 Cologne, Germany
| |
Collapse
|
40
|
Gauba H, Kumar P, Roy PP, Singh P, Dogra DP, Raman B. Prediction of advertisement preference by fusing EEG response and sentiment analysis. Neural Netw 2017; 92:77-88. [PMID: 28254237 DOI: 10.1016/j.neunet.2017.01.013] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 01/28/2017] [Accepted: 01/30/2017] [Indexed: 10/20/2022]
Abstract
This paper presents a novel approach to predict rating of video-advertisements based on a multimodal framework combining physiological analysis of the user and global sentiment-rating available on the internet. We have fused Electroencephalogram (EEG) waves of user and corresponding global textual comments of the video to understand the user's preference more precisely. In our framework, the users were asked to watch the video-advertisement and simultaneously EEG signals were recorded. Valence scores were obtained using self-report for each video. A higher valence corresponds to intrinsic attractiveness of the user. Furthermore, the multimedia data that comprised of the comments posted by global viewers, were retrieved and processed using Natural Language Processing (NLP) technique for sentiment analysis. Textual contents from review comments were analyzed to obtain a score to understand sentiment nature of the video. A regression technique based on Random forest was used to predict the rating of an advertisement using EEG data. Finally, EEG based rating is combined with NLP-based sentiment score to improve the overall prediction. The study was carried out using 15 video clips of advertisements available online. Twenty five participants were involved in our study to analyze our proposed system. The results are encouraging and these suggest that the proposed multimodal approach can achieve lower RMSE in rating prediction as compared to the prediction using only EEG data.
Collapse
Affiliation(s)
- Himaanshu Gauba
- Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, India.
| | - Pradeep Kumar
- Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, India
| | - Partha Pratim Roy
- Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, India
| | - Priyanka Singh
- Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, India
| | - Debi Prosad Dogra
- School of Electrical Sciences, Indian Institute of Technology, Bhubaneswar, India
| | - Balasubramanian Raman
- Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, India
| |
Collapse
|
41
|
Álvarez-Cabria M, González-Ferreras AM, Peñas FJ, Barquín J. Modelling macroinvertebrate and fish biotic indices: From reaches to entire river networks. THE SCIENCE OF THE TOTAL ENVIRONMENT 2017; 577:308-318. [PMID: 27802888 DOI: 10.1016/j.scitotenv.2016.10.186] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Revised: 09/12/2016] [Accepted: 10/25/2016] [Indexed: 06/06/2023]
Abstract
We modelled three macroinvertebrate (IASPT, EPT number of families and LIFE) and one fish (percentage of salmonid biomass) biotic indices to river networks draining a large region (110,000km2) placed in Northern and Eastern Spain. Models were developed using Random Forest and 26 predictor variables (19 predictors to model macroinvertebrate indices and 22 predictors to model the fish index). Predictor variables were related with different environmental characteristics (water quality, physical habitat characteristics, hydrology, topography, geology and human pressures). The importance and effect of predictors on the 4 biotic indices was evaluated with the IncNodePurity index and partial dependence plots, respectively. Results indicated that the spatial variability of macroinvertebrate and fish indices were mostly dependent on the same environmental variables. They decreased in river reaches affected by high mean annual nitrate concentration (>4mg/l) and temperature (>12°C), with low flow water velocity (<0.4m/s) and aquatic plant communities being dominated by macrophytes. These indices were higher in the Atlantic region than in the Mediterranean. This study provides a continuous image of river biological communities used as indicators, which turns very useful to identify the main sources of change in the ecological status of water bodies and assist both, the integrated catchment management and the identification of river reaches for recovery.
Collapse
Affiliation(s)
- Mario Álvarez-Cabria
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011 Santander, Spain.
| | - Alexia M González-Ferreras
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011 Santander, Spain.
| | - Francisco J Peñas
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011 Santander, Spain.
| | - José Barquín
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011 Santander, Spain.
| |
Collapse
|
42
|
Chikayama E, Shimbo Y, Komatsu K, Kikuchi J. The Effect of Molecular Conformation on the Accuracy of Theoretical (1)H and (13)C Chemical Shifts Calculated by Ab Initio Methods for Metabolic Mixture Analysis. J Phys Chem B 2016; 120:3479-87. [PMID: 26963288 DOI: 10.1021/acs.jpcb.5b12748] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
NMR spectroscopy is a powerful method for analyzing metabolic mixtures. The information obtained from an NMR spectrum is in the form of physical parameters, such as chemical shifts, and construction of databases for many metabolites will be useful for data interpretation. To increase the accuracy of theoretical chemical shifts for development of a database for a variety of metabolites, the effects of sets of conformations (structural ensembles) and the levels of theory on computations of theoretical chemical shifts were systematically investigated for a set of 29 small molecules in the present study. For each of the 29 compounds, 101 structures were generated by classical molecular dynamics at 298.15 K, and then theoretical chemical shifts for 164 (1)H and 123 (13)C atoms were calculated by ab initio quantum chemical methods. Six levels of theory were used by pairing Hartree-Fock, B3LYP (density functional theory), or second order Møller-Plesset perturbation with 6-31G or aug-cc-pVDZ basis set. The six average fluctuations in the (1)H chemical shift were ±0.63, ± 0.59, ± 0.70, ± 0.62, ± 0.75, and ±0.66 ppm for the structural ensembles, and the six average errors were ±0.34, ± 0.27, ± 0.32, ± 0.25, ± 0.32, and ±0.25 ppm. The results showed that chemical shift fluctuations with changes in the conformation because of molecular motion were larger than the differences between computed and experimental chemical shifts for all six levels of theory. In conclusion, selection of an appropriate structural ensemble should be performed before theoretical chemical shift calculations for development of an accurate database for a variety of metabolites.
Collapse
Affiliation(s)
- Eisuke Chikayama
- RIKEN Center for Sustainable Resource Science , 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.,Department of Information Systems, Niigata University of International and Information Studies , 3-1-1 Mizukino, Nishi-ku, Niigata, Niigata 950-2292, Japan
| | - Yudai Shimbo
- NEC Solution Innovators, Ltd. , 2-2-41 Ekimae, Kashiwazaki, Niigata 945-0055, Japan
| | - Keiko Komatsu
- RIKEN Center for Sustainable Resource Science , 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Jun Kikuchi
- RIKEN Center for Sustainable Resource Science , 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.,Graduate School of Medical Life Science, Yokohama City University , 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.,Graduate School of Bioagricultural Sciences, Nagoya University , 1 Furo-cho, Chikusa-ku, Nagoya, Aichi 464-0810, Japan
| |
Collapse
|
43
|
Álvarez-Cabria M, Barquín J, Peñas FJ. Modelling the spatial and seasonal variability of water quality for entire river networks: Relationships with natural and anthropogenic factors. THE SCIENCE OF THE TOTAL ENVIRONMENT 2016; 545-546:152-162. [PMID: 26745301 DOI: 10.1016/j.scitotenv.2015.12.109] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Revised: 12/22/2015] [Accepted: 12/22/2015] [Indexed: 06/05/2023]
Abstract
We model the spatial and seasonal variability of three key water quality variables (water temperature and concentration of nitrates and phosphates) for entire river networks in a large area in northern Spain. Models were developed with the Random Forest technique, using 12 (water temperature and nitrate concentration) and 15 (phosphate concentration) predictor variables as descriptors of several environmental attributes (climate, topography, land-uses, hydrology and anthropogenic pressures). The effect of the different predictors on the response variables was assessed with partial dependence plots and partial correlation analysis. Results indicated that land-uses were important predictors in defining the spatial and seasonal patterns of these three variables. Water temperature was positively related with air temperature and the upstream drainage area, whereas increases in forest cover decreased water temperature. Nitrate concentration was mainly related to the area covered by agricultural land-uses, increasing in winter, probably because of catchment run-off processes. On the other hand, phosphate concentration was highly related to the area covered by urban land-uses in the upstream catchment and to the proximity of the closest upstream effluent. Phosphate concentration increased notably during the low flow period (summer), probably due to the reduction of the dilution capacity. These results provide a large-scale continuous picture of water quality, which could help identify the main sources of change in water quality and assist in the prioritization of river reaches for restoration projects.
Collapse
Affiliation(s)
- Mario Álvarez-Cabria
- Environmental Hydraulics Institute (IH Cantabria), University of Cantabria, C/Isabel Torres n° 15, Parque Científico y Tecnológico de Cantabria, 39011 Santander, Spain.
| | - José Barquín
- Environmental Hydraulics Institute (IH Cantabria), University of Cantabria, C/Isabel Torres n° 15, Parque Científico y Tecnológico de Cantabria, 39011 Santander, Spain.
| | - Francisco J Peñas
- Environmental Hydraulics Institute (IH Cantabria), University of Cantabria, C/Isabel Torres n° 15, Parque Científico y Tecnológico de Cantabria, 39011 Santander, Spain.
| |
Collapse
|
44
|
|
45
|
Flavor Profile of Chinese Liquor Is Altered by Interactions of Intrinsic and Extrinsic Microbes. Appl Environ Microbiol 2015; 82:422-30. [PMID: 26475111 DOI: 10.1128/aem.02518-15] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Accepted: 10/14/2015] [Indexed: 12/23/2022] Open
Abstract
The flavor profile of Chinese liquor is the result of the metabolic activity of its microbial community. Given the importance of the microbial interaction, a novel way to control the liquor's flavor is by regulating the composition of the community. In this study, we efficiently improved the liquor's flavor by perturbing the intrinsic microbial metabolism with extrinsic microbes. We first constructed a basic microbial group (intrinsic) containing Saccharomyces cerevisiae, Wickerhamomyces anomalus, and Issatchenkia orientalis and added special flavor producers (extrinsic), Saccharomyces uvarum and Saccharomyces servazzii, to this intrinsic group. Upon the addition of the extrinsic microbes, the maximum specific growth rates of S. cerevisiae and I. orientalis increased from 6.19 to 43.28/day and from 1.15 to 14.32/day, respectively, but that of W. anomalus changed from 1.00 to 0.96/day. In addition, most volatile compounds known to be produced by the extrinsic strains were not produced. However, more esters, alcohols, and acids were produced by S. cerevisiae and I. orientalis. Six compounds were significantly different by random forest analysis after perturbation. Among them, increases in ethyl hexanoate, isobutanol, and 3-methylbutyric acid were correlated with S. cerevisiae and I. orientalis, and a decrease in geranyl acetone was correlated with W. anomalus. Variations in ethyl acetate and 2-phenylethanol might be due to the varied activity of W. anomalus and S. cerevisiae. This work showed the effect of the interaction between the intrinsic and extrinsic microbes on liquor flavor, which would be beneficial for improving the quality of Chinese liquor.
Collapse
|
46
|
Bernal A, Castillo AM, González F, Patiny L, Wist J. Improving the efficiency of branch-and-bound complete-search NMR assignment using the symmetry of molecules and spectra. J Chem Phys 2015; 142:074103. [PMID: 25701998 DOI: 10.1063/1.4907898] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Nuclear magnetic resonance (NMR) assignment of small molecules is presented as a typical example of a combinatorial optimization problem in chemical physics. Three strategies that help improve the efficiency of solution search by the branch and bound method are presented: 1. reduction of the size of the solution space by resort to a condensed structure formula, wherein symmetric nuclei are grouped together; 2. partitioning of the solution space based on symmetry, that becomes the basis for an efficient branching procedure; and 3. a criterion of selection of input restrictions that leads to increased gaps between branches and thus faster pruning of non-viable solutions. Although the examples chosen to illustrate this work focus on small-molecule NMR assignment, the results are generic and might help solving other combinatorial optimization problems.
Collapse
Affiliation(s)
- Andrés Bernal
- Ecole Polytechnique Féderale de Lausanne (EPFL), CH1-1015 Lausanne, Switzerland
| | - Andrés M Castillo
- Chemistry Department, Universidad del Valle, AA 25360 Cali, Valle, Colombia
| | - Fabio González
- MindLab Research Group, Universidad Nacional de Colombia, Bogotá D.C., Colombia
| | - Luc Patiny
- Ecole Polytechnique Féderale de Lausanne (EPFL), CH1-1015 Lausanne, Switzerland
| | - Julien Wist
- DARMN Research Group, Universidad del Valle, Cali, Valle, Colombia
| |
Collapse
|
47
|
Gaudêncio SP, Pereira F. Dereplication: racing to speed up the natural products discovery process. Nat Prod Rep 2015; 32:779-810. [PMID: 25850681 DOI: 10.1039/c4np00134f] [Citation(s) in RCA: 162] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Covering: 1993-2014 (July)To alleviate the dereplication holdup, which is a major bottleneck in natural products discovery, scientists have been conducting their research efforts to add tools to their "bag of tricks" aiming to achieve faster, more accurate and efficient ways to accelerate the pace of the drug discovery process. Consequently dereplication has become a hot topic presenting a huge publication boom since 2012, blending multidisciplinary fields in new ways that provide important conceptual and/or methodological advances, opening up pioneering research prospects in this field.
Collapse
Affiliation(s)
- Susana P Gaudêncio
- LAQV, REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal.
| | | |
Collapse
|
48
|
Mohamed A, Nguyen CH, Mamitsuka H. Current status and prospects of computational resources for natural product dereplication: a review. Brief Bioinform 2015; 17:309-21. [PMID: 26153512 DOI: 10.1093/bib/bbv042] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Indexed: 01/08/2023] Open
Abstract
Research in natural products has always enhanced drug discovery by providing new and unique chemical compounds. However, recently, drug discovery from natural products is slowed down by the increasing chance of re-isolating known compounds. Rapid identification of previously isolated compounds in an automated manner, called dereplication, steers researchers toward novel findings, thereby reducing the time and effort for identifying new drug leads. Dereplication identifies compounds by comparing processed experimental data with those of known compounds, and so, diverse computational resources such as databases and tools to process and compare compound data are necessary. Automating the dereplication process through the integration of computational resources has always been an aspired goal of natural product researchers. To increase the utilization of current computational resources for natural products, we first provide an overview of the dereplication process, and then list useful resources, categorizing into databases, methods and software tools and further explaining them from a dereplication perspective. Finally, we discuss the current challenges to automating dereplication and proposed solutions.
Collapse
|
49
|
Kessler N, Bonte A, Albaum SP, Mäder P, Messmer M, Goesmann A, Niehaus K, Langenkämper G, Nattkemper TW. Learning to Classify Organic and Conventional Wheat - A Machine Learning Driven Approach Using the MeltDB 2.0 Metabolomics Analysis Platform. Front Bioeng Biotechnol 2015; 3:35. [PMID: 25853128 PMCID: PMC4371749 DOI: 10.3389/fbioe.2015.00035] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 03/03/2015] [Indexed: 11/13/2022] Open
Abstract
We present results of our machine learning approach to the problem of classifying GC-MS data originating from wheat grains of different farming systems. The aim is to investigate the potential of learning algorithms to classify GC-MS data to be either from conventionally grown or from organically grown samples and considering different cultivars. The motivation of our work is rather obvious nowadays: increased demand for organic food in post-industrialized societies and the necessity to prove organic food authenticity. The background of our data set is given by up to 11 wheat cultivars that have been cultivated in both farming systems, organic and conventional, throughout 3 years. More than 300 GC-MS measurements were recorded and subsequently processed and analyzed in the MeltDB 2.0 metabolomics analysis platform, being briefly outlined in this paper. We further describe how unsupervised (t-SNE, PCA) and supervised (SVM) methods can be applied for sample visualization and classification. Our results clearly show that years have most and wheat cultivars have second-most influence on the metabolic composition of a sample. We can also show that for a given year and cultivar, organic and conventional cultivation can be distinguished by machine-learning algorithms.
Collapse
Affiliation(s)
- Nikolas Kessler
- Biodata Mining Group, Faculty of Technology, Bielefeld University , Bielefeld , Germany ; Bioinformatics Resource Facility, Center for Biotechnology, Bielefeld University , Bielefeld , Germany
| | - Anja Bonte
- Department of Safety and Quality of Cereals, Max Rubner-Institut , Detmold , Germany
| | - Stefan P Albaum
- Bioinformatics Resource Facility, Center for Biotechnology, Bielefeld University , Bielefeld , Germany
| | - Paul Mäder
- Department of Soil Sciences, Research Institute of Organic Agriculture (FiBL) , Frick , Switzerland
| | - Monika Messmer
- Department of Crop Sciences, Research Institute of Organic Agriculture (FiBL) , Frick , Switzerland
| | - Alexander Goesmann
- Bioinformatics and Systems Biology, Justus-Liebig-University Gießen , Gießen , Germany
| | - Karsten Niehaus
- Department of Proteome and Metabolome Research, Faculty of Biology, Center for Biotechnology, Bielefeld University , Bielefeld , Germany
| | - Georg Langenkämper
- Department of Safety and Quality of Cereals, Max Rubner-Institut , Detmold , Germany
| | - Tim W Nattkemper
- Biodata Mining Group, Faculty of Technology, Bielefeld University , Bielefeld , Germany
| |
Collapse
|
50
|
Chen R, Deng Z, Song Z. The prediction of malignant middle cerebral artery infarction: a predicting approach using random forest. J Stroke Cerebrovasc Dis 2015; 24:958-64. [PMID: 25804564 DOI: 10.1016/j.jstrokecerebrovasdis.2014.12.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Revised: 12/04/2014] [Accepted: 12/12/2014] [Indexed: 10/23/2022] Open
Abstract
BACKGROUND Malignant middle cerebral artery infarction (MMI) is always associated with high mortality rates. Early decompressive craniectomy is crucial to its treatment. The purpose of this study was to establish a reliable model for an early prediction of MMI. METHODS Using a retrospective survey, we have collected the data of 132 patients with middle cerebral artery infarction. According to a prognosis, the patients are divided into the MMI group (n = 36) and the non-MMI group (n = 96). All the patients are represented by their clinical, biochemical, and imaging features. Then a random forest (RF) prediction model is established on the clinical data. Meanwhile, 3 traditional prediction models, including univariate linear discriminant analysis (LDA) model, multivariate LDA model, and binary logistic regression analysis (BLRA), are built to compare with the RF model. The prediction performance of different models is assessed by the area under the receiver operating characteristic curves (AUCs). RESULTS Four parameters, Glasgow Coma Scale, midline shifting, area, and volume of focus, selected as predictors in all models. As independent predictors, their AUCs are .72-.80, and when the sensitivities are high (.91-.95), the specificities are low (.32-.53). The AUC of RF model is .96, 95% confidence interval (CI) is (.93-.99), sensitivity is 1, and specificity is .85. The AUC of the multivariate LDA model is .87 and 95% CI is (.80-.93). The AUC of the BLRA model is .86 and 95% CI is (.80-.93). CONCLUSIONS The RF performs very well in the given clinical data set, which indicates that the RF is applicable to the early prediction of the MMI.
Collapse
Affiliation(s)
- Ru Chen
- Neurological Department, The Third Xiangya Hospital of Central South University, Hunan, China
| | - Zelin Deng
- Department of software engineering, School of Computer and Communication Engineering, Changsha University of Science and Technology, Hunan, China
| | - Zhi Song
- Neurological Department, The Third Xiangya Hospital of Central South University, Hunan, China.
| |
Collapse
|