1
|
Du H, Zhang Y, Ma Y, Jiao W, Lei T, Su H. Rapid Determination of Crude Protein Content in Alfalfa Based on Fourier Transform Infrared Spectroscopy. Foods 2024; 13:2187. [PMID: 39063271 PMCID: PMC11276440 DOI: 10.3390/foods13142187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Revised: 07/08/2024] [Accepted: 07/10/2024] [Indexed: 07/28/2024] Open
Abstract
The crude protein (CP) content is an important determining factor for the quality of alfalfa, and its accurate and rapid evaluation is a challenge for the industry. A model was developed by combining Fourier transform infrared spectroscopy (FTIS) and chemometric analysis. Fourier spectra were collected in the range of 4000~400 cm-1. Adaptive iteratively reweighted penalized least squares (airPLS) and Savitzky-Golay (SG) were used for preprocessing the spectral data; competitive adaptive reweighted sampling (CARS) and the characteristic peaks of CP functional groups and moieties were used for feature selection; partial least squares regression (PLSR) and random forest regression (RFR) were used for quantitative prediction modelling. By comparing the combined prediction results of CP content, the predictive performance of airPLST-cars-PLSR-CV was the best, with an RP2 of 0.99 and an RMSEP of 0.053, which is suitable for establishing a small-sample prediction model. The research results show that the combination of the PLSR model can achieve an accurate prediction of the crude protein content of alfalfa forage, which can provide a reliable and effective new detection method for the crude protein content of alfalfa forage.
Collapse
Affiliation(s)
- Haijun Du
- College of Mechanical and Electrical Engineering, Inner Mongolia Agricultural University, No. 36 Zhaowuda Road, Hohhot 010018, China; (H.D.); (T.L.); (H.S.)
| | - Yaru Zhang
- College of Horticulture and Plant Protection, Inner Mongolia Agricultural University, No. 36 Zhaowuda Road, Hohhot 010018, China;
| | - Yanhua Ma
- College of Mechanical and Electrical Engineering, Inner Mongolia Agricultural University, No. 36 Zhaowuda Road, Hohhot 010018, China; (H.D.); (T.L.); (H.S.)
| | - Wei Jiao
- The China Academy of Grassland Research, No. 120 Wulanchabu East Street, Saihan District, Hohhot 010018, China;
| | - Ting Lei
- College of Mechanical and Electrical Engineering, Inner Mongolia Agricultural University, No. 36 Zhaowuda Road, Hohhot 010018, China; (H.D.); (T.L.); (H.S.)
| | - He Su
- College of Mechanical and Electrical Engineering, Inner Mongolia Agricultural University, No. 36 Zhaowuda Road, Hohhot 010018, China; (H.D.); (T.L.); (H.S.)
| |
Collapse
|
2
|
Stienstra CMK, Hebert L, Thomas P, Haack A, Guo J, Hopkins WS. Graphormer-IR: Graph Transformers Predict Experimental IR Spectra Using Highly Specialized Attention. J Chem Inf Model 2024; 64:4613-4629. [PMID: 38845400 DOI: 10.1021/acs.jcim.4c00378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]
Abstract
Infrared (IR) spectroscopy is an important analytical tool in various chemical and forensic domains and a great deal of effort has gone into developing in silico methods for predicting experimental spectra. A key challenge in this regard is generating highly accurate spectra quickly to enable real-time feedback between computation and experiment. Here, we employ Graphormer, a graph neural network (GNN) transformer, to predict IR spectra using only simplified molecular-input line-entry system (SMILES) strings. Our data set includes 53,528 high-quality spectra, measured in five different experimental media (i.e., phases), for molecules containing the elements H, C, N, O, F, Si, S, P, Cl, Br, and I. When using only atomic numbers for node encodings, Graphormer-IR achieved a mean test spectral information similarity (SISμ) value of 0.8449 ± 0.0012 (n = 5), which surpasses that the current state-of-the-art model Chemprop-IR (SISμ = 0.8409 ± 0.0014, n = 5) with only 36% of the encoded information. Augmenting node embeddings with additional node-level descriptors in learned embeddings generated through a multilayer perceptron improves scores to SISμ = 0.8523 ± 0.0006, a total improvement of 19.7σ (t = 19). These improved scores show how Graphormer-IR excels in capturing long-range interactions like hydrogen bonding, anharmonic peak positions in experimental spectra, and stretching frequencies of uncommon functional groups. Scaling our architecture to 210 attention heads demonstrates specialist-like behavior for distinct IR frequencies that improves model performance. Our model utilizes novel architectures, including a global node for phase encoding, learned node feature embeddings, and a one-dimensional (1D) smoothing convolutional neural network (CNN). Graphormer-IR's innovations underscore its value over traditional message-passing neural networks (MPNNs) due to its expressive embeddings and ability to capture long-range intramolecular relationships.
Collapse
Affiliation(s)
- Cailum M K Stienstra
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Liam Hebert
- Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Patrick Thomas
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Alexander Haack
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Jason Guo
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - W Scott Hopkins
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Watermine Innovation, Waterloo, Ontario N0B 2T0, Canada
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong
| |
Collapse
|
3
|
Lu XY, Wu HP, Ma H, Li H, Li J, Liu YT, Pan ZY, Xie Y, Wang L, Ren B, Liu GK. Deep Learning-Assisted Spectrum-Structure Correlation: State-of-the-Art and Perspectives. Anal Chem 2024; 96:7959-7975. [PMID: 38662943 DOI: 10.1021/acs.analchem.4c01639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Spectrum-structure correlation is playing an increasingly crucial role in spectral analysis and has undergone significant development in recent decades. With the advancement of spectrometers, the high-throughput detection triggers the explosive growth of spectral data, and the research extension from small molecules to biomolecules accompanies massive chemical space. Facing the evolving landscape of spectrum-structure correlation, conventional chemometrics becomes ill-equipped, and deep learning assisted chemometrics rapidly emerges as a flourishing approach with superior ability of extracting latent features and making precise predictions. In this review, the molecular and spectral representations and fundamental knowledge of deep learning are first introduced. We then summarize the development of how deep learning assist to establish the correlation between spectrum and molecular structure in the recent 5 years, by empowering spectral prediction (i.e., forward structure-spectrum correlation) and further enabling library matching and de novo molecular generation (i.e., inverse spectrum-structure correlation). Finally, we highlight the most important open issues persisted with corresponding potential solutions. With the fast development of deep learning, it is expected to see ultimate solution of establishing spectrum-structure correlation soon, which would trigger substantial development of various disciplines.
Collapse
Affiliation(s)
- Xin-Yu Lu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hao-Ping Wu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| | - Hao Ma
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hui Li
- Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen 361005, P. R. China
| | - Jia Li
- Institute of Artificial Intelligence, Xiamen University, Xiamen 361005, P. R. China
| | - Yan-Ti Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Zheng-Yan Pan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yi Xie
- School of Informatics, Xiamen University, Xiamen 361005, P. R. China
| | - Lei Wang
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361005, P. R. China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Guo-Kun Liu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| |
Collapse
|
4
|
Liu S. Harvesting Chemical Understanding with Machine Learning and Quantum Computers. ACS PHYSICAL CHEMISTRY AU 2024; 4:135-142. [PMID: 38560751 PMCID: PMC10979482 DOI: 10.1021/acsphyschemau.3c00067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 12/29/2023] [Accepted: 01/02/2024] [Indexed: 04/04/2024]
Abstract
It is tenable to argue that nobody can predict the future with certainty, yet one can learn from the past and make informed projections for the years ahead. In this Perspective, we overview the status of how theory and computation can be exploited to obtain chemical understanding from wave function theory and density functional theory, and then outlook the likely impact of machine learning (ML) and quantum computers (QC) to appreciate traditional chemical concepts in decades to come. It is maintained that the development and maturation of ML and QC methods in theoretical and computational chemistry represent two paradigm shifts about how the Schrödinger equation can be solved. New chemical understanding can be harnessed in these two new paradigms by making respective use of ML features and QC qubits. Before that happens, however, we still have hurdles to face and obstacles to overcome in both ML and QC arenas. Possible pathways to tackle these challenges are proposed. We anticipate that hierarchical modeling, in contrast to multiscale modeling, will emerge and thrive, becoming the workhorse of in silico simulations in the next few decades.
Collapse
|
5
|
Díaz-Sánchez F, García-Castro MA, Amador-Ramírez MP, Arzola-Flores JA, Limón-Aguilar X. Experimental Determination of the Standard Enthalpy of Formation of Trimellitic Acid and Its Prediction by Supervised Learning. J Phys Chem A 2024; 128:2200-2209. [PMID: 38445978 PMCID: PMC10961834 DOI: 10.1021/acs.jpca.3c05235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 02/19/2024] [Accepted: 02/20/2024] [Indexed: 03/07/2024]
Abstract
The standard molar enthalpy of formation for trimellitic acid (TMAc) in the crystalline phase at 298.15 K, ΔfHm°(cr), was calculated experimentally from the enthalpy of combustion through combustion calorimetry experiments. Likewise, the standard molar enthalpy of sublimation was determined from the standard molar enthalpy of fusion and from the standard molar enthalpy of vaporization from differential scanning calorimetry and thermogravimetry, respectively. Subsequently, the standard molar enthalpies of formation in the gas-phase at 298.15 K, ΔfHm°(g), were calculated. The enthalpies of formation for TMAc, hemimellitic, and trimesic acids were predicted using multiple linear regression (MLR) with a nonreplacement evaluation technique. MLR was applied to the data set that allowed estimating these thermochemical properties with an R2 greater than 0.99. This model was used to compare the predicted and experimental results for benzene carboxylic acids.
Collapse
Affiliation(s)
- Fausto Díaz-Sánchez
- Facultad
de Ingeniería Química de la Benemérita Universidad
Autónoma de Puebla, 18 Sur y Av. San Claudio, C.P. 72570 Puebla Pue, Mexico
| | - Miguel Angel García-Castro
- Facultad
de Ingeniería Química de la Benemérita Universidad
Autónoma de Puebla, 18 Sur y Av. San Claudio, C.P. 72570 Puebla Pue, Mexico
| | - María Patricia Amador-Ramírez
- Facultad
de Ciencias Químicas de la Benemérita Universidad Autónoma
de Puebla, 14 Sur y Av.
San Claudio, C.P. 72570 Puebla Pue, Mexico
| | - Jesús Andrés Arzola-Flores
- Facultad
de Ingeniería Química de la Benemérita Universidad
Autónoma de Puebla, 18 Sur y Av. San Claudio, C.P. 72570 Puebla Pue, Mexico
| | - Ximena Limón-Aguilar
- Facultad
de Ingeniería Química de la Benemérita Universidad
Autónoma de Puebla, 18 Sur y Av. San Claudio, C.P. 72570 Puebla Pue, Mexico
| |
Collapse
|
6
|
Ye K, Wang S, Huang Y, Hu M, Zhou D, Luo Y, Ye S, Zhang G, Jiang J. Machine Learning Prediction of Molecular Binding Profiles on Metal-Porphyrin via Spectroscopic Descriptors. J Phys Chem Lett 2024; 15:1956-1961. [PMID: 38346267 DOI: 10.1021/acs.jpclett.3c03002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
The study of molecular adsorption is crucial for understanding various chemical processes. Spectroscopy offers a convenient and non-invasive way of probing structures of adsorbed states and can be used for real-time observation of molecular binding profiles, including both structural and energetic information. However, deciphering atomic structures from spectral information using the first-principles approach is computationally expensive and time-consuming because of the sophistication of recording spectra, chemical structures, and their relationship. Here, we demonstrate the feasibility of a data-driven machine learning approach for predicting binding energy and structural information directly from vibrational spectra of the adsorbate by using CO adsorption on iron porphyrin as an example. Our trained machine learning model is not only interpretable but also readily transferred to similar metal-nitrogen-carbon systems with comparable accuracy. This work shows the potential of using structure-encoded spectroscopic descriptors in machine learning models for the study of adsorbed states of molecules on transition metal complexes.
Collapse
Affiliation(s)
- Ke Ye
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, P. R. China
| | - Song Wang
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, P. R. China
- Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, P. R. China
| | - Yan Huang
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, P. R. China
- Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, P. R. China
| | - Min Hu
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, P. R. China
| | - Donglai Zhou
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, P. R. China
- Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, P. R. China
| | - Yi Luo
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, P. R. China
- Hefei National Laboratory, University of Science and Technology of China, Hefei, Anhui 230088, P. R. China
| | - Sheng Ye
- School of Artificial Intelligence, Anhui University, Hefei, Anhui 230601, P. R. China
| | - Guozhen Zhang
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, P. R. China
- Hefei National Laboratory, University of Science and Technology of China, Hefei, Anhui 230088, P. R. China
| | - Jun Jiang
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, P. R. China
- Hefei National Laboratory, University of Science and Technology of China, Hefei, Anhui 230088, P. R. China
- Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, P. R. China
| |
Collapse
|
7
|
Ye S, Zhong K, Huang Y, Zhang G, Sun C, Jiang J. Artificial Intelligence-based Amide-II Infrared Spectroscopy Simulation for Monitoring Protein Hydrogen Bonding Dynamics. J Am Chem Soc 2024; 146:2663-2672. [PMID: 38240637 DOI: 10.1021/jacs.3c12258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
The structurally sensitive amide II infrared (IR) bands of proteins provide valuable information about the hydrogen bonding of protein secondary structures, which is crucial for understanding protein dynamics and associated functions. However, deciphering protein structures from experimental amide II spectra relies on time-consuming quantum chemical calculations on tens of thousands of representative configurations in solvent water. Currently, the accurate simulation of amide II spectra for whole proteins remains a challenge. Here, we present a machine learning (ML)-based protocol designed to efficiently simulate the amide II IR spectra of various proteins with an accuracy comparable to experimental results. This protocol stands out as a cost-effective and efficient alternative for studying protein dynamics, including the identification of secondary structures and monitoring the dynamics of protein hydrogen bonding under different pH conditions and during protein folding process. Our method provides a valuable tool in the field of protein research, focusing on the study of dynamic properties of proteins, especially those related to hydrogen bonding, using amide II IR spectroscopy.
Collapse
Affiliation(s)
- Sheng Ye
- School of Artificial Intelligence, Anhui University, Hefei, Anhui 230601, People's Republic of China
| | - Kai Zhong
- Zernike Institute for Advanced Materials, University of Groningen, Nijenborgh 4, Groningen 9747AG, Netherlands
| | - Yan Huang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Guozhen Zhang
- Hefei National Research Center of Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
- Hefei National Laboratory, University of Science and Technology of China, Hefei 230088, China
| | - Changyin Sun
- School of Artificial Intelligence, Anhui University, Hefei, Anhui 230601, People's Republic of China
| | - Jun Jiang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
- Hefei National Laboratory, University of Science and Technology of China, Hefei 230088, China
| |
Collapse
|
8
|
Baronio CM, Barth A. Refining protein amide I spectrum simulations with simple yet effective electrostatic models for local wavenumbers and dipole derivative magnitudes. Phys Chem Chem Phys 2024; 26:1166-1181. [PMID: 38099625 DOI: 10.1039/d3cp02018e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]
Abstract
Analysis of the amide I band of proteins is probably the most wide-spread application of bioanalytical infrared spectroscopy. Although highly desirable for a more detailed structural interpretation, a quantitative description of this absorption band is still difficult. This work optimized several electrostatic models with the aim to reproduce the effect of the protein environment on the intrinsic wavenumber of a local amide I oscillator. We considered the main secondary structures - α-helices, parallel and antiparallel β-sheets - with a maximum of 21 amide groups. The models were based on the electric potential and/or the electric field component along the CO bond at up to four atoms in an amide group. They were bench-marked by comparison to Hessian matrices reconstructed from density functional theory calculations at the BPW91, 6-31G** level. The performance of the electrostatic models depended on the charge set used to calculate the electric field and potential. Gromos and DSSP charge sets, used in common force fields, were not optimal for the better performing models. A good compromise between performance and the stability of model parameters was achieved by a model that considered the electric field at the positions of the oxygen, nitrogen, and hydrogen atoms of the considered amide group. The model describes also some aspects of the local conformation effect and performs similar on its own as in combination with an explicit implementation of the local conformation effect. It is better than a combination of a local hydrogen bonding model with the local conformation effect. Even though the short-range hydrogen bonding model performs worse, it captures important aspects of the local wavenumber sensitivity to the molecular surroundings. We improved also the description of the coupling between local amide I oscillators by developing an electrostatic model for the dependency of the dipole derivative magnitude on the protein environment.
Collapse
Affiliation(s)
- Cesare M Baronio
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden.
| | - Andreas Barth
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden.
| |
Collapse
|
9
|
Yang T, Zhou D, Ye S, Li X, Li H, Feng Y, Jiang Z, Yang L, Ye K, Shen Y, Jiang S, Feng S, Zhang G, Huang Y, Wang S, Jiang J. Catalytic Structure Design by AI Generating with Spectroscopic Descriptors. J Am Chem Soc 2023. [PMID: 38019281 DOI: 10.1021/jacs.3c09299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Generative artificial intelligence has depicted a beautiful blueprint for on-demand design in chemical research. However, the few successful chemical generations have only been able to implement a few special property values because most chemical descriptors are mathematically discrete or discontinuously adjustable. Herein, we use spectroscopic descriptors with machine learning to establish a quantitative spectral structure-property relationship for adsorbed molecules on metal monatomic catalysts. Besides catalytic properties such as adsorption energy and charge transfer, the complete spatial relative coordinates of the adsorbed molecule were successfully inverted. The spectroscopic descriptors and prediction models are generalized, allowing them to be transferred to several different systems. Due to the continuous tunability of the spectroscopic descriptors, the design of catalytic structures with continuous adsorption states generated by AI in the catalytic process has been achieved. This work paves the way for using spectroscopy to enable real-time monitoring of the catalytic process and continuous customization of catalytic performance, which will lead to profound changes in catalytic research.
Collapse
Affiliation(s)
- Tongtong Yang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
- Institute of Intelligent Innovation, Henan Academy of Sciences, Zhengzhou, Henan 451162, P. R. China
| | - Donglai Zhou
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Sheng Ye
- School of Artificial Intelligence, Anhui University, Hefei, Anhui 230601, China
| | - Xiyu Li
- Songshan Lake Materials Laboratory, Dongguan, Guangdong 523808, China
| | - Huirong Li
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Yi Feng
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Zifan Jiang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Li Yang
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
| | - Ke Ye
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Yixi Shen
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Shuang Jiang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Shuo Feng
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Guozhen Zhang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Yan Huang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Song Wang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Jun Jiang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| |
Collapse
|
10
|
Ma H, Yan S, Lu X, Bao YF, Liu J, Liao L, Dai K, Cao M, Zhao X, Yan H, Wang HL, Peng X, Chen N, Feng H, Zhu L, Yao G, Fan C, Wu DY, Wang B, Wang X, Ren B. Rapidly determining the 3D structure of proteins by surface-enhanced Raman spectroscopy. SCIENCE ADVANCES 2023; 9:eadh8362. [PMID: 37992170 PMCID: PMC10665000 DOI: 10.1126/sciadv.adh8362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 10/23/2023] [Indexed: 11/24/2023]
Abstract
Despite great advances in protein structure analysis, label-free and ultrasensitive methods to obtain the natural and dynamic three-dimensional (3D) structures are still urgently needed. Surface-enhanced Raman spectroscopy (SERS) can be a good candidate, whereas the complexity originated from the interactions between the protein and the gradient surface electric field makes it extremely challenging to determine the protein structure. Here, we propose a deciphering strategy for accurate determination of 3D protein structure from experimental SERS spectra in seconds by simply summing SERS spectra of isolated amino acids in electric fields of different strength with their orientations in protein. The 3D protein structure can be reconstructed by comparing the experimental spectra obtained in a well-defined gap-mode SERS configuration with the simulated spectra. The gradient electric field endows SERS with a unique advantage to section biomolecules with atomic precision, which makes SERS a competent tool for monitoring biomolecular events under physiological conditions.
Collapse
Affiliation(s)
- Hao Ma
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, China
| | - Sen Yan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, China
| | - Xinyu Lu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, China
| | - Yi-Fan Bao
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, China
| | - Jia Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Langxing Liao
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Kun Dai
- School of Chemistry and Chemical Engineering, Frontiers Science Center for Transformative Molecules, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Maofeng Cao
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, China
| | - Xiaojiao Zhao
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, China
| | - Hao Yan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Hai-Long Wang
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, China
| | - Xiaohui Peng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, China
| | - Ningyu Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, China
| | - Huishu Feng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, China
| | - Lilin Zhu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, China
| | - Guangbao Yao
- School of Chemistry and Chemical Engineering, Frontiers Science Center for Transformative Molecules, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Chunhai Fan
- School of Chemistry and Chemical Engineering, Frontiers Science Center for Transformative Molecules, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - De-Yin Wu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Xiang Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, China
| |
Collapse
|
11
|
Zou Z, Zhang Y, Liang L, Wei M, Leng J, Jiang J, Luo Y, Hu W. A deep learning model for predicting selected organic molecular spectra. NATURE COMPUTATIONAL SCIENCE 2023; 3:957-964. [PMID: 38177591 DOI: 10.1038/s43588-023-00550-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 10/06/2023] [Indexed: 01/06/2024]
Abstract
Accurate and efficient molecular spectra simulations are crucial for substance discovery and structure identification. However, the conventional approach of relying on the quantum chemistry is cost intensive, which hampers efficiency. Here we develop DetaNet, a deep-learning model combining E(3)-equivariance group and self-attention mechanism to predict molecular spectra with improved efficiency and accuracy. By passing high-order geometric tensorial messages, DetaNet is able to generate a wide variety of molecular properties, including scalars, vectors, and second- and third-order tensors-all at the accuracy of quantum chemistry calculations. Based on this we developed generalized modules to predict four important types of molecular spectra, namely infrared, Raman, ultraviolet-visible, and 1H and 13C nuclear magnetic resonance, taking the QM9S dataset containing 130,000 molecular species as an example. By speeding up the prediction of molecular spectra at quantum chemical accuracy, DetaNet could help progress toward real-time structural identification using spectroscopic measurements.
Collapse
Affiliation(s)
- Zihan Zou
- School of Chemistry and Chemical Engineering, Qilu University of Technology (Shandong Academy of Science), Jinan, China
| | - Yujin Zhang
- School of Chemistry and Chemical Engineering, Qilu University of Technology (Shandong Academy of Science), Jinan, China.
| | - Lijun Liang
- College of Automation, Hangzhou Dianzi University, Hangzhou, China
| | - Mingzhi Wei
- School of Chemistry and Chemical Engineering, Qilu University of Technology (Shandong Academy of Science), Jinan, China
| | - Jiancai Leng
- School of Chemistry and Chemical Engineering, Qilu University of Technology (Shandong Academy of Science), Jinan, China
| | - Jun Jiang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, China.
- Hefei National Laboratory, University of Science and Technology of China, Hefei, China.
| | - Yi Luo
- Hefei National Laboratory, University of Science and Technology of China, Hefei, China.
- Hefei National Research Center for Physical Sciences at the Microscale, University of Science and Technology of China, Hefei, China.
| | - Wei Hu
- School of Chemistry and Chemical Engineering, Qilu University of Technology (Shandong Academy of Science), Jinan, China.
| |
Collapse
|
12
|
Hu W, Zhang L. First-principles, machine learning and symbolic regression modelling for organic molecule adsorption on two-dimensional CaO surface. J Mol Graph Model 2023; 124:108530. [PMID: 37321063 DOI: 10.1016/j.jmgm.2023.108530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 05/15/2023] [Accepted: 05/22/2023] [Indexed: 06/17/2023]
Abstract
Data-driven methods are receiving significant attention in recent years for chemical and materials researches; however, more works should be done to leverage the new paradigm to model and analyze the adsorption of the organic molecules on low-dimensional surfaces beyond using the traditional simulation methods. In this manuscript, we employ machine learning and symbolic regression method coupled with DFT calculations to investigate the adsorption of atmospheric organic molecules on a low-dimensional metal oxide mineral system. The starting dataset consisting of the atomic structures of the organic/metal oxide interfaces are obtained via the density functional theory (DFT) calculation and different machine learning algorithms are compared, with the random forest algorithm achieving high accuracies for the target output. The feature ranking step identifies that the polarizability and bond type of the organic adsorbates are the key descriptors for the adsorption energy output. In addition, the symbolic regression coupled with genetic programming automatically identifies a series of hybrid new descriptors displaying improved relevance with the target output, suggesting the viability of symbolic regression to complement the traditional machine learning techniques for the descriptor design and fast modeling purposes. This manuscript provides a framework for effectively modeling and analyzing the adsorption of the organic molecules on low-dimensional surfaces via comprehensive data-driven approaches.
Collapse
Affiliation(s)
- Wenguang Hu
- Department of Materials Physics, School of Chemistry and Materials Science, Nanjing University of Information Science & Technology, 210044, Nanjing, China
| | - Lei Zhang
- Department of Materials Physics, School of Chemistry and Materials Science, Nanjing University of Information Science & Technology, 210044, Nanjing, China.
| |
Collapse
|
13
|
Yang J, Cong Y, Li Y, Li H. Machine Learning Approach Based on a Range-Corrected Deep Potential Model for Efficient Vibrational Frequency Computation. J Chem Theory Comput 2023; 19:6366-6374. [PMID: 37652890 DOI: 10.1021/acs.jctc.3c00386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
As an ensemble average result, vibrational spectrum simulation can be time-consuming with high accuracy methods. We present a machine learning approach based on the range-corrected deep potential (DPRc) model to improve the computing efficiency. The DPRc method divides the system into "probe region" and "solvent region"; "solvent-solvent" interactions are not counted in the neural network. We applied the approach to two systems: formic acid C═O stretching and MeCN C≡N stretching vibrational frequency shifts in water. All data sets were prepared using the quantum vibration perturbation approach. Effects of different region divisions, one-body correction, cut range, and training data size were tested. The model with a single-molecule "probe region" showed stable accuracy; it ran roughly 10 times faster than regular deep potential and reduced the training time by about four. The approach is efficient, easy to apply, and extendable to calculating various spectra.
Collapse
Affiliation(s)
- Jitai Yang
- Institute of Theoretical Chemistry, College of Chemistry, Jilin University, 2519 Jiefang Road, Changchun 130023, P. R. China
| | - Yang Cong
- Institute of Theoretical Chemistry, College of Chemistry, Jilin University, 2519 Jiefang Road, Changchun 130023, P. R. China
| | - You Li
- Institute of Theoretical Chemistry, College of Chemistry, Jilin University, 2519 Jiefang Road, Changchun 130023, P. R. China
| | - Hui Li
- Institute of Theoretical Chemistry, College of Chemistry, Jilin University, 2519 Jiefang Road, Changchun 130023, P. R. China
| |
Collapse
|
14
|
Guo S, Jiang J, Ren H, Wang S. Fusion of Multiple Spectra for Investigating Chemical Bonding Properties via Machine Learning. J Phys Chem Lett 2023; 14:7461-7468. [PMID: 37579021 DOI: 10.1021/acs.jpclett.3c01709] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/16/2023]
Abstract
Chemical bonding properties are crucial to understanding the chemical behavior of molecules. Spectroscopy is a versatile technical tool to study various microscopic properties, but its interpretation suffers from human biases and the loss of high-dimensional information. Here, we present a machine learning approach to predict diverse bonding properties, including the bond dissociation energy, bond length, and α-C connectivity of hydroxyls in organic molecules, by fusing multiple spectra with different physical mechanisms. Combining nuclear magnetic resonance and vibrational spectroscopy exhibits higher prediction accuracy than what they did separately. On the hold-out test data set, the models achieve a mean absolute error of 1.243 kcal/mol and 1.041 × 10-4 Å for BDE and bond length and an accuracy of 95.09% for hydroxyl α-C connectivity. Our models demonstrate strong extrapolation capabilities when they are transferred to different molecules, external electric fields, and solvation environments. These end-to-end models pave the way to investigating chemical bonding properties by using spectroscopic observables.
Collapse
Affiliation(s)
- Sibei Guo
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Jun Jiang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
- Hefei National Laboratory, University of Science and Technology of China, Hefei, Anhui 230088, China
| | - Hao Ren
- School of Materials Science and Engineering, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Song Wang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| |
Collapse
|
15
|
Qi W, Zhai D, Song D, Liu C, Yang J, Sun L, Li Y, Li X, Deng W. Optimized synthesis of anti-COVID-19 drugs aided by retrosynthesis software. RSC Med Chem 2023; 14:1254-1259. [PMID: 37484565 PMCID: PMC10357945 DOI: 10.1039/d2md00444e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Accepted: 03/21/2023] [Indexed: 07/25/2023] Open
Abstract
Considering the millions of COVID-19 patients worldwide, a global critical challenge of low-cost and efficient anti-COVID-19 drug production has emerged. Favipiravir is one of the potential anti-COVID-19 drugs, but its original synthetic route with 7 harsh steps gives a low product yield (0.8%) and has a high cost ($68 per g). Herein, we demonstrated a low-cost and efficient synthesis route for favipiravir designed using improved retrosynthesis software, which involves only 3 steps under safe and near-ambient air conditions. A yield of 32% and cost of $1.54 per g were achieved by this synthetic route. We also used the same strategy to optimize the synthesis of sabizabulin. We anticipate that these synthetic routes will contribute to the prevention and treatment of COVID-19.
Collapse
Affiliation(s)
- Wentao Qi
- Institute of Molecular Sciences and Engineering, Institute of Frontier and Interdisciplinary Science, Shandong University Qingdao 266237 P. R. China
| | - Dong Zhai
- Institute of Molecular Sciences and Engineering, Institute of Frontier and Interdisciplinary Science, Shandong University Qingdao 266237 P. R. China
| | - Danna Song
- Institute of Molecular Sciences and Engineering, Institute of Frontier and Interdisciplinary Science, Shandong University Qingdao 266237 P. R. China
| | - Chengcheng Liu
- Institute of Molecular Sciences and Engineering, Institute of Frontier and Interdisciplinary Science, Shandong University Qingdao 266237 P. R. China
| | - Junxia Yang
- Institute of Molecular Sciences and Engineering, Institute of Frontier and Interdisciplinary Science, Shandong University Qingdao 266237 P. R. China
| | - Lei Sun
- Institute of Molecular Sciences and Engineering, Institute of Frontier and Interdisciplinary Science, Shandong University Qingdao 266237 P. R. China
| | - Youyong Li
- Institute of Functional Nano & Soft Materials (FUNSOM), Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University Suzhou 215123 P. R. China
| | - Xingwei Li
- Institute of Molecular Sciences and Engineering, Institute of Frontier and Interdisciplinary Science, Shandong University Qingdao 266237 P. R. China
| | - Weiqiao Deng
- Institute of Molecular Sciences and Engineering, Institute of Frontier and Interdisciplinary Science, Shandong University Qingdao 266237 P. R. China
| |
Collapse
|
16
|
Vermeyen T, Cunha A, Bultinck P, Herrebout W. Impact of conformation and intramolecular interactions on vibrational circular dichroism spectra identified with machine learning. Commun Chem 2023; 6:148. [PMID: 37438485 DOI: 10.1038/s42004-023-00944-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 06/29/2023] [Indexed: 07/14/2023] Open
Abstract
Vibrational Circular Dichroism (VCD) spectra often differ strongly from one conformer to another, even within the same absolute configuration of a molecule. Simulated molecular VCD spectra typically require expensive quantum chemical calculations for all conformers to generate a Boltzmann averaged total spectrum. This paper reports whether machine learning (ML) can partly replace these quantum chemical calculations by capturing the intricate connection between a conformer geometry and its VCD spectrum. Three hypotheses concerning the added value of ML are tested. First, it is shown that for a single stereoisomer, ML can predict the VCD spectrum of a conformer from solely the conformer geometry. Second, it is found that the ML approach results in important time savings. Third, the ML model produced is unfortunately hardly transferable from one stereoisomer to another.
Collapse
Affiliation(s)
- Tom Vermeyen
- Department of Chemistry, University of Antwerp, Groenenborgerlaan 171, Antwerpen, 2020, Belgium.
- Department of Chemistry, Ghent University, Krijgslaan 281, Gent, 9000, Belgium.
| | - Ana Cunha
- Department of Chemistry, University of Antwerp, Groenenborgerlaan 171, Antwerpen, 2020, Belgium
| | - Patrick Bultinck
- Department of Chemistry, Ghent University, Krijgslaan 281, Gent, 9000, Belgium.
| | - Wouter Herrebout
- Department of Chemistry, University of Antwerp, Groenenborgerlaan 171, Antwerpen, 2020, Belgium
| |
Collapse
|
17
|
Coppola F, Frigau L, Markelj J, Malešič J, Conversano C, Strlič M. Near-Infrared Spectroscopy and Machine Learning for Accurate Dating of Historical Books. J Am Chem Soc 2023. [PMID: 37216468 DOI: 10.1021/jacs.3c02835] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Non-destructive, fast, and accurate methods of dating are highly desirable for many heritage objects. Here, we present and critically evaluate the use of near-infrared (NIR) spectroscopic data combined with three supervised machine learning methods to predict the publication year of paper books dated between 1851 and 2000. These methods provide different accuracies; however, we demonstrate that the underlying processes refer to common spectral features. Regardless of the machine learning method used, the most informative wavelength ranges can be associated with C-H and O-H stretching first overtone, typical of the cellulose structure, and N-H stretching first overtone from amide/protein structures. We find that the expected influence of degradation on the accuracy of prediction is not meaningful. The variance-bias decomposition of the reducible error reveals some differences among the three machine learning methods. Our results show that two out of the three methods allow predictions of publication dates in the period 1851-2000 from NIR spectroscopic data with an unprecedented accuracy of up to 2 years, better than any other non-destructive method applied to a real heritage collection.
Collapse
Affiliation(s)
- Floriana Coppola
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, Večna pot 113, Ljubljana 1000, Slovenia
| | - Luca Frigau
- Department of Business and Economics, University of Cagliari, Via Sant'Ignazio da Laconi 17, Cagliari 09123, Italy
| | - Jernej Markelj
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, Večna pot 113, Ljubljana 1000, Slovenia
| | - Jasna Malešič
- National and University Library of Slovenia, Turjaška ulica 1, Ljubljana 1000, Slovenia
| | - Claudio Conversano
- Department of Business and Economics, University of Cagliari, Via Sant'Ignazio da Laconi 17, Cagliari 09123, Italy
| | - Matija Strlič
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, Večna pot 113, Ljubljana 1000, Slovenia
- Institute for Sustainable Heritage, University College London, 14 Upper Woburn Place, London WC1H 0NN, U.K
| |
Collapse
|
18
|
Mou LH, Han T, Smith PES, Sharman E, Jiang J. Machine Learning Descriptors for Data-Driven Catalysis Study. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023:e2301020. [PMID: 37191279 PMCID: PMC10401178 DOI: 10.1002/advs.202301020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 04/07/2023] [Indexed: 05/17/2023]
Abstract
Traditional trial-and-error experiments and theoretical simulations have difficulty optimizing catalytic processes and developing new, better-performing catalysts. Machine learning (ML) provides a promising approach for accelerating catalysis research due to its powerful learning and predictive abilities. The selection of appropriate input features (descriptors) plays a decisive role in improving the predictive accuracy of ML models and uncovering the key factors that influence catalytic activity and selectivity. This review introduces tactics for the utilization and extraction of catalytic descriptors in ML-assisted experimental and theoretical research. In addition to the effectiveness and advantages of various descriptors, their limitations are also discussed. Highlighted are both 1) newly developed spectral descriptors for catalytic performance prediction and 2) a novel research paradigm combining computational and experimental ML models through suitable intermediate descriptors. Current challenges and future perspectives on the application of descriptors and ML techniques to catalysis are also presented.
Collapse
Affiliation(s)
- Li-Hui Mou
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - TianTian Han
- Hefei JiShu Quantum Technology Co. Ltd., Hefei, 230026, China
| | - Pieter E S Smith
- YDS Pharmatech, ETEC, 1220 Washington Ave., Albany, NY, 12203, USA
| | - Edward Sharman
- Department of Neurology, University of California, Irvine, CA, 92697, USA
| | - Jun Jiang
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui, 230026, China
| |
Collapse
|
19
|
Qin J, Guo J, Tang G, Li L, Yao SQ. Multiplex Identification of Post-Translational Modifications at Point-of-Care by Deep Learning-Assisted Hydrogel Sensors. Angew Chem Int Ed Engl 2023; 62:e202218412. [PMID: 36815677 DOI: 10.1002/anie.202218412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 02/02/2023] [Accepted: 02/23/2023] [Indexed: 02/24/2023]
Abstract
Multiplex detection of protein post-translational modifications (PTMs), especially at point-of-care, is of great significance in cancer diagnosis. Herein, we report a machine learning-assisted photonic crystal hydrogel (PCH) sensor for multiplex detection of PTMs. With closely-related PCH sensors microfabricated on a single chip, our design achieved not only rapid screening of PTMs at specific protein sites by using only naked eyes/cellphone, but also the feasibility of real-time monitoring of phosphorylation reactions. By taking advantage of multiplex sensor chips and a neural network algorithm, accurate prediction of PTMs by both their types and concentrations was enabled. This approach was ultimately used to detect and differentiate up/down regulation of different phosphorylation sites within the same protein in live mammalian cells. Our developed method thus holds potential for POC identification of various PTMs in early-stage diagnosis of protein-related diseases.
Collapse
Affiliation(s)
- Junjie Qin
- Department of Chemistry, National University of Singapore, Singapore, 117543, Singapore
| | - Jia Guo
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Guanghui Tang
- Department of Chemistry, National University of Singapore, Singapore, 117543, Singapore
| | - Lin Li
- The Institute of Flexible Electronics (IFE, Future Technologies), Xiamen University, Xiamen, 361005, Fujian, China
| | - Shao Q Yao
- Department of Chemistry, National University of Singapore, Singapore, 117543, Singapore
| |
Collapse
|
20
|
Choi WJ, Lee SH, Park BC, Kotov NA. Terahertz Circular Dichroism Spectroscopy of Molecular Assemblies and Nanostructures. J Am Chem Soc 2022; 144:22789-22804. [DOI: 10.1021/jacs.2c04817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Won Jin Choi
- Biointerfaces Institute, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Chemical Engineering, University of Michigan, Ann Arbor, Michigan 48109, United States
- Physical and Life Sciences, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
| | - Sang Hyun Lee
- Biointerfaces Institute, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Bum Chul Park
- Biointerfaces Institute, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Chemical Engineering, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Nicholas A. Kotov
- Biointerfaces Institute, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Chemical Engineering, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, Michigan 48109, United States
- Program in Macromolecular Science and Engineering, University of Michigan, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
21
|
Wang PS, Ma H, Yan S, Lu X, Tang H, Xi XH, Peng XH, Huang Y, Bao YF, Cao MF, Wang H, Huang J, Liu G, Wang X, Ren B. Correlation coefficient-directed label-free characterization of native proteins by surface-enhanced Raman spectroscopy. Chem Sci 2022; 13:13829-13835. [PMID: 36544733 PMCID: PMC9710310 DOI: 10.1039/d2sc04775f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 10/30/2022] [Indexed: 12/24/2022] Open
Abstract
Investigation of proteins in their native state is the core of proteomics towards better understanding of their structures and functions. Surface-enhanced Raman spectroscopy (SERS) has shown its unique advantages in protein characterization with fingerprint information and high sensitivity, which makes it a promising tool for proteomics. It is still challenging to obtain SERS spectra of proteins in the native state and evaluate the native degree. Here, we constructed 3D physiological hotspots for a label-free dynamic SERS characterization of a native protein with iodide-modified 140 nm Au nanoparticles. We further introduced the correlation coefficient to quantitatively evaluate the variation of the native degree, whose quantitative nature allows us to explicitly investigate the Hofmeister effect on the protein structure. We realized the classification of a protein of SARS-CoV-2 variants in 15 min, which has not been achieved before. This study offers an effective tool for tracking the dynamic structure of proteins and biomedical research.
Collapse
Affiliation(s)
- Ping-Shi Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen UniversityXiamen 361005China
| | - Hao Ma
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen UniversityXiamen 361005China
| | - Sen Yan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen UniversityXiamen 361005China
| | - Xinyu Lu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen UniversityXiamen 361005China
| | - Hui Tang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen UniversityXiamen 361005China
| | - Xiao-Han Xi
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen UniversityXiamen 361005China
| | - Xiao-Hui Peng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen UniversityXiamen 361005China
| | - Yajun Huang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen UniversityXiamen 361005China
| | - Yi-Fan Bao
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen UniversityXiamen 361005China
| | - Mao-Feng Cao
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen UniversityXiamen 361005China
| | - Huimeng Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen UniversityXiamen 361005China
| | - Jinglin Huang
- Laser Fusion Research Center, China Academy of Engineering PhysicsMianyang 621900China
| | - Guokun Liu
- State Key Laboratory of Marine Environmental Science, College of the Environment and Ecology, Xiamen UniversityXiamen 361005China
| | - Xiang Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen UniversityXiamen 361005China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen UniversityXiamen 361005China
| |
Collapse
|
22
|
Zhang Y, Lin Q, Jiang B. Atomistic neural network representations for chemical dynamics simulations of molecular, condensed phase, and interfacial systems: Efficiency, representability, and generalization. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Affiliation(s)
- Yaolong Zhang
- Department of Chemical Physics, School of Chemistry and Materials Science, Key Laboratory of Surface and Interface Chemistry and Energy Catalysis of Anhui Higher Education Institutes University of Science and Technology of China Hefei Anhui China
| | - Qidong Lin
- Department of Chemical Physics, School of Chemistry and Materials Science, Key Laboratory of Surface and Interface Chemistry and Energy Catalysis of Anhui Higher Education Institutes University of Science and Technology of China Hefei Anhui China
| | - Bin Jiang
- Department of Chemical Physics, School of Chemistry and Materials Science, Key Laboratory of Surface and Interface Chemistry and Energy Catalysis of Anhui Higher Education Institutes University of Science and Technology of China Hefei Anhui China
| |
Collapse
|
23
|
Kuntz D, Wilson AK. Machine learning, artificial intelligence, and chemistry: how smart algorithms are reshaping simulation and the laboratory. PURE APPL CHEM 2022. [DOI: 10.1515/pac-2022-0202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
Machine learning and artificial intelligence are increasingly gaining in prominence through image analysis, language processing, and automation, to name a few applications. Machine learning is also making profound changes in chemistry. From revisiting decades-old analytical techniques for the purpose of creating better calibration curves, to assisting and accelerating traditional in silico simulations, to automating entire scientific workflows, to being used as an approach to deduce underlying physics of unexplained chemical phenomena, machine learning and artificial intelligence are reshaping chemistry, accelerating scientific discovery, and yielding new insights. This review provides an overview of machine learning and artificial intelligence from a chemist’s perspective and focuses on a number of examples of the use of these approaches in computational chemistry and in the laboratory.
Collapse
Affiliation(s)
- David Kuntz
- Department of Chemistry , University of North Texas , Denton , TX 76201 , USA
| | - Angela K. Wilson
- Department of Chemistry , Michigan State University , East Lansing , MI 48824 , USA
| |
Collapse
|
24
|
Ni S, Yang Q, Huang J, Zhou M, Wei L, Yang Y, Wen J, Mo W, Le W, Qi D, Jin L, Li B, Zhao Z, Du K. Constructing high-accuracy theoretical Raman spectra of SARS-CoV-2 spike proteins based on a large fragment method. Chem Phys Lett 2022; 800:139663. [PMID: 35529782 PMCID: PMC9055380 DOI: 10.1016/j.cplett.2022.139663] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 03/27/2022] [Accepted: 04/26/2022] [Indexed: 01/17/2023]
Abstract
In order to control COVID-19, rapid and accurate detection of the pathogenic, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is an urgent task. The target spike proteins of SARS-CoV-2 have been detected experimentally via Raman spectroscopy. However, there lacks high-accuracy theoretical Raman spectra of the spike proteins to as a standard reference for the clinic diagnostic purpose. In this paper, we propose a large fragment method to construct the high-precision Raman spectra for the spike proteins. The large fragment method not only reduces the calculation error but also improves the accuracy of the protein Raman spectra by completely calculating the interactions within the large fragment. The Pearson correlation coefficient of theoretical Raman spectra is greater than 0.929 or more. Compared with the experimental spectra, the characteristic patterns are easily visible. This work provides a detection standard for the spike proteins which shall bring a step closer to the fast recognition of SARS-CoV-2 via Raman spectroscopy method.
Collapse
Affiliation(s)
- Shuang Ni
- Laser Fusion Research Center, China Academy of Engineering Physics, 621900 Mianyang, China
| | - Qiang Yang
- China Academy of Engineering Physics, 621900 Mianyang, China
| | - Jinling Huang
- Laser Fusion Research Center, China Academy of Engineering Physics, 621900 Mianyang, China
| | - Minjie Zhou
- Laser Fusion Research Center, China Academy of Engineering Physics, 621900 Mianyang, China,Corresponding author
| | - Lai Wei
- Laser Fusion Research Center, China Academy of Engineering Physics, 621900 Mianyang, China
| | - Yue Yang
- Laser Fusion Research Center, China Academy of Engineering Physics, 621900 Mianyang, China
| | - Jiaxin Wen
- Laser Fusion Research Center, China Academy of Engineering Physics, 621900 Mianyang, China,Department of Engineering Physics, Tsinghua University, 100084 Beijing, China
| | - Wenbo Mo
- Laser Fusion Research Center, China Academy of Engineering Physics, 621900 Mianyang, China,Department of Engineering Physics, Tsinghua University, 100084 Beijing, China
| | - Wei Le
- Laser Fusion Research Center, China Academy of Engineering Physics, 621900 Mianyang, China
| | - Daojian Qi
- Laser Fusion Research Center, China Academy of Engineering Physics, 621900 Mianyang, China
| | - Lei Jin
- Laser Fusion Research Center, China Academy of Engineering Physics, 621900 Mianyang, China
| | - Bo Li
- Laser Fusion Research Center, China Academy of Engineering Physics, 621900 Mianyang, China
| | - Zongqin Zhao
- Laser Fusion Research Center, China Academy of Engineering Physics, 621900 Mianyang, China
| | - Kai Du
- Laser Fusion Research Center, China Academy of Engineering Physics, 621900 Mianyang, China
| |
Collapse
|
25
|
Ren H, Zhang Q, Wang Z, Zhang G, Liu H, Guo W, Mukamel S, Jiang J. Machine learning recognition of protein secondary structures based on two-dimensional spectroscopic descriptors. Proc Natl Acad Sci U S A 2022; 119:e2202713119. [PMID: 35476517 PMCID: PMC9171355 DOI: 10.1073/pnas.2202713119] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 03/28/2022] [Indexed: 11/29/2022] Open
Abstract
Protein secondary structure discrimination is crucial for understanding their biological function. It is not generally possible to invert spectroscopic data to yield the structure. We present a machine learning protocol which uses two-dimensional UV (2DUV) spectra as pattern recognition descriptors, aiming at automated protein secondary structure determination from spectroscopic features. Accurate secondary structure recognition is obtained for homologous (97%) and nonhomologous (91%) protein segments, randomly selected from simulated model datasets. The advantage of 2DUV descriptors over one-dimensional linear absorption and circular dichroism spectra lies in the cross-peak information that reflects interactions between local regions of the protein. Thanks to their ultrafast (∼200 fs) nature, 2DUV measurements can be used in the future to probe conformational variations in the course of protein dynamics.
Collapse
Affiliation(s)
- Hao Ren
- School of Materials Science and Engineering, China University of Petroleum (East China), Qingdao 266580, Shandong, China
| | - Qian Zhang
- School of Materials Science and Engineering, China University of Petroleum (East China), Qingdao 266580, Shandong, China
| | - Zhengjie Wang
- School of Materials Science and Engineering, China University of Petroleum (East China), Qingdao 266580, Shandong, China
| | - Guozhen Zhang
- School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Hongzhang Liu
- School of Materials Science and Engineering, China University of Petroleum (East China), Qingdao 266580, Shandong, China
| | - Wenyue Guo
- School of Materials Science and Engineering, China University of Petroleum (East China), Qingdao 266580, Shandong, China
| | - Shaul Mukamel
- Department of Chemistry and Physics & Astronomy, University of California, Irvine, CA 92697
| | - Jun Jiang
- School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| |
Collapse
|
26
|
Joung JF, Han M, Jeong M, Park S. Beyond Woodward-Fieser Rules: Design Principles of Property-Oriented Chromophores Based on Explainable Deep Learning Optical Spectroscopy. J Chem Inf Model 2022; 62:2933-2942. [PMID: 35476584 DOI: 10.1021/acs.jcim.2c00173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
An adequate understanding of molecular structure-property relationships is important for developing new molecules with desired properties. Although deep learning optical spectroscopy (DLOS) has been successfully applied to predict the optical and photophysical properties of organic chromophores, how specific functional groups and solvents affect the optical properties is not clearly understood. Here, we employed an explainable DLOS method by applying the integrated gradients method to DLOS. The integrated gradients method allows us to obtain attributions, indicating how much the functional group contributes to the optical properties including the absorption wavelength and bandwidth, extinction coefficients, emission wavelength and bandwidth, photoluminescence quantum yield, and lifetime. The attributions of 54 functional groups and 9 solvent molecules to seven optical properties are quantified and can be used to estimate the optical properties of chromophores as in the Woodward-Fieser rule. Unlike the Woodward-Fieser rule for only the absorption wavelength, the attributions obtained in this work can be applied to estimate all seven optical properties, which makes a significant extension of the Woodward-Fieser rules. In addition, we demonstrated a strategy for utilizing the attributions in the design of molecules and in tuning the optical properties of the molecules. The design of molecular structures using attributions can revolutionize the development of optimal molecules.
Collapse
Affiliation(s)
- Joonyoung F Joung
- Department of Chemistry and Research Institute for Natural Science, Korea University, Seoul 02841, Korea
| | - Minhi Han
- Department of Chemistry and Research Institute for Natural Science, Korea University, Seoul 02841, Korea
| | - Minseok Jeong
- Department of Chemistry and Research Institute for Natural Science, Korea University, Seoul 02841, Korea
| | - Sungnam Park
- Department of Chemistry and Research Institute for Natural Science, Korea University, Seoul 02841, Korea
| |
Collapse
|
27
|
Wang Y, Zhao L, Zhou X, Zhang J, Jiang J, Dong H. Global Fold Switching of the RafH Protein: Diverse Structures with a Conserved Pathway. J Phys Chem B 2022; 126:2979-2989. [PMID: 35438983 DOI: 10.1021/acs.jpcb.1c10965] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
It is generally believed that a protein's sequence uniquely determines its structure, the basis for a protein to perform biological functions. However, as a representative metamorphic protein, RfaH can be encoded by a single amino acid sequence into two distinct native state structures. Its C-terminal domain (CTD) either takes an all-α-helical configuration to pack tightly with its N-terminal domain (NTD), or the CTD disassociates from the NTD, transforms into an all-β-barrel fold, and further attaches to the ribosome, leaving the NTD exposed to bind RNA polymerases. Therefore, the RfaH protein couples transcription and translation processes. Although previous studies have provided a preliminary understanding of its function, the full course of the conformational change of RfaH-CTD at the atomic level is elusive. We used teDA2, a feature space-based enhanced sampling protocol, to explore the transformation of RfaH-CTD. We found that it undergoes a large-scale structural rearrangement, with characteristic spectra as the fingerprint, and a global unfolding transition with a tighter and energetically moderate molten globule-like nucleus formed in between. The formation of this nucleus limits the possible intermediate conformations, facilitates the formation of secondary and tertiary structures, and thus ensures the efficiency of transformation. The key features along the transition path disclosed from this work are likely associated with the evolution of RfaH, such that encoding a single sequence into multiple folds with distinct biological functions is energetically unhindered.
Collapse
Affiliation(s)
- Yiqiao Wang
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China.,School of Physics, National Laboratory of Solid State Microstructure, and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing 210093, China
| | - Luyuan Zhao
- Hefei National Laboratory for Physical Sciences at the Microscale, Collaborative Innovation Center of Chemistry for Energy Materials, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Xuejie Zhou
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China
| | - Jian Zhang
- School of Physics, National Laboratory of Solid State Microstructure, and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing 210093, China.,Institute for Brain Sciences, Nanjing University, Nanjing 210023, China
| | - Jun Jiang
- Hefei National Laboratory for Physical Sciences at the Microscale, Collaborative Innovation Center of Chemistry for Energy Materials, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Hao Dong
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China.,Institute for Brain Sciences, Nanjing University, Nanjing 210023, China.,State Key Laboratory of Analytical Chemistry for Life Science, Nanjing University, Nanjing 210023, China.,Engineering Research Center of Protein and Peptide Medicine of Ministry of Education, Nanjing University, Nanjing 210023, China
| |
Collapse
|
28
|
Meuwly M. Atomistic Simulations for Reactions and Vibrational Spectroscopy in the Era of Machine Learning─ Quo Vadis?. J Phys Chem B 2022; 126:2155-2167. [PMID: 35286087 DOI: 10.1021/acs.jpcb.2c00212] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Atomistic simulations using accurate energy functions can provide molecular-level insight into functional motions of molecules in the gas and in the condensed phase. This Perspective delineates the present status of the field from the efforts of others and some of our own work and discusses open questions and future prospects. The combination of physics-based long-range representations using multipolar charge distributions and kernel representations for the bonded interactions is shown to provide realistic models for the exploration of the infrared spectroscopy of molecules in solution. For reactions, empirical models connecting dedicated energy functions for the reactant and product states allow statistically meaningful sampling of conformational space whereas machine-learned energy functions are superior in accuracy. The future combination of physics-based models with machine-learning techniques and integration into all-purpose molecular simulation software provides a unique opportunity to bring such dynamics simulations closer to reality.
Collapse
Affiliation(s)
- Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, 4056 Basel, Switzerland
| |
Collapse
|
29
|
Fan J, Lan H, Ning W, Zhong R, Chen F, Yan G, Cai K. Modeling amide-I vibrations of alanine dipeptide in solution by using neural network protocol. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2022; 268:120675. [PMID: 34890871 DOI: 10.1016/j.saa.2021.120675] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Revised: 10/27/2021] [Accepted: 11/26/2021] [Indexed: 06/13/2023]
Abstract
Infrared spectroscopy is a powerful tool for the understanding of molecular structure and function of polypeptides. Theoretical interpretation of IR spectra relies on ab initio calculations may be very costly in computational resources. Herein, we developed a neural network (NN) modeling protocol to evaluate a model dipeptide's backbone amide-I spectra. DFT calculations were performed for the amide-I vibrational motions and structural parameters of alanine dipeptide (ALAD) conformers in different micro-environments ranging from polar to non-polar ones. The obtained backbone dihedrals, C = O bond lengths and amide-I frequencies of ALAD were gather together for NN architecture. The applications of built NN protocols for the prediction of amide-I frequencies of ALAD in other solvation conditions are quite satisfactory with much less computational cost comparing with electronic structure calculations. The results show that this cost-effective way enables us to decipher the polypeptide's dynamic secondary structures and biological functions with their backbone vibrational probes.
Collapse
Affiliation(s)
- Jianping Fan
- College of Chemistry and Materials Science, Fujian Provincial Key Laboratory of Advanced Materials Oriented Chemical Engineering, Fujian Normal University, Fuzhou 350007, PR China; Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen 361005, PR China; Fujian Provincial Key Laboratory of Featured Biochemical and Chemical Materials, Ningde Normal University, Ningde 352100, PR China
| | - Huaying Lan
- College of Chemistry and Materials Science, Fujian Provincial Key Laboratory of Advanced Materials Oriented Chemical Engineering, Fujian Normal University, Fuzhou 350007, PR China; Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen 361005, PR China
| | - Wenfeng Ning
- College of Chemistry and Materials Science, Fujian Provincial Key Laboratory of Advanced Materials Oriented Chemical Engineering, Fujian Normal University, Fuzhou 350007, PR China; Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen 361005, PR China
| | - Rongzhen Zhong
- College of Chemistry and Materials Science, Fujian Provincial Key Laboratory of Advanced Materials Oriented Chemical Engineering, Fujian Normal University, Fuzhou 350007, PR China; Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen 361005, PR China
| | - Feng Chen
- Fujian Provincial Key Laboratory of Featured Biochemical and Chemical Materials, Ningde Normal University, Ningde 352100, PR China
| | - Guiyang Yan
- Fujian Provincial Key Laboratory of Featured Biochemical and Chemical Materials, Ningde Normal University, Ningde 352100, PR China
| | - Kaicong Cai
- College of Chemistry and Materials Science, Fujian Provincial Key Laboratory of Advanced Materials Oriented Chemical Engineering, Fujian Normal University, Fuzhou 350007, PR China; Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen 361005, PR China; Fujian Provincial Key Laboratory of Featured Biochemical and Chemical Materials, Ningde Normal University, Ningde 352100, PR China
| |
Collapse
|
30
|
Han R, Ketkaew R, Luber S. A Concise Review on Recent Developments of Machine Learning for the Prediction of Vibrational Spectra. J Phys Chem A 2022; 126:801-812. [PMID: 35133168 DOI: 10.1021/acs.jpca.1c10417] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Machine learning has become more and more popular in computational chemistry, as well as in the important field of spectroscopy. In this concise review, we walk the reader through a short summary of machine learning algorithms and a comprehensive discussion on the connection between machine learning methods and vibrational spectroscopy, particularly for the case of infrared and Raman spectroscopy. We also briefly discuss state-of-the-art molecular representations which serve as meaningful inputs for machine learning to predict vibrational spectra. In addition, this review provides an overview of the transferability and best practices of machine learning in the prediction of vibrational spectra as well as possible future research directions.
Collapse
Affiliation(s)
- Ruocheng Han
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Rangsiman Ketkaew
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Sandra Luber
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| |
Collapse
|
31
|
Saini V. A machine learning approach for predicting the fluorination strength of electrophilic fluorinating reagents. Phys Chem Chem Phys 2022; 24:26802-26812. [DOI: 10.1039/d2cp03281c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A neural network algorithm utilizing SMILES encoding of organic molecules was successfully employed for predicting the fluorination strength of a wide range of N–F fluorinating reagents.
Collapse
Affiliation(s)
- Vaneet Saini
- Department of Chemistry & Centre for Advanced Studies in Chemistry, Panjab University, Chandigarh 160014, India
| |
Collapse
|
32
|
Jindal S, Hsu PJ, Phan HT, Tsou PK, Kuo JL. Capturing the potential energy landscape of large size molecular clusters from atomic interactions up to a 4-body system using deep learning. Phys Chem Chem Phys 2022; 24:27263-27276. [DOI: 10.1039/d2cp04441b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We propose a new method that utilizes the database of stable conformers and borrow the fragmentation concept of many-body-expansion (MBE) methods in ab initio methods to train a deep-learning machine learning (ML) model using SchNet.
Collapse
Affiliation(s)
- Shweta Jindal
- Institute of Atomic and Molecular Sciences, Academia Sinica, No. 1 Roosevelt Road, Section 4, Daan District, Taipei City 10617, Taiwan
| | - Po-Jen Hsu
- Institute of Atomic and Molecular Sciences, Academia Sinica, No. 1 Roosevelt Road, Section 4, Daan District, Taipei City 10617, Taiwan
| | - Huu Trong Phan
- Institute of Atomic and Molecular Sciences, Academia Sinica, No. 1 Roosevelt Road, Section 4, Daan District, Taipei City 10617, Taiwan
- Molecular Science and Technology Program, Taiwan International Graduate Program, Academia Sinica, Taipei, 11529, Taiwan
- Department of Chemistry, National Tsing Hua University, Hsinchu 30013, Taiwan
| | - Pei-Kang Tsou
- Institute of Atomic and Molecular Sciences, Academia Sinica, No. 1 Roosevelt Road, Section 4, Daan District, Taipei City 10617, Taiwan
| | - Jer-Lai Kuo
- Institute of Atomic and Molecular Sciences, Academia Sinica, No. 1 Roosevelt Road, Section 4, Daan District, Taipei City 10617, Taiwan
- Department of Chemistry, National Tsing Hua University, Hsinchu 30013, Taiwan
- Molecular Science and Technology, National Taiwan University, Section 4, Daan District, Taipei City 10617, Taiwan
| |
Collapse
|
33
|
Saini V, Kumar R. A machine learning approach for predicting the empirical polarity of organic solvents. NEW J CHEM 2022. [DOI: 10.1039/d2nj02513b] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
A neural network architecture was found to efficiently predict the empirical polarity parameter ET(30) using simple to compute and interpretable six quantum mechanical, topological and categorical descriptors.
Collapse
Affiliation(s)
- Vaneet Saini
- Department of Chemistry & Centre for Advanced Studies in Chemistry, Panjab University, Chandigarh 160014, India
| | - Ranjeet Kumar
- Department of Chemistry & Centre for Advanced Studies in Chemistry, Panjab University, Chandigarh 160014, India
| |
Collapse
|
34
|
Zhao L, Zhang J, Zhang Y, Ye S, Zhang G, Chen X, Jiang B, Jiang J. Accurate Machine Learning Prediction of Protein Circular Dichroism Spectra with Embedded Density Descriptors. JACS AU 2021; 1:2377-2384. [PMID: 34977905 PMCID: PMC8715543 DOI: 10.1021/jacsau.1c00449] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Indexed: 05/08/2023]
Abstract
A data-driven approach to simulate circular dichroism (CD) spectra is appealing for fast protein secondary structure determination, yet the challenge of predicting electric and magnetic transition dipole moments poses a substantial barrier for the goal. To address this problem, we designed a new machine learning (ML) protocol in which ordinary pure geometry-based descriptors are replaced with alternative embedded density descriptors and electric and magnetic transition dipole moments are successfully predicted with an accuracy comparable to first-principle calculation. The ML model is able to not only simulate protein CD spectra nearly 4 orders of magnitude faster than conventional first-principle simulation but also obtain CD spectra in good agreement with experiments. Finally, we predicted a series of CD spectra of the Trp-cage protein associated with continuous changes of protein configuration along its folding path, showing the potential of our ML model for supporting real-time CD spectroscopy study of protein dynamics.
Collapse
Affiliation(s)
- Luyuan Zhao
- Hefei
National Laboratory for Physical Sciences at the Microscale, Collaborative
Innovation Center of Chemistry for Energy Materials, School of Chemistry
and Materials Science, University of Science
and Technology of China, Hefei, Anhui 230026, P. R. China
| | - Jinxiao Zhang
- Guangxi
Key Laboratory of Electrochemical and Magneto-chemical Functional
Materials, College of Chemistry and Bioengineering, Guilin University of Technology, Guilin 541006, P. R. China
| | - Yaolong Zhang
- Hefei
National Laboratory for Physical Sciences at the Microscale, Collaborative
Innovation Center of Chemistry for Energy Materials, School of Chemistry
and Materials Science, University of Science
and Technology of China, Hefei, Anhui 230026, P. R. China
| | - Sheng Ye
- School
of Artificial Intelligence, Anhui University, Hefei, Anhui 230601, P. R. China
| | - Guozhen Zhang
- Hefei
National Laboratory for Physical Sciences at the Microscale, Collaborative
Innovation Center of Chemistry for Energy Materials, School of Chemistry
and Materials Science, University of Science
and Technology of China, Hefei, Anhui 230026, P. R. China
| | - Xin Chen
- Gusu
Laboratory of Materials, Suzhou, Jiangsu 215123, P. R. China
| | - Bin Jiang
- Hefei
National Laboratory for Physical Sciences at the Microscale, Collaborative
Innovation Center of Chemistry for Energy Materials, School of Chemistry
and Materials Science, University of Science
and Technology of China, Hefei, Anhui 230026, P. R. China
| | - Jun Jiang
- Hefei
National Laboratory for Physical Sciences at the Microscale, Collaborative
Innovation Center of Chemistry for Energy Materials, School of Chemistry
and Materials Science, University of Science
and Technology of China, Hefei, Anhui 230026, P. R. China
| |
Collapse
|
35
|
Li W, Ma H, Li S, Ma J. Computational and data driven molecular material design assisted by low scaling quantum mechanics calculations and machine learning. Chem Sci 2021; 12:14987-15006. [PMID: 34909141 PMCID: PMC8612375 DOI: 10.1039/d1sc02574k] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 10/12/2021] [Indexed: 12/11/2022] Open
Abstract
Electronic structure methods based on quantum mechanics (QM) are widely employed in the computational predictions of the molecular properties and optoelectronic properties of molecular materials. The computational costs of these QM methods, ranging from density functional theory (DFT) or time-dependent DFT (TDDFT) to wave-function theory (WFT), usually increase sharply with the system size, causing the curse of dimensionality and hindering the QM calculations for large sized systems such as long polymer oligomers and complex molecular aggregates. In such cases, in recent years low scaling QM methods and machine learning (ML) techniques have been adopted to reduce the computational costs and thus assist computational and data driven molecular material design. In this review, we illustrated low scaling ground-state and excited-state QM approaches and their applications to long oligomers, self-assembled supramolecular complexes, stimuli-responsive materials, mechanically interlocked molecules, and excited state processes in molecular aggregates. Variable electrostatic parameters were also introduced in the modified force fields with the polarization model. On the basis of QM computational or experimental datasets, several ML algorithms, including explainable models, deep learning, and on-line learning methods, have been employed to predict the molecular energies, forces, electronic structure properties, and optical or electrical properties of materials. It can be conceived that low scaling algorithms with periodic boundary conditions are expected to be further applicable to functional materials, perhaps in combination with machine learning to fast predict the lattice energy, crystal structures, and spectroscopic properties of periodic functional materials.
Collapse
Affiliation(s)
- Wei Li
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, School of Chemistry and Chemical Engineering, Nanjing University Nanjing 210023 China
| | - Haibo Ma
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, School of Chemistry and Chemical Engineering, Nanjing University Nanjing 210023 China
- Jiangsu Key Laboratory of Advanced Organic Materials, Jiangsu Key Laboratory of Vehicle Emissions Control, Nanjing University Nanjing 210023 China
| | - Shuhua Li
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, School of Chemistry and Chemical Engineering, Nanjing University Nanjing 210023 China
| | - Jing Ma
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, School of Chemistry and Chemical Engineering, Nanjing University Nanjing 210023 China
- Jiangsu Key Laboratory of Advanced Organic Materials, Jiangsu Key Laboratory of Vehicle Emissions Control, Nanjing University Nanjing 210023 China
| |
Collapse
|
36
|
Abstract
Numerous linear and non-linear spectroscopic techniques have been developed to elucidate structural and functional information of complex systems ranging from natural systems, such as proteins and light-harvesting systems, to synthetic systems, such as solar cell materials and light-emitting diodes. The obtained experimental data can be challenging to interpret due to the complexity and potential overlapping spectral signatures. Therefore, computational spectroscopy plays a crucial role in the interpretation and understanding of spectral observables of complex systems. Computational modeling of various spectroscopic techniques has seen significant developments in the past decade, when it comes to the systems that can be addressed, the size and complexity of the sample types, the accuracy of the methods, and the spectroscopic techniques that can be addressed. In this Perspective, I will review the computational spectroscopy methods that have been developed and applied for infrared and visible spectroscopies in the condensed phase. I will discuss some of the questions that this has allowed answering. Finally, I will discuss current and future challenges and how these may be addressed.
Collapse
Affiliation(s)
- Thomas L C Jansen
- Zernike Institute for Advanced Materials, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
| |
Collapse
|
37
|
Kwac K, Freedman H, Cho M. Machine Learning Approach for Describing Water OH Stretch Vibrations. J Chem Theory Comput 2021; 17:6353-6365. [PMID: 34498885 DOI: 10.1021/acs.jctc.1c00540] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A machine learning approach employing neural networks is developed to calculate the vibrational frequency shifts and transition dipole moments of the symmetric and antisymmetric OH stretch vibrations of a water molecule surrounded by water molecules. We employed the atom-centered symmetry functions (ACSFs), polynomial functions, and Gaussian-type orbital-based density vectors as descriptor functions and compared their performances in predicting vibrational frequency shifts using the trained neural networks. The ACSFs perform best in modeling the frequency shifts of the OH stretch vibration of water among the types of descriptor functions considered in this paper. However, the differences in performance among these three descriptors are not significant. We also tried a feature selection method called CUR matrix decomposition to assess the importance and leverage of the individual functions in the set of selected descriptor functions. We found that a significant number of those functions included in the set of descriptor functions give redundant information in describing the configuration of the water system. We here show that the predicted vibrational frequency shifts by trained neural networks successfully describe the solvent-solute interaction-induced fluctuations of OH stretch frequencies.
Collapse
Affiliation(s)
- Kijeong Kwac
- Center for Molecular Spectroscopy and Dynamics, Institute for Basic Science (IBS), Seoul 02841, Republic of Korea
| | - Holly Freedman
- Center for Molecular Spectroscopy and Dynamics, Institute for Basic Science (IBS), Seoul 02841, Republic of Korea
| | - Minhaeng Cho
- Center for Molecular Spectroscopy and Dynamics, Institute for Basic Science (IBS), Seoul 02841, Republic of Korea.,Department of Chemistry, Korea University, Seoul 02841, Republic of Korea
| |
Collapse
|
38
|
Lindorff-Larsen K, Kragelund BB. On the potential of machine learning to examine the relationship between sequence, structure, dynamics and function of intrinsically disordered proteins. J Mol Biol 2021; 433:167196. [PMID: 34390736 DOI: 10.1016/j.jmb.2021.167196] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 08/03/2021] [Accepted: 08/04/2021] [Indexed: 11/29/2022]
Abstract
Intrinsically disordered proteins (IDPs) constitute a broad set of proteins with few uniting and many diverging properties. IDPs-and intrinsically disordered regions (IDRs) interspersed between folded domains-are generally characterized as having no persistent tertiary structure; instead they interconvert between a large number of different and often expanded structures. IDPs and IDRs are involved in an enormously wide range of biological functions and reveal novel mechanisms of interactions, and while they defy the common structure-function paradigm of folded proteins, their structural preferences and dynamics are important for their function. We here discuss open questions in the field of IDPs and IDRs, focusing on areas where machine learning and other computational methods play a role. We discuss computational methods aimed to predict transiently formed local and long-range structure, including methods for integrative structural biology. We discuss the many different ways in which IDPs and IDRs can bind to other molecules, both via short linear motifs, as well as in the formation of larger dynamic complexes such as biomolecular condensates. We discuss how experiments are providing insight into such complexes and may enable more accurate predictions. Finally, we discuss the role of IDPs in disease and how new methods are needed to interpret the mechanistic effects of genomic variants in IDPs.
Collapse
Affiliation(s)
- Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Birthe B Kragelund
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|
39
|
AI-based spectroscopic monitoring of real-time interactions between SARS-CoV-2 and human ACE2. Proc Natl Acad Sci U S A 2021; 118:2025879118. [PMID: 34185681 PMCID: PMC8256048 DOI: 10.1073/pnas.2025879118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
The COVID-19 caused by SARS-CoV-2 virus has posed a tremendous threat to human health. The interactions between human angiotensin-converting enzyme 2 and the spike glycoprotein of SARS-CoV-2 hold the key to understanding the molecular mechanism to develop treatment and vaccines. However, the simulation of these interactions in fluctuating surroundings is challenging because it requires many electronic structure calculations at the quantum mechanics level for a large number of representative configurations. We report a machine learning protocol that can efficiently predict the IR spectra of SARS-CoV-2 with high efficiency and characterize fine changes in IR spectra associated with variations of protein secondary structures. Machine learning provides a cost-effective tool for monitoring of real-time interactions between the SARS-CoV-2 and human ACE2. The novel coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), invades a human cell via human angiotensin-converting enzyme 2 (hACE2) as the entry, causing the severe coronavirus disease (COVID-19). The interactions between hACE2 and the spike glycoprotein (S protein) of SARS-CoV-2 hold the key to understanding the molecular mechanism to develop treatment and vaccines, yet the dynamic nature of these interactions in fluctuating surroundings is very challenging to probe by those structure determination techniques requiring the structures of samples to be fixed. Here we demonstrate, by a proof-of-concept simulation of infrared (IR) spectra of S protein and hACE2, that time-resolved spectroscopy may monitor the real-time structural information of the protein−protein complexes of interest, with the help of machine learning. Our machine learning protocol is able to identify fine changes in IR spectra associated with variation of the secondary structures of S protein of the coronavirus. Further, it is three to four orders of magnitude faster than conventional quantum chemistry calculations. We expect our machine learning protocol would accelerate the development of real-time spectroscopy study of protein dynamics.
Collapse
|
40
|
Zhang H, Yang Y, Zhang C, Farid SS, Dalby PA. Machine learning reveals hidden stability code in protein native fluorescence. Comput Struct Biotechnol J 2021; 19:2750-2760. [PMID: 34093990 PMCID: PMC8131987 DOI: 10.1016/j.csbj.2021.04.047] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 04/19/2021] [Accepted: 04/22/2021] [Indexed: 12/15/2022] Open
Abstract
Conformational stability of a protein is usually obtained by spectroscopically measuring the unfolding melting temperature. However, optical spectra under native conditions are considered to contain too little resolution to probe protein stability. Here, we have built and trained a neural network model to take the temperature-dependence of intrinsic fluorescence emission under native-only conditions as inputs, and then predict the spectra at the unfolding transition and denatured state. Application to a therapeutic antibody fragment demonstrates that thermal transitions obtained from the predicted spectra correlate highly with those measured experimentally. Crucially, this work reveals that the temperature-dependence of native fluorescence spectra contains a high-degree of previously hidden information relating native ensemble features to stability. This could lead to rapid screening of therapeutic protein variants and formulations based on spectroscopic measurements under non-denaturing temperatures only.
Collapse
Affiliation(s)
- Hongyu Zhang
- Department of Biochemical Engineering, UCL, London WC1E 6BT, UK.,EPSRC Future Targeted Healthcare Manufacturing Hub, UCL, London WC1E 6BT, UK
| | - Yang Yang
- Department of Biochemical Engineering, UCL, London WC1E 6BT, UK.,EPSRC Future Targeted Healthcare Manufacturing Hub, UCL, London WC1E 6BT, UK
| | - Cheng Zhang
- Department of Biochemical Engineering, UCL, London WC1E 6BT, UK
| | - Suzanne S Farid
- Department of Biochemical Engineering, UCL, London WC1E 6BT, UK.,EPSRC Future Targeted Healthcare Manufacturing Hub, UCL, London WC1E 6BT, UK
| | - Paul A Dalby
- Department of Biochemical Engineering, UCL, London WC1E 6BT, UK.,EPSRC Future Targeted Healthcare Manufacturing Hub, UCL, London WC1E 6BT, UK
| |
Collapse
|
41
|
Joung J, Han M, Hwang J, Jeong M, Choi DH, Park S. Deep Learning Optical Spectroscopy Based on Experimental Database: Potential Applications to Molecular Design. JACS AU 2021; 1:427-438. [PMID: 34467305 PMCID: PMC8395663 DOI: 10.1021/jacsau.1c00035] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Indexed: 06/13/2023]
Abstract
Accurate and reliable prediction of the optical and photophysical properties of organic compounds is important in various research fields. Here, we developed deep learning (DL) optical spectroscopy using a DL model and experimental database to predict seven optical and photophysical properties of organic compounds, namely, the absorption peak position and bandwidth, extinction coefficient, emission peak position and bandwidth, photoluminescence quantum yield (PLQY), and emission lifetime. Our DL model included the chromophore-solvent interaction to account for the effect of local environments on the optical and photophysical properties of organic compounds and was trained using an experimental database of 30 094 chromophore/solvent combinations. Our DL optical spectroscopy made it possible to reliably and quickly predict the aforementioned properties of organic compounds in solution, gas phase, film, and powder with the root mean squared errors of 26.6 and 28.0 nm for absorption and emission peak positions, 603 and 532 cm-1 for absorption and emission bandwidths, and 0.209, 0.371, and 0.262 for the logarithm of the extinction coefficient, PLQY, and emission lifetime, respectively. Finally, we demonstrated how a blue emitter with desired optical and photophysical properties could be efficiently virtually screened and developed by DL optical spectroscopy. DL optical spectroscopy can be efficiently used for developing chromophores and fluorophores in various research areas.
Collapse
Affiliation(s)
| | | | - Jinhyo Hwang
- Department of Chemistry and
Research Institute for Natural Science, Korea University, Seoul 02841, Korea
| | - Minseok Jeong
- Department of Chemistry and
Research Institute for Natural Science, Korea University, Seoul 02841, Korea
| | - Dong Hoon Choi
- Department of Chemistry and
Research Institute for Natural Science, Korea University, Seoul 02841, Korea
| | - Sungnam Park
- Department of Chemistry and
Research Institute for Natural Science, Korea University, Seoul 02841, Korea
| |
Collapse
|
42
|
He H, Yan S, Lyu D, Xu M, Ye R, Zheng P, Lu X, Wang L, Ren B. Deep Learning for Biospectroscopy and Biospectral Imaging: State-of-the-Art and Perspectives. Anal Chem 2021; 93:3653-3665. [PMID: 33599125 DOI: 10.1021/acs.analchem.0c04671] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
With the advances in instrumentation and sampling techniques, there is an explosive growth of data from molecular and cellular samples. The call to extract more information from the large data sets has greatly challenged the conventional chemometrics method. Deep learning, which utilizes very large data sets for finding hidden features therein and for making accurate predictions for a wide range of applications, has been applied in an unbelievable pace in biospectroscopy and biospectral imaging in the recent 3 years. In this Feature, we first introduce the background and basic knowledge of deep learning. We then focus on the emerging applications of deep learning in the data preprocessing, feature detection, and modeling of the biological samples for spectral analysis and spectroscopic imaging. Finally, we highlight the challenges and limitations in deep learning and the outlook for future directions.
Collapse
Affiliation(s)
- Hao He
- School of Aerospace Engineering, Xiamen University, Xiamen, 361000, China
| | - Sen Yan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Danya Lyu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Mengxi Xu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Ruiqian Ye
- School of Aerospace Engineering, Xiamen University, Xiamen, 361000, China
| | - Peng Zheng
- School of Aerospace Engineering, Xiamen University, Xiamen, 361000, China
| | - Xinyu Lu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Lei Wang
- School of Aerospace Engineering, Xiamen University, Xiamen, 361000, China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|