1
|
Wu W, Leonardis A, Jiao J, Jiang J, Chen L. Transformer-Based Models for Predicting Molecular Structures from Infrared Spectra Using Patch-Based Self-Attention. J Phys Chem A 2025; 129:2077-2085. [PMID: 39951543 DOI: 10.1021/acs.jpca.4c05665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2025]
Abstract
Infrared (IR) spectroscopy, a type of vibrational spectroscopy, provides extensive molecular structure details and is a highly effective technique for chemists to determine molecular structures. However, analyzing experimental spectra has always been challenging due to the specialized knowledge required and the variability of spectra under different experimental conditions. Here, we propose a transformer-based model with a patch-based self-attention spectrum embedding layer, designed to prevent the loss of spectral information while maintaining simplicity and effectiveness. To further enhance the model's understanding of IR spectra, we introduce a data augmentation approach, which selectively introduces vertical noise only at absorption peaks. Our approach not only achieves state-of-the-art performance on simulated data sets but also attains a top-1 accuracy of 55% on real experimental spectra, surpassing the previous state-of-the-art by approximately 10%. Additionally, our model demonstrates proficiency in analyzing intricate and variable fingerprint regions, effectively extracting critical structural information.
Collapse
Affiliation(s)
- Wenjin Wu
- State Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei 230026, China
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, U.K
| | - Aleš Leonardis
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, U.K
| | - Jianbo Jiao
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, U.K
| | - Jun Jiang
- State Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei 230026, China
| | - Linjiang Chen
- State Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei 230026, China
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, U.K
- School of Chemistry, University of Birmingham, Birmingham B15 2TT, U.K
| |
Collapse
|
2
|
Li XK, Tang LJ, Li ZY, Qiu D, Yang ZL, Zhang XY, Zhang XZ, Guo JJ, Li BQ. Geographical origin discrimination of Chenpi using machine learning and enhanced mid-level data fusion. NPJ Sci Food 2025; 9:17. [PMID: 39910100 PMCID: PMC11799441 DOI: 10.1038/s41538-025-00376-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 01/06/2025] [Indexed: 02/07/2025] Open
Abstract
Chenpi, or dried tangerine peel, is a traditional Chinese ingredient valued in medicine and edible for its digestive and respiratory benefits. The geographical origin of Chenpi is important, as it can impact its quality, active compounds and market value. This study develops a strategy to distinguish Chenpi samples on its origin. Thirty-nine samples from eight regions in Xinhui district (Guangdong, China) are analyzed by gas chromatography (GC) and mid-infrared (MIR) technique. Four machine learning methods are employed to establish discrimination models based on GC and MIR data, with two mid-level data fusion strategies to combine the data. The results show that data fusion significantly improves Chenpi origin discrimination. The K-nearest neighbors and artificial neural network models, using modified mid-level data fusion, provide the best performance, misclassified only one sample. Machine learning in combination with modified mid-level data fusion strategy provides effective classification of Chenpi samples from different geographical origins.
Collapse
Affiliation(s)
- Xin Kang Li
- School of Pharmacy and Food Engineering, Wuyi University, Jiangmen, 529020, PR China
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
| | - Li Jun Tang
- School of Pharmacy and Food Engineering, Wuyi University, Jiangmen, 529020, PR China
| | - Ze Ying Li
- School of Pharmacy and Food Engineering, Wuyi University, Jiangmen, 529020, PR China
| | - Dian Qiu
- School of Pharmacy and Food Engineering, Wuyi University, Jiangmen, 529020, PR China
| | - Zhuo Ling Yang
- School of Pharmacy and Food Engineering, Wuyi University, Jiangmen, 529020, PR China
| | - Xiao Yi Zhang
- School of Pharmacy and Food Engineering, Wuyi University, Jiangmen, 529020, PR China
| | - Xiang-Zhi Zhang
- School of Pharmacy and Food Engineering, Wuyi University, Jiangmen, 529020, PR China
| | - Jing Jing Guo
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China.
| | - Bao Qiong Li
- School of Pharmacy and Food Engineering, Wuyi University, Jiangmen, 529020, PR China.
| |
Collapse
|
3
|
Doan VHM, Ly CD, Mondal S, Truong TT, Nguyen TD, Choi J, Lee B, Oh J. Fcg-Former: Identification of Functional Groups in FTIR Spectra Using Enhanced Transformer-Based Model. Anal Chem 2024. [PMID: 39008658 DOI: 10.1021/acs.analchem.4c01622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
Deep learning (DL) is becoming more popular as a useful tool in various scientific domains, especially in chemistry applications. In the infrared spectroscopy field, where identifying functional groups in unknown compounds poses a significant challenge, there is a growing need for innovative approaches to streamline and enhance analysis processes. This study introduces a transformative approach leveraging a DL methodology based on transformer attention models. With a data set containing approximately 8677 spectra, our model utilizes self-attention mechanisms to capture complex spectral features and precisely predict 17 functional groups, outperforming conventional architectures in both functional group prediction accuracy and compound-level precision. The success of our approach underscores the potential of transformer-based methodologies in enhancing spectral analysis techniques.
Collapse
Affiliation(s)
- Vu Hoang Minh Doan
- Smart Gym-Based Translational Research Center for Active Senior's Healthcare, Pukyong National University, Busan 48513, Republic of Korea
| | - Cao Duong Ly
- Research and Development Department, Senior AI Research Engineer, Vision-in Inc., Seoul 08505, Republic of Korea
| | - Sudip Mondal
- Digital Healthcare Research Center, Pukyong National University, Busan 48513, Republic of Korea
| | - Thi Thuy Truong
- Industry 4.0 Convergence Bionics Engineering, Department of Biomedical Engineering, Pukyong National University, Busan 48513, Republic of Korea
| | - Tan Dung Nguyen
- Industry 4.0 Convergence Bionics Engineering, Department of Biomedical Engineering, Pukyong National University, Busan 48513, Republic of Korea
| | - Jaeyeop Choi
- Smart Gym-Based Translational Research Center for Active Senior's Healthcare, Pukyong National University, Busan 48513, Republic of Korea
| | - Byeongil Lee
- Digital Healthcare Research Center, Pukyong National University, Busan 48513, Republic of Korea
- Industry 4.0 Convergence Bionics Engineering, Department of Biomedical Engineering, Pukyong National University, Busan 48513, Republic of Korea
| | - Junghwan Oh
- Smart Gym-Based Translational Research Center for Active Senior's Healthcare, Pukyong National University, Busan 48513, Republic of Korea
- Digital Healthcare Research Center, Pukyong National University, Busan 48513, Republic of Korea
- Industry 4.0 Convergence Bionics Engineering, Department of Biomedical Engineering, Pukyong National University, Busan 48513, Republic of Korea
- Ohlabs Corp., Busan 48513, Republic of Korea
| |
Collapse
|
4
|
Srinivasan K, Puliyanda A, Prasad V. Identification of Reaction Network Hypotheses for Complex Feedstocks from Spectroscopic Measurements with Minimal Human Intervention. J Phys Chem A 2024; 128:4714-4729. [PMID: 38836378 DOI: 10.1021/acs.jpca.4c01592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
In this work, we detail an automated reaction network hypothesis generation protocol for processes involving complex feedstocks where information about the species and reactions involved is unknown. Our methodology is process agnostic and can be utilized in any reactive process with spectroscopic measurements that provide information on the evolution of the components in the mixture. We decompose the mixture spectra to obtain spectroscopic signatures of the individual components and use a 1-D convolutional neural network to automatically identify functional groups indicated by them. We employ atom-atom mapping to automatically recover reaction rules that are applied on candidate molecules identified from chemistry databases through fingerprint similarity. The method is tested on synthetic data and on spectroscopic measurements of lab-scale batch hydrothermal liquefaction (HTL) of biomass to determine the accuracy of prediction across datasets of varying complexities. Our methodology is able to identify reaction network hypotheses containing reaction networks close to the ground truth in the case of synthetic data, and we are also able to recover candidate molecules and reaction networks close to the ones reported in the previous literature studies for biomass pyrolysis.
Collapse
Affiliation(s)
- Karthik Srinivasan
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| | - Anjana Puliyanda
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| | - Vinay Prasad
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| |
Collapse
|
5
|
Lu XY, Wu HP, Ma H, Li H, Li J, Liu YT, Pan ZY, Xie Y, Wang L, Ren B, Liu GK. Deep Learning-Assisted Spectrum-Structure Correlation: State-of-the-Art and Perspectives. Anal Chem 2024; 96:7959-7975. [PMID: 38662943 DOI: 10.1021/acs.analchem.4c01639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Spectrum-structure correlation is playing an increasingly crucial role in spectral analysis and has undergone significant development in recent decades. With the advancement of spectrometers, the high-throughput detection triggers the explosive growth of spectral data, and the research extension from small molecules to biomolecules accompanies massive chemical space. Facing the evolving landscape of spectrum-structure correlation, conventional chemometrics becomes ill-equipped, and deep learning assisted chemometrics rapidly emerges as a flourishing approach with superior ability of extracting latent features and making precise predictions. In this review, the molecular and spectral representations and fundamental knowledge of deep learning are first introduced. We then summarize the development of how deep learning assist to establish the correlation between spectrum and molecular structure in the recent 5 years, by empowering spectral prediction (i.e., forward structure-spectrum correlation) and further enabling library matching and de novo molecular generation (i.e., inverse spectrum-structure correlation). Finally, we highlight the most important open issues persisted with corresponding potential solutions. With the fast development of deep learning, it is expected to see ultimate solution of establishing spectrum-structure correlation soon, which would trigger substantial development of various disciplines.
Collapse
Affiliation(s)
- Xin-Yu Lu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hao-Ping Wu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| | - Hao Ma
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hui Li
- Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen 361005, P. R. China
| | - Jia Li
- Institute of Artificial Intelligence, Xiamen University, Xiamen 361005, P. R. China
| | - Yan-Ti Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Zheng-Yan Pan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yi Xie
- School of Informatics, Xiamen University, Xiamen 361005, P. R. China
| | - Lei Wang
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361005, P. R. China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Guo-Kun Liu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| |
Collapse
|
6
|
Wang D, Xia Z, Wang L, Yan J, Yin H. Gas Graph Convolutional Transformer for Robust Generalization in Adaptive Gas Mixture Concentration Estimation. ACS Sens 2024; 9:1927-1937. [PMID: 38513127 DOI: 10.1021/acssensors.3c02654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Gas concentration estimation has a tremendous research significance in various fields. However, existing methods for estimating the concentration of mixed gases generally depend on specific data-preprocessing methods and suffer from poor generalizability to diverse types of gases. This paper proposes a graph neural network-based gas graph convolutional transformer model (GGCT) incorporating the information propagation properties and the physical characteristics of temporal sensor data. GGCT accurately predicts mixed gas concentrations and enhances its generalizability by analyzing the concentration tokens. The experimental results highlight the GGCT's robust performance, achieving exceptional levels of accuracy across most tested gas components, underscoring its strong potential for practical applications in mixed gas analysis.
Collapse
Affiliation(s)
- Ding Wang
- College of Electronics and Information Engineering, Tongji University, 4800 Cao'an Highway, Shanghai 201804, P. R. China
| | - Ziyuan Xia
- College of Electronics and Information Engineering, Tongji University, 4800 Cao'an Highway, Shanghai 201804, P. R. China
| | - Lei Wang
- College of Electronics and Information Engineering, Tongji University, 4800 Cao'an Highway, Shanghai 201804, P. R. China
| | - Jun Yan
- College of Electronics and Information Engineering, Tongji University, 4800 Cao'an Highway, Shanghai 201804, P. R. China
| | - Huilin Yin
- College of Electronics and Information Engineering, Tongji University, 4800 Cao'an Highway, Shanghai 201804, P. R. China
| |
Collapse
|