1
|
Srinivasan K, Puliyanda A, Prasad V. Identification of Reaction Network Hypotheses for Complex Feedstocks from Spectroscopic Measurements with Minimal Human Intervention. J Phys Chem A 2024; 128:4714-4729. [PMID: 38836378 DOI: 10.1021/acs.jpca.4c01592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
In this work, we detail an automated reaction network hypothesis generation protocol for processes involving complex feedstocks where information about the species and reactions involved is unknown. Our methodology is process agnostic and can be utilized in any reactive process with spectroscopic measurements that provide information on the evolution of the components in the mixture. We decompose the mixture spectra to obtain spectroscopic signatures of the individual components and use a 1-D convolutional neural network to automatically identify functional groups indicated by them. We employ atom-atom mapping to automatically recover reaction rules that are applied on candidate molecules identified from chemistry databases through fingerprint similarity. The method is tested on synthetic data and on spectroscopic measurements of lab-scale batch hydrothermal liquefaction (HTL) of biomass to determine the accuracy of prediction across datasets of varying complexities. Our methodology is able to identify reaction network hypotheses containing reaction networks close to the ground truth in the case of synthetic data, and we are also able to recover candidate molecules and reaction networks close to the ones reported in the previous literature studies for biomass pyrolysis.
Collapse
Affiliation(s)
- Karthik Srinivasan
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| | - Anjana Puliyanda
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| | - Vinay Prasad
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| |
Collapse
|
2
|
Nitika N, Keerthiveena B, Thakur G, Rathore AS. Convolutional Neural Networks Guided Raman Spectroscopy as a Process Analytical Technology (PAT) Tool for Monitoring and Simultaneous Prediction of Monoclonal Antibody Charge Variants. Pharm Res 2024; 41:463-479. [PMID: 38366234 DOI: 10.1007/s11095-024-03663-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 01/18/2024] [Indexed: 02/18/2024]
Abstract
BACKGROUND Charge related heterogeneities of monoclonal antibody (mAb) based therapeutic products are increasingly being considered as a critical quality attribute (CQA). They are typically estimated using analytical cation exchange chromatography (CEX), which is time consuming and not suitable for real time control. Raman spectroscopy coupled with artificial intelligence (AI) tools offers an opportunity for real time monitoring and control of charge variants. OBJECTIVE We present a process analytical technology (PAT) tool for on-line and real-time charge variant determination during process scale CEX based on Raman spectroscopy employing machine learning techniques. METHOD Raman spectra are collected from a reference library of samples with distribution of acidic, main, and basic species from 0-100% in a mAb concentration range of 0-20 g/L generated from process-scale CEX. The performance of different machine learning techniques for spectral processing is compared for predicting different charge variant species. RESULT A convolutional neural network (CNN) based model was successfully calibrated for quantification of acidic species, main species, basic species, and total protein concentration with R2 values of 0.94, 0.99, 0.96 and 0.99, respectively, and the Root Mean Squared Error (RMSE) of 0.1846, 0.1627, and 0.1029 g/L, respectively, and 0.2483 g/L for the total protein concentration. CONCLUSION We demonstrate that Raman spectroscopy combined with AI-ML frameworks can deliver rapid and accurate determination of product related impurities. This approach can be used for real time CEX pooling decisions in mAb production processes, thus enabling consistent charge variant profiles to be achieved.
Collapse
Affiliation(s)
- Nitika Nitika
- Department of Chemical Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India
| | - B Keerthiveena
- School of Artificial Intelligence, Indian Institute of Technology Delhi, New Delhi, India
| | - Garima Thakur
- Department of Chemical Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India
| | - Anurag S Rathore
- Department of Chemical Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India.
- School of Artificial Intelligence, Indian Institute of Technology Delhi, New Delhi, India.
| |
Collapse
|
3
|
Xue X, Sun H, Yang M, Liu X, Hu HY, Deng Y, Wang X. Advances in the Application of Artificial Intelligence-Based Spectral Data Interpretation: A Perspective. Anal Chem 2023; 95:13733-13745. [PMID: 37688541 DOI: 10.1021/acs.analchem.3c02540] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2023]
Abstract
The interpretation of spectral data, including mass, nuclear magnetic resonance, infrared, and ultraviolet-visible spectra, is critical for obtaining molecular structural information. The development of advanced sensing technology has multiplied the amount of available spectral data. Chemical experts must use basic principles corresponding to the spectral information generated by molecular fragments and functional groups. This is a time-consuming process that requires a solid professional knowledge base. In recent years, the rapid development of computer science and its applications in cheminformatics and the emergence of computer-aided expert systems have greatly reduced the difficulty in analyzing large quantities of data. For expert systems, however, the problem-solving strategy must be known in advance or extracted by human experts and translated into algorithms. Gratifyingly, the development of artificial intelligence (AI) methods has shown great promise for solving such problems. Traditional algorithms, including the latest neural network algorithms, have shown great potential for both extracting useful information and processing massive quantities of data. This Perspective highlights recent innovations covering all of the emerging AI-based spectral interpretation techniques. In addition, the main limitations and current obstacles are presented, and the corresponding directions for further research are proposed. Moreover, this Perspective gives the authors' personal outlook on the development and future applications of spectral interpretation.
Collapse
Affiliation(s)
- Xi Xue
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hai-Yu Hu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
| |
Collapse
|
4
|
Wang T, Tan Y, Chen YZ, Tan C. Infrared Spectral Analysis for Prediction of Functional Groups Based on Feature-Aggregated Deep Learning. J Chem Inf Model 2023; 63:4615-4622. [PMID: 37531205 DOI: 10.1021/acs.jcim.3c00749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
Infrared (IR) spectroscopy is a powerful and versatile tool for analyzing functional groups in organic compounds. A complex and time-consuming interpretation of massive unknown spectra usually requires knowledge of chemistry and spectroscopy. This paper presents a new deep learning method for transforming IR spectral features into intuitive imagelike feature maps and prediction of major functional groups. We obtained 8272 gas-phase IR spectra from the NIST Chemistry WebBook. Feature maps are constructed using the intrinsic correlation of spectral data, and prediction models are developed based on convolutional neural networks. Twenty-one major functional groups for each molecule are successfully identified using binary and multilabel models without expert guidance and feature selection. The multilabel classification model can produce all prediction results simultaneously for rapid characterization. Further analysis of the detailed substructures indicates that our model is capable of obtaining abundant structural information from IR spectra for a comprehensive investigation. The interpretation of our model reveals that the peaks of most interest are similar to those often considered by spectroscopists. In addition to demonstrating great potential for spectral identification, our method may contribute to the development of automated analyses in many fields.
Collapse
Affiliation(s)
- Tianyi Wang
- The State Key Laboratory of Chemical Oncogenomics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Open FIESTA, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
| | - Ying Tan
- The State Key Laboratory of Chemical Oncogenomics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Open FIESTA, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
| | - Yu Zong Chen
- The State Key Laboratory of Chemical Oncogenomics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen 518132, P.R. China
| | - Chunyan Tan
- The State Key Laboratory of Chemical Oncogenomics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Open FIESTA, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
| |
Collapse
|
5
|
Li C, Cong Y, Deng W. Identifying molecular functional groups of organic compounds by deep learning of NMR data. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2022; 60:1061-1069. [PMID: 35674984 DOI: 10.1002/mrc.5292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 06/02/2022] [Accepted: 06/06/2022] [Indexed: 06/15/2023]
Abstract
We preprocess the raw nuclear magnetic resonance (NMR) spectrum and extract key features by using two different methodologies, called equidistant sampling and peak sampling for subsequent substructure pattern recognition. We also provide a strategy to address the imbalance issue frequently encountered in statistical modeling of NMR data set and establish two conventional support vector machine (SVM) and K-nearest neighbor (KNN) models to assess the capability of two feature selections, respectively. Our results in this study show that the models using the selected features of peak sampling outperform those using equidistant sampling. Then we build the recurrent neural network (RNN) model trained by data collected from peak sampling. Furthermore, we illustrate the easier optimization of hyperparameters and the better generalization ability of the RNN deep learning model by detailed comparison with traditional machine learning SVM and KNN models.
Collapse
Affiliation(s)
- Chongcan Li
- School of Mathematics and Statistics, Gansu Key Laboratory of Applied Mathematics and Complex Systems, Lanzhou University, Lanzhou, China
| | - Yong Cong
- College of Chemistry and Chemical Engineering, State Key Laboratory of Applied Organic Chemistry, Key Laboratory of Nonferrous Metals Chemistry and Resources Utilization, Lanzhou University, Lanzhou, China
| | - Weihua Deng
- School of Mathematics and Statistics, Gansu Key Laboratory of Applied Mathematics and Complex Systems, Lanzhou University, Lanzhou, China
| |
Collapse
|
6
|
Sridharan B, Goel M, Priyakumar UD. Modern Machine Learning for Tackling Inverse Problems in Chemistry: Molecular Design to Realization. Chem Commun (Camb) 2022; 58:5316-5331. [DOI: 10.1039/d1cc07035e] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The discovery of new molecules and materials helps expand the horizons of novel and innovative real-life applications. In the pursuit of finding molecules with desired properties, chemists have traditionally relied...
Collapse
|
7
|
Huang Z, Chen MS, Woroch CP, Markland TE, Kanan MW. A framework for automated structure elucidation from routine NMR spectra. Chem Sci 2021; 12:15329-15338. [PMID: 34976353 PMCID: PMC8635205 DOI: 10.1039/d1sc04105c] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 11/08/2021] [Indexed: 12/25/2022] Open
Abstract
Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional 1H and/or 13C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms. A machine learning model and graph generator were able to accurately predict for the presence of nearly 1000 substructures and the connectivity of small organic molecules from experimental 1D NMR data.![]()
Collapse
Affiliation(s)
- Zhaorui Huang
- Department of Chemistry, Stanford University Stanford CA 94305 USA
| | - Michael S Chen
- Department of Chemistry, Stanford University Stanford CA 94305 USA
| | | | | | - Matthew W Kanan
- Department of Chemistry, Stanford University Stanford CA 94305 USA
| |
Collapse
|
8
|
Johnson JL, Polavarapu PL. Chiral Molecular Structure Determination for a Desired Compound Just from Its Molecular Formula and Vibrational Optical Activity Spectra. J Phys Chem A 2021; 125:8000-8013. [PMID: 34478311 DOI: 10.1021/acs.jpca.1c06369] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
A novel proof-of-concept model for chiral molecular structure determination using just the molecular formula and vibrational optical activity (VOA) spectra is presented. To verify this concept, the molecular formula of a desired compound is used to generate all possible chiral structural isomers and their VOA spectra are predicted. The similarity analyses of predicted VOA spectra were then carried out in two different ways: (a) similarity between VOA spectrum of one structural isomer with those of the rest, referred to as cross-correlations; (b) similarity between VOA spectra of all chiral structural isomers with the experimental VOA spectra of the desired compound. Three different molecular formulae, C4H8O, C3H5ClO, and C6H10O, and their chiral structural isomers (6, 9, and 75 respectively), were investigated. In each case, the correct chiral molecular structure of the desired compound was identified without ambiguity. Cross-correlation analysis revealed the uniqueness of VOA spectra in deducing the chiral molecular structure solely from its molecular formula. Different chiral structural isomers associated with the molecular formula CH3NO2 were also found to have no significant cross-correlations between their VOA spectra, opening a pathway to detect and identify the elusive chiral N-hydroxyoxaziridine from its VOA spectra.
Collapse
Affiliation(s)
- Jordan L Johnson
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Prasad L Polavarapu
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| |
Collapse
|