1
|
Kundu P, Beura S, Mondal S, Das AK, Ghosh A. Machine learning for the advancement of genome-scale metabolic modeling. Biotechnol Adv 2024; 74:108400. [PMID: 38944218 DOI: 10.1016/j.biotechadv.2024.108400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 05/13/2024] [Accepted: 06/23/2024] [Indexed: 07/01/2024]
Abstract
Constraint-based modeling (CBM) has evolved as the core systems biology tool to map the interrelations between genotype, phenotype, and external environment. The recent advancement of high-throughput experimental approaches and multi-omics strategies has generated a plethora of new and precise information from wide-ranging biological domains. On the other hand, the continuously growing field of machine learning (ML) and its specialized branch of deep learning (DL) provide essential computational architectures for decoding complex and heterogeneous biological data. In recent years, both multi-omics and ML have assisted in the escalation of CBM. Condition-specific omics data, such as transcriptomics and proteomics, helped contextualize the model prediction while analyzing a particular phenotypic signature. At the same time, the advanced ML tools have eased the model reconstruction and analysis to increase the accuracy and prediction power. However, the development of these multi-disciplinary methodological frameworks mainly occurs independently, which limits the concatenation of biological knowledge from different domains. Hence, we have reviewed the potential of integrating multi-disciplinary tools and strategies from various fields, such as synthetic biology, CBM, omics, and ML, to explore the biochemical phenomenon beyond the conventional biological dogma. How the integrative knowledge of these intersected domains has improved bioengineering and biomedical applications has also been highlighted. We categorically explained the conventional genome-scale metabolic model (GEM) reconstruction tools and their improvement strategies through ML paradigms. Further, the crucial role of ML and DL in omics data restructuring for GEM development has also been briefly discussed. Finally, the case-study-based assessment of the state-of-the-art method for improving biomedical and metabolic engineering strategies has been elaborated. Therefore, this review demonstrates how integrating experimental and in silico strategies can help map the ever-expanding knowledge of biological systems driven by condition-specific cellular information. This multiview approach will elevate the application of ML-based CBM in the biomedical and bioengineering fields for the betterment of society and the environment.
Collapse
Affiliation(s)
- Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Satyajit Beura
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Suman Mondal
- P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Kumar Das
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
2
|
Jin T, Zhao Q, Schofield AB, Savoie BM. Deductive machine learning models for product identification. Chem Sci 2024; 15:11995-12005. [PMID: 39092129 PMCID: PMC11290435 DOI: 10.1039/d3sc04909d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Accepted: 06/09/2024] [Indexed: 08/04/2024] Open
Abstract
Deductive solution strategies are required in prediction scenarios that are under determined, when contradictory information is available, or more generally wherever one-to-many non-functional mappings occur. In contrast, most contemporary machine learning (ML) in the chemical sciences is inductive learning from example, with a fixed set of features. Chemical workflows are replete with situations requiring deduction, including many aspects of lab automation and spectral interpretation. Here, a general strategy is described for designing and training machine learning models capable of deduction that consists of combining individual inductive models into a larger deductive network. The training and testing of these models is demonstrated on the task of deducing reaction products from a mixture of spectral sources. The resulting models can distinguish between intended and unintended reaction outcomes and identify starting material based on a mixture of spectral sources. The models also perform well on tasks that they were not directly trained on, like performing structural inference using real rather than simulated spectral inputs, predicting minor products from named organic chemistry reactions, identifying reagents and isomers as plausible impurities, and handling missing or conflicting information. A new dataset of 1 124 043 simulated spectra that were generated to train these models is also distributed with this work. These findings demonstrate that deductive bottlenecks for chemical problems are not fundamentally insuperable for ML models.
Collapse
Affiliation(s)
- Tianfan Jin
- Department of Chemical Engineering, Purdue University West Lafayette USA
| | - Qiyuan Zhao
- Department of Chemical Engineering, Purdue University West Lafayette USA
| | - Andrew B Schofield
- Department of Chemical Engineering, Purdue University West Lafayette USA
| | - Brett M Savoie
- Department of Chemical Engineering, Purdue University West Lafayette USA
| |
Collapse
|
3
|
Doan VHM, Ly CD, Mondal S, Truong TT, Nguyen TD, Choi J, Lee B, Oh J. Fcg-Former: Identification of Functional Groups in FTIR Spectra Using Enhanced Transformer-Based Model. Anal Chem 2024. [PMID: 39008658 DOI: 10.1021/acs.analchem.4c01622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
Deep learning (DL) is becoming more popular as a useful tool in various scientific domains, especially in chemistry applications. In the infrared spectroscopy field, where identifying functional groups in unknown compounds poses a significant challenge, there is a growing need for innovative approaches to streamline and enhance analysis processes. This study introduces a transformative approach leveraging a DL methodology based on transformer attention models. With a data set containing approximately 8677 spectra, our model utilizes self-attention mechanisms to capture complex spectral features and precisely predict 17 functional groups, outperforming conventional architectures in both functional group prediction accuracy and compound-level precision. The success of our approach underscores the potential of transformer-based methodologies in enhancing spectral analysis techniques.
Collapse
Affiliation(s)
- Vu Hoang Minh Doan
- Smart Gym-Based Translational Research Center for Active Senior's Healthcare, Pukyong National University, Busan 48513, Republic of Korea
| | - Cao Duong Ly
- Research and Development Department, Senior AI Research Engineer, Vision-in Inc., Seoul 08505, Republic of Korea
| | - Sudip Mondal
- Digital Healthcare Research Center, Pukyong National University, Busan 48513, Republic of Korea
| | - Thi Thuy Truong
- Industry 4.0 Convergence Bionics Engineering, Department of Biomedical Engineering, Pukyong National University, Busan 48513, Republic of Korea
| | - Tan Dung Nguyen
- Industry 4.0 Convergence Bionics Engineering, Department of Biomedical Engineering, Pukyong National University, Busan 48513, Republic of Korea
| | - Jaeyeop Choi
- Smart Gym-Based Translational Research Center for Active Senior's Healthcare, Pukyong National University, Busan 48513, Republic of Korea
| | - Byeongil Lee
- Digital Healthcare Research Center, Pukyong National University, Busan 48513, Republic of Korea
- Industry 4.0 Convergence Bionics Engineering, Department of Biomedical Engineering, Pukyong National University, Busan 48513, Republic of Korea
| | - Junghwan Oh
- Smart Gym-Based Translational Research Center for Active Senior's Healthcare, Pukyong National University, Busan 48513, Republic of Korea
- Digital Healthcare Research Center, Pukyong National University, Busan 48513, Republic of Korea
- Industry 4.0 Convergence Bionics Engineering, Department of Biomedical Engineering, Pukyong National University, Busan 48513, Republic of Korea
- Ohlabs Corp., Busan 48513, Republic of Korea
| |
Collapse
|
4
|
Beck A, Muhoberac M, Randolph CE, Beveridge CH, Wijewardhane PR, Kenttämaa HI, Chopra G. Recent Developments in Machine Learning for Mass Spectrometry. ACS MEASUREMENT SCIENCE AU 2024; 4:233-246. [PMID: 38910862 PMCID: PMC11191731 DOI: 10.1021/acsmeasuresciau.3c00060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 12/27/2023] [Accepted: 01/22/2024] [Indexed: 06/25/2024]
Abstract
Statistical analysis and modeling of mass spectrometry (MS) data have a long and rich history with several modern MS-based applications using statistical and chemometric methods. Recently, machine learning (ML) has experienced a renaissance due to advents in computational hardware and the development of new algorithms for artificial neural networks (ANN) and deep learning architectures. Moreover, recent successes of new ANN and deep learning architectures in several areas of science, engineering, and society have further strengthened the ML field. Importantly, modern ML methods and architectures have enabled new approaches for tasks related to MS that are now widely adopted in several popular MS-based subdisciplines, such as mass spectrometry imaging and proteomics. Herein, we aim to provide an introductory summary of the practical aspects of ML methodology relevant to MS. Additionally, we seek to provide an up-to-date review of the most recent developments in ML integration with MS-based techniques while also providing critical insights into the future direction of the field.
Collapse
Affiliation(s)
- Armen
G. Beck
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Matthew Muhoberac
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Caitlin E. Randolph
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Connor H. Beveridge
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Prageeth R. Wijewardhane
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Hilkka I. Kenttämaa
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Gaurav Chopra
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
- Department
of Computer Science (by courtesy), Purdue University, West Lafayette, Indiana 47907, United States
- Purdue
Institute for Drug Discovery, Purdue Institute for Cancer Research,
Regenstrief Center for Healthcare Engineering, Purdue Institute for
Inflammation, Immunology and Infectious Disease, Purdue Institute for Integrative Neuroscience, West Lafayette, Indiana 47907 United States
| |
Collapse
|
5
|
Srinivasan K, Puliyanda A, Prasad V. Identification of Reaction Network Hypotheses for Complex Feedstocks from Spectroscopic Measurements with Minimal Human Intervention. J Phys Chem A 2024; 128:4714-4729. [PMID: 38836378 DOI: 10.1021/acs.jpca.4c01592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
In this work, we detail an automated reaction network hypothesis generation protocol for processes involving complex feedstocks where information about the species and reactions involved is unknown. Our methodology is process agnostic and can be utilized in any reactive process with spectroscopic measurements that provide information on the evolution of the components in the mixture. We decompose the mixture spectra to obtain spectroscopic signatures of the individual components and use a 1-D convolutional neural network to automatically identify functional groups indicated by them. We employ atom-atom mapping to automatically recover reaction rules that are applied on candidate molecules identified from chemistry databases through fingerprint similarity. The method is tested on synthetic data and on spectroscopic measurements of lab-scale batch hydrothermal liquefaction (HTL) of biomass to determine the accuracy of prediction across datasets of varying complexities. Our methodology is able to identify reaction network hypotheses containing reaction networks close to the ground truth in the case of synthetic data, and we are also able to recover candidate molecules and reaction networks close to the ones reported in the previous literature studies for biomass pyrolysis.
Collapse
Affiliation(s)
- Karthik Srinivasan
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| | - Anjana Puliyanda
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| | - Vinay Prasad
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| |
Collapse
|
6
|
Jang HD, Kwon S, Nam H, Chang DE. Semi-Supervised Autoencoder for Chemical Gas Classification with FTIR Spectrum. SENSORS (BASEL, SWITZERLAND) 2024; 24:3601. [PMID: 38894390 PMCID: PMC11175179 DOI: 10.3390/s24113601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/13/2024] [Accepted: 05/31/2024] [Indexed: 06/21/2024]
Abstract
Chemical warfare agents pose a serious threat due to their extreme toxicity, necessitating swift the identification of chemical gases and individual responses to the identified threats. Fourier transform infrared (FTIR) spectroscopy offers a method for remote material analysis, particularly in detecting colorless and odorless chemical agents. In this paper, we propose a deep neural network utilizing a semi-supervised autoencoder (SSAE) for the classification of chemical gases based on FTIR spectra. In contrast to traditional methods, the SSAE concurrently trains an autoencoder and a classifier attached to a latent vector of the autoencoder, enhancing feature extraction for classification. The SSAE was evaluated on laboratory-collected FTIR spectra, demonstrating a superior classification performance compared to existing methods. The efficacy of the SSAE lies in its ability to generate denser cluster distributions in latent vectors, thereby enhancing gas classification. This study established a consistent experimental environment for hyperparameter optimization, offering valuable insights into the influence of latent vectors on classification performance.
Collapse
Affiliation(s)
- Hee-Deok Jang
- School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea; (H.-D.J.); (S.K.)
| | - Seokjoon Kwon
- School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea; (H.-D.J.); (S.K.)
| | - Hyunwoo Nam
- Chem-Bio Technology Center, Advanced Defense Science and Technology Research Institute, Agency for Defense Development, Daejeon 34186, Republic of Korea;
| | - Dong Eui Chang
- School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea; (H.-D.J.); (S.K.)
| |
Collapse
|
7
|
Lu XY, Wu HP, Ma H, Li H, Li J, Liu YT, Pan ZY, Xie Y, Wang L, Ren B, Liu GK. Deep Learning-Assisted Spectrum-Structure Correlation: State-of-the-Art and Perspectives. Anal Chem 2024; 96:7959-7975. [PMID: 38662943 DOI: 10.1021/acs.analchem.4c01639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Spectrum-structure correlation is playing an increasingly crucial role in spectral analysis and has undergone significant development in recent decades. With the advancement of spectrometers, the high-throughput detection triggers the explosive growth of spectral data, and the research extension from small molecules to biomolecules accompanies massive chemical space. Facing the evolving landscape of spectrum-structure correlation, conventional chemometrics becomes ill-equipped, and deep learning assisted chemometrics rapidly emerges as a flourishing approach with superior ability of extracting latent features and making precise predictions. In this review, the molecular and spectral representations and fundamental knowledge of deep learning are first introduced. We then summarize the development of how deep learning assist to establish the correlation between spectrum and molecular structure in the recent 5 years, by empowering spectral prediction (i.e., forward structure-spectrum correlation) and further enabling library matching and de novo molecular generation (i.e., inverse spectrum-structure correlation). Finally, we highlight the most important open issues persisted with corresponding potential solutions. With the fast development of deep learning, it is expected to see ultimate solution of establishing spectrum-structure correlation soon, which would trigger substantial development of various disciplines.
Collapse
Affiliation(s)
- Xin-Yu Lu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hao-Ping Wu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| | - Hao Ma
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hui Li
- Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen 361005, P. R. China
| | - Jia Li
- Institute of Artificial Intelligence, Xiamen University, Xiamen 361005, P. R. China
| | - Yan-Ti Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Zheng-Yan Pan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yi Xie
- School of Informatics, Xiamen University, Xiamen 361005, P. R. China
| | - Lei Wang
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361005, P. R. China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Guo-Kun Liu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| |
Collapse
|
8
|
Nitika N, Keerthiveena B, Thakur G, Rathore AS. Convolutional Neural Networks Guided Raman Spectroscopy as a Process Analytical Technology (PAT) Tool for Monitoring and Simultaneous Prediction of Monoclonal Antibody Charge Variants. Pharm Res 2024; 41:463-479. [PMID: 38366234 DOI: 10.1007/s11095-024-03663-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 01/18/2024] [Indexed: 02/18/2024]
Abstract
BACKGROUND Charge related heterogeneities of monoclonal antibody (mAb) based therapeutic products are increasingly being considered as a critical quality attribute (CQA). They are typically estimated using analytical cation exchange chromatography (CEX), which is time consuming and not suitable for real time control. Raman spectroscopy coupled with artificial intelligence (AI) tools offers an opportunity for real time monitoring and control of charge variants. OBJECTIVE We present a process analytical technology (PAT) tool for on-line and real-time charge variant determination during process scale CEX based on Raman spectroscopy employing machine learning techniques. METHOD Raman spectra are collected from a reference library of samples with distribution of acidic, main, and basic species from 0-100% in a mAb concentration range of 0-20 g/L generated from process-scale CEX. The performance of different machine learning techniques for spectral processing is compared for predicting different charge variant species. RESULT A convolutional neural network (CNN) based model was successfully calibrated for quantification of acidic species, main species, basic species, and total protein concentration with R2 values of 0.94, 0.99, 0.96 and 0.99, respectively, and the Root Mean Squared Error (RMSE) of 0.1846, 0.1627, and 0.1029 g/L, respectively, and 0.2483 g/L for the total protein concentration. CONCLUSION We demonstrate that Raman spectroscopy combined with AI-ML frameworks can deliver rapid and accurate determination of product related impurities. This approach can be used for real time CEX pooling decisions in mAb production processes, thus enabling consistent charge variant profiles to be achieved.
Collapse
Affiliation(s)
- Nitika Nitika
- Department of Chemical Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India
| | - B Keerthiveena
- School of Artificial Intelligence, Indian Institute of Technology Delhi, New Delhi, India
| | - Garima Thakur
- Department of Chemical Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India
| | - Anurag S Rathore
- Department of Chemical Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India.
- School of Artificial Intelligence, Indian Institute of Technology Delhi, New Delhi, India.
| |
Collapse
|
9
|
Goldman S, Li J, Coley CW. Generating Molecular Fragmentation Graphs with Autoregressive Neural Networks. Anal Chem 2024; 96:3419-3428. [PMID: 38349970 DOI: 10.1021/acs.analchem.3c04654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2024]
Abstract
The accurate prediction of tandem mass spectra from molecular structures has the potential to unlock new metabolomic discoveries by augmenting the community's libraries of experimental reference standards. Cheminformatic spectrum prediction strategies use a "bond-breaking" framework to iteratively simulate mass spectrum fragmentations, but these methods are (a) slow due to the need to exhaustively and combinatorially break molecules and (b) inaccurate as they often rely upon heuristics to predict the intensity of each resulting fragment; neural network alternatives mitigate computational cost but are black-box and not inherently more accurate. We introduce a physically grounded neural approach that learns to predict each breakage event and score the most relevant subset of molecular fragments quickly and accurately. We evaluate our model by predicting spectra from both public and private standard libraries, demonstrating that our hybrid approach offers state-of-the-art prediction accuracy, improved metabolite identification from a database of candidates, and higher interpretability when compared to previous breakage methods and black-box neural networks. The grounding of our approach in physical fragmentation events shows especially great promise for elucidating natural product molecules with more complex scaffolds.
Collapse
Affiliation(s)
- Samuel Goldman
- Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Janet Li
- Harvard College, Harvard University, Cambridge, Massachusetts 02138, United States
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
10
|
Lu XY, Wang CY, Tang H, Qin YF, Cui L, Wang X, Liu GK, Ren B. Patch-Based Convolutional Encoder: A Deep Learning Algorithm for Spectral Classification Balancing the Local and Global Information. Anal Chem 2024. [PMID: 38324760 DOI: 10.1021/acs.analchem.3c03889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Molecular vibrational spectroscopies, including infrared absorption and Raman scattering, provide molecular fingerprint information and are powerful tools for qualitative and quantitative analysis. They benefit from the recent development of deep-learning-based algorithms to improve the spectral, spatial, and temporal resolutions. Although a variety of deep-learning-based algorithms, including those to simultaneously extract the global and local spectral features, have been developed for spectral classification, the classification accuracy is still far from satisfactory when the difference becomes very subtle. Here, we developed a lightweight algorithm named patch-based convolutional encoder (PACE), which effectively improved the accuracy of spectral classification by extracting spectral features while balancing local and global information. The local information was captured well by segmenting the spectrum into patches with an appropriate patch size. The global information was extracted by constructing the correlation between different patches with depthwise separable convolutions. In the five open-source spectral data sets, PACE achieved a state-of-the-art performance. The more difficult the classification, the better the performance of PACE, compared with that of residual neural network (ResNet), vision transformer (ViT), and other commonly used deep learning algorithms. PACE helped improve the accuracy to 92.1% in Raman identification of pathogen-derived extracellular vesicles at different physiological states, which is much better than those of ResNet (85.1%) and ViT (86.0%). In general, the precise recognition and extraction of subtle differences offered by PACE are expected to facilitate vibrational spectroscopy to be a powerful tool toward revealing the relevant chemical reaction mechanisms in surface science or realizing the early diagnosis in life science.
Collapse
Affiliation(s)
- Xin-Yu Lu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Chen-Yue Wang
- Institute of Artificial Intelligence, Xiamen University, Xiamen 361005, China
| | - Hui Tang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yi-Fei Qin
- Xiamen Key Laboratory of Indoor Air and Health, Key Laboratory of Urban Environment and Health, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China
- College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Li Cui
- Xiamen Key Laboratory of Indoor Air and Health, Key Laboratory of Urban Environment and Health, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China
- College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiang Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, China
| | - Guo-Kun Liu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
- Institute of Artificial Intelligence, Xiamen University, Xiamen 361005, China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, China
| |
Collapse
|
11
|
Cao H, Shi H, Tang J, Xu Y, Ling Y, Lu X, Yang Y, Zhang X, Wang H. Ultrasensitive discrimination of volatile organic compounds using a microfluidic silicon SERS artificial intelligence chip. iScience 2023; 26:107821. [PMID: 37731613 PMCID: PMC10507157 DOI: 10.1016/j.isci.2023.107821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 07/06/2023] [Accepted: 08/31/2023] [Indexed: 09/22/2023] Open
Abstract
Current gaseous sensors hardly discriminate trace volatile organic compounds at the ppt level. Herein, we present an integrated platform for simultaneously enabling rapid preconcentration, reliable surface-enhanced Raman scattering, (SERS) detection and automatic identification of trace aldehydes at the ppt level. For rapid preconcentration, we demonstrate that the nozzle-like microfluidic concentrator allows the enrichment of rare gaseous analytes by five-fold in only 0.01 ms. The enriched gas is subsequently captured and detected by an integrated silicon-based SERS chip, which is made of zeolitic imidazolate framework-8 coated silver nanoparticles grown in situ on a silicon wafer. After SERS measurement, a fully connected deep neural network is built to extract faint features in the spectral dataset and discriminate volatile organic compound classes. We demonstrate that six kinds of gaseous aldehydes at 100 ppt could be detected and classified with an identification accuracy of ∼80.9% by using this platform.
Collapse
Affiliation(s)
- Haiting Cao
- Suzhou Key Laboratory of Nanotechnology and Biomedicine, Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China
| | - Huayi Shi
- Suzhou Key Laboratory of Nanotechnology and Biomedicine, Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China
| | - Jie Tang
- Suzhou Key Laboratory of Nanotechnology and Biomedicine, Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China
| | - Yanan Xu
- Suzhou Key Laboratory of Nanotechnology and Biomedicine, Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China
| | - Yufan Ling
- State Key Laboratory of Radiation Medicine and Protection, School of Radiation Medicine and Protection, Collaborative Innovation Center of Radiological Medicine of Jiangsu Higher Education Institutions, Soochow University, 199 Renai Road, Suzhou 215123, China
| | - Xing Lu
- Suzhou Key Laboratory of Nanotechnology and Biomedicine, Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China
| | - Yang Yang
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
| | - Xiaojie Zhang
- Department of Experimental Center, Medical College of Soochow University, Suzhou, Jiangsu 215123, China
| | - Houyu Wang
- Suzhou Key Laboratory of Nanotechnology and Biomedicine, Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China
| |
Collapse
|
12
|
Xue X, Sun H, Yang M, Liu X, Hu HY, Deng Y, Wang X. Advances in the Application of Artificial Intelligence-Based Spectral Data Interpretation: A Perspective. Anal Chem 2023; 95:13733-13745. [PMID: 37688541 DOI: 10.1021/acs.analchem.3c02540] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2023]
Abstract
The interpretation of spectral data, including mass, nuclear magnetic resonance, infrared, and ultraviolet-visible spectra, is critical for obtaining molecular structural information. The development of advanced sensing technology has multiplied the amount of available spectral data. Chemical experts must use basic principles corresponding to the spectral information generated by molecular fragments and functional groups. This is a time-consuming process that requires a solid professional knowledge base. In recent years, the rapid development of computer science and its applications in cheminformatics and the emergence of computer-aided expert systems have greatly reduced the difficulty in analyzing large quantities of data. For expert systems, however, the problem-solving strategy must be known in advance or extracted by human experts and translated into algorithms. Gratifyingly, the development of artificial intelligence (AI) methods has shown great promise for solving such problems. Traditional algorithms, including the latest neural network algorithms, have shown great potential for both extracting useful information and processing massive quantities of data. This Perspective highlights recent innovations covering all of the emerging AI-based spectral interpretation techniques. In addition, the main limitations and current obstacles are presented, and the corresponding directions for further research are proposed. Moreover, this Perspective gives the authors' personal outlook on the development and future applications of spectral interpretation.
Collapse
Affiliation(s)
- Xi Xue
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hai-Yu Hu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
| |
Collapse
|
13
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
14
|
Wang T, Tan Y, Chen YZ, Tan C. Infrared Spectral Analysis for Prediction of Functional Groups Based on Feature-Aggregated Deep Learning. J Chem Inf Model 2023; 63:4615-4622. [PMID: 37531205 DOI: 10.1021/acs.jcim.3c00749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
Infrared (IR) spectroscopy is a powerful and versatile tool for analyzing functional groups in organic compounds. A complex and time-consuming interpretation of massive unknown spectra usually requires knowledge of chemistry and spectroscopy. This paper presents a new deep learning method for transforming IR spectral features into intuitive imagelike feature maps and prediction of major functional groups. We obtained 8272 gas-phase IR spectra from the NIST Chemistry WebBook. Feature maps are constructed using the intrinsic correlation of spectral data, and prediction models are developed based on convolutional neural networks. Twenty-one major functional groups for each molecule are successfully identified using binary and multilabel models without expert guidance and feature selection. The multilabel classification model can produce all prediction results simultaneously for rapid characterization. Further analysis of the detailed substructures indicates that our model is capable of obtaining abundant structural information from IR spectra for a comprehensive investigation. The interpretation of our model reveals that the peaks of most interest are similar to those often considered by spectroscopists. In addition to demonstrating great potential for spectral identification, our method may contribute to the development of automated analyses in many fields.
Collapse
Affiliation(s)
- Tianyi Wang
- The State Key Laboratory of Chemical Oncogenomics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Open FIESTA, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
| | - Ying Tan
- The State Key Laboratory of Chemical Oncogenomics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Open FIESTA, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
| | - Yu Zong Chen
- The State Key Laboratory of Chemical Oncogenomics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen 518132, P.R. China
| | - Chunyan Tan
- The State Key Laboratory of Chemical Oncogenomics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Open FIESTA, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
| |
Collapse
|
15
|
Vermeyen T, Cunha A, Bultinck P, Herrebout W. Impact of conformation and intramolecular interactions on vibrational circular dichroism spectra identified with machine learning. Commun Chem 2023; 6:148. [PMID: 37438485 DOI: 10.1038/s42004-023-00944-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 06/29/2023] [Indexed: 07/14/2023] Open
Abstract
Vibrational Circular Dichroism (VCD) spectra often differ strongly from one conformer to another, even within the same absolute configuration of a molecule. Simulated molecular VCD spectra typically require expensive quantum chemical calculations for all conformers to generate a Boltzmann averaged total spectrum. This paper reports whether machine learning (ML) can partly replace these quantum chemical calculations by capturing the intricate connection between a conformer geometry and its VCD spectrum. Three hypotheses concerning the added value of ML are tested. First, it is shown that for a single stereoisomer, ML can predict the VCD spectrum of a conformer from solely the conformer geometry. Second, it is found that the ML approach results in important time savings. Third, the ML model produced is unfortunately hardly transferable from one stereoisomer to another.
Collapse
Affiliation(s)
- Tom Vermeyen
- Department of Chemistry, University of Antwerp, Groenenborgerlaan 171, Antwerpen, 2020, Belgium.
- Department of Chemistry, Ghent University, Krijgslaan 281, Gent, 9000, Belgium.
| | - Ana Cunha
- Department of Chemistry, University of Antwerp, Groenenborgerlaan 171, Antwerpen, 2020, Belgium
| | - Patrick Bultinck
- Department of Chemistry, Ghent University, Krijgslaan 281, Gent, 9000, Belgium.
| | - Wouter Herrebout
- Department of Chemistry, University of Antwerp, Groenenborgerlaan 171, Antwerpen, 2020, Belgium
| |
Collapse
|
16
|
North N, Enders AA, Cable ML, Allen HC. Array-Based Machine Learning for Functional Group Detection in Electron Ionization Mass Spectrometry. ACS OMEGA 2023; 8:24341-24350. [PMID: 37457446 PMCID: PMC10339417 DOI: 10.1021/acsomega.3c01684] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 05/22/2023] [Indexed: 07/18/2023]
Abstract
Mass spectrometry is a ubiquitous technique capable of complex chemical analysis. The fragmentation patterns that appear in mass spectrometry are an excellent target for artificial intelligence methods to automate and expedite the analysis of data to identify targets such as functional groups. To develop this approach, we trained models on electron ionization (a reproducible hard fragmentation technique) mass spectra so that not only the final model accuracies but also the reasoning behind model assignments could be evaluated. The convolutional neural network (CNN) models were trained on 2D images of the spectra using transfer learning of Inception V3, and the logistic regression models were trained using array-based data and Scikit Learn implementation in Python. Our training dataset consisted of 21,166 mass spectra from the United States' National Institute of Standards and Technology (NIST) Webbook. The data was used to train models to identify functional groups, both specific (e.g., amines, esters) and generalized classifications (aromatics, oxygen-containing functional groups, and nitrogen-containing functional groups). We found that the highest final accuracies on identifying new data were observed using logistic regression rather than transfer learning on CNN models. It was also determined that the mass range most beneficial for functional group analysis is 0-100 m/z. We also found success in correctly identifying functional groups of example molecules selected from both the NIST database and experimental data. Beyond functional group analysis, we also have developed a methodology to identify impactful fragments for the accurate detection of the models' targets. The results demonstrate a potential pathway for analyzing and screening substantial amounts of mass spectral data.
Collapse
Affiliation(s)
- Nicole
M. North
- Department
of Chemistry & Biochemistry, The Ohio
State University, Columbus, Ohio 43210, United States
| | - Abigail A. Enders
- Department
of Chemistry & Biochemistry, The Ohio
State University, Columbus, Ohio 43210, United States
| | - Morgan L. Cable
- NASA
Jet Propulsion Laboratory, California Institute
of Technology, Pasadena, California 91109, United States
| | - Heather C. Allen
- Department
of Chemistry & Biochemistry, The Ohio
State University, Columbus, Ohio 43210, United States
| |
Collapse
|
17
|
Sun Y, Brockhauser S, Hegedűs P, Plückthun C, Gelisio L, Ferreira de Lima DE. Application of self-supervised approaches to the classification of X-ray diffraction spectra during phase transitions. Sci Rep 2023; 13:9370. [PMID: 37296300 DOI: 10.1038/s41598-023-36456-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 06/04/2023] [Indexed: 06/12/2023] Open
Abstract
Spectroscopy and X-ray diffraction techniques encode ample information on investigated samples. The ability of rapidly and accurately extracting these enhances the means to steer the experiment, as well as the understanding of the underlying processes governing the experiment. It improves the efficiency of the experiment, and maximizes the scientific outcome. To address this, we introduce and validate three frameworks based on self-supervised learning which are capable of classifying 1D spectral curves using data transformations preserving the scientific content and only a small amount of data labeled by domain experts. In particular, in this work we focus on the identification of phase transitions in samples investigated by x-ray powder diffraction. We demonstrate that the three frameworks, based either on relational reasoning, contrastive learning, or a combination of the two, are capable of accurately identifying phase transitions. Furthermore, we discuss in detail the selection of data augmentation techniques, crucial to ensure that scientifically meaningful information is retained.
Collapse
Affiliation(s)
- Yue Sun
- Software Engineering Department, Institute of Informatics, University of Szeged, Dugonics tér 13, Szeged, 6720, Hungary.
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany.
| | - Sandor Brockhauser
- Software Engineering Department, Institute of Informatics, University of Szeged, Dugonics tér 13, Szeged, 6720, Hungary
- Center for Materials Science Data, Humboldt-Universität zu Berlin, Zum Großen Windkanal 2, 12489, Berlin, Germany
| | - Péter Hegedűs
- Software Engineering Department, Institute of Informatics, University of Szeged, Dugonics tér 13, Szeged, 6720, Hungary.
| | - Christian Plückthun
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
- Deutsches Elektronen-Synchrotron (DESY), 22607, Hamburg, Germany
| | - Luca Gelisio
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
| | | |
Collapse
|
18
|
Talaty NN, Johnson RW, Sawicki J, Nacham O, Djuric SW. Recent Developments in Mass Spectrometry to Support Next-Generation Synthesis and Screening. ACS Med Chem Lett 2023; 14:711-718. [PMID: 37312853 PMCID: PMC10258828 DOI: 10.1021/acsmedchemlett.3c00040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 05/10/2023] [Indexed: 06/15/2023] Open
Abstract
The complexity of new therapeutics continues to increase and the timeline for the discovery of these therapeutics continues to shrink. This creates demand for new analytical techniques to facilitate quicker discovery and development of novel drugs. Mass spectrometry is one of the most prolific analytical techniques that has been applied across the entire drug discovery pipeline. New mass spectrometers and the associated methods for sampling have been introduced at a rate that keeps pace with new chemistries, therapeutic types, and screening practices used by modern drug hunters. This microperspective covers application and implementation of new mass spectrometry workflows that enable current and future efforts in screening and synthesis for drug discovery.
Collapse
Affiliation(s)
- Nari N. Talaty
- Discovery
Platform Technologies, AbbVie Inc., North Chicago, Illinois 60064, United States
| | - Robert W. Johnson
- Discovery
Platform Technologies, AbbVie Inc., North Chicago, Illinois 60064, United States
| | - James Sawicki
- Discovery
Platform Technologies, AbbVie Inc., North Chicago, Illinois 60064, United States
| | - Omprakash Nacham
- Discovery
Platform Technologies, AbbVie Inc., North Chicago, Illinois 60064, United States
| | - Stevan W. Djuric
- Discovery
Chemistry and Technology Consulting LLC, New Bern, North Carolina 28562, United States
| |
Collapse
|
19
|
Houthuijs KJ, Berden G, Engelke UFH, Gautam V, Wishart DS, Wevers RA, Martens J, Oomens J. An In Silico Infrared Spectral Library of Molecular Ions for Metabolite Identification. Anal Chem 2023. [PMID: 37262385 DOI: 10.1021/acs.analchem.3c01078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Infrared ion spectroscopy (IRIS) continues to see increasing use as an analytical tool for small-molecule identification in conjunction with mass spectrometry (MS). The IR spectrum of an m/z selected population of ions constitutes a unique fingerprint that is specific to the molecular structure. However, direct translation of an IR spectrum to a molecular structure remains challenging, as reference libraries of IR spectra of molecular ions largely do not exist. Quantum-chemically computed spectra can reliably be used as reference, but the challenge of selecting the candidate structures remains. Here, we introduce an in silico library of vibrational spectra of common MS adducts of over 4500 compounds found in the human metabolome database. In total, the library currently contains more than 75,000 spectra computed at the DFT level that can be queried with an experimental IR spectrum. Moreover, we introduce a database of 189 experimental IRIS spectra, which is employed to validate the automated spectral matching routines. This demonstrates that 75% of the metabolites in the experimental data set are correctly identified, based solely on their exact m/z and IRIS spectrum. Additionally, we demonstrate an approach for specifically identifying substructures by performing a search without m/z constraints to find structural analogues. Such an unsupervised search paves the way toward the de novo identification of unknowns that are absent in spectral libraries. We apply the in silico spectral library to identify an unknown in a plasma sample as 3-hydroxyhexanoic acid, highlighting the potential of the method.
Collapse
Affiliation(s)
- Kas J Houthuijs
- Institute for Molecules and Materials, FELIX Laboratory, Radboud University, Nijmegen 6525 ED, The Netherlands
| | - Giel Berden
- Institute for Molecules and Materials, FELIX Laboratory, Radboud University, Nijmegen 6525 ED, The Netherlands
| | - Udo F H Engelke
- Department of Genetics, Translational Metabolic Laboratory, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands
| | - Vasuk Gautam
- Department of Biological Sciences, University of Alberta, Edmonton AB T6G 2E9, Canada
| | - David S Wishart
- Department of Biological Sciences, University of Alberta, Edmonton AB T6G 2E9, Canada
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
- Department of Laboratory Medicine and Pathology, University of Alberta, Edmonton, AB T6G 2B7, Canada
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, AB T6G 2H7, Canada
| | - Ron A Wevers
- Department of Genetics, Translational Metabolic Laboratory, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands
| | - Jonathan Martens
- Institute for Molecules and Materials, FELIX Laboratory, Radboud University, Nijmegen 6525 ED, The Netherlands
| | - Jos Oomens
- Institute for Molecules and Materials, FELIX Laboratory, Radboud University, Nijmegen 6525 ED, The Netherlands
- van 't Hoff Institute for Molecular Sciences, University of Amsterdam, Amsterdam 1098 XH, The Netherlands
| |
Collapse
|
20
|
Liu Y, Yao W, Qin F, Zhou L, Zheng Y. Spectral Classification of Large-Scale Blended (Micro)Plastics Using FT-IR Raw Spectra and Image-Based Machine Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:6656-6663. [PMID: 37052503 DOI: 10.1021/acs.est.2c08952] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Microplastics (MPs) are currently recognized as emerging pollutants; their identification and classification are therefore essential during their monitoring and management. In contrast to most studies based on small datasets and library searches, this study developed and compared four machine learning-based classifiers and two large-scale blended plastic datasets, where a 1D convolutional neural network (CNN), decision tree, and random forest (RF) were fed with raw spectral data from Fourier transform infrared spectroscopy, while a 2D CNN used the corresponding spectral images as the input. With an overall accuracy of 96.43% on a small dataset and 97.44% on a large dataset, the 1D CNN outperformed other models. The 1D CNN was the best at predicting environment samples, while the RF was the most robust with less spectral data. Overall, RF and 2D CNNs might be evaluated for plastic identification with fewer spectral data; however, 1D CNNs were thought to be the most effective with sufficient spectral data. Accordingly, an open-source MP spectroscopic analysis tool was developed to facilitate a quick and accurate analysis of existing MP samples.
Collapse
Affiliation(s)
- Yanlong Liu
- Gansu Key Laboratory for Environmental Pollution Prediction and Control, College of Earth and Environmental Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Wenli Yao
- Gansu Key Laboratory for Environmental Pollution Prediction and Control, College of Earth and Environmental Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Fenghui Qin
- Gansu Key Laboratory for Environmental Pollution Prediction and Control, College of Earth and Environmental Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Lei Zhou
- College of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Yian Zheng
- Gansu Key Laboratory for Environmental Pollution Prediction and Control, College of Earth and Environmental Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| |
Collapse
|
21
|
Bougueroua S, Bricage M, Aboulfath Y, Barth D, Gaigeot MP. Algorithmic Graph Theory, Reinforcement Learning and Game Theory in MD Simulations: From 3D Structures to Topological 2D-Molecular Graphs (2D-MolGraphs) and Vice Versa. Molecules 2023; 28:molecules28072892. [PMID: 37049654 PMCID: PMC10096312 DOI: 10.3390/molecules28072892] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 03/17/2023] [Accepted: 03/18/2023] [Indexed: 04/14/2023] Open
Abstract
This paper reviews graph-theory-based methods that were recently developed in our group for post-processing molecular dynamics trajectories. We show that the use of algorithmic graph theory not only provides a direct and fast methodology to identify conformers sampled over time but also allows to follow the interconversions between the conformers through graphs of transitions in time. Examples of gas phase molecules and inhomogeneous aqueous solid interfaces are presented to demonstrate the power of topological 2D graphs and their versatility for post-processing molecular dynamics trajectories. An even more complex challenge is to predict 3D structures from topological 2D graphs. Our first attempts to tackle such a challenge are presented with the development of game theory and reinforcement learning methods for predicting the 3D structure of a gas-phase peptide.
Collapse
Affiliation(s)
- Sana Bougueroua
- Université Paris-Saclay, University Evry, CY Cergy Paris Université, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France
| | - Marie Bricage
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Ylène Aboulfath
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Dominique Barth
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Marie-Pierre Gaigeot
- Université Paris-Saclay, University Evry, CY Cergy Paris Université, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France
| |
Collapse
|
22
|
Fan X, Wang Y, Yu C, Lv Y, Zhang H, Yang Q, Wen M, Lu H, Zhang Z. A Universal and Accurate Method for Easily Identifying Components in Raman Spectroscopy Based on Deep Learning. Anal Chem 2023; 95:4863-4870. [PMID: 36908216 DOI: 10.1021/acs.analchem.2c03853] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Raman spectroscopy has been widely used to provide the structural fingerprint for molecular identification. Due to interference from coexisting components, noise, baseline, and systematic differences between spectrometers, component identification with Raman spectra is challenging, especially for mixtures. In this study, a method entitled DeepRaman has been proposed to solve those problems by combining the comparison ability of a pseudo-Siamese neural network (pSNN) and the input-shape flexibility of spatial pyramid pooling (SPP). DeepRaman was trained, validated, and tested with 41,564 augmented Raman spectra from two databases (pharmaceutical material and S.T. Japan). It can achieve 96.29% accuracy, 98.40% true positive rate (TPR), and 94.36% true negative rate (TNR) on the test set. Another six data sets measured on different instruments were used to evaluate the performance of the proposed method from different aspects. DeepRaman can provide accurate identification results and significantly outperform the hit quality index (HQI) method and other deep learning models. In addition, it performs well in cases of different spectral complexity and low-content components. Once the model is established, it can be used directly on different data sets without retraining or transfer learning. Furthermore, it also obtains promising results for the analysis of surface-enhanced Raman spectroscopy (SERS) data sets and Raman imaging data sets. In summary, it is an accurate, universal, and ready-to-use method for component identification in various application scenarios.
Collapse
Affiliation(s)
- Xiaqiong Fan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yue Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Chuanxiu Yu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yuanxia Lv
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Ming Wen
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
23
|
Fine J, Wijewardhane PR, Mohideen SDB, Smith K, Bothe JR, Krishnamachari Y, Andrews A, Liu Y, Chopra G. Learning Relationships Between Chemical and Physical Stability for Peptide Drug Development. Pharm Res 2023; 40:701-710. [PMID: 36797504 DOI: 10.1007/s11095-023-03475-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Accepted: 01/16/2023] [Indexed: 02/18/2023]
Abstract
PURPOSE OR OBJECTIVE Chemical and physical stabilities are two key features considered in pharmaceutical development. Chemical stability is typically reported as a combination of potency and degradation product. Moreover, fluorescent reporter Thioflavin-T is commonly used to measure physical stability. Executing stability studies is a lengthy process and requires extensive resources. To reduce the resources and shorten the process for stability studies during the development of a drug product, we introduce a machine learning-based model for predicting the chemical stability over time using both formulation conditions as well as aggregation curves. METHODS In this work, we develop the relationships between the formulation, stability timepoint, and the chemical stability measurements and evaluated the performance on a random test set. We have developed a multilayer perceptron (MLP) for total degradation prediction and a random forest (RF) model for potency. RESULTS The coefficient of determination (R2) of 0.945 and a mean absolute error (MAE) of 0.421 were achieved on the test set when using MLP for total degradation. Similarly, we achieved a R2 of 0.908 and MAE of 1.435 when predicting potency using the RF model. When physical stability measurements are included into the MLP model, the MAE of predicting TD decreases to 0.148. Using a similar strategy for potency prediction, the MAE decreases to 0.705 for the RF model. CONCLUSIONS We conclude two important points: first, chemical stability can be modeled using machine learning techniques and second there is a relationship between the physical stability of a peptide and its chemical stability.
Collapse
Affiliation(s)
- Jonathan Fine
- Department of Chemistry, Purdue University, West Lafayette, IN, USA
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ, USA
| | | | | | - Katelyn Smith
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ, USA
| | - Jameson R Bothe
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ, USA
| | - Yogita Krishnamachari
- Sterile and Specialty Products, Pharmaceutical Sciences, MRL, Merck & Co., Inc., Rahway, NJ, USA
| | - Alexandra Andrews
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ, USA
| | - Yong Liu
- Tango Therapeutics, Boston, MA, USA
| | - Gaurav Chopra
- Department of Chemistry, Purdue University, West Lafayette, IN, USA.
- Department of Computer Science (by courtesy), Purdue University, West Lafayette, NJ, USA.
| |
Collapse
|
24
|
Schmid N, Bruderer S, Paruzzo F, Fischetti G, Toscano G, Graf D, Fey M, Henrici A, Ziebart V, Heitmann B, Grabner H, Wegner JD, Sigel RKO, Wilhelm D. Deconvolution of 1D NMR spectra: A deep learning-based approach. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2023; 347:107357. [PMID: 36563418 DOI: 10.1016/j.jmr.2022.107357] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 12/01/2022] [Accepted: 12/04/2022] [Indexed: 06/17/2023]
Abstract
The analysis of nuclear magnetic resonance (NMR) spectra to detect peaks and characterize their parameters, often referred to as deconvolution, is a crucial step in the quantification, elucidation, and verification of the structure of molecular systems. However, deconvolution of 1D NMR spectra is a challenge for both experts and machines. We propose a robust, expert-level quality deep learning-based deconvolution algorithm for 1D experimental NMR spectra. The algorithm is based on a neural network trained on synthetic spectra. Our customized pre-processing and labeling of the synthetic spectra enable the estimation of critical peak parameters. Furthermore, the neural network model transfers well to the experimental spectra and demonstrates low fitting errors and sparse peak lists in challenging scenarios such as crowded, high dynamic range, shoulder peak regions as well as broad peaks. We demonstrate in challenging spectra that the proposed algorithm is superior to expert results.
Collapse
Affiliation(s)
- N Schmid
- Zurich University of Applied Sciences (ZHAW), Switzerland; University of Zurich (UZH), Switzerland.
| | | | | | | | | | - D Graf
- Bruker Switzerland AG, Switzerland
| | - M Fey
- Bruker Switzerland AG, Switzerland
| | - A Henrici
- Zurich University of Applied Sciences (ZHAW), Switzerland
| | - V Ziebart
- Zurich University of Applied Sciences (ZHAW), Switzerland
| | | | - H Grabner
- Zurich University of Applied Sciences (ZHAW), Switzerland
| | | | | | - D Wilhelm
- Zurich University of Applied Sciences (ZHAW), Switzerland
| |
Collapse
|
25
|
Bougueroua S, Aboulfath Y, Barth D, Gaigeot MP. Algorithmic graph theory for post-processing molecular dynamics trajectories. Mol Phys 2023. [DOI: 10.1080/00268976.2022.2162456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Affiliation(s)
- Sana Bougueroua
- Université Paris-Saclay, Univ Evry, CNRS, LAMBE UMR8587, Evry-Courcouronnes, France
| | - Ylène Aboulfath
- Université Paris-Saclay, Univ Versailles SQ, DAVID, Versailles, France
| | - Dominique Barth
- Université Paris-Saclay, Univ Versailles SQ, DAVID, Versailles, France
| | - Marie-Pierre Gaigeot
- Université Paris-Saclay, Univ Evry, CNRS, LAMBE UMR8587, Evry-Courcouronnes, France
| |
Collapse
|
26
|
Pant A, Kaur T, Sharma T, Singh J, Suttee A, Barnwal RP, Kaur IP, Singh G, Singh B. A glass matrices-assisted quantum dots-based biosensor for selective capturing and detection of Escherichia coli. JOURNAL OF WATER AND HEALTH 2022; 20:1673-1687. [PMID: 36573672 DOI: 10.2166/wh.2022.293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Bacterial contamination of water and food is a grave health concern rendering humans quite vulnerable to disease(s), and proving, at times, fatal too. Exploration of the novel diagnostic tools is, accordingly, highly called for to ensure rapid detection of the pathogenic bacteria, particularly Escherichia coli. The current manuscript, accordingly, reports the use of silane-functionalized glass matrices and antibody-conjugated cadmium telluride (CdTe) quantum dots (QDs) for efficient detection of E. coli. Synthesis of QDs (size: 5.4-6.8 nm) using mercaptopropionic acid (MPA) stabilizer yielded stable photoluminescence (∼62%), corroborating superior fluorescent characteristics. A test sample, when added to antibody-conjugated matrices, followed by antibody-conjugated CdTe-MPA QDs, formed a pathogen-antibody QDs complex. The latter, during confocal microscopy, demonstrated rapid detection of the selectively captured pathogenic bacteria (10 microorganism cells/10 μL) with enhanced sensitivity and specificity. The work, overall, encompasses establishment and design of an innovative detection platform in microbial diagnostics for rapid capturing of pathogens in water and food samples.
Collapse
Affiliation(s)
- Anjali Pant
- University Institute of Pharmaceutical Sciences, Panjab University, Chandigarh, India 160014
| | - Taranvir Kaur
- University Institute of Pharmaceutical Sciences, Panjab University, Chandigarh, India 160014
| | - Teenu Sharma
- Chitkara College of Pharmacy, Chitkara University, Rajpura, Punjab, India 140401
| | - Joga Singh
- University Institute of Pharmaceutical Sciences, Panjab University, Chandigarh, India 160014
| | - Ashish Suttee
- Department of Pharmacognosy, School of Pharmaceutical Sciences, Lovely Professional University, Jalandhar, Punjab, India
| | | | - Indu Pal Kaur
- University Institute of Pharmaceutical Sciences, Panjab University, Chandigarh, India 160014
| | - Gurpal Singh
- University Institute of Pharmaceutical Sciences, Panjab University, Chandigarh, India 160014
| | - Bhupinder Singh
- University Institute of Pharmaceutical Sciences, Panjab University, Chandigarh, India 160014 ; Chitkara College of Pharmacy, Chitkara University, Rajpura, Punjab, India 140401
| |
Collapse
|
27
|
TransG-net: transformer and graph neural network based multi-modal data fusion network for molecular properties prediction. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04351-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
28
|
Zheng Y, Zhang L, He S, Xie Z, Zhang J, Ge C, Sun G, Huang J, Li H. Integrated Module of Multidimensional Omics for Peripheral Biomarkers (iMORE) in patients with major depressive disorder: rationale and design of a prospective multicentre cohort study. BMJ Open 2022; 12:e067447. [PMID: 36418119 PMCID: PMC9685190 DOI: 10.1136/bmjopen-2022-067447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
INTRODUCTION Major depressive disorder (MDD) represents a worldwide burden on healthcare and the response to antidepressants remains limited. Systems biology approaches have been used to explore the precision therapy. However, no reliable biomarker clinically exists for prognostic prediction at present. The objectives of the Integrated Module of Multidimensional Omics for Peripheral Biomarkers (iMORE) study are to predict the efficacy of antidepressants by integrating multidimensional omics and performing validation in a real-world setting. As secondary aims, a series of potential biomarkers are explored for biological subtypes. METHODS AND ANALYSIS iMore is an observational cohort study in patients with MDD with a multistage design in China. The study is performed by three mental health centres comprising an observation phase and a validation phase. A total of 200 patients with MDD and 100 healthy controls were enrolled. The protocol-specified antidepressants are selective serotonin reuptake inhibitors and serotonin-norepinephrine reuptake inhibitors. Clinical visits (baseline, 4 and 8 weeks) include psychiatric rating scales for symptom assessment and biospecimen collection for multiomics analysis. Participants are divided into responders and non-responders based on treatment response (>50% reduction in Montgomery-Asberg Depression Rating Scale). Antidepressants' responses are predicted and biomarkers are explored using supervised learning approach by integration of metabolites, cytokines, gut microbiomes and immunophenotypic cells. The accuracy of the prediction models constructed is verified in an independent validation phase. ETHICS AND DISSEMINATION The study was approved by the ethics committee of Shanghai Mental Health Center (approval number 2020-87). All participants need to sign a written consent for the study entry. Study findings will be published in peer-reviewed journals. TRIAL REGISTRATION NUMBER NCT04518592.
Collapse
Affiliation(s)
- Yuzhen Zheng
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Linna Zhang
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shen He
- Department of Psychiatry, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zuoquan Xie
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Jing Zhang
- Shanghai Green Valley Pharmaceutical Co Ltd, Shanghai, China
| | - Changrong Ge
- Shanghai Green Valley Pharmaceutical Co Ltd, Shanghai, China
| | - Guangqiang Sun
- Shanghai Green Valley Pharmaceutical Co Ltd, Shanghai, China
| | - Jingjing Huang
- Department of Psychiatry, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Clinical Research Center for Mental Health, Shanghai Mental Health Center, Shanghai, China
| | - Huafang Li
- Department of Psychiatry, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Clinical Research Center for Mental Health, Shanghai Mental Health Center, Shanghai, China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
29
|
Li C, Cong Y, Deng W. Identifying molecular functional groups of organic compounds by deep learning of NMR data. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2022; 60:1061-1069. [PMID: 35674984 DOI: 10.1002/mrc.5292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 06/02/2022] [Accepted: 06/06/2022] [Indexed: 06/15/2023]
Abstract
We preprocess the raw nuclear magnetic resonance (NMR) spectrum and extract key features by using two different methodologies, called equidistant sampling and peak sampling for subsequent substructure pattern recognition. We also provide a strategy to address the imbalance issue frequently encountered in statistical modeling of NMR data set and establish two conventional support vector machine (SVM) and K-nearest neighbor (KNN) models to assess the capability of two feature selections, respectively. Our results in this study show that the models using the selected features of peak sampling outperform those using equidistant sampling. Then we build the recurrent neural network (RNN) model trained by data collected from peak sampling. Furthermore, we illustrate the easier optimization of hyperparameters and the better generalization ability of the RNN deep learning model by detailed comparison with traditional machine learning SVM and KNN models.
Collapse
Affiliation(s)
- Chongcan Li
- School of Mathematics and Statistics, Gansu Key Laboratory of Applied Mathematics and Complex Systems, Lanzhou University, Lanzhou, China
| | - Yong Cong
- College of Chemistry and Chemical Engineering, State Key Laboratory of Applied Organic Chemistry, Key Laboratory of Nonferrous Metals Chemistry and Resources Utilization, Lanzhou University, Lanzhou, China
| | - Weihua Deng
- School of Mathematics and Statistics, Gansu Key Laboratory of Applied Mathematics and Complex Systems, Lanzhou University, Lanzhou, China
| |
Collapse
|
30
|
Boiko DA, Kozlov KS, Burykina JV, Ilyushenkova VV, Ananikov VP. Fully Automated Unconstrained Analysis of High-Resolution Mass Spectrometry Data with Machine Learning. J Am Chem Soc 2022; 144:14590-14606. [PMID: 35939718 DOI: 10.1021/jacs.2c03631] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Mass spectrometry (MS) is a convenient, highly sensitive, and reliable method for the analysis of complex mixtures, which is vital for materials science, life sciences fields such as metabolomics and proteomics, and mechanistic research in chemistry. Although it is one of the most powerful methods for individual compound detection, complete signal assignment in complex mixtures is still a great challenge. The unconstrained formula-generating algorithm, covering the entire spectra and revealing components, is a "dream tool" for researchers. We present the framework for efficient MS data interpretation, describing a novel approach for detailed analysis based on deisotoping performed by gradient-boosted decision trees and a neural network that generates molecular formulas from the fine isotopic structure, approaching the long-standing inverse spectral problem. The methods were successfully tested on three examples: fragment ion analysis in protein sequencing for proteomics, analysis of the natural samples for life sciences, and study of the cross-coupling catalytic system for chemistry.
Collapse
Affiliation(s)
- Daniil A Boiko
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia
| | - Konstantin S Kozlov
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia
| | - Julia V Burykina
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia
| | - Valentina V Ilyushenkova
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia
| | - Valentine P Ananikov
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia
| |
Collapse
|
31
|
Wang Q, Zhang Q, He H, Feng Z, Mao J, Hu X, Wei X, Bi S, Qin G, Wang X, Ge B, Yu D, Ren H, Huang F. Carbon Dot Blinking Fingerprint Uncovers Native Membrane Receptor Organizations via Deep Learning. Anal Chem 2022; 94:3914-3921. [PMID: 35188385 DOI: 10.1021/acs.analchem.1c04947] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Oligomeric organization of G protein-coupled receptors is proposed to regulate receptor signaling and function, yet rapid and precise identification of the oligomeric status especially for native receptors on a cell membrane remains an outstanding challenge. By using blinking carbon dots (CDs), we now develop a deep learning (DL)-based blinking fingerprint recognition method, named deep-blinking fingerprint recognition (BFR), which allows automatic classification of CD-labeled receptor organizations on a cell membrane. This DL model integrates convolutional layers, long-short-term memory, and fully connected layers to extract time-dependent blinking features of CDs and is trained to a high accuracy (∼95%) for identifying receptor organizations. Using deep blinking fingerprint recognition, we found that CXCR4 mainly exists as 87.3% monomers, 12.4% dimers, and <1% higher-order oligomers on a HeLa cell membrane. We further demonstrate that the heterogeneous organizations can be regulated by various stimuli at different degrees. The receptor-binding ligands, agonist SDF-1α and antagonist AMD3100, can induce the dimerization of CXCR4 to 33.1 and 20.3%, respectively. In addition, cytochalasin D, which inhibits actin polymerization, similarly prompts significant dimerization of CXCR4 to 30.9%. The multi-pathway organization regulation will provide an insight for understanding the oligomerization mechanism of CXCR4 as well as for elucidating their physiological functions.
Collapse
Affiliation(s)
- Qian Wang
- State Key Laboratory of Heavy Oil Processing and College of Chemistry and Chemical Engineering, China University of Petroleum (East China), Qingdao 266580, China
| | - Qian Zhang
- State Key Laboratory of Heavy Oil Processing and College of Chemistry and Chemical Engineering, China University of Petroleum (East China), Qingdao 266580, China
| | - Hua He
- State Key Laboratory of Heavy Oil Processing and College of Chemistry and Chemical Engineering, China University of Petroleum (East China), Qingdao 266580, China
| | - Zhenzhen Feng
- Technical Center of Qingdao Customs District, Qingdao 266500, China
| | - Jian Mao
- State Key Laboratory of Heavy Oil Processing and College of Chemistry and Chemical Engineering, China University of Petroleum (East China), Qingdao 266580, China
| | - Xiang Hu
- State Key Laboratory of Heavy Oil Processing and College of Chemistry and Chemical Engineering, China University of Petroleum (East China), Qingdao 266580, China
| | - Xiaoyun Wei
- State Key Laboratory of Heavy Oil Processing and College of Chemistry and Chemical Engineering, China University of Petroleum (East China), Qingdao 266580, China
| | - Simin Bi
- State Key Laboratory of Heavy Oil Processing and College of Chemistry and Chemical Engineering, China University of Petroleum (East China), Qingdao 266580, China
| | - Guangyong Qin
- State Key Laboratory of Heavy Oil Processing and College of Chemistry and Chemical Engineering, China University of Petroleum (East China), Qingdao 266580, China
| | - Xiaojuan Wang
- State Key Laboratory of Heavy Oil Processing and College of Chemistry and Chemical Engineering, China University of Petroleum (East China), Qingdao 266580, China
| | - Baosheng Ge
- State Key Laboratory of Heavy Oil Processing and College of Chemistry and Chemical Engineering, China University of Petroleum (East China), Qingdao 266580, China
| | - Daoyong Yu
- State Key Laboratory of Heavy Oil Processing and College of Chemistry and Chemical Engineering, China University of Petroleum (East China), Qingdao 266580, China
| | - Hao Ren
- State Key Laboratory of Heavy Oil Processing and College of Chemistry and Chemical Engineering, China University of Petroleum (East China), Qingdao 266580, China
| | - Fang Huang
- State Key Laboratory of Heavy Oil Processing and College of Chemistry and Chemical Engineering, China University of Petroleum (East China), Qingdao 266580, China
| |
Collapse
|
32
|
Han R, Ketkaew R, Luber S. A Concise Review on Recent Developments of Machine Learning for the Prediction of Vibrational Spectra. J Phys Chem A 2022; 126:801-812. [PMID: 35133168 DOI: 10.1021/acs.jpca.1c10417] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Machine learning has become more and more popular in computational chemistry, as well as in the important field of spectroscopy. In this concise review, we walk the reader through a short summary of machine learning algorithms and a comprehensive discussion on the connection between machine learning methods and vibrational spectroscopy, particularly for the case of infrared and Raman spectroscopy. We also briefly discuss state-of-the-art molecular representations which serve as meaningful inputs for machine learning to predict vibrational spectra. In addition, this review provides an overview of the transferability and best practices of machine learning in the prediction of vibrational spectra as well as possible future research directions.
Collapse
Affiliation(s)
- Ruocheng Han
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Rangsiman Ketkaew
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Sandra Luber
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| |
Collapse
|
33
|
Sridharan B, Goel M, Priyakumar UD. Modern Machine Learning for Tackling Inverse Problems in Chemistry: Molecular Design to Realization. Chem Commun (Camb) 2022; 58:5316-5331. [DOI: 10.1039/d1cc07035e] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The discovery of new molecules and materials helps expand the horizons of novel and innovative real-life applications. In the pursuit of finding molecules with desired properties, chemists have traditionally relied...
Collapse
|
34
|
Liu L, Bi M, Wang Y, Liu J, Jiang X, Xu Z, Zhang X. Artificial intelligence-powered microfluidics for nanomedicine and materials synthesis. NANOSCALE 2021; 13:19352-19366. [PMID: 34812823 DOI: 10.1039/d1nr06195j] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Artificial intelligence (AI) is an emerging technology with great potential, and its robust calculation and analysis capabilities are unmatched by traditional calculation tools. With the promotion of deep learning and open-source platforms, the threshold of AI has also become lower. Combining artificial intelligence with traditional fields to create new fields of high research and application value has become a trend. AI has been involved in many disciplines, such as medicine, materials, energy, and economics. The development of AI requires the support of many kinds of data, and microfluidic systems can often mine object data on a large scale to support AI. Due to the excellent synergy between the two technologies, excellent research results have emerged in many fields. In this review, we briefly review AI and microfluidics and introduce some applications of their combination, mainly in nanomedicine and material synthesis. Finally, we discuss the development trend of the combination of the two technologies.
Collapse
Affiliation(s)
- Linbo Liu
- John A. Paulson School of Engineering and Applied Science, Harvard University, Cambridge, MA 02138, USA
| | - Mingcheng Bi
- Institute of Process Equipment, College of Energy Engineering, Zhejiang University, Hangzhou 310027, P.R. China
| | - Yunhua Wang
- John A. Paulson School of Engineering and Applied Science, Harvard University, Cambridge, MA 02138, USA
| | - Junfeng Liu
- Institute of Process Equipment, College of Energy Engineering, Zhejiang University, Hangzhou 310027, P.R. China
| | - Xiwen Jiang
- College of Biological Science and Engineering, Fuzhou university, Fuzhou 350108, P.R. China
| | - Zhongbin Xu
- Institute of Process Equipment, College of Energy Engineering, Zhejiang University, Hangzhou 310027, P.R. China
| | - Xingcai Zhang
- John A. Paulson School of Engineering and Applied Science, Harvard University, Cambridge, MA 02138, USA
- School of Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
35
|
Huang Z, Chen MS, Woroch CP, Markland TE, Kanan MW. A framework for automated structure elucidation from routine NMR spectra. Chem Sci 2021; 12:15329-15338. [PMID: 34976353 PMCID: PMC8635205 DOI: 10.1039/d1sc04105c] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 11/08/2021] [Indexed: 12/25/2022] Open
Abstract
Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional 1H and/or 13C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms. A machine learning model and graph generator were able to accurately predict for the presence of nearly 1000 substructures and the connectivity of small organic molecules from experimental 1D NMR data.![]()
Collapse
Affiliation(s)
- Zhaorui Huang
- Department of Chemistry, Stanford University Stanford CA 94305 USA
| | - Michael S Chen
- Department of Chemistry, Stanford University Stanford CA 94305 USA
| | | | | | - Matthew W Kanan
- Department of Chemistry, Stanford University Stanford CA 94305 USA
| |
Collapse
|
36
|
Beniddir MA, Kang KB, Genta-Jouve G, Huber F, Rogers S, van der Hooft JJJ. Advances in decomposing complex metabolite mixtures using substructure- and network-based computational metabolomics approaches. Nat Prod Rep 2021; 38:1967-1993. [PMID: 34821250 PMCID: PMC8597898 DOI: 10.1039/d1np00023c] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Indexed: 12/13/2022]
Abstract
Covering: up to the end of 2020Recently introduced computational metabolome mining tools have started to positively impact the chemical and biological interpretation of untargeted metabolomics analyses. We believe that these current advances make it possible to start decomposing complex metabolite mixtures into substructure and chemical class information, thereby supporting pivotal tasks in metabolomics analysis including metabolite annotation, the comparison of metabolic profiles, and network analyses. In this review, we highlight and explain key tools and emerging strategies covering 2015 up to the end of 2020. The majority of these tools aim at processing and analyzing liquid chromatography coupled to mass spectrometry fragmentation data. We start with defining what substructures are, how they relate to molecular fingerprints, and how recognizing them helps to decompose complex mixtures. We continue with chemical classes that are based on the presence or absence of particular molecular scaffolds and/or functional groups and are thus intrinsically related to substructures. We discuss novel tools to mine substructures, annotate chemical compound classes, and create mass spectral networks from metabolomics data and demonstrate them using two case studies. We also review and speculate about the opportunities that NMR spectroscopy-based metabolome mining of complex metabolite mixtures offers to discover substructures and chemical classes. Finally, we will describe the main benefits and limitations of the current tools and strategies that rely on them, and our vision on how this exciting field can develop toward repository-scale-sized metabolomics analyses. Complementary sources of structural information from genomics analyses and well-curated taxonomic records are also discussed. Many research fields such as natural products discovery, pharmacokinetic and drug metabolism studies, and environmental metabolomics increasingly rely on untargeted metabolomics to gain biochemical and biological insights. The here described technical advances will benefit all those metabolomics disciplines by transforming spectral data into knowledge that can answer biological questions.
Collapse
Affiliation(s)
- Mehdi A Beniddir
- Université Paris-Saclay, CNRS, BioCIS, 5 rue J.-B Clément, 92290 Châtenay-Malabry, France
| | - Kyo Bin Kang
- Research Institute of Pharmaceutical Sciences, College of Pharmacy, Sookmyung Women's University, Seoul 04310, Republic of Korea
| | - Grégory Genta-Jouve
- Laboratoire de Chimie-Toxicologie Analytique et Cellulaire (C-TAC), UMR CNRS 8038, CiTCoM, Université de Paris, 4, Avenue de l'Observatoire, 75006, Paris, France
- Laboratoire Ecologie, Evolution, Interactions des Systèmes Amazoniens (LEEISA), USR 3456, Université De Guyane, CNRS Guyane, 275 Route de Montabo, 97334 Cayenne, French Guiana, France
| | - Florian Huber
- Netherlands eScience Center, 1098 XG Amsterdam, The Netherlands
| | - Simon Rogers
- School of Computing Science, University of Glasgow, Glasgow G12 8QQ, UK
| | | |
Collapse
|
37
|
Hammer AS, Leonov AI, Bell NL, Cronin L. Chemputation and the Standardization of Chemical Informatics. JACS AU 2021; 1:1572-1587. [PMID: 34723260 PMCID: PMC8549037 DOI: 10.1021/jacsau.1c00303] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Indexed: 05/11/2023]
Abstract
The explosion in the use of machine learning for automated chemical reaction optimization is gathering pace. However, the lack of a standard architecture that connects the concept of chemical transformations universally to software and hardware provides a barrier to using the results of these optimizations and could cause the loss of relevant data and prevent reactions from being reproducible or unexpected findings verifiable or explainable. In this Perspective, we describe how the development of the field of digital chemistry or chemputation, that is the universal code-enabled control of chemical reactions using a standard language and ontology, will remove these barriers allowing users to focus on the chemistry and plug in algorithms according to the problem space to be explored or unit function to be optimized. We describe a standard hardware (the chemical processing programming architecture-the ChemPU) to encompass all chemical synthesis, an approach which unifies all chemistry automation strategies, from solid-phase peptide synthesis, to HTE flow chemistry platforms, while at the same time establishing a publication standard so that researchers can exchange chemical code (χDL) to ensure reproducibility and interoperability. Not only can a vast range of different chemistries be plugged into the hardware, but the ever-expanding developments in software and algorithms can also be accommodated. These technologies, when combined will allow chemistry, or chemputation, to follow computation-that is the running of code across many different types of capable hardware to get the same result every time with a low error rate.
Collapse
|
38
|
Ye S, Liang J, Zhu X. Catalyst deep neural networks (Cat-DNNs) in singlet fission property prediction. Phys Chem Chem Phys 2021; 23:20835-20840. [PMID: 34505584 DOI: 10.1039/d1cp03594k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Many current deep neural network (DNN) models only focus on straightforward optimization over the given database. However, most numerical fitting procedures depart from physical laws. By introducing the concept of "catalysis" from physical chemistry, we propose that the physical correlations among molecular properties could spontaneously act as a catalyst in the DNNs, which increases the accuracy, and more importantly, guides the DNNs in the right way. These Catalysis-DNNs (Cat-DNNs) could precisely predict both the ground and excited-state properties, especially the molecules' screening with singlet fission character. We show that traditional machine learning metrics are not suitable for evaluating model accuracy in physical-chemical tasks and issue new physical errors. We believe that the agile transfer of fundamental physics or chemistry domain knowledge, like the catalyst, could significantly benefit both the architecture and application of artificial intelligence technology in the future.
Collapse
Affiliation(s)
- Shuqian Ye
- School of Science and Engineering (SSE), Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS), The Chinese University of Hong Kong, Shenzhen(CUHK-Shenzhen), 14-15F, Tower G2, Xinghe World, Rd Yabao, Longgang District, Shenzhen, Guangdong, 518172, China.
| | - Jiechun Liang
- School of Science and Engineering (SSE), Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS), The Chinese University of Hong Kong, Shenzhen(CUHK-Shenzhen), 14-15F, Tower G2, Xinghe World, Rd Yabao, Longgang District, Shenzhen, Guangdong, 518172, China.
| | - Xi Zhu
- School of Science and Engineering (SSE), Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS), The Chinese University of Hong Kong, Shenzhen(CUHK-Shenzhen), 14-15F, Tower G2, Xinghe World, Rd Yabao, Longgang District, Shenzhen, Guangdong, 518172, China.
| |
Collapse
|
39
|
Vermeyen T, Brence J, Van Echelpoel R, Aerts R, Acke G, Bultinck P, Herrebout W. Exploring machine learning methods for absolute configuration determination with vibrational circular dichroism. Phys Chem Chem Phys 2021; 23:19781-19789. [PMID: 34524304 DOI: 10.1039/d1cp02428k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The added value of supervised Machine Learning (ML) methods to determine the Absolute Configuration (AC) of compounds from their Vibrational Circular Dichroism (VCD) spectra was explored. Among all ML methods considered, Random Forest (RF) and Feedforward Neural Network (FNN) yield the best performance for identification of the AC. At its best, FNN allows near-perfect AC determination, with accuracy of prediction up to 0.995, while RF combines good predictive accuracy (up to 0.940) with the ability to identify the spectral areas important for the identification of the AC. No loss in performance of either model is observed as long as the spectral sampling interval used does not exceed the spectral bandwidth. Increasing the sampling interval proves to be the best method to lower the dimensionality of the input data, thereby decreasing the computational cost associated with the training of the models.
Collapse
Affiliation(s)
- Tom Vermeyen
- Department of Chemistry, University of Antwerp, Groenenborgerlaan 171, B-2020 Antwerp, Belgium. .,Department of Chemistry, Ghent University, Krijgslaan 281, B-9000 Ghent, Belgium.
| | - Jure Brence
- Department of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia.,Jožef Stefan International Postgraduate School, Jamova cesta 39, 1000 Ljubljana, Slovenia
| | - Robin Van Echelpoel
- Department of Chemistry, University of Antwerp, Groenenborgerlaan 171, B-2020 Antwerp, Belgium.
| | - Roy Aerts
- Department of Chemistry, University of Antwerp, Groenenborgerlaan 171, B-2020 Antwerp, Belgium.
| | - Guillaume Acke
- Department of Chemistry, Ghent University, Krijgslaan 281, B-9000 Ghent, Belgium.
| | - Patrick Bultinck
- Department of Chemistry, Ghent University, Krijgslaan 281, B-9000 Ghent, Belgium.
| | - Wouter Herrebout
- Department of Chemistry, University of Antwerp, Groenenborgerlaan 171, B-2020 Antwerp, Belgium.
| |
Collapse
|
40
|
Enders AA, North NM, Fensore CM, Velez-Alvarez J, Allen HC. Functional Group Identification for FTIR Spectra Using Image-Based Machine Learning Models. Anal Chem 2021; 93:9711-9718. [PMID: 34190551 DOI: 10.1021/acs.analchem.1c00867] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Fourier transform infrared spectroscopy (FTIR) is a ubiquitous spectroscopic technique. Spectral interpretation is a time-consuming process, but it yields important information about functional groups present in compounds and in complex substances. We develop a generalizable model via a machine learning (ML) algorithm using convolutional neural networks (CNNs) to identify the presence of functional groups in gas-phase FTIR spectra. The ML models reduce the amount of time required to analyze functional groups and facilitate interpretation of FTIR spectra. Through web scraping, we acquire intensity-frequency data from 8728 gas-phase organic molecules within the NIST spectral database and transform the data into spectral images. We successfully train models for 15 of the most common organic functional groups, which we then determine via identification from previously untrained spectra. These models serve to expand the application of FTIR measurements for facile analysis of organic samples. Our approach was done such that we have broad functional group models that infer in tandem to provide full interpretation of a spectrum. We present the first implementation of ML using image-based CNNs for predicting functional groups from a spectroscopic method.
Collapse
Affiliation(s)
- Abigail A Enders
- Department of Chemistry & Biochemistry, The Ohio State University, Columbus, Ohio 43210, United States
| | - Nicole M North
- Department of Chemistry & Biochemistry, The Ohio State University, Columbus, Ohio 43210, United States
| | - Chase M Fensore
- Department of Chemistry & Biochemistry, The Ohio State University, Columbus, Ohio 43210, United States
| | - Juan Velez-Alvarez
- Department of Chemistry & Biochemistry, The Ohio State University, Columbus, Ohio 43210, United States
| | - Heather C Allen
- Department of Chemistry & Biochemistry, The Ohio State University, Columbus, Ohio 43210, United States
| |
Collapse
|
41
|
Westermayr J, Gastegger M, Schütt KT, Maurer RJ. Perspective on integrating machine learning into computational chemistry and materials science. J Chem Phys 2021; 154:230903. [PMID: 34241249 DOI: 10.1063/5.0047760] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Machine learning (ML) methods are being used in almost every conceivable area of electronic structure theory and molecular simulation. In particular, ML has become firmly established in the construction of high-dimensional interatomic potentials. Not a day goes by without another proof of principle being published on how ML methods can represent and predict quantum mechanical properties-be they observable, such as molecular polarizabilities, or not, such as atomic charges. As ML is becoming pervasive in electronic structure theory and molecular simulation, we provide an overview of how atomistic computational modeling is being transformed by the incorporation of ML approaches. From the perspective of the practitioner in the field, we assess how common workflows to predict structure, dynamics, and spectroscopy are affected by ML. Finally, we discuss how a tighter and lasting integration of ML methods with computational chemistry and materials science can be achieved and what it will mean for research practice, software development, and postgraduate training.
Collapse
Affiliation(s)
- Julia Westermayr
- Department of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, United Kingdom
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Kristof T Schütt
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Reinhard J Maurer
- Department of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, United Kingdom
| |
Collapse
|
42
|
Coic L, Sacré PY, Dispas A, De Bleye C, Fillet M, Ruckebusch C, Hubert P, Ziemons E. Pixel-based Raman hyperspectral identification of complex pharmaceutical formulations. Anal Chim Acta 2021; 1155:338361. [PMID: 33766319 DOI: 10.1016/j.aca.2021.338361] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 02/24/2021] [Accepted: 02/25/2021] [Indexed: 12/16/2022]
Abstract
Hyperspectral imaging has been widely used for different kinds of applications and many chemometric tools have been developed to help identifying chemical compounds. However, most of those tools rely on factorial decomposition techniques that can be challenging for large data sets and/or in the presence of minor compounds. The present study proposes a pixel-based identification (PBI) approach that allows readily identifying spectral signatures in Raman hyperspectral imaging data. This strategy is based on the identification of essential spectral pixels (ESP), which can be found by convex hull calculation. As the corresponding set of spectra is largely reduced and encompasses the purest spectral signatures, direct database matching and identification can be reliably and rapidly performed. The efficiency of PBI was evaluated on both known and unknown samples, considering genuine and falsified pharmaceutical tablets. We showed that it is possible to analyze a wide variety of pharmaceutical formulations of increasing complexity (from 5 to 0.1% (w/w) of polymorphic impurity detection) for medium (150 x 150 pixels) and big (1000 x 1000 pixels) map sizes in less than 2 min. Moreover, in the case of falsified medicines, it is demonstrated that the proposed approach allows the identification of all compounds, found in very different proportions and, sometimes, in trace amounts. Furthermore, the relevant spectral signatures for which no match is found in the reference database can be identified at a later stage and the nature of the corresponding compounds further investigated. Overall, the provided results show that Raman hyperspectral imaging combined with PBI enables rapid and reliable spectral identification of complex pharmaceutical formulations.
Collapse
Affiliation(s)
- Laureen Coic
- University of Liege (ULiege), CIRM, Vibra-Santé Hub, Laboratory of Pharmaceutical Analytical Chemistry, Avenue Hippocrate 15, 4000, Liege, Belgium.
| | - Pierre-Yves Sacré
- University of Liege (ULiege), CIRM, Vibra-Santé Hub, Laboratory of Pharmaceutical Analytical Chemistry, Avenue Hippocrate 15, 4000, Liege, Belgium
| | - Amandine Dispas
- University of Liege (ULiege), CIRM, Vibra-Santé Hub, Laboratory of Pharmaceutical Analytical Chemistry, Avenue Hippocrate 15, 4000, Liege, Belgium; University of Liege (ULiege), CIRM, MaS-Santé Hub, Laboratory for the Analysis of Medicines, Avenue Hippocrate 15, 4000, Liege, Belgium
| | - Charlotte De Bleye
- University of Liege (ULiege), CIRM, Vibra-Santé Hub, Laboratory of Pharmaceutical Analytical Chemistry, Avenue Hippocrate 15, 4000, Liege, Belgium
| | - Marianne Fillet
- University of Liege (ULiege), CIRM, MaS-Santé Hub, Laboratory for the Analysis of Medicines, Avenue Hippocrate 15, 4000, Liege, Belgium
| | - Cyril Ruckebusch
- University of Lille, CNRS, UMR 8516 LAboratoire de Spectroscopie pour les Interactions, la Réactivité et l'Environnement (LASIRE), F-59000, Lille, France
| | - Philippe Hubert
- University of Liege (ULiege), CIRM, Vibra-Santé Hub, Laboratory of Pharmaceutical Analytical Chemistry, Avenue Hippocrate 15, 4000, Liege, Belgium
| | - Eric Ziemons
- University of Liege (ULiege), CIRM, Vibra-Santé Hub, Laboratory of Pharmaceutical Analytical Chemistry, Avenue Hippocrate 15, 4000, Liege, Belgium
| |
Collapse
|
43
|
Specht T, Münnemann K, Hasse H, Jirasek F. Automated Methods for Identification and Quantification of Structural Groups from Nuclear Magnetic Resonance Spectra Using Support Vector Classification. J Chem Inf Model 2021; 61:143-155. [PMID: 33405926 DOI: 10.1021/acs.jcim.0c01186] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Nuclear magnetic resonance (NMR) spectroscopy is a powerful tool for elucidating the structure of unknown components and the composition of liquid mixtures. However, these tasks are often tedious and challenging, especially if complex samples are considered. In this work, we introduce automated methods for the identification and quantification of structural groups in pure components and mixtures from NMR spectra using support vector classification. As input, a 1H NMR spectrum and a 13C NMR spectrum of the liquid sample (pure component or mixture) that is to be analyzed is needed. The first method, called group-identification method, yields qualitative information on the structural groups in the sample. The second method, called group-assignment method, provides the basis for a quantitative analysis of the sample by identifying the structural groups and assigning them to signals in the 13C NMR spectrum of the sample; quantitative information can then be obtained with readily available tools by simple integration. We demonstrate that both methods, after being trained to NMR spectra of nearly 1000 pure components, yield excellent predictions for pure components that were not part of the training set as well as mixtures. The structural group-specific information obtained with the presented methods can, e.g., be used in combination with thermodynamic group-contribution methods to predict fluid properties of unknown samples.
Collapse
Affiliation(s)
- Thomas Specht
- Laboratory of Engineering Thermodynamics (LTD), TU Kaiserslautern, Erwin-Schrödinger-Straße 44, 67663 Kaiserslautern, Germany
| | - Kerstin Münnemann
- Laboratory of Engineering Thermodynamics (LTD), TU Kaiserslautern, Erwin-Schrödinger-Straße 44, 67663 Kaiserslautern, Germany
| | - Hans Hasse
- Laboratory of Engineering Thermodynamics (LTD), TU Kaiserslautern, Erwin-Schrödinger-Straße 44, 67663 Kaiserslautern, Germany
| | - Fabian Jirasek
- Laboratory of Engineering Thermodynamics (LTD), TU Kaiserslautern, Erwin-Schrödinger-Straße 44, 67663 Kaiserslautern, Germany
| |
Collapse
|
44
|
|
45
|
Pomyen Y, Wanichthanarak K, Poungsombat P, Fahrmann J, Grapov D, Khoomrung S. Deep metabolome: Applications of deep learning in metabolomics. Comput Struct Biotechnol J 2020; 18:2818-2825. [PMID: 33133423 PMCID: PMC7575644 DOI: 10.1016/j.csbj.2020.09.033] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 09/21/2020] [Accepted: 09/21/2020] [Indexed: 01/11/2023] Open
Abstract
In the past few years, deep learning has been successfully applied to various omics data. However, the applications of deep learning in metabolomics are still relatively low compared to others omics. Currently, data pre-processing using convolutional neural network architecture appears to benefit the most from deep learning. Compound/structure identification and quantification using artificial neural network/deep learning performed relatively better than traditional machine learning techniques, whereas only marginally better results are observed in biological interpretations. Before deep learning can be effectively applied to metabolomics, several challenges should be addressed, including metabolome-specific deep learning architectures, dimensionality problems, and model evaluation regimes.
Collapse
Key Words
- AI, Artificial Intelligence
- ANN, Artificial Neural Network
- AUC, Area Under the receiver-operating characteristic Curve
- Artificial neural network
- CCS value, Collision Cross Section value
- CFM-EI, Competitive Fragmentation Modeling-Electron Ionization
- CNN, Convolutional Neural Network
- DL, Deep Learning
- DNN, Deep Neural Network
- Deep learning
- ECFP, Extended Circular Fingerprint
- ER, Estrogen Receptor
- FID, Free Induction Decay
- FP score, Fingerprint correlation score
- FTIR, Fourier Transform Infrared
- GC–MS, Gas Chromatography-Mass Spectrometry
- HDLSS data, High Dimensional Low Sample Size data
- IST, Iterative Soft Thresholding
- LC-MS, Liquid Chromatography-Mass Spectrometry
- LSTM, Long Short-Term Memory
- ML, Machine Learning
- MLP, Multi-layered Perceptron
- MS, Mass Spectrometry
- Mass spectrometry
- Metabolomics
- NEIMS, Neural Electron-Ionization Mass Spectrometry
- NMR
- NMR, Nuclear Magnetic Resonance
- NUS, Non-Uniformly Sampling
- PARAFAC2, Parallel Factor Analysis 2
- RF, Random Forest
- RNN, Recurrent Neural Network
- ReLU, Rectified Linear Unit
- SMARTS, SMILES arbitrary target specification
- SMILE, Sparse Multidimensional Iterative Lineshape-enhanced
- SMILES, Simplified Molecular-Input Line-Entry System
- SRA, Sequence Read Archive
- VAE, Variational Autoencoder
- istHMS, Implementation of IST at Harvard Medical School
- m/z, mass/charge ratio
Collapse
Affiliation(s)
- Yotsawat Pomyen
- Translational Research Unit, Chulabhorn Research Institute, Bangkok, Thailand
| | - Kwanjeera Wanichthanarak
- Metabolomics and Systems Biology, Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Patcha Poungsombat
- Metabolomics and Systems Biology, Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Center for Innovation in Chemistry (PERCH-CIC), Faculty of Science, Mahidol University, Rama 6 Road, Bangkok 10400, Thailand
| | - Johannes Fahrmann
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, TX 77030, USA
| | - Dmitry Grapov
- CDS- Creative Data Solutions LLC, https://creative-data.solutions, USA
| | - Sakda Khoomrung
- Metabolomics and Systems Biology, Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Center for Innovation in Chemistry (PERCH-CIC), Faculty of Science, Mahidol University, Rama 6 Road, Bangkok 10400, Thailand
| |
Collapse
|
46
|
Schrier J. Can One Hear the Shape of a Molecule (from its Coulomb Matrix Eigenvalues)? J Chem Inf Model 2020; 60:3804-3811. [PMID: 32668151 DOI: 10.1021/acs.jcim.0c00631] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Coulomb matrix eigenvalues (CMEs) are global 3D representations of molecular structure, which have been previously used to predict atomization energies, prioritize geometry searches, and interpret rotational spectra. The properties of the CME representation and its relationship to molecular structure are established using the Gershgorin circle theorem. Numerical bounds are studied using a data set of 309 000 conformational samples of all constitutional isomers of acyclic alkanes, CnH2n+2, from methane (n = 1) to undecane (n = 11), to establish the extent to which the CME preserves chemical intuitions about isomer and conformer similarity and its ability to distinguish constitutional isomers. Neither supervised nor unsupervised machine-learning algorithms can perfectly distinguish constitutional isomers as the molecular size increases, but the misclassification rate can be kept below 1%.
Collapse
Affiliation(s)
- Joshua Schrier
- Department of Chemistry, Fordham University, 441 East Fordham Road, The Bronx, New York 10458, United States
| |
Collapse
|