1
|
Yang Q, Zhang H, Wang Y, Tan L, Xie T, Wang Y, Long J, Guo Z, Zhang Z, Lu H. MWFormer: Estimation of Molecular Weights from Electron Ionization Mass Spectra for Improved Library Searching. Anal Chem 2025; 97:212-219. [PMID: 39700345 DOI: 10.1021/acs.analchem.4c03781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Molecular weight (MW) is a crucial property to improve the accuracy of multidimensional compound identification. In this study, we have developed MWFormer, a novel method that predicts MWs solely from spectra of electron ionization mass spectrometry (EI-MS) based on a Transformer encoder. MWFormer achieves a mean absolute error (MAE) of 6.38 Da, which is only one-sixth of the MAE by the peak interpretation method (PIM) on the test set. The MWFormer-predicted MW with superior accuracy can be used to eliminate false positive molecules in multidimensional compound identification. The results show that the MW filter improves the recall@3 metric by nearly 4% points compared with solely spectrum matching results. Moreover, MWFormer can be combined with retention indices (RIs) to achieve GC-EI-MS 3D compound identification to improve the recall@3 metric by nearly 7% points, compared with the results of spectrum matching alone. Besides, a user-friendly web service is provided to predict MWs in single mode or batch mode. All code, data, and models are available at https://github.com/zhanghailiangcsu/MWFormer.
Collapse
Affiliation(s)
- Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yue Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Lin Tan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Ting Xie
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yufei Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Jia Long
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zixuan Guo
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
2
|
Kavianpour B, Piadeh F, Gheibi M, Ardakanian A, Behzadian K, Campos LC. Applications of artificial intelligence for chemical analysis and monitoring of pharmaceutical and personal care products in water and wastewater: A review. CHEMOSPHERE 2024; 368:143692. [PMID: 39515544 DOI: 10.1016/j.chemosphere.2024.143692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 09/15/2024] [Accepted: 11/04/2024] [Indexed: 11/16/2024]
Abstract
Specifying and interpreting the occurrence of emerging pollutants is essential for assessing treatment processes and plants, conducting wastewater-based epidemiology, and advancing environmental toxicology research. In recent years, artificial intelligence (AI) has been increasingly applied to enhance chemical analysis and monitoring of contaminants in environmental water and wastewater. However, their specific roles targeting pharmaceuticals and personal care products (PPCPs) have not been reviewed sufficiently. This review aims to narrow the gap by highlighting, scoping, and discussing the incorporation of AI during the detection and quantification of PPCPs when utilising chemical analysis equipment and interpreting their monitoring data for the first time. In the chemical analysis of PPCPs, AI-assisted prediction of chromatographic retention times and collision cross-sections (CCS) in suspect and non-target screenings using high-resolution mass spectrometry (HRMS) enhances detection confidence, reduces analysis time, and lowers costs. AI also aids in interpreting spectroscopic analysis results. However, this approach still cannot be applied in all matrices, as it offers lower sensitivity than liquid chromatography coupled with tandem or HRMS. For the interpretation of monitoring of PPCPs, unsupervised AI methods have recently presented the capacity to survey regional or national community health and socioeconomic factors. Nevertheless, as a challenge, long-term monitoring data sources are not given in the literature, and more comparative AI studies are needed for both chemical analysis and monitoring. Finally, AI assistance anticipates more frequent applications of CCS prediction to enhance detection confidence and the use of AI methods in data processing for wastewater-based epidemiology and community health surveillance.
Collapse
Affiliation(s)
- Babak Kavianpour
- School of Computing and Engineering, University of West London, St Mary's Rd, London W5 5RF, UK
| | - Farzad Piadeh
- School of Computing and Engineering, University of West London, St Mary's Rd, London W5 5RF, UK; Centre for Engineering Research, School of Physics, Engineering and Computer Science, University of Hertfordshire, Hatfield, AL10 9AB, UK
| | - Mohammad Gheibi
- Institute for Nanomaterials, Advanced Technologies and Innovation, Technical University of Liberec, 46117, Liberec, Czech Republic
| | - Atiyeh Ardakanian
- School of Computing and Engineering, University of West London, St Mary's Rd, London W5 5RF, UK
| | - Kourosh Behzadian
- School of Computing and Engineering, University of West London, St Mary's Rd, London W5 5RF, UK; Centre for Urban Sustainability and Resilience, Department of Civil, Environmental and Geomatic Engineering, University College London, London WC1E6BT, UK.
| | - Luiza C Campos
- Centre for Urban Sustainability and Resilience, Department of Civil, Environmental and Geomatic Engineering, University College London, London WC1E6BT, UK
| |
Collapse
|
3
|
Kumari P, Guilherme MSR, Choudhary P, Van Laethem T, Fillet M, Hubert P, Sacre PY, Hubert C. Transfer Learning Approach to Multitarget QSRR Modeling in RPLC. J Chem Inf Model 2024; 64:7447-7456. [PMID: 39284310 DOI: 10.1021/acs.jcim.4c00608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
QSRR is a valuable technique for the retention time predictions of small molecules. This aims to bridge the gap between molecular structure and chromatographic behavior, offering invaluable insights for analytical chemistry. Given the challenge of simultaneous target prediction with variable experimental conditions and the scarcity of comprehensive data sets for such predictive modelings in chromatography, this study introduces a transfer learning-based multitarget QSRR approach to enhance retention time prediction. Through a comparative study of four models, both with and without the transfer learning approach, the performance of both single and multitarget QSRR was evaluated based on Mean Squared Error (MSE) and R2 metrics. Individual models were also tested for their performance against benchmark studies in this field. The findings suggest that transfer learning based multitarget models exhibit potential for enhanced accuracy in predicting retention times of small molecules, presenting a promising avenue for QSRR modeling. These models will be highly beneficial for optimizing experimental conditions in method development by better retention time predictions in Reversed-Phase Liquid Chromatography (RPLC). The reliable and effective predictive capabilities of these models make them valuable tools for pharmaceutical research and development endeavors.
Collapse
Affiliation(s)
- Priyanka Kumari
- Department of Pharmacy, Laboratory of Pharmaceutical Analytical Chemistry, CIRM, Liège, Belgium 4000
- Laboratory for the Analysis of Medicines, CIRM, Liège, Belgium 4000
| | | | | | - Thomas Van Laethem
- Department of Pharmacy, Laboratory of Pharmaceutical Analytical Chemistry, CIRM, Liège, Belgium 4000
| | - Marianne Fillet
- Laboratory for the Analysis of Medicines, CIRM, Liège, Belgium 4000
| | - Phillipe Hubert
- Department of Pharmacy, Laboratory of Pharmaceutical Analytical Chemistry, CIRM, Liège, Belgium 4000
| | - Pierre Yves Sacre
- Department of Pharmacy, Laboratory of Pharmaceutical Analytical Chemistry, CIRM, Liège, Belgium 4000
| | - Cedric Hubert
- Department of Pharmacy, Laboratory of Pharmaceutical Analytical Chemistry, CIRM, Liège, Belgium 4000
| |
Collapse
|
4
|
Liu Y, Yoshizawa AC, Ling Y, Okuda S. Insights into predicting small molecule retention times in liquid chromatography using deep learning. J Cheminform 2024; 16:113. [PMID: 39375739 PMCID: PMC11460055 DOI: 10.1186/s13321-024-00905-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 09/13/2024] [Indexed: 10/09/2024] Open
Abstract
In untargeted metabolomics, structures of small molecules are annotated using liquid chromatography-mass spectrometry by leveraging information from the molecular retention time (RT) in the chromatogram and m/z (formerly called ''mass-to-charge ratio'') in the mass spectrum. However, correct identification of metabolites is challenging due to the vast array of small molecules. Therefore, various in silico tools for mass spectrometry peak alignment and compound prediction have been developed; however, the list of candidate compounds remains extensive. Accurate RT prediction is important to exclude false candidates and facilitate metabolite annotation. Recent advancements in artificial intelligence (AI) have led to significant breakthroughs in the use of deep learning models in various fields. Release of a large RT dataset has mitigated the bottlenecks limiting the application of deep learning models, thereby improving their application in RT prediction tasks. This review lists the databases that can be used to expand training datasets and concerns the issue about molecular representation inconsistencies in datasets. It also discusses the application of AI technology for RT prediction, particularly in the 5 years following the release of the METLIN small molecule RT dataset. This review provides a comprehensive overview of the AI applications used for RT prediction, highlighting the progress and remaining challenges. SCIENTIFIC CONTRIBUTION: This article focuses on the advancements in small molecule retention time prediction in computational metabolomics over the past five years, with a particular emphasis on the application of AI technologies in this field. It reviews the publicly available datasets for small molecule retention time, the molecular representation methods, the AI algorithms applied in recent studies. Furthermore, it discusses the effectiveness of these models in assisting with the annotation of small molecule structures and the challenges that must be addressed to achieve practical applications.
Collapse
Affiliation(s)
- Yuting Liu
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Akiyasu C Yoshizawa
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Yiwei Ling
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Shujiro Okuda
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan.
| |
Collapse
|
5
|
Wang C, Yuan C, Wang Y, Shi Y, Zhang T, Patti GJ. Predicting Collision Cross-Section Values for Small Molecules through Chemical Class-Based Multimodal Graph Attention Network. J Chem Inf Model 2024; 64:6305-6315. [PMID: 38959055 DOI: 10.1021/acs.jcim.3c01934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Libraries of collision cross-section (CCS) values have the potential to facilitate compound identification in metabolomics. Although computational methods provide an opportunity to increase library size rapidly, accurate prediction of CCS values remains challenging due to the structural diversity of small molecules. Here, we developed a machine learning (ML) model that integrates graph attention networks and multimodal molecular representations to predict CCS values on the basis of chemical class. Our approach, referred to as MGAT-CCS, had superior performance in comparison to other ML models in CCS prediction. MGAT-CCS achieved a median relative error of 0.47%/1.14% (positive/negative mode) and 1.40%/1.63% (positive/negative mode) for lipids and metabolites, respectively. When MGAT-CCS was applied to real-world metabolomics data, it reduced the number of false metabolite candidates by roughly 25% across multiple sample types ranging from plasma and urine to cells. To facilitate its application, we developed a user-friendly stand-alone web server for MGAT-CCS that is freely available at https://mgat-ccs-web.onrender.com. This work represents a step forward in predicting CCS values and can potentially facilitate the identification of small molecules when using ion mobility spectrometry coupled with mass spectrometry.
Collapse
Affiliation(s)
- Cheng Wang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250000, China
- Department of Chemistry, Washington University in St. Louis, St. Louis, Missouri 63130 United States
| | - Chuang Yuan
- School of Life Sciences, and Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing 100871, China
- Department of Biochemistry and Biophysics, School of Basic Medical Sciences, Peking University, Beijing 100191, China
| | - Yahui Wang
- Department of Chemistry, Washington University in St. Louis, St. Louis, Missouri 63130 United States
- Department of Medicine, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Yuying Shi
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250000, China
| | - Tao Zhang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250000, China
| | - Gary J Patti
- Department of Chemistry, Washington University in St. Louis, St. Louis, Missouri 63130 United States
- Department of Medicine, Washington University in St. Louis, St. Louis, Missouri 63130, United States
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, Missouri 63130, United States
- Center for Metabolomics and Isotope Tracing, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| |
Collapse
|
6
|
Duan Y, Yang X, Zeng X, Wang W, Deng Y, Cao D. Enhancing Molecular Property Prediction through Task-Oriented Transfer Learning: Integrating Universal Structural Insights and Domain-Specific Knowledge. J Med Chem 2024; 67:9575-9586. [PMID: 38748846 DOI: 10.1021/acs.jmedchem.4c00692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2024]
Abstract
Precisely predicting molecular properties is crucial in drug discovery, but the scarcity of labeled data poses a challenge for applying deep learning methods. While large-scale self-supervised pretraining has proven an effective solution, it often neglects domain-specific knowledge. To tackle this issue, we introduce Task-Oriented Multilevel Learning based on BERT (TOML-BERT), a dual-level pretraining framework that considers both structural patterns and domain knowledge of molecules. TOML-BERT achieved state-of-the-art prediction performance on 10 pharmaceutical datasets. It has the capability to mine contextual information within molecular structures and extract domain knowledge from massive pseudo-labeled data. The dual-level pretraining accomplished significant positive transfer, with its two components making complementary contributions. Interpretive analysis elucidated that the effectiveness of the dual-level pretraining lies in the prior learning of a task-related molecular representation. Overall, TOML-BERT demonstrates the potential of combining multiple pretraining tasks to extract task-oriented knowledge, advancing molecular property prediction in drug discovery.
Collapse
Affiliation(s)
- Yanjing Duan
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
| | - Xixi Yang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410013, P. R. China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410013, P. R. China
| | - Wenxuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
| | - Youchao Deng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| |
Collapse
|
7
|
Kwon Y, Kwon H, Han J, Kang M, Kim JY, Shin D, Choi YS, Kang S. Retention Time Prediction through Learning from a Small Training Data Set with a Pretrained Graph Neural Network. Anal Chem 2023; 95:17273-17283. [PMID: 37955847 DOI: 10.1021/acs.analchem.3c03177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2023]
Abstract
Graph neural networks (GNNs) have shown remarkable performance in predicting the retention time (RT) for small molecules. However, the training data set for a particular target chromatographic system tends to exhibit scarcity, which poses a challenge because the experimental process for measuring RT is costly. To address this challenge, transfer learning has been used to leverage an abundant training data set from a related source task. In this study, we present an improved transfer learning method to better predict the RT of molecules for a target chromatographic system by learning from a small training data set with a pretrained GNN. We use a graph isomorphism network as the architecture of the GNN. The GNN is pretrained on the METLIN-SMRT data set and is then fine-tuned on the target training data set for a fixed number of training iterations using the limited-memory Broyden-Fletcher-Goldfarb-Shanno optimizer with a learning rate decay. We demonstrate that the proposed method achieves superior predictive performance on various chromatographic systems compared with that of the existing transfer learning methods, especially when only a small training data set is available for use. A potential avenue for future research is to leverage multiple small training data sets from different chromatographic systems to further enhance the generalization performance.
Collapse
Affiliation(s)
- Youngchun Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Hyukju Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
- Department of Chemistry, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Jongmin Han
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Myeonginn Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Ji-Yeong Kim
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Dongyeeb Shin
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Youn-Suk Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Seokho Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| |
Collapse
|
8
|
Wang Y, Wei W, Du W, Cai J, Liao Y, Lu H, Kong B, Zhang Z. Deep-Learning-Based Mixture Identification for Nuclear Magnetic Resonance Spectroscopy Applied to Plant Flavors. Molecules 2023; 28:7380. [PMID: 37959799 PMCID: PMC10648966 DOI: 10.3390/molecules28217380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 10/25/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open
Abstract
Nuclear magnetic resonance (NMR) is a crucial technique for analyzing mixtures consisting of small molecules, providing non-destructive, fast, reproducible, and unbiased benefits. However, it is challenging to perform mixture identification because of the offset of chemical shifts and peak overlaps that often exist in mixtures such as plant flavors. Here, we propose a deep-learning-based mixture identification method (DeepMID) that can be used to identify plant flavors (mixtures) in a formulated flavor (mixture consisting of several plant flavors) without the need to know the specific components in the plant flavors. A pseudo-Siamese convolutional neural network (pSCNN) and a spatial pyramid pooling (SPP) layer were used to solve the problems due to their high accuracy and robustness. The DeepMID model is trained, validated, and tested on an augmented data set containing 50,000 pairs of formulated and plant flavors. We demonstrate that DeepMID can achieve excellent prediction results in the augmented test set: ACC = 99.58%, TPR = 99.48%, FPR = 0.32%; and two experimentally obtained data sets: one shows ACC = 97.60%, TPR = 92.81%, FPR = 0.78% and the other shows ACC = 92.31%, TPR = 80.00%, FPR = 0.00%. In conclusion, DeepMID is a reliable method for identifying plant flavors in formulated flavors based on NMR spectroscopy, which can assist researchers in accelerating the design of flavor formulations.
Collapse
Affiliation(s)
- Yufei Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China; (Y.W.); (Y.L.); (H.L.)
| | - Weiwei Wei
- Technology Center, China Tobacco Hunan Industrial Co., Ltd., Changsha 410014, China; (W.W.); (W.D.); (J.C.)
| | - Wen Du
- Technology Center, China Tobacco Hunan Industrial Co., Ltd., Changsha 410014, China; (W.W.); (W.D.); (J.C.)
| | - Jiaxiao Cai
- Technology Center, China Tobacco Hunan Industrial Co., Ltd., Changsha 410014, China; (W.W.); (W.D.); (J.C.)
| | - Yuxuan Liao
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China; (Y.W.); (Y.L.); (H.L.)
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China; (Y.W.); (Y.L.); (H.L.)
| | - Bo Kong
- Technology Center, China Tobacco Hunan Industrial Co., Ltd., Changsha 410014, China; (W.W.); (W.D.); (J.C.)
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China; (Y.W.); (Y.L.); (H.L.)
| |
Collapse
|
9
|
Singh YR, Shah DB, Maheshwari DG, Shah JS, Shah S. Advances in AI-Driven Retention Prediction for Different Chromatographic Techniques: Unraveling the Complexity. Crit Rev Anal Chem 2023; 54:3559-3569. [PMID: 37672314 DOI: 10.1080/10408347.2023.2254379] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
Retention prediction through Artificial intelligence (AI)-based techniques has gained exponential growth due to their abilities to process complex sets of data and ease the crucial task of identification and separation of compounds in most employed chromatographic techniques. Numerous approaches were reported for retention prediction in different chromatographic techniques, and consistent results demonstrated that the accuracy and effectiveness of deep learning models outclassed the linear machine learning models, mainly in liquid and gas chromatography, as ML algorithms use fewer complex data to train and predict information. Support Vector machine-based neural networks were found to be most utilized for the prediction of retention factors of different compounds in thin-layer chromatography. Cheminformatics, chemometrics, and hybrid approaches were also employed for the modeling and were more reliable in retention prediction over conventional models. Quantitative Structure Retention Relationship (QSRR) was also a potential method for predicting retention in different chromatographic techniques and determining the separation method for analytes. These techniques demonstrated the aids of incorporating QSRR with AI-driven techniques acquiring more precise retention predictions. This review aims at recent exploration of different AI-driven approaches employed for retention prediction in different chromatographic techniques, and due to the lack of summarized literature, it also aims at providing a comprehensive literature that will be highly useful for the society of scientists exploring the field of AI in analytical chemistry.
Collapse
Affiliation(s)
- Yash Raj Singh
- Department of Pharmaceutical Quality Assurance, L. J. Institute of Pharmacy, L J University, Ahmedabad, Gujarat, India
| | - Darshil B Shah
- Department of Pharmaceutical Quality Assurance, L. J. Institute of Pharmacy, L J University, Ahmedabad, Gujarat, India
| | - Dilip G Maheshwari
- Department of Pharmaceutical Quality Assurance, L. J. Institute of Pharmacy, L J University, Ahmedabad, Gujarat, India
| | - Jignesh S Shah
- Department of Pharmaceutical Regulatory Affairs, L. J. Institute of Pharmacy, L J University, Ahmedabad, Gujarat, India
| | - Shreeraj Shah
- Department of Pharmaceutical Technology, L. J. Institute of Pharmacy, L J University, Ahmedabad, Gujarat, India
| |
Collapse
|
10
|
Akhlaqi M, Wang WC, Möckel C, Kruve A. Complementary methods for structural assignment of isomeric candidate structures in non-target liquid chromatography ion mobility high-resolution mass spectrometric analysis. Anal Bioanal Chem 2023; 415:5247-5259. [PMID: 37452839 PMCID: PMC10404200 DOI: 10.1007/s00216-023-04852-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/03/2023] [Accepted: 07/06/2023] [Indexed: 07/18/2023]
Abstract
Non-target screening with LC/IMS/HRMS is increasingly employed for detecting and identifying the structure of potentially hazardous chemicals in the environment and food. Structural assignment relies on a combination of multidimensional instrumental methods and computational methods. The candidate structures are often isomeric, and unfortunately, assigning the correct structure among a number of isomeric candidate structures still is a key challenge both instrumentally and computationally. While practicing non-target screening, it is usually impossible to evaluate separately the limitations arising from (1) the inability of LC/IMS/HRMS to resolve the isomeric candidate structures and (2) the uncertainty of in silico methods in predicting the analytical information of isomeric candidate structures due to the lack of analytical standards for all candidate structures. Here we evaluate the feasibility of structural assignment of isomeric candidate structures based on in silico-predicted retention time and database collision cross-section (CCS) values as well as based on matching the empirical analytical properties of the detected feature with those of the analytical standards. For this, we investigated 14 candidate structures corresponding to five features detected with LC/HRMS in a spiked surface water sample. Considering the predicted retention times and database CCS values with the accompanying uncertainty, only one of the isomeric candidate structures could be deemed as unlikely; therefore, the annotation of the LC/IMS/HRMS features remained ambiguous. To further investigate if unequivocal annotation is possible via analytical standards, the reversed-phase LC retention times and low- and high-resolution ion mobility spectrometry separation, as well as high-resolution MS2 spectra of analytical standards were studied. Reversed-phase LC separated the highest number of candidate structures while low-resolution ion mobility and high-resolution MS2 spectra provided little means for pinpointing the correct structure among the isomeric candidate structures even if analytical standards were available for comparison. Furthermore, the question arises which prediction accuracy is required from the in silico methods to par the analytical separation. Based on the experimental data of the isomeric candidate structures studied here and previously published in the literature (516 retention time and 569 CCS values), we estimate that to reduce the candidate list by 95% of the structures, the confidence interval of the predicted retention times would need to decrease to below 0.05 min for a 15-min gradient while that of CCS values would need to decrease to 0.15%. Hereby, we set a clear goal to the in silico methods for retention time and CCS prediction.
Collapse
Affiliation(s)
- Masoumeh Akhlaqi
- Department of Materials and Environmental Chemistry, Svante Arrhenius väg 16C, 114 18, Stockholm, Sweden
| | - Wei-Chieh Wang
- Department of Materials and Environmental Chemistry, Svante Arrhenius väg 16C, 114 18, Stockholm, Sweden
| | - Claudia Möckel
- Department of Materials and Environmental Chemistry, Svante Arrhenius väg 16C, 114 18, Stockholm, Sweden
| | - Anneli Kruve
- Department of Materials and Environmental Chemistry, Svante Arrhenius väg 16C, 114 18, Stockholm, Sweden.
- Department of Environmental Science, Svante Arrhenius väg 8, 114 18, Stockholm, Sweden.
| |
Collapse
|
11
|
Liao Y, Tian M, Zhang H, Lu H, Jiang Y, Chen Y, Zhang Z. Highly automatic and universal approach for pure ion chromatogram construction from liquid chromatography-mass spectrometry data using deep learning. J Chromatogr A 2023; 1705:464172. [PMID: 37392637 DOI: 10.1016/j.chroma.2023.464172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 06/14/2023] [Accepted: 06/18/2023] [Indexed: 07/03/2023]
Abstract
Feature extraction is the most fundamental step when analyzing liquid chromatography-mass spectrometry (LC-MS) datasets. However, traditional methods require optimal parameter selections and re-optimization for different datasets, thus hindering efficient and objective large-scale data analysis. Pure ion chromatogram (PIC) is widely used because it avoids the peak splitting problem of the extracted ion chromatogram (EIC) and regions of interest (ROIs). Here, we developed a deep learning-based pure ion chromatogram method (DeepPIC) to find PICs using a customized U-Net from centroid mode data of LC-MS directly and automatically. A model was trained, validated, and tested on the Arabidopsis thaliana dataset with 200 input-label pairs. DeepPIC was integrated into KPIC2. The combination enables the entire processing pipeline from raw data to discriminant models for metabolomics datasets. The KPIC2 with DeepPIC was compared against other competing methods (XCMS, FeatureFinderMetabo, and peakonly) on the MM48, simulated MM48, and quantitative datasets. These comparisons showed that DeepPIC outperforms XCMS, FeatureFinderMetabo, and peakonly in recall rates and correlation with sample concentrations. Five datasets of different instruments and samples were used to evaluate the quality of PICs and the universal applicability of DeepPIC, and 95.12% of the found PICs could precisely match their manually labeled PICs. Therefore, KPIC2+DeepPIC is an automatic, practical, and off-the-shelf method to extract features from raw data directly, exceeding traditional methods with careful parameter tuning. It is publicly available at https://github.com/yuxuanliao/DeepPIC.
Collapse
Affiliation(s)
- Yuxuan Liao
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Miao Tian
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yonglei Jiang
- Yunnan Academy of Tobacco Agricultural Sciences, Kunming, Yunnan 650021, China
| | - Yi Chen
- Yunnan Academy of Tobacco Agricultural Sciences, Kunming, Yunnan 650021, China.
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
12
|
Luo M, Yin Y, Zhou Z, Zhang H, Chen X, Wang H, Zhu ZJ. A mass spectrum-oriented computational method for ion mobility-resolved untargeted metabolomics. Nat Commun 2023; 14:1813. [PMID: 37002244 PMCID: PMC10066191 DOI: 10.1038/s41467-023-37539-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Accepted: 03/17/2023] [Indexed: 04/03/2023] Open
Abstract
Ion mobility (IM) adds a new dimension to liquid chromatography-mass spectrometry-based untargeted metabolomics which significantly enhances coverage, sensitivity, and resolving power for analyzing the metabolome, particularly metabolite isomers. However, the high dimensionality of IM-resolved metabolomics data presents a great challenge to data processing, restricting its widespread applications. Here, we develop a mass spectrum-oriented bottom-up assembly algorithm for IM-resolved metabolomics that utilizes mass spectra to assemble four-dimensional peaks in a reverse order of multidimensional separation. We further develop the end-to-end computational framework Met4DX for peak detection, quantification and identification of metabolites in IM-resolved metabolomics. Benchmarking and validation of Met4DX demonstrates superior performance compared to existing tools with regard to coverage, sensitivity, peak fidelity and quantification precision. Importantly, Met4DX successfully detects and differentiates co-eluted metabolite isomers with small differences in the chromatographic and IM dimensions. Together, Met4DX advances metabolite discovery in biological organisms by deciphering the complex 4D metabolomics data.
Collapse
Affiliation(s)
- Mingdu Luo
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032, P. R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P. R. China
| | - Yandong Yin
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032, P. R. China
| | - Zhiwei Zhou
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032, P. R. China
| | - Haosong Zhang
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032, P. R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P. R. China
| | - Xi Chen
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032, P. R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P. R. China
| | - Hongmiao Wang
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032, P. R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P. R. China
| | - Zheng-Jiang Zhu
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032, P. R. China.
- Shanghai Key Laboratory of Aging Studies, Shanghai, 201210, P. R. China.
| |
Collapse
|
13
|
Fan Y, Yu C, Lu H, Chen Y, Hu B, Zhang X, Su J, Zhang Z. Deep learning-based method for automatic resolution of gas chromatography-mass spectrometry data from complex samples. J Chromatogr A 2023; 1690:463768. [PMID: 36641940 DOI: 10.1016/j.chroma.2022.463768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 12/21/2022] [Accepted: 12/28/2022] [Indexed: 12/31/2022]
Abstract
Modern gas chromatography-mass spectrometry (GC-MS) is the workhorse for the high-throughput profiling of volatile compounds in complex samples. It can produce a considerable amount of two-dimensional data, and automatic methods are required to distill chemical information from raw GC-MS data efficiently. In this study, we proposed an Automatic Resolution method (AutoRes) based on pseudo-Siamese convolutional neural networks (pSCNN) to extract the meaningful features swamped by the noises, baseline drifts, retention time shifts, and overlapped peaks. Two pSCNN models were trained with 400,000 augmented spectral pairs, respectively. They can predict the selective region (pSCNN1) and elution region (pSCNN2) of compounds in an untargeted manner. The accuracies of the pSCNN1 model and the pSCNN2 model on their test sets are 99.9% and 92.6%, respectively. Then, the chromatographic profile of each component was automatically resolved by full rank resolution (FRR) based on the predicted regions by these models. The performance of AutoRes was evaluated on the simulated and plant essential oil datasets. Compared to AMDIS and MZmine, AutoRes resolves more reasonable mass spectra, chromatograms, and peak areas to identify and quantify compounds. The average match scores of AutoRes (925 and 936) outperformed AMDIS (909 and 925) and MZmine (888 and 916) when resolving mass spectra from overlapped peaks on the Set Ⅰ and Set Ⅱ of plant essential oil dataset and matching them against the NIST17 library. It extracted peak areas and mass spectra automatically from 10 GC-MS files of plant essential oils, and the entire process was completed in 8 min without any prior information or manual intervention. It is implemented in Python and is available as an open-source package at https://github.com/dyjfan/AutoRes.
Collapse
Affiliation(s)
- Yingjie Fan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, Hunan, China
| | - Chuanxiu Yu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, Hunan, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, Hunan, China
| | - Yi Chen
- Yunnan Academy of Tobacco Agricultural Sciences, Kunming 650021, Yunnan, China
| | - Binbin Hu
- Yunnan Academy of Tobacco Agricultural Sciences, Kunming 650021, Yunnan, China
| | - Xingren Zhang
- Yunnan Academy of Tobacco Agricultural Sciences, Kunming 650021, Yunnan, China; Baoshan City Branch of Yunnan Tobacco Company, Baoshan 678000, Yunnan, China
| | - Jiaen Su
- Dali Prefecture Branch of Yunnan Tobacco Company, Dali 671000, Yunnan, China.
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, Hunan, China.
| |
Collapse
|
14
|
Zhang H, Xu Z, Fan X, Wang Y, Yang Q, Sun J, Wen M, Kang X, Zhang Z, Lu H. Fusion of Quality Evaluation Metrics and Convolutional Neural Network Representations for ROI Filtering in LC-MS. Anal Chem 2023; 95:612-620. [PMID: 36597722 DOI: 10.1021/acs.analchem.2c01398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Region of interest (ROI) extraction is a fundamental step in analyzing metabolomic datasets acquired by liquid chromatography-mass spectrometry (LC-MS). However, noises and backgrounds in LC-MS data often affect the quality of extracted ROIs. Therefore, developing effective ROI evaluation algorithms is necessary to eliminate false positives meanwhile keep the false-negative rate as low as possible. In this study, a deep fused filter of ROIs (dffROI) was proposed to improve the accuracy of ROI extraction by combining the handcrafted evaluation metrics with convolutional neural network (CNN)-learned representations. To evaluate the performance of dffROI, dffROI was compared with peakonly (CNN-learned representation) and five handcrafted metrics on three LC-MS datasets and a gas chromatography-mass spectrometry (GC-MS) dataset. Results show that dffROI can achieve higher accuracy, better true-positive rate, and lower false-positive rate. Its accuracy, true-positive rate, and false-positive rate are 0.9841, 0.9869, and 0.0186 on the test set, respectively. The classification error rate of dffROI (1.59%) is significantly reduced compared with peakonly (2.73%). The model-agnostic feature importance demonstrates the necessity of fusing handcrafted evaluation metrics with the convolutional neural network representations. dffROI is an automatic, robust, and universal method for ROI filtering by virtue of information fusion and end-to-end learning. It is implemented in Python programming language and open-sourced at https://github.com/zhanghailiangcsu/dffROI under BSD License. Furthermore, it has been integrated into the KPIC2 framework previously proposed by our group to facilitate real metabolomic LC-MS dataset analysis.
Collapse
Affiliation(s)
- Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Zhenbo Xu
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Xiaqiong Fan
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Yue Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Jinyu Sun
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Ming Wen
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Xiao Kang
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China.,National International Collaborative Research Center for Medical Metabolomics, Central South University, Changsha410083, China
| |
Collapse
|
15
|
Cai Y, Zhou Z, Zhu ZJ. Advanced analytical and informatic strategies for metabolite annotation in untargeted metabolomics. Trends Analyt Chem 2022. [DOI: 10.1016/j.trac.2022.116903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
16
|
Sun J, Wen M, Wang H, Ruan Y, Yang Q, Kang X, Zhang H, Zhang Z, Lu H. Prediction of drug-likeness using graph convolutional attention network. Bioinformatics 2022; 38:5262-5269. [PMID: 36222555 DOI: 10.1093/bioinformatics/btac676] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 09/22/2022] [Accepted: 10/08/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The drug-likeness has been widely used as a criterion to distinguish drug-like molecules from non-drugs. Developing reliable computational methods to predict the drug-likeness of compounds is crucial to triage unpromising molecules and accelerate the drug discovery process. RESULTS In this study, a deep learning method was developed to predict the drug-likeness based on the graph convolutional attention network (D-GCAN) directly from molecular structures. Results showed that the D-GCAN model outperformed other state-of-the-art models for drug-likeness prediction. The combination of graph convolution and attention mechanism made an important contribution to the performance of the model. Specifically, the application of the attention mechanism improved accuracy by 4.0%. The utilization of graph convolution improved the accuracy by 6.1%. Results on the dataset beyond Lipinski's rule of five space and the non-US dataset showed that the model had good versatility. Then, the billion-scale GDB-13 database was used as a case study to screen SARS-CoV-2 3C-like protease inhibitors. Sixty-five drug candidates were screened out, most substructures of which are similar to these of existing oral drugs. Candidates screened from S-GDB13 have higher similarity to existing drugs and better molecular docking performance than those from the rest of GDB-13. The screening speed on S-GDB13 is significantly faster than screening directly on GDB-13. In general, D-GCAN is a promising tool to predict the drug-likeness for selecting potential candidates and accelerating drug discovery by excluding unpromising candidates and avoiding unnecessary biological and clinical testing. AVAILABILITY AND IMPLEMENTATION The source code, model and tutorials are available at https://github.com/JinYSun/D-GCAN. The S-GDB13 database is available at https://doi.org/10.5281/zenodo.7054367. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jinyu Sun
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Ming Wen
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Huabei Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yuezhe Ruan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Xiao Kang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
17
|
Celma A, Bade R, Sancho JV, Hernandez F, Humphries M, Bijlsma L. Prediction of Retention Time and Collision Cross Section (CCS H+, CCS H-, and CCS Na+) of Emerging Contaminants Using Multiple Adaptive Regression Splines. J Chem Inf Model 2022; 62:5425-5434. [PMID: 36280383 PMCID: PMC9709913 DOI: 10.1021/acs.jcim.2c00847] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Ultra-high performance liquid chromatography coupled to ion mobility separation and high-resolution mass spectrometry instruments have proven very valuable for screening of emerging contaminants in the aquatic environment. However, when applying suspect or nontarget approaches (i.e., when no reference standards are available), there is no information on retention time (RT) and collision cross-section (CCS) values to facilitate identification. In silico prediction tools of RT and CCS can therefore be of great utility to decrease the number of candidates to investigate. In this work, Multiple Adaptive Regression Splines (MARS) were evaluated for the prediction of both RT and CCS. MARS prediction models were developed and validated using a database of 477 protonated molecules, 169 deprotonated molecules, and 249 sodium adducts. Multivariate and univariate models were evaluated showing a better fit for univariate models to the experimental data. The RT model (R2 = 0.855) showed a deviation between predicted and experimental data of ±2.32 min (95% confidence intervals). The deviation observed for CCS data of protonated molecules using the CCSH model (R2 = 0.966) was ±4.05% with 95% confidence intervals. The CCSH model was also tested for the prediction of deprotonated molecules, resulting in deviations below ±5.86% for the 95% of the cases. Finally, a third model was developed for sodium adducts (CCSNa, R2 = 0.954) with deviation below ±5.25% for 95% of the cases. The developed models have been incorporated in an open-access and user-friendly online platform which represents a great advantage for third-party research laboratories for predicting both RT and CCS data.
Collapse
Affiliation(s)
- Alberto Celma
- Environmental
and Public Health Analytical
Chemistry, Research Institute for Pesticides
and Water, University Jaume I, E-12071Castelló, Spain,Department
of Aquatic Sciences and Assessment, Swedish
University of Agricultural Sciences (SLU), SE-750 07Uppsala, Sweden
| | - Richard Bade
- University
of South Australia, Adelaide, UniSA: Clinical and Health Sciences,
Health and Biomedical Innovation, AdelaideSA-5000, South
Australia, Australia,Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, 20 Cornwall Street, WoolloongabbaAUS-4102, Queensland, Australia
| | - Juan Vicente Sancho
- Environmental
and Public Health Analytical
Chemistry, Research Institute for Pesticides
and Water, University Jaume I, E-12071Castelló, Spain
| | - Félix Hernandez
- Environmental
and Public Health Analytical
Chemistry, Research Institute for Pesticides
and Water, University Jaume I, E-12071Castelló, Spain
| | - Melissa Humphries
- School
of Mathematical Sciences, University of
Adelaide, Ingkarni Wardli Building, North Terrace Campus, SA-5005Adelaide, Australia,
| | - Lubertus Bijlsma
- Environmental
and Public Health Analytical
Chemistry, Research Institute for Pesticides
and Water, University Jaume I, E-12071Castelló, Spain,
| |
Collapse
|
18
|
Retention Time Prediction with Message-Passing Neural Networks. SEPARATIONS 2022. [DOI: 10.3390/separations9100291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023] Open
Abstract
Retention time prediction, facilitated by advances in machine learning, has become a useful tool in untargeted LC-MS applications. State-of-the-art approaches include graph neural networks and 1D-convolutional neural networks that are trained on the METLIN small molecule retention time dataset (SMRT). These approaches demonstrate accurate predictions comparable with the experimental error for the training set. The weak point of retention time prediction approaches is the transfer of predictions to various systems. The accuracy of this step depends both on the method of mapping and on the accuracy of the general model trained on SMRT. Therefore, improvements to both parts of prediction workflows may lead to improved compound annotations. Here, we evaluate capabilities of message-passing neural networks (MPNN) that have demonstrated outstanding performance on many chemical tasks to accurately predict retention times. The model was initially trained on SMRT, providing mean and median absolute cross-validation errors of 32 and 16 s, respectively. The pretrained MPNN was further fine-tuned on five publicly available small reversed-phase retention sets in a transfer learning mode and demonstrated up to 30% improvement of prediction accuracy for these sets compared with the state-of-the-art methods. We demonstrated that filtering isomeric candidates by predicted retention with the thresholds obtained from ROC curves eliminates up to 50% of false identities.
Collapse
|
19
|
Fully automatic resolution of untargeted GC-MS data with deep learning assistance. Talanta 2022; 244:123415. [DOI: 10.1016/j.talanta.2022.123415] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 03/24/2022] [Accepted: 03/26/2022] [Indexed: 11/17/2022]
|
20
|
Zhou L, Wang H. Multihorizons transfer strategy for continuous online prediction of time‐series data in complex systems. INT J INTELL SYST 2022. [DOI: 10.1002/int.22900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Liang Zhou
- College of Civil Aviation Nanjing University of Aeronautics and Astronautics Nanjing China
| | - Huawei Wang
- College of Civil Aviation Nanjing University of Aeronautics and Astronautics Nanjing China
| |
Collapse
|
21
|
Tian Z, Liu F, Li D, Fernie AR, Chen W. Strategies for structure elucidation of small molecules based on LC–MS/MS data from complex biological samples. Comput Struct Biotechnol J 2022; 20:5085-5097. [PMID: 36187931 PMCID: PMC9489805 DOI: 10.1016/j.csbj.2022.09.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 09/03/2022] [Accepted: 09/03/2022] [Indexed: 11/06/2022] Open
Abstract
LC–MS/MS is a major analytical platform for metabolomics, which has become a recent hotspot in the research fields of life and environmental sciences. By contrast, structure elucidation of small molecules based on LC–MS/MS data remains a major challenge in the chemical and biological interpretation of untargeted metabolomics datasets. In recent years, several strategies for structure elucidation using LC–MS/MS data from complex biological samples have been proposed, these strategies can be simply categorized into two types, one based on structure annotation of mass spectra and for the other on retention time prediction. These strategies have helped many scientists conduct research in metabolite-related fields and are indispensable for the development of future tools. Here, we summarized the characteristics of the current tools and strategies for structure elucidation of small molecules based on LC–MS/MS data, and further discussed the directions and perspectives to improve the power of the tools or strategies for structure elucidation.
Collapse
|