1
|
Van Laethem T, Kumari P, Boulanger B, Hubert P, Fillet M, Sacré PY, Hubert C. Uncertainty management for In Silico screening of reversed-phase liquid chromatography methods for small compounds. J Pharm Biomed Anal 2024; 249:116373. [PMID: 39047465 DOI: 10.1016/j.jpba.2024.116373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 07/18/2024] [Accepted: 07/18/2024] [Indexed: 07/27/2024]
Abstract
The process of developing new reversed-phase liquid chromatography methods can be both time-consuming and challenging. To meet this challenge, statistics-based strategies have emerged as cost-effective, efficient and flexible solutions. In the present study, we use a Bayesian response surface methodology, which takes advantage of the knowledge of the pKa values of the compounds present in the analyzed sample to model their retention behavior. A multi-criteria decision analysis (MCDA) was then developed to exploit the uncertainty information inherent in the model distributions. This strategic approach is designed to integrate seamlessly with quantitative structure retention relationship (QSRR) models, forming an initial in-silico screening phase. Of the two methods presented for MCDA, one showed promising results. The method development process was carried out with the optimization phase, generating a design space that corroborates the results of the selection phase.
Collapse
Affiliation(s)
- Thomas Van Laethem
- Laboratory for the Analysis of Medicines, University of Liège (ULiège), CIRM, Liège 4000, Belgium; Laboratory of Pharmaceutical Analytical Chemistry, University of Liège (ULiège), CIRM, Liège 4000, Belgium.
| | - Priyanka Kumari
- Laboratory for the Analysis of Medicines, University of Liège (ULiège), CIRM, Liège 4000, Belgium; Laboratory of Pharmaceutical Analytical Chemistry, University of Liège (ULiège), CIRM, Liège 4000, Belgium
| | | | - Philippe Hubert
- Laboratory of Pharmaceutical Analytical Chemistry, University of Liège (ULiège), CIRM, Liège 4000, Belgium
| | - Marianne Fillet
- Laboratory for the Analysis of Medicines, University of Liège (ULiège), CIRM, Liège 4000, Belgium
| | - Pierre-Yves Sacré
- Laboratory of Pharmaceutical Analytical Chemistry, University of Liège (ULiège), CIRM, Liège 4000, Belgium
| | - Cédric Hubert
- Laboratory of Pharmaceutical Analytical Chemistry, University of Liège (ULiège), CIRM, Liège 4000, Belgium.
| |
Collapse
|
2
|
Chen X, Wu W, Sun H, Chen L, Wang Y, Xia B, Zhou Y. Development and Application of a Comprehensive Nontargeted Screening Strategy for Aristolochic Acid Analogues. Anal Chem 2024; 96:1922-1931. [PMID: 38264982 DOI: 10.1021/acs.analchem.3c04064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
Aristolochic acid analogs (AAAs) are naturally occurring carcinogenic and toxic compounds that pose a safety threat to pharmaceuticals and the environment. It is challenging to screen AAAs due to their lack of characteristic mass spectral fragmentation and their presence of structural diversity. A comprehensive nontargeted screening strategy was proposed by taking into account diverse factors and incorporating various self-developed techniques, and a Python3-based toolkit called AAAs_finder was developed for its implementation. The main procedures consist of virtual structure and ultraviolet and visible (UV) spectra database creation, exact mass and UV spectra-based suspect data extraction, tandem mass spectra (MS/MS) anthropomorphic interpretation, and multicondition retention time (RT) prediction-based candidate structures ranking. To initially assess screening feasibility, eight hypothetical unknown samples were subjected to nontargeted screening using the AAAs_finder toolkit and two other advanced tools. The results showed that the former successfully identified all, while the latter two only managed to identify two and three, respectively, indicating that our strategy was more feasible. After that, the strategy was carefully evaluated for false positives and false negatives, instrument dependence, reproducibility, and sensitivity. After the evaluation, the strategy was successfully applied to the screening of AAAs in real samples, such as herbal medicine, spiked soil, and water. Overall, this study proposed a nontargeted screening strategy and toolkit independent of characteristic mass spectral fragmentation and able to overcome challenges posed by structural diversity for the AAAs screening, which is also valuable for other classes of compounds.
Collapse
Affiliation(s)
- Xiaoqi Chen
- Chengdu Institute of Organic Chemistry, Chinese Academy of Sciences, Chengdu 610041, China
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenlin Wu
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Chengdu Institute of Food Inspection, Chengdu 611130, China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing 100029, China
| | - Hongbing Sun
- Chengdu Institute of Organic Chemistry, Chinese Academy of Sciences, Chengdu 610041, China
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Sichuan Academy of Chinese Medicine Sciences, Chengdu 610041, China
| | - Lu Chen
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yu Wang
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bing Xia
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
| | - Yan Zhou
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
| |
Collapse
|
3
|
Kensert A, Desmet G, Cabooter D. A perspective on the use of deep deterministic policy gradient reinforcement learning for retention time modeling in reversed-phase liquid chromatography. J Chromatogr A 2024; 1713:464570. [PMID: 38101304 DOI: 10.1016/j.chroma.2023.464570] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 12/04/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023]
Abstract
Artificial intelligence and machine learning techniques are increasingly used for different tasks related to method development in liquid chromatography. In this study, the possibilities of a reinforcement learning algorithm, more specifically a deep deterministic policy gradient algorithm, are evaluated for the selection of scouting runs for retention time modeling. As a theoretical exercise, it is investigated whether such an algorithm can be trained to select scouting runs for any compound of interest allowing to retrieve its correct retention parameters for the three-parameter Neue-Kuss retention model. It is observed that three scouting runs are generally sufficient to retrieve the retention parameters with an accuracy (mean relative percentage error MRPE) of 1 % or less. When given the opportunity to select additional scouting runs, this does not lead to a significantly improved accuracy. It is also observed that the agent tends to give preference to isocratic scouting runs for retention time modeling, and is only motivated towards selecting gradient scouting runs when penalized (strongly) for large analysis/gradient times. This seems to reinforce the general power and usefulness of isocratic scouting runs for retention time modeling. Finally, the best results (lowest MRPE) are obtained when the agent manages to retrieve retention time data for % ACN at elution of the compound under consideration that spread the entire relevant range of ACN (5 % ACN to 95 % ACN) as well as possible, i.e., resulting in retention data at a low, intermediate and high % ACN. Based on the obtained results, we believe reinforcement learning holds great potential to automate and rationalize method development in liquid chromatography in the future.
Collapse
Affiliation(s)
- Alexander Kensert
- University of Leuven (KU Leuven), Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, Herestraat 49, 3000 Leuven, Belgium; Vrije Universiteit Brussel, Department of Chemical Engineering, Pleinlaan 2, 1050 Brussel, Belgium
| | - Gert Desmet
- Vrije Universiteit Brussel, Department of Chemical Engineering, Pleinlaan 2, 1050 Brussel, Belgium
| | - Deirdre Cabooter
- University of Leuven (KU Leuven), Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, Herestraat 49, 3000 Leuven, Belgium.
| |
Collapse
|
4
|
Kwon Y, Kwon H, Han J, Kang M, Kim JY, Shin D, Choi YS, Kang S. Retention Time Prediction through Learning from a Small Training Data Set with a Pretrained Graph Neural Network. Anal Chem 2023; 95:17273-17283. [PMID: 37955847 DOI: 10.1021/acs.analchem.3c03177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2023]
Abstract
Graph neural networks (GNNs) have shown remarkable performance in predicting the retention time (RT) for small molecules. However, the training data set for a particular target chromatographic system tends to exhibit scarcity, which poses a challenge because the experimental process for measuring RT is costly. To address this challenge, transfer learning has been used to leverage an abundant training data set from a related source task. In this study, we present an improved transfer learning method to better predict the RT of molecules for a target chromatographic system by learning from a small training data set with a pretrained GNN. We use a graph isomorphism network as the architecture of the GNN. The GNN is pretrained on the METLIN-SMRT data set and is then fine-tuned on the target training data set for a fixed number of training iterations using the limited-memory Broyden-Fletcher-Goldfarb-Shanno optimizer with a learning rate decay. We demonstrate that the proposed method achieves superior predictive performance on various chromatographic systems compared with that of the existing transfer learning methods, especially when only a small training data set is available for use. A potential avenue for future research is to leverage multiple small training data sets from different chromatographic systems to further enhance the generalization performance.
Collapse
Affiliation(s)
- Youngchun Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Hyukju Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
- Department of Chemistry, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Jongmin Han
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Myeonginn Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Ji-Yeong Kim
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Dongyeeb Shin
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Youn-Suk Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Seokho Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| |
Collapse
|
5
|
Kang Q, Fang P, Zhang S, Qiu H, Lan Z. Deep graph convolutional network for small-molecule retention time prediction. J Chromatogr A 2023; 1711:464439. [PMID: 37865024 DOI: 10.1016/j.chroma.2023.464439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/04/2023] [Accepted: 10/06/2023] [Indexed: 10/23/2023]
Abstract
The retention time (RT) is a crucial source of data for liquid chromatography-mass spectrometry (LCMS). A model that can accurately predict the RT for each molecule would empower filtering candidates with similar spectra but differing RT in LCMS-based molecule identification. Recent research shows that graph neural networks (GNNs) outperform traditional machine learning algorithms in RT prediction. However, all of these models use relatively shallow GNNs. This study for the first time investigates how depth affects GNNs' performance on RT prediction. The results demonstrate that a notable improvement can be achieved by pushing the depth of GNNs to 16 layers by the adoption of residual connection. Additionally, we also find that graph convolutional network (GCN) model benefits from the edge information. The developed deep graph convolutional network, DeepGCN-RT, significantly outperforms the previous state-of-the-art method and achieves the lowest mean absolute percentage error (MAPE) of 3.3% and the lowest mean absolute error (MAE) of 26.55 s on the SMRT test set. We also finetune DeepGCN-RT on seven datasets with various chromatographic conditions. The mean MAE of the seven datasets largely decreases 30% compared to previous state-of-the-art method. On the RIKEN-PlaSMA dataset, we also test the effectiveness of DeepGCN-RT in assisting molecular structure identification. By 30% lessening the number of potential structures, DeepGCN-RT is able to improve top-1 accuracy by about 11%.
Collapse
Affiliation(s)
- Qiyue Kang
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China.
| | - Pengfei Fang
- School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, 210096, China
| | - Shuai Zhang
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China
| | - Huachuan Qiu
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China
| | - Zhenzhong Lan
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China.
| |
Collapse
|
6
|
Szucs R, Brown R, Brunelli C, Hradski J, Masár M. Impact of structural similarity on the accuracy of retention time prediction. J Chromatogr A 2023; 1707:464317. [PMID: 37634261 DOI: 10.1016/j.chroma.2023.464317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 08/07/2023] [Accepted: 08/17/2023] [Indexed: 08/29/2023]
Abstract
Quantitative Structure-Retention Relationships offer a valuable tool for de-risking chromatographic methods in relation to newly formed or hypothetical compounds, arising from synthetic processes or formulation activities. They can also be used to identify optimal separation conditions, or in support of structural elucidation. In this contribution, we provide a systematic study of the relationship between the accuracy of the retention model, the size of the training set and its structural similarity to the predicted compound. We compare structural similarity expressed either on a fingerprint basis (e.g., Tanimoto index), or by Euclidean distance calculated from of subset of molecular descriptors. The results presented indicate that accurate and predictive models can be built from a small dataset containing as few as 25 compounds, provided that the training set is structurally similar to the test compound. When the training set contains compounds selected by minimizing the Euclidean distance calculated from 3 descriptors most correlated with the retention time, root mean square error of 0.48 min and correlation coefficient of 0.9464 were observed for the test sets of 104 compounds. Moreover, these models meet the Tropsha predictivity criteria. These findings potentially bring the prediction of retention times within the practical reach of pharmaceutical analysts involved in chromatographic method development. We also present an optimisation approach to select algorithm settings in order to minimize the prediction error and ensure model predictivity.
Collapse
Affiliation(s)
- Roman Szucs
- Department of Analytical Chemistry, Faculty of Natural Sciences, Comenius University Bratislava, Ilkovičova 6, SK-84215 Bratislava, Slovakia.
| | - Roland Brown
- Pfizer R&D UK Limited, Ramsgate Road, Sandwich CT13 9NJ, UK
| | | | - Jasna Hradski
- Department of Analytical Chemistry, Faculty of Natural Sciences, Comenius University Bratislava, Ilkovičova 6, SK-84215 Bratislava, Slovakia
| | - Marián Masár
- Department of Analytical Chemistry, Faculty of Natural Sciences, Comenius University Bratislava, Ilkovičova 6, SK-84215 Bratislava, Slovakia
| |
Collapse
|
7
|
Gedefaw L, Liu CF, Ip RKL, Tse HF, Yeung MHY, Yip SP, Huang CL. Artificial Intelligence-Assisted Diagnostic Cytology and Genomic Testing for Hematologic Disorders. Cells 2023; 12:1755. [PMID: 37443789 PMCID: PMC10340428 DOI: 10.3390/cells12131755] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 06/21/2023] [Accepted: 06/28/2023] [Indexed: 07/15/2023] Open
Abstract
Artificial intelligence (AI) is a rapidly evolving field of computer science that involves the development of computational programs that can mimic human intelligence. In particular, machine learning and deep learning models have enabled the identification and grouping of patterns within data, leading to the development of AI systems that have been applied in various areas of hematology, including digital pathology, alpha thalassemia patient screening, cytogenetics, immunophenotyping, and sequencing. These AI-assisted methods have shown promise in improving diagnostic accuracy and efficiency, identifying novel biomarkers, and predicting treatment outcomes. However, limitations such as limited databases, lack of validation and standardization, systematic errors, and bias prevent AI from completely replacing manual diagnosis in hematology. In addition, the processing of large amounts of patient data and personal information by AI poses potential data privacy issues, necessitating the development of regulations to evaluate AI systems and address ethical concerns in clinical AI systems. Nonetheless, with continued research and development, AI has the potential to revolutionize the field of hematology and improve patient outcomes. To fully realize this potential, however, the challenges facing AI in hematology must be addressed and overcome.
Collapse
Affiliation(s)
- Lealem Gedefaw
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China; (L.G.); (C.-F.L.); (M.H.Y.Y.)
| | - Chia-Fei Liu
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China; (L.G.); (C.-F.L.); (M.H.Y.Y.)
| | - Rosalina Ka Ling Ip
- Department of Pathology, Pamela Youde Nethersole Eastern Hospital, Hong Kong, China; (R.K.L.I.); (H.-F.T.)
| | - Hing-Fung Tse
- Department of Pathology, Pamela Youde Nethersole Eastern Hospital, Hong Kong, China; (R.K.L.I.); (H.-F.T.)
| | - Martin Ho Yin Yeung
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China; (L.G.); (C.-F.L.); (M.H.Y.Y.)
| | - Shea Ping Yip
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China; (L.G.); (C.-F.L.); (M.H.Y.Y.)
| | - Chien-Ling Huang
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China; (L.G.); (C.-F.L.); (M.H.Y.Y.)
| |
Collapse
|
8
|
Wang X, Zheng F, Sheng M, Xu G, Lin X. Retention time prediction for small samples based on integrating molecular representations and adaptive network. J Chromatogr B Analyt Technol Biomed Life Sci 2023; 1217:123624. [PMID: 36780745 DOI: 10.1016/j.jchromb.2023.123624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Revised: 01/13/2023] [Accepted: 01/27/2023] [Indexed: 02/07/2023]
Abstract
Retention time (RT) can provide orthogonal information different from that of mass spectrometry and contribute to identifying compounds. Many machine learning methods have been developed and applied to RT prediction. In application, the training data size is usually small in most chromatography systems. To enhance the performance of RT prediction, this study proposes a RT prediction method based on multi-data combinations and adaptive neural network (MDC-ANN). MDC-ANN establishes the RT prediction model for the target chromatographic system through transfer learning and a base deep learning model trained on a big dataset. It selects the optimal molecular representation combination from the multiple input candidates and automatically determines the neural network structure according to the determined input combination. MDC-ANN was compared with two new efficient deep learning methods, three transferring methods and four popular machine learning methods on 14 small datasets and showed advantages in MAE, MedAE, MRE and R2 in most cases. The experiment results illustrated that integrating multiple molecular representations can provide more information, improve the performance of RT prediction and contribute to compound annotation, different chromatographic systems may use different molecular representation combinations to obtain good RT prediction performance. Hence, MDC-ANN which automatically determines the best combination of molecular representations for a specific system is promising for predicting RTs accurately in real applications.
Collapse
Affiliation(s)
- Xiaoxiao Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Fujian Zheng
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, Liaoning, China.
| | - Meizhen Sheng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, Liaoning, China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China.
| |
Collapse
|
9
|
Šípka M, Erlebach A, Grajciar L. Constructing Collective Variables Using Invariant Learned Representations. J Chem Theory Comput 2023; 19:887-901. [PMID: 36696574 PMCID: PMC9940718 DOI: 10.1021/acs.jctc.2c00729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Indexed: 01/26/2023]
Abstract
On the time scales accessible to atomistic numerical modeling, chemical reactions are considered rare events. Therefore, the atomistic simulations are commonly biased along a low-dimensional representation of a chemical reaction in an atomic structure space, i.e., along the collective variables. However, suitable collective variables are often complicated to guess a priori. We propose a novel method of collective variable discovery based on dimensionality reduction of the atomic representation vectors. These linear-scaling and invariant representations can be either fixed (untrained) or learned by supervised training of the end-to-end machine learning potential. The learned representations are expected to reflect not only the structural but also the energetic features of the system that are transferable to all of the reactive transformation covered by the machine learning potential. We demonstrate our approach on four high-barrier reactions ranging from a simple gas-phase hydrogen jump reaction to complex reactions in periodic models of industrially relevant heterogeneous catalysts. High data efficiency, automatized feature extraction, favorable scaling, and retention of inherent invariances are all properties that are expected to enable fast and largely automatic construction of suitable collective variables even in highly complex reactive scenarios such as reactive/catalytic transformations at solid-liquid interfaces.
Collapse
Affiliation(s)
- Martin Šípka
- Department
of Physical and Macromolecular Chemistry, Faculty of Sciences, Charles University, Hlavova 8, 128 43 Prague 2, Czech Republic
- Mathematical
Institute, Faculty of Mathematics and Physics, Charles University, Sokolovská 83, 186 75 Prague, Czech Republic
| | - Andreas Erlebach
- Department
of Physical and Macromolecular Chemistry, Faculty of Sciences, Charles University, Hlavova 8, 128 43 Prague 2, Czech Republic
| | - Lukáš Grajciar
- Department
of Physical and Macromolecular Chemistry, Faculty of Sciences, Charles University, Hlavova 8, 128 43 Prague 2, Czech Republic
| |
Collapse
|
10
|
Kensert A, Desmet G, Cabooter D. Graph Neural Networks for Improved Retention Time Predictions. LCGC EUROPE 2022. [DOI: 10.56530/lcgc.eu.qt5667e1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
In this extended special feature to celebrate the 35th anniversary edition of LCGC Europe, leading figures from the separation science community explore contemporary trends in separation science and identify possible future developments.
Collapse
|
11
|
Cai Y, Zhou Z, Zhu ZJ. Advanced analytical and informatic strategies for metabolite annotation in untargeted metabolomics. Trends Analyt Chem 2022. [DOI: 10.1016/j.trac.2022.116903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
12
|
Celma A, Bade R, Sancho JV, Hernandez F, Humphries M, Bijlsma L. Prediction of Retention Time and Collision Cross Section (CCS H+, CCS H-, and CCS Na+) of Emerging Contaminants Using Multiple Adaptive Regression Splines. J Chem Inf Model 2022; 62:5425-5434. [PMID: 36280383 PMCID: PMC9709913 DOI: 10.1021/acs.jcim.2c00847] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Ultra-high performance liquid chromatography coupled to ion mobility separation and high-resolution mass spectrometry instruments have proven very valuable for screening of emerging contaminants in the aquatic environment. However, when applying suspect or nontarget approaches (i.e., when no reference standards are available), there is no information on retention time (RT) and collision cross-section (CCS) values to facilitate identification. In silico prediction tools of RT and CCS can therefore be of great utility to decrease the number of candidates to investigate. In this work, Multiple Adaptive Regression Splines (MARS) were evaluated for the prediction of both RT and CCS. MARS prediction models were developed and validated using a database of 477 protonated molecules, 169 deprotonated molecules, and 249 sodium adducts. Multivariate and univariate models were evaluated showing a better fit for univariate models to the experimental data. The RT model (R2 = 0.855) showed a deviation between predicted and experimental data of ±2.32 min (95% confidence intervals). The deviation observed for CCS data of protonated molecules using the CCSH model (R2 = 0.966) was ±4.05% with 95% confidence intervals. The CCSH model was also tested for the prediction of deprotonated molecules, resulting in deviations below ±5.86% for the 95% of the cases. Finally, a third model was developed for sodium adducts (CCSNa, R2 = 0.954) with deviation below ±5.25% for 95% of the cases. The developed models have been incorporated in an open-access and user-friendly online platform which represents a great advantage for third-party research laboratories for predicting both RT and CCS data.
Collapse
Affiliation(s)
- Alberto Celma
- Environmental
and Public Health Analytical
Chemistry, Research Institute for Pesticides
and Water, University Jaume I, E-12071Castelló, Spain,Department
of Aquatic Sciences and Assessment, Swedish
University of Agricultural Sciences (SLU), SE-750 07Uppsala, Sweden
| | - Richard Bade
- University
of South Australia, Adelaide, UniSA: Clinical and Health Sciences,
Health and Biomedical Innovation, AdelaideSA-5000, South
Australia, Australia,Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, 20 Cornwall Street, WoolloongabbaAUS-4102, Queensland, Australia
| | - Juan Vicente Sancho
- Environmental
and Public Health Analytical
Chemistry, Research Institute for Pesticides
and Water, University Jaume I, E-12071Castelló, Spain
| | - Félix Hernandez
- Environmental
and Public Health Analytical
Chemistry, Research Institute for Pesticides
and Water, University Jaume I, E-12071Castelló, Spain
| | - Melissa Humphries
- School
of Mathematical Sciences, University of
Adelaide, Ingkarni Wardli Building, North Terrace Campus, SA-5005Adelaide, Australia,
| | - Lubertus Bijlsma
- Environmental
and Public Health Analytical
Chemistry, Research Institute for Pesticides
and Water, University Jaume I, E-12071Castelló, Spain,
| |
Collapse
|
13
|
Ma P, Zhang Z, Jia X, Peng X, Zhang Z, Tarwa K, Wei CI, Liu F, Wang Q. Neural network in food analytics. Crit Rev Food Sci Nutr 2022; 64:4059-4077. [PMID: 36322538 DOI: 10.1080/10408398.2022.2139217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Neural network (i.e. deep learning, NN)-based data analysis techniques have been listed as a pivotal opportunity to protect the integrity and safety of the global food supply chain and forecast $11.2 billion in agriculture markets. As a general-purpose data analytic tool, NN has been applied in several areas of food science, such as food recognition, food supply chain security and omics analysis, and so on. Therefore, given the rapid emergence of NN applications in food safety, this review aims to provide a comprehensive overview of the NN application in food analysis for the first time, focusing on domain-specific applications in food analysis by introducing fundamental methodology, reviewing recent and notable progress, and discussing challenges and potential pitfalls. NN demonstrated that it has a bright future through effective collaboration between food specialist and the broader community in the food field, for example, superiority in food recognition, sensory evaluation, pattern recognition of spectroscopy and chromatography. However, major challenges impeded NN extension including void in the food scientist-friendly interface software package, incomprehensible model behavior, multi-source heterogeneous data, and so on. The breakthrough from other fields proved NN has the potential to offer a revolution in the immediate future.
Collapse
Affiliation(s)
- Peihua Ma
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| | - Zhikun Zhang
- CISPA Helmholtz Center for Information Security, Saarbrucken, Germany
| | - Xiaoxue Jia
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| | - Xiaoke Peng
- College of Food Science and Engineering, Northwest A&F University, Yangling, Shaanxi, PR China
| | - Zhi Zhang
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| | - Kevin Tarwa
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| | - Cheng-I Wei
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| | - Fuguo Liu
- College of Food Science and Engineering, Northwest A&F University, Yangling, Shaanxi, PR China
| | - Qin Wang
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| |
Collapse
|
14
|
Retention Time Prediction with Message-Passing Neural Networks. SEPARATIONS 2022. [DOI: 10.3390/separations9100291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023] Open
Abstract
Retention time prediction, facilitated by advances in machine learning, has become a useful tool in untargeted LC-MS applications. State-of-the-art approaches include graph neural networks and 1D-convolutional neural networks that are trained on the METLIN small molecule retention time dataset (SMRT). These approaches demonstrate accurate predictions comparable with the experimental error for the training set. The weak point of retention time prediction approaches is the transfer of predictions to various systems. The accuracy of this step depends both on the method of mapping and on the accuracy of the general model trained on SMRT. Therefore, improvements to both parts of prediction workflows may lead to improved compound annotations. Here, we evaluate capabilities of message-passing neural networks (MPNN) that have demonstrated outstanding performance on many chemical tasks to accurately predict retention times. The model was initially trained on SMRT, providing mean and median absolute cross-validation errors of 32 and 16 s, respectively. The pretrained MPNN was further fine-tuned on five publicly available small reversed-phase retention sets in a transfer learning mode and demonstrated up to 30% improvement of prediction accuracy for these sets compared with the state-of-the-art methods. We demonstrated that filtering isomeric candidates by predicted retention with the thresholds obtained from ROC curves eliminates up to 50% of false identities.
Collapse
|
15
|
Tian Z, Liu F, Li D, Fernie AR, Chen W. Strategies for structure elucidation of small molecules based on LC–MS/MS data from complex biological samples. Comput Struct Biotechnol J 2022; 20:5085-5097. [PMID: 36187931 PMCID: PMC9489805 DOI: 10.1016/j.csbj.2022.09.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 09/03/2022] [Accepted: 09/03/2022] [Indexed: 11/06/2022] Open
Abstract
LC–MS/MS is a major analytical platform for metabolomics, which has become a recent hotspot in the research fields of life and environmental sciences. By contrast, structure elucidation of small molecules based on LC–MS/MS data remains a major challenge in the chemical and biological interpretation of untargeted metabolomics datasets. In recent years, several strategies for structure elucidation using LC–MS/MS data from complex biological samples have been proposed, these strategies can be simply categorized into two types, one based on structure annotation of mass spectra and for the other on retention time prediction. These strategies have helped many scientists conduct research in metabolite-related fields and are indispensable for the development of future tools. Here, we summarized the characteristics of the current tools and strategies for structure elucidation of small molecules based on LC–MS/MS data, and further discussed the directions and perspectives to improve the power of the tools or strategies for structure elucidation.
Collapse
|