1
|
Liu Y, Yoshizawa AC, Ling Y, Okuda S. Insights into predicting small molecule retention times in liquid chromatography using deep learning. J Cheminform 2024; 16:113. [PMID: 39375739 PMCID: PMC11460055 DOI: 10.1186/s13321-024-00905-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 09/13/2024] [Indexed: 10/09/2024] Open
Abstract
In untargeted metabolomics, structures of small molecules are annotated using liquid chromatography-mass spectrometry by leveraging information from the molecular retention time (RT) in the chromatogram and m/z (formerly called ''mass-to-charge ratio'') in the mass spectrum. However, correct identification of metabolites is challenging due to the vast array of small molecules. Therefore, various in silico tools for mass spectrometry peak alignment and compound prediction have been developed; however, the list of candidate compounds remains extensive. Accurate RT prediction is important to exclude false candidates and facilitate metabolite annotation. Recent advancements in artificial intelligence (AI) have led to significant breakthroughs in the use of deep learning models in various fields. Release of a large RT dataset has mitigated the bottlenecks limiting the application of deep learning models, thereby improving their application in RT prediction tasks. This review lists the databases that can be used to expand training datasets and concerns the issue about molecular representation inconsistencies in datasets. It also discusses the application of AI technology for RT prediction, particularly in the 5 years following the release of the METLIN small molecule RT dataset. This review provides a comprehensive overview of the AI applications used for RT prediction, highlighting the progress and remaining challenges. SCIENTIFIC CONTRIBUTION: This article focuses on the advancements in small molecule retention time prediction in computational metabolomics over the past five years, with a particular emphasis on the application of AI technologies in this field. It reviews the publicly available datasets for small molecule retention time, the molecular representation methods, the AI algorithms applied in recent studies. Furthermore, it discusses the effectiveness of these models in assisting with the annotation of small molecule structures and the challenges that must be addressed to achieve practical applications.
Collapse
Affiliation(s)
- Yuting Liu
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Akiyasu C Yoshizawa
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Yiwei Ling
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Shujiro Okuda
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan.
| |
Collapse
|
2
|
Zhang Y, Liu F, Li XQ, Gao Y, Li KC, Zhang QH. Retention time dataset for heterogeneous molecules in reversed-phase liquid chromatography. Sci Data 2024; 11:946. [PMID: 39209861 PMCID: PMC11362277 DOI: 10.1038/s41597-024-03780-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 08/14/2024] [Indexed: 09/04/2024] Open
Abstract
Quantitative structure-property relationships have been extensively studied in the field of predicting retention times in liquid chromatography (LC). However, making transferable predictions is inherently complex because retention times are influenced by both the structure of the molecule and the chromatographic method used. Despite decades of development and numerous published machine learning models, the practical application of predicting small molecule retention time remains limited. The resulting models are typically limited to specific chromatographic conditions and the molecules used in their training and evaluation. Here, we have developed a comprehensive dataset comprising over 10,000 experimental retention times. These times were derived from 30 different reversed-phase liquid chromatography methods and pertain to a collection of 343 small molecules representing a wide range of chemical structures. These chromatographic methods encompass common LC setups for studying the retention behavior of small molecules. They offer a wide range of examples for modeling retention time with different LC setups.
Collapse
Affiliation(s)
- Yan Zhang
- Key Laboratory of Groundwater Conservation of MWR, China University of Geosciences, Beijing, 100083, People's Republic of China
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Fei Liu
- Key Laboratory of Groundwater Conservation of MWR, China University of Geosciences, Beijing, 100083, People's Republic of China.
| | - Xiu Qin Li
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Yan Gao
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Kang Cong Li
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Qing He Zhang
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China.
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China.
| |
Collapse
|
3
|
Hupatz H, Rahu I, Wang WC, Peets P, Palm EH, Kruve A. Critical review on in silico methods for structural annotation of chemicals detected with LC/HRMS non-targeted screening. Anal Bioanal Chem 2024:10.1007/s00216-024-05471-x. [PMID: 39138659 DOI: 10.1007/s00216-024-05471-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 07/22/2024] [Accepted: 07/24/2024] [Indexed: 08/15/2024]
Abstract
Non-targeted screening with liquid chromatography coupled to high-resolution mass spectrometry (LC/HRMS) is increasingly leveraging in silico methods, including machine learning, to obtain candidate structures for structural annotation of LC/HRMS features and their further prioritization. Candidate structures are commonly retrieved based on the tandem mass spectral information either from spectral or structural databases; however, the vast majority of the detected LC/HRMS features remain unannotated, constituting what we refer to as a part of the unknown chemical space. Recently, the exploration of this chemical space has become accessible through generative models. Furthermore, the evaluation of the candidate structures benefits from the complementary empirical analytical information such as retention time, collision cross section values, and ionization type. In this critical review, we provide an overview of the current approaches for retrieving and prioritizing candidate structures. These approaches come with their own set of advantages and limitations, as we showcase in the example of structural annotation of ten known and ten unknown LC/HRMS features. We emphasize that these limitations stem from both experimental and computational considerations. Finally, we highlight three key considerations for the future development of in silico methods.
Collapse
Affiliation(s)
- Henrik Hupatz
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden
- Stockholm University Center for Circular and Sustainable Systems (SUCCeSS), Stockholm University, 106 91, Stockholm, Sweden
| | - Ida Rahu
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden.
| | - Wei-Chieh Wang
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden
| | - Pilleriin Peets
- Institute of Biodiversity, Faculty of Biological Science, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07743, Jena, Germany
| | - Emma H Palm
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, 4367, Belvaux, Luxembourg
| | - Anneli Kruve
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden.
- Stockholm University Center for Circular and Sustainable Systems (SUCCeSS), Stockholm University, 106 91, Stockholm, Sweden.
- Department of Environmental Science, Stockholm University, Svante Arrhenius Väg 8, 114 18, Stockholm, Sweden.
| |
Collapse
|
4
|
Luo W, Chou L, Cui Q, Wei S, Zhang X, Guo J. High-efficiency effect-directed analysis (EDA) advancing toxicant identification in aquatic environments: Latest progress and application status. ENVIRONMENT INTERNATIONAL 2024; 190:108855. [PMID: 38945088 DOI: 10.1016/j.envint.2024.108855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 05/21/2024] [Accepted: 06/26/2024] [Indexed: 07/02/2024]
Abstract
Facing the great threats to ecosystems and human health posed by the continuous release of chemicals into aquatic environments, effect-directed analysis (EDA) has emerged as a powerful tool for identifying causative toxicants. However, traditional EDA shows problems of low-coverage, labor-intensive and low-efficiency. Currently, a number of high-efficiency techniques have been integrated into EDA to improve toxicant identification. In this review, the latest progress and current limitations of high-efficiency EDA, comprising high-coverage effect evaluation, high-resolution fractionation, high-coverage chemical analysis, high-automation causative peak extraction and high-efficiency structure elucidation, are summarized. Specifically, high-resolution fractionation, high-automation data processing algorithms and in silico structure elucidation techniques have been well developed to enhance EDA. While high-coverage effect evaluation and chemical analysis should be further emphasized, especially omics tools and data-independent mass acquisition. For the application status in aquatic environments, high-efficiency EDA is widely applied in surface water and wastewater. Estrogenic, androgenic and aryl hydrocarbon receptor-mediated activities are the most concerning, with causative toxicants showing the typical structural features of steroids and benzenoids. A better understanding of the latest progress and application status of EDA would be beneficial to further advance in the field and greatly support aquatic environment monitoring.
Collapse
Affiliation(s)
- Wenrui Luo
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Liben Chou
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Qinglan Cui
- Bluestar Lehigh Engineering Institute Co., Ltd., Lianyungang 222004, China
| | - Si Wei
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Xiaowei Zhang
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Jing Guo
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China; Jiangsu Province Ecology and Environment Protection Key Laboratory of Chemical Safety and Health Risk, Nanjing 210023, China.
| |
Collapse
|
5
|
Samanipour S, Barron LP, van Herwerden D, Praetorius A, Thomas KV, O’Brien JW. Exploring the Chemical Space of the Exposome: How Far Have We Gone? JACS AU 2024; 4:2412-2425. [PMID: 39055136 PMCID: PMC11267556 DOI: 10.1021/jacsau.4c00220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 05/29/2024] [Accepted: 05/31/2024] [Indexed: 07/27/2024]
Abstract
Around two-thirds of chronic human disease can not be explained by genetics alone. The Lancet Commission on Pollution and Health estimates that 16% of global premature deaths are linked to pollution. Additionally, it is now thought that humankind has surpassed the safe planetary operating space for introducing human-made chemicals into the Earth System. Direct and indirect exposure to a myriad of chemicals, known and unknown, poses a significant threat to biodiversity and human health, from vaccine efficacy to the rise of antimicrobial resistance as well as autoimmune diseases and mental health disorders. The exposome chemical space remains largely uncharted due to the sheer number of possible chemical structures, estimated at over 1060 unique forms. Conventional methods have cataloged only a fraction of the exposome, overlooking transformation products and often yielding uncertain results. In this Perspective, we have reviewed the latest efforts in mapping the exposome chemical space and its subspaces. We also provide our view on how the integration of data-driven approaches might be able to bridge the identified gaps.
Collapse
Affiliation(s)
- Saer Samanipour
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- UvA
Data Science Center, University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Cornwall Street, Woolloongabba, Queensland 4102, Australia
| | - Leon Patrick Barron
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- MRC
Centre for Environment and Health, Environmental Research Group, School
of Public Health, Faculty of Medicine, Imperial
College London, London W12 0BZ, United Kingdom
| | - Denice van Herwerden
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
| | - Antonia Praetorius
- Institute
for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
| | - Kevin V. Thomas
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Cornwall Street, Woolloongabba, Queensland 4102, Australia
| | - Jake William O’Brien
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Cornwall Street, Woolloongabba, Queensland 4102, Australia
| |
Collapse
|
6
|
Song D, Tang T, Wang R, Liu H, Xie D, Zhao B, Dang Z, Lu G. Enhancing compound confidence in suspect and non-target screening through machine learning-based retention time prediction. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024; 347:123763. [PMID: 38492749 DOI: 10.1016/j.envpol.2024.123763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 02/26/2024] [Accepted: 03/09/2024] [Indexed: 03/18/2024]
Abstract
The retention time (RT) of contaminants of emerging concern (CECs) in liquid chromatography-high-resolution mass spectrometry (LC-HRMS) is crucial for database matching in non-targeted screening (NTS) analysis. In this study, we developed a machine learning (ML) model to predict RTs of CECs in NTS analysis. Using 1051 CEC standards, we evaluated Random Forest (RF), XGBoost, Support Vector Regression (SVR), and Artificial Neural Network (ANN) with molecular fingerprints and chemical descriptors to establish an optimal model. The SVR model utilizing chemical descriptors resulted in good predictive capacity with R2ext = 0.850 and r2 = 0.925. The model was further validated through laboratory NTS compound characterization. When applied to examine CEC occurrence in a large wastewater treatment plant, we identified 40 level S1 CECs (confirmed structure by reference standard) and 234 level S2 compounds (probable structure by library spectrum match). The model predicted RTs for level S2 compounds, leading to the classification of 153 level S2 compounds with high confidence (ΔRT <2 min). The model served as a robust filtering mechanism within the analytical framework. This study emphasizes the importance of predicted RTs in NTS analysis and highlights the potential of prediction models. Our research introduces a workflow that enhances NTS analysis by utilizing RT prediction models to determine compound confidence levels.
Collapse
Affiliation(s)
- Dehao Song
- School of Environment and Energy, South China University of Technology, Guangzhou, 510006, China
| | - Ting Tang
- School of Environment and Energy, South China University of Technology, Guangzhou, 510006, China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, 510006, China.
| | - Rui Wang
- South China Institute of Environmental Sciences, Ministry of Ecology and Environment, Guangzhou, 510655, China; Guangxi Key Laboratory of Emerging Contaminants Monitoring, Early Warning and Environmental Health Risk Assessment, Nanning, 530000, China
| | - He Liu
- South China Institute of Environmental Sciences, Ministry of Ecology and Environment, Guangzhou, 510655, China; Guangxi Key Laboratory of Emerging Contaminants Monitoring, Early Warning and Environmental Health Risk Assessment, Nanning, 530000, China
| | - Danping Xie
- South China Institute of Environmental Sciences, Ministry of Ecology and Environment, Guangzhou, 510655, China; Guangxi Key Laboratory of Emerging Contaminants Monitoring, Early Warning and Environmental Health Risk Assessment, Nanning, 530000, China
| | - Bo Zhao
- South China Institute of Environmental Sciences, Ministry of Ecology and Environment, Guangzhou, 510655, China; Guangxi Key Laboratory of Emerging Contaminants Monitoring, Early Warning and Environmental Health Risk Assessment, Nanning, 530000, China
| | - Zhi Dang
- School of Environment and Energy, South China University of Technology, Guangzhou, 510006, China; Guangdong Provincial Key Laboratory of Solid Wastes Pollution Control and Recycling, South China University of Technology, Guangzhou, 510006, China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, 510006, China
| | - Guining Lu
- School of Environment and Energy, South China University of Technology, Guangzhou, 510006, China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, 510006, China
| |
Collapse
|
7
|
Mazraedoost S, Žuvela P, Ulenberg S, Bączek T, Liu JJ. Cross-column density functional theory-based quantitative structure-retention relationship model development powered by machine learning. Anal Bioanal Chem 2024:10.1007/s00216-024-05243-7. [PMID: 38507043 DOI: 10.1007/s00216-024-05243-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 03/03/2024] [Accepted: 03/06/2024] [Indexed: 03/22/2024]
Abstract
Quantitative structure-retention relationship (QSRR) modeling has emerged as an efficient alternative to predict analyte retention times using molecular descriptors. However, most reported QSRR models are column-specific, requiring separate models for each high-performance liquid chromatography (HPLC) system. This study evaluates the potential of machine learning (ML) algorithms and quantum mechanical (QM) descriptors to develop QSRR models that can predict retention times across three different reversed-phase HPLC columns under varying conditions. Four machine learning methods-partial least squares (PLS) regression, ridge regression (RR), random forest (RF), and gradient boosting (GB)-were compared on a dataset of 360 retention times for 15 aromatic analytes. Molecular descriptors were calculated using density functional theory (DFT). Column characteristics like particle size and pore size and experimental conditions like temperature and gradient time were additionally used as descriptors. Results showed that the GB-QSRR model demonstrated the best predictive performance, with Q2 of 0.989 and root mean square error of prediction (RMSEP) of 0.749 min on the test set. Feature analysis revealed that solvation energy (SE), HOMO-LUMO energy gap (∆E HOMO-LUMO), total dipole moment (Mtot), and global hardness (η) are among the most influential predictors for retention time prediction, indicating the significance of electrostatic interactions and hydrophobicity. Our findings underscore the efficiency of ensemble methods, GB and RF models employing non-linear learners, in capturing local variations in retention times across diverse experimental setups. This study emphasizes the potential of cross-column QSRR modeling and highlights the utility of ML models in optimizing chromatographic analysis.
Collapse
Affiliation(s)
- Sargol Mazraedoost
- Intelligent Systems Laboratory, Department of Chemical Engineering, Pukyong National University, Busan, 48513, Republic of Korea
| | - Petar Žuvela
- Intelligent Systems Laboratory, Department of Chemical Engineering, Pukyong National University, Busan, 48513, Republic of Korea
| | - Szymon Ulenberg
- Department of Pharmaceutical Chemistry, Medical University of Gdańsk, Gen. J. Hallera 107, 80-416, Gdańsk, Poland
| | - Tomasz Bączek
- Department of Pharmaceutical Chemistry, Medical University of Gdańsk, Gen. J. Hallera 107, 80-416, Gdańsk, Poland
| | - J Jay Liu
- Intelligent Systems Laboratory, Department of Chemical Engineering, Pukyong National University, Busan, 48513, Republic of Korea.
- Institute of Cleaner Production Technology, Pukyong National University, (48513) 45, Yongso-Ro, Nam-Gu, Busan, South Korea.
| |
Collapse
|
8
|
Zhang Y, Liu F, Li XQ, Gao Y, Li KC, Zhang QH. Generic and accurate prediction of retention times in liquid chromatography by post-projection calibration. Commun Chem 2024; 7:54. [PMID: 38459241 PMCID: PMC10923921 DOI: 10.1038/s42004-024-01135-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 02/21/2024] [Indexed: 03/10/2024] Open
Abstract
Retention time predictions from molecule structures in liquid chromatography (LC) are increasingly used in MS-based targeted and untargeted analyses, providing supplementary evidence for molecule annotation and reducing experimental measurements. Nevertheless, different LC setups (e.g., differences in gradient, column, and/or mobile phase) give rise to many prediction models that can only accurately predict retention times for a specific chromatographic method (CM). Here, a generic and accurate method is present to predict retention times across different CMs, by introducing the concept of post-projection calibration. This concept builds on the direct projections of retention times between different CMs and uses 35 external calibrants to eliminate the impact of LC setups on projection accuracy. Results showed that post-projection calibration consistently achieved a median projection error below 3.2% of the elution time. The ranking results of putative candidates reached similar levels among different CMs. This work opens up broad possibilities for coordinating retention times between different laboratories and developing extensive retention databases.
Collapse
Affiliation(s)
- Yan Zhang
- Key Laboratory of Groundwater Conservation of MWR, China University of Geosciences, Beijing, 100083, People's Republic of China
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Fei Liu
- Key Laboratory of Groundwater Conservation of MWR, China University of Geosciences, Beijing, 100083, People's Republic of China.
| | - Xiu Qin Li
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Yan Gao
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Kang Cong Li
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Qing He Zhang
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China.
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China.
| |
Collapse
|
9
|
Xue J, Wang B, Ji H, Li W. RT-Transformer: retention time prediction for metabolite annotation to assist in metabolite identification. Bioinformatics 2024; 40:btae084. [PMID: 38402516 PMCID: PMC10914443 DOI: 10.1093/bioinformatics/btae084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 01/14/2024] [Accepted: 02/22/2024] [Indexed: 02/26/2024] Open
Abstract
MOTIVATION Liquid chromatography retention times prediction can assist in metabolite identification, which is a critical task and challenge in nontargeted metabolomics. However, different chromatographic conditions may result in different retention times for the same metabolite. Current retention time prediction methods lack sufficient scalability to transfer from one specific chromatographic method to another. RESULTS Therefore, we present RT-Transformer, a novel deep neural network model coupled with graph attention network and 1D-Transformer, which can predict retention times under any chromatographic methods. First, we obtain a pre-trained model by training RT-Transformer on the large small molecule retention time dataset containing 80 038 molecules, and then transfer the resulting model to different chromatographic methods based on transfer learning. When tested on the small molecule retention time dataset, as other authors did, the average absolute error reached 27.30 after removing not retained molecules. Still, it reached 33.41 when no samples were removed. The pre-trained RT-Transformer was further transferred to 5 datasets corresponding to different chromatographic conditions and fine-tuned. According to the experimental results, RT-Transformer achieves competitive performance compared to state-of-the-art methods. In addition, RT-Transformer was applied to 41 external molecular retention time datasets. Extensive evaluations indicate that RT-Transformer has excellent scalability in predicting retention times for liquid chromatography and improves the accuracy of metabolite identification. AVAILABILITY AND IMPLEMENTATION The source code for the model is available at https://github.com/01dadada/RT-Transformer. The web server is available at https://huggingface.co/spaces/Xue-Jun/RT-Transformer.
Collapse
Affiliation(s)
- Jun Xue
- School of Information Science and Engineering, Yunnan University, Kunming, Yunnan 650500, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Bingyi Wang
- Yunnan Police College, Kunming, Yunnan 650223, China
- Key Laboratory of Smart Drugs Control (Yunnan Police College), Ministry of Education, Kunming, Yunnan 650223, China
| | - Hongchao Ji
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - WeiHua Li
- School of Information Science and Engineering, Yunnan University, Kunming, Yunnan 650500, China
| |
Collapse
|
10
|
Allwright M, Guennewig B, Hoffmann AE, Rohleder C, Jieu B, Chung LH, Jiang YC, Lemos Wimmer BF, Qi Y, Don AS, Leweke FM, Couttas TA. ReTimeML: a retention time predictor that supports the LC-MS/MS analysis of sphingolipids. Sci Rep 2024; 14:4375. [PMID: 38388524 PMCID: PMC10883992 DOI: 10.1038/s41598-024-53860-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 02/06/2024] [Indexed: 02/24/2024] Open
Abstract
The analysis of ceramide (Cer) and sphingomyelin (SM) lipid species using liquid chromatography-tandem mass spectrometry (LC-MS/MS) continues to present challenges as their precursor mass and fragmentation can correspond to multiple molecular arrangements. To address this constraint, we developed ReTimeML, a freeware that automates the expected retention times (RTs) for Cer and SM lipid profiles from complex chromatograms. ReTimeML works on the principle that LC-MS/MS experiments have pre-determined RTs from internal standards, calibrators or quality controls used throughout the analysis. Employed as reference RTs, ReTimeML subsequently extrapolates the RTs of unknowns using its machine-learned regression library of mass-to-charge (m/z) versus RT profiles, which does not require model retraining for adaptability on different LC-MS/MS pipelines. We validated ReTimeML RT estimations for various Cer and SM structures across different biologicals, tissues and LC-MS/MS setups, exhibiting a mean variance between 0.23 and 2.43% compared to user annotations. ReTimeML also aided the disambiguation of SM identities from isobar distributions in paired serum-cerebrospinal fluid from healthy volunteers, allowing us to identify a series of non-canonical SMs associated between the two biofluids comprised of a polyunsaturated structure that confers increased stability against catabolic clearance.
Collapse
Affiliation(s)
- Michael Allwright
- ForeFront, Brain and Mind Centre, The University of Sydney, Sydney, Australia
| | - Boris Guennewig
- ForeFront, Brain and Mind Centre, The University of Sydney, Sydney, Australia
| | - Anna E Hoffmann
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Endosane Pharmaceuticals GmbH, Berlin, Germany
| | - Cathrin Rohleder
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Endosane Pharmaceuticals GmbH, Berlin, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Beverly Jieu
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Long H Chung
- Centenary Institute, The University of Sydney, Sydney, Australia
| | - Yingxin C Jiang
- Centenary Institute, The University of Sydney, Sydney, Australia
| | - Bruno F Lemos Wimmer
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Yanfei Qi
- Centenary Institute, The University of Sydney, Sydney, Australia
| | - Anthony S Don
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - F Markus Leweke
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Endosane Pharmaceuticals GmbH, Berlin, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Timothy A Couttas
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia.
| |
Collapse
|
11
|
Kretschmer F, Harrieder EM, Hoffmann MA, Böcker S, Witting M. RepoRT: a comprehensive repository for small molecule retention times. Nat Methods 2024; 21:153-155. [PMID: 38191934 DOI: 10.1038/s41592-023-02143-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Affiliation(s)
- Fleming Kretschmer
- Chair for Bioinformatics, Institute for Computer Science, Friedrich Schiller University Jena, Jena, Germany
| | - Eva-Maria Harrieder
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Neuherberg, Germany
| | - Martin A Hoffmann
- Chair for Bioinformatics, Institute for Computer Science, Friedrich Schiller University Jena, Jena, Germany
- Bright Giant GmbH, Jena, Germany
| | - Sebastian Böcker
- Chair for Bioinformatics, Institute for Computer Science, Friedrich Schiller University Jena, Jena, Germany.
| | - Michael Witting
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Neuherberg, Germany.
- Metabolomics and Proteomics Core, Helmholtz Zentrum München, Neuherberg, Germany.
- Chair of Analytical Food Chemistry, TU München, Freising, Germany.
| |
Collapse
|
12
|
Sandström H, Rissanen M, Rousu J, Rinke P. Data-Driven Compound Identification in Atmospheric Mass Spectrometry. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2306235. [PMID: 38095508 PMCID: PMC10885664 DOI: 10.1002/advs.202306235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/04/2023] [Indexed: 02/24/2024]
Abstract
Aerosol particles found in the atmosphere affect the climate and worsen air quality. To mitigate these adverse impacts, aerosol particle formation and aerosol chemistry in the atmosphere need to be better mapped out and understood. Currently, mass spectrometry is the single most important analytical technique in atmospheric chemistry and is used to track and identify compounds and processes. Large amounts of data are collected in each measurement of current time-of-flight and orbitrap mass spectrometers using modern rapid data acquisition practices. However, compound identification remains a major bottleneck during data analysis due to lacking reference libraries and analysis tools. Data-driven compound identification approaches could alleviate the problem, yet remain rare to non-existent in atmospheric science. In this perspective, the authors review the current state of data-driven compound identification with mass spectrometry in atmospheric science and discuss current challenges and possible future steps toward a digital era for atmospheric mass spectrometry.
Collapse
Affiliation(s)
- Hilda Sandström
- Department of Applied Physics, Aalto University, P.O. Box 11000, FI-00076, Aalto, Espoo, Finland
| | - Matti Rissanen
- Aerosol Physics Laboratory, Tampere University, FI-33720, Tampere, Finland
- Department of Chemistry, University of Helsinki, P.O. Box 55, A.I. Virtasen aukio 1, FI-00560, Helsinki, Finland
| | - Juho Rousu
- Department of Computer Science, Aalto University, P.O. Box 11000, FI-00076, Aalto, Espoo, Finland
| | - Patrick Rinke
- Department of Applied Physics, Aalto University, P.O. Box 11000, FI-00076, Aalto, Espoo, Finland
| |
Collapse
|
13
|
Ma Y, Cao Y, Song X, Min C, Man Z, Li Z. BART: A transferable liquid chromatography retention time library for bile acids. J Chromatogr A 2024; 1715:464602. [PMID: 38159405 DOI: 10.1016/j.chroma.2023.464602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/21/2023] [Accepted: 12/22/2023] [Indexed: 01/03/2024]
Abstract
Identification of unknown bile acids, especially the distinguishment between isomers, requires retention times of a large number of reference standards, which are often not commercially available. Meanwhile, published retention information cannot be directly transferred across labs due to the differences between liquid chromatography (LC) systems, such as different extra column volume and dwell volume. To improve this situation, a transferrable retention time library for bile acids named BART was developed. BART was composed of isocratic retention models of 272 bile acids and a software tool to predict their gradient retention times on various LC systems. The isocratic retention times of bile acids were acquired on a Waters BEH C18 column with mobile phases of acidic ammonium acetate buffer and acetonitrile, and fit to the quadratic solvent strength model (QSSM). Segmented linear gradient retention times were calculated with holdup time (t0), dwell time (tD) and actual gradient profile corrected using 21 bile acid calibration standards. In addition to the reference system where the isocratic retention times were acquired, this approach has been validated on four other LC-MS systems in four labs with two gradient methods. Average root mean square errors (RMSE) between predicted and experimental retention times were 0.052 and 0.054 min for the two gradients tested, which were 9-fold more accurate than referring to a static retention time library. The library is freely available at https://bafinder.github.io/.
Collapse
Affiliation(s)
- Yan Ma
- National Institute of Biological Sciences, Beijing 102206, China; Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China.
| | - Yang Cao
- National Institute of Biological Sciences, Beijing 102206, China
| | - Xiaocui Song
- National Institute of Biological Sciences, Beijing 102206, China
| | - Chunyan Min
- Suzhou Institute for Drug Control, Suzhou 215104, China
| | - Zhuo Man
- SCIEX China, Beijing 100015, China
| | - Zhen Li
- State Key Laboratory of Plant Environmental Resilience, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
14
|
Torigoe T, Takahashi M, Heravizadeh O, Ikeda K, Nakatani K, Bamba T, Izumi Y. Predicting Retention Time in Unified-Hydrophilic-Interaction/Anion-Exchange Liquid Chromatography High-Resolution Tandem Mass Spectrometry (Unified-HILIC/AEX/HRMS/MS) for Comprehensive Structural Annotation of Polar Metabolome. Anal Chem 2024; 96:1275-1283. [PMID: 38186224 DOI: 10.1021/acs.analchem.3c04618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
The accuracy of the structural annotation of unidentified peaks obtained in metabolomic analysis using liquid chromatography/tandem mass spectrometry (LC/MS/MS) can be enhanced using retention time (RT) information as well as precursor and product ions. Unified-hydrophilic-interaction/anion-exchange liquid chromatography high-resolution tandem mass spectrometry (unified-HILIC/AEX/HRMS/MS) has been recently developed as an innovative method ideal for nontargeted polar metabolomics. However, the RT prediction for unified-HILIC/AEX has not been developed because of the complex separation mechanism characterized by the continuous transition of the separation modes from HILIC to AEX. In this study, we propose an RT prediction model of unified-HILIC/AEX/HRMS/MS, which enables the comprehensive structural annotation of polar metabolites. With training data for 203 polar metabolites, we ranked the feature importance using a random forest among 12,420 molecular descriptors (MDs) and constructed an RT prediction model with 26 selected MDs. The accuracy of the RT model was evaluated using test data for 51 polar metabolites, and 86.3% of the ΔRTs (difference between measured and predicted RTs) were within ±1.50 min, with a mean absolute error of 0.80 min, indicating high RT prediction accuracy. Nontargeted metabolomic data from the NIST SRM 1950-Metabolites in frozen human plasma were analyzed using the developed RT model and in silico MS/MS prediction, resulting in a successful structural estimation of 216 polar metabolites, in addition to the 62 identified based on standards. The proposed model can help accelerate the structural annotation of unknown hydrophilic metabolites, which is a key issue in metabolomic research.
Collapse
Affiliation(s)
- Taihei Torigoe
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Masatomo Takahashi
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
- Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Omidreza Heravizadeh
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Kazuki Ikeda
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Kohta Nakatani
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
- Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Takeshi Bamba
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
- Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Yoshihiro Izumi
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
- Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| |
Collapse
|
15
|
Zhang J, Zhou Y, Lei J, Liu X, Zhang N, Wu L, Li Y. Retention time prediction and MRM validation reinforce the biomarker identification of LC-MS based phospholipidomics. Analyst 2024; 149:515-527. [PMID: 38078496 DOI: 10.1039/d3an01735d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
Dysfunctional lipid metabolism plays a crucial role in the development and progression of various diseases. Accurate measurement of lipidomes can help uncover the complex interactions between genes, proteins, and lipids in health and diseases. The prediction of retention time (RT) has become increasingly important in both targeted and untargeted metabolomics. However, the potential impact of RT prediction on targeted LC-MS based lipidomics is still not fully understood. Herein, we propose a simplified workflow for predicting RT in phospholipidomics. Our approach involves utilizing the fatty acyl chain length or carbon-carbon double bond (DB) number in combination with multiple reaction monitoring (MRM) validation. We found that our model's predictive capacity for RT was comparable to that of a publicly accessible program (QSRR Automator). Additionally, MRM validation helped in further mitigating the interference in signal recognition. Using this developed workflow, we conducted phospholipidomics of sorafenib resistant hepatocellular carcinoma (HCC) cell lines, namely MHCC97H and Hep3B. Our findings revealed an abundance of monounsaturated fatty acyl (MUFA) or polyunsaturated fatty acyl (PUFA) phospholipids in these cell lines after developing drug resistance. In both cell lines, a total of 29 lipids were found to be co-upregulated and 5 lipids were co-downregulated. Further validation was conducted on seven of the upregulated lipids using an independent dataset, which demonstrates the potential for translation of the established workflow or the lipid biomarkers.
Collapse
Affiliation(s)
- Jiangang Zhang
- Department of Medical Oncology, Chongqing University Cancer Hospital, Chongqing 400030, China.
| | - Yu Zhou
- Department of Medical Oncology, Chongqing University Cancer Hospital, Chongqing 400030, China.
| | - Juan Lei
- Department of Medical Oncology, Chongqing University Cancer Hospital, Chongqing 400030, China.
| | - Xudong Liu
- Department of Medical Oncology, Chongqing University Cancer Hospital, Chongqing 400030, China.
| | - Nan Zhang
- Department of Medical Oncology, Chongqing University Cancer Hospital, Chongqing 400030, China.
| | - Lei Wu
- Department of Medical Oncology, Chongqing University Cancer Hospital, Chongqing 400030, China.
| | - Yongsheng Li
- Department of Medical Oncology, Chongqing University Cancer Hospital, Chongqing 400030, China.
| |
Collapse
|
16
|
Pérez-Victoria I. Natural Products Dereplication: Databases and Analytical Methods. PROGRESS IN THE CHEMISTRY OF ORGANIC NATURAL PRODUCTS 2024; 124:1-56. [PMID: 39101983 DOI: 10.1007/978-3-031-59567-7_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/06/2024]
Abstract
The development of efficient methods for dereplication has been critical in the re-emergence of the research in natural products as a source of drug leads. Current dereplication workflows rapidly identify already known bioactive secondary metabolites in the early stages of any drug discovery screening campaign based on natural extracts or enriched fractions. Two main factors have driven the evolution of natural products dereplication over the last decades. First, the availability of both commercial and public large databases of natural products containing the key annotations against which the biological and chemical data derived from the studied sample are searched for. Second, the considerable improvement achieved in analytical technologies (including instrumentation and software tools) employed to obtain robust and precise chemical information (particularly spectroscopic signatures) on the compounds present in the bioactive natural product samples. This chapter describes the main methods of dereplication, which rely on the combined use of large natural product databases and spectral libraries, alongside the information obtained from chromatographic, UV-Vis, MS, and NMR spectroscopic analyses of the samples of interest.
Collapse
Affiliation(s)
- Ignacio Pérez-Victoria
- Fundación MEDINA, Centro de Excelencia en Investigación de Medicamentos Innovadores en Andalucía, Parque Tecnológico de Ciencias de La Salud, Avda. del Conocimiento 34, 18016, Armilla, Granada, Spain.
| |
Collapse
|
17
|
Chaker J, Gilles E, Monfort C, Chevrier C, Lennon S, David A. Scannotation: A Suspect Screening Tool for the Rapid Pre-Annotation of the Human LC-HRMS-Based Chemical Exposome. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:19253-19262. [PMID: 37968235 DOI: 10.1021/acs.est.3c04764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2023]
Abstract
In an increasingly chemically polluted environment, rapidly characterizing the human chemical exposome (i.e., chemical mixtures accumulating in humans) at the population scale is critical to understand its impact on health. High-resolution mass spectrometry (HRMS) profiling of complex biological matrices can theoretically provide a comprehensive picture of chemical exposures. However, annotating the detected chemical features, particularly low-abundant ones, remains a significant obstacle to implementing such approaches at a large scale. We present Scannotation (https://github.com/scannotation/Scannotation_software), an automated and user-friendly suspect screening tool for the rapid pre-annotation of HRMS preprocessed data sets. This software tool combines several MS1 chemical predictors, i.e., m/z, experimental and predicted retention times, isotopic patterns, and neutral loss patterns, to score the proximity between features and suspects, thus efficiently prioritizing tentative annotations to verify. Scannotation and MS-DIAL4 were used to annotate blood serum samples of 75 Breton adolescents. Scannotation's combination of MS1-based chemical predictors allowed us to annotate 89 chemically diverse environmental compounds with high confidence (confirmed by MS2 when available). These compounds included 62% of emerging molecules, for which no toxicological or human biomonitoring data are reported in the literature. The complementarity observed with MS-DIAL4 results demonstrates the relevance of Scannotation for the efficient pre-annotation of large-scale exposomics data sets.
Collapse
Affiliation(s)
- Jade Chaker
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35000 Rennes, France
| | - Erwann Gilles
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35000 Rennes, France
| | - Christine Monfort
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35000 Rennes, France
| | - Cécile Chevrier
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35000 Rennes, France
| | - Sarah Lennon
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35000 Rennes, France
| | - Arthur David
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35000 Rennes, France
| |
Collapse
|
18
|
Witting M. (Re-)use and (re-)analysis of publicly available metabolomics data. Proteomics 2023; 23:e2300032. [PMID: 37670538 DOI: 10.1002/pmic.202300032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 08/23/2023] [Accepted: 08/24/2023] [Indexed: 09/07/2023]
Abstract
Metabolomics, the systematic measurement of small molecules (<1000 Da) in a given biological sample, is a fast-growing field with many different applications. In contrast to transcriptomics and proteomics, sharing of data is not as widespread in metabolomics, though more scientists are sharing their data nowadays. However, to improve data analysis tools and develop new data analytical approaches and to improve metabolite annotation and identification, sharing of reference data is crucial. Here, different possibilities to share (metabolomics) data are reviewed and some recent approaches and applications regarding the (re-)use and (re-)analysis are highlighted.
Collapse
Affiliation(s)
- Michael Witting
- Metabolomics and Proteomics Core, Helmholtz Zentrum München, Neuherberg, Germany
- Chair of Analytical Food Chemistry, TUM School of Life Sciences, Freising-Weihenstephan, Germany
| |
Collapse
|
19
|
Kwon Y, Kwon H, Han J, Kang M, Kim JY, Shin D, Choi YS, Kang S. Retention Time Prediction through Learning from a Small Training Data Set with a Pretrained Graph Neural Network. Anal Chem 2023; 95:17273-17283. [PMID: 37955847 DOI: 10.1021/acs.analchem.3c03177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2023]
Abstract
Graph neural networks (GNNs) have shown remarkable performance in predicting the retention time (RT) for small molecules. However, the training data set for a particular target chromatographic system tends to exhibit scarcity, which poses a challenge because the experimental process for measuring RT is costly. To address this challenge, transfer learning has been used to leverage an abundant training data set from a related source task. In this study, we present an improved transfer learning method to better predict the RT of molecules for a target chromatographic system by learning from a small training data set with a pretrained GNN. We use a graph isomorphism network as the architecture of the GNN. The GNN is pretrained on the METLIN-SMRT data set and is then fine-tuned on the target training data set for a fixed number of training iterations using the limited-memory Broyden-Fletcher-Goldfarb-Shanno optimizer with a learning rate decay. We demonstrate that the proposed method achieves superior predictive performance on various chromatographic systems compared with that of the existing transfer learning methods, especially when only a small training data set is available for use. A potential avenue for future research is to leverage multiple small training data sets from different chromatographic systems to further enhance the generalization performance.
Collapse
Affiliation(s)
- Youngchun Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Hyukju Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
- Department of Chemistry, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Jongmin Han
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Myeonginn Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Ji-Yeong Kim
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Dongyeeb Shin
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Youn-Suk Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Seokho Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| |
Collapse
|
20
|
Kang Q, Fang P, Zhang S, Qiu H, Lan Z. Deep graph convolutional network for small-molecule retention time prediction. J Chromatogr A 2023; 1711:464439. [PMID: 37865024 DOI: 10.1016/j.chroma.2023.464439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/04/2023] [Accepted: 10/06/2023] [Indexed: 10/23/2023]
Abstract
The retention time (RT) is a crucial source of data for liquid chromatography-mass spectrometry (LCMS). A model that can accurately predict the RT for each molecule would empower filtering candidates with similar spectra but differing RT in LCMS-based molecule identification. Recent research shows that graph neural networks (GNNs) outperform traditional machine learning algorithms in RT prediction. However, all of these models use relatively shallow GNNs. This study for the first time investigates how depth affects GNNs' performance on RT prediction. The results demonstrate that a notable improvement can be achieved by pushing the depth of GNNs to 16 layers by the adoption of residual connection. Additionally, we also find that graph convolutional network (GCN) model benefits from the edge information. The developed deep graph convolutional network, DeepGCN-RT, significantly outperforms the previous state-of-the-art method and achieves the lowest mean absolute percentage error (MAPE) of 3.3% and the lowest mean absolute error (MAE) of 26.55 s on the SMRT test set. We also finetune DeepGCN-RT on seven datasets with various chromatographic conditions. The mean MAE of the seven datasets largely decreases 30% compared to previous state-of-the-art method. On the RIKEN-PlaSMA dataset, we also test the effectiveness of DeepGCN-RT in assisting molecular structure identification. By 30% lessening the number of potential structures, DeepGCN-RT is able to improve top-1 accuracy by about 11%.
Collapse
Affiliation(s)
- Qiyue Kang
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China.
| | - Pengfei Fang
- School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, 210096, China
| | - Shuai Zhang
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China
| | - Huachuan Qiu
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China
| | - Zhenzhong Lan
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China.
| |
Collapse
|
21
|
Clarke ED, Ferguson JJ, Stanford J, Collins CE. Dietary Assessment and Metabolomic Methodologies in Human Feeding Studies: A Scoping Review. Adv Nutr 2023; 14:1453-1465. [PMID: 37604308 PMCID: PMC10721540 DOI: 10.1016/j.advnut.2023.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 05/01/2023] [Accepted: 08/16/2023] [Indexed: 08/23/2023] Open
Abstract
Dietary metabolomics is a relatively objective approach to identifying new biomarkers of dietary intake and for use alongside traditional methods. However, methods used across dietary feeding studies vary, thus making it challenging to compare results. The objective of this study was to synthesize methodological components of controlled human feeding studies designed to quantify the diet-related metabolome in biospecimens, including plasma, serum, and urine after dietary interventions. Six electronic databases were searched. Included studies were as follows: 1) conducted in healthy adults; 2) intervention studies; 3) feeding studies focusing on dietary patterns; and 4) measured the dietary metabolome. From 12,425 texts, 50 met all inclusion criteria. Interventions were primarily crossover (n = 25) and parallel randomized controlled trials (n = 22), with between 8 and 395 participants. Seventeen different dietary patterns were tested, with the most common being the "High versus Low-Glycemic Index/Load" pattern (n = 11) and "Typical Country Intake" (n = 11); with 32 providing all or the majority (90%) of food, 16 providing some food, and 2 providing no food. Metabolites were identified in urine (n = 31) and plasma/serum (n = 30). Metabolites were quantified using liquid chromatography, mass spectroscopy (n = 31) and used untargeted metabolomics (n = 37). There was extensive variability in the methods used in controlled human feeding studies examining the metabolome, including dietary patterns tested, biospecimen sample collection, and metabolomic analysis techniques. To improve the comparability and reproducibility of controlled human feeding studies examining the metabolome, it is important to provide detailed information about the dietary interventions being tested, including information about included or restricted foods, food groups, and meal plans provided. Strategies to control for individual variability, such as a crossover study design, statistical adjustment methods, dietary-controlled run-in periods, or providing standardized meals or test foods throughout the study should also be considered. The protocol for this review has been registered at Open Science Framework (https://doi.org/10.17605/OSF.IO/DAHGS).
Collapse
Affiliation(s)
- Erin D Clarke
- School of Health Sciences, College of Health Medicine and Wellbeing, The University of Newcastle, Callaghan, NSW, Australia; Food and Nutrition Research Program, Hunter Medical Research Institute, New Lambton Heights, NSW, Australia
| | - Jessica Ja Ferguson
- School of Health Sciences, College of Health Medicine and Wellbeing, The University of Newcastle, Callaghan, NSW, Australia; Food and Nutrition Research Program, Hunter Medical Research Institute, New Lambton Heights, NSW, Australia
| | - Jordan Stanford
- School of Health Sciences, College of Health Medicine and Wellbeing, The University of Newcastle, Callaghan, NSW, Australia; Food and Nutrition Research Program, Hunter Medical Research Institute, New Lambton Heights, NSW, Australia
| | - Clare E Collins
- School of Health Sciences, College of Health Medicine and Wellbeing, The University of Newcastle, Callaghan, NSW, Australia; Food and Nutrition Research Program, Hunter Medical Research Institute, New Lambton Heights, NSW, Australia.
| |
Collapse
|
22
|
Chen B, Wang C, Fu Z, Yu H, Liu E, Gao X, Li J, Han L. RT-Ensemble Pred: A tool for retention time prediction of metabolites on different LC-MS systems. J Chromatogr A 2023; 1707:464304. [PMID: 37611386 DOI: 10.1016/j.chroma.2023.464304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 08/15/2023] [Indexed: 08/25/2023]
Abstract
Liquid chromatography-mass spectrometry (LC-MS) could provide a large amount of information to assist in metabolites identification. Different liquid chromatographic methods (CMs) could produce different retention times to the same metabolite. To predict the retention time of local dataset by online datasets has become a trend, but the datasets downloaded from different databases were differences in quantity levels. And the imbalanced data could produce bad influence in model prediction. Thus, based on quantitative structure-retention relationships (QSRRs), an ensemble model, named RT-Ensemble Pred, has been successfully built to predict retention time of different LC-MS systems in this study. A total of 76, 807 metabolites (76, 909 retention times) have been collected across 9 CMs, and 19 natural products and 1 antifungal drug (20 retention times) have been collected to test the model applicability. An ensemble sampling was applied for the preprocessing procedure to solve the problem of imbalanced data. Based on the ensemble sampling, RT-Ensemble Pred could better utilize online datasets for the prediction of retention time. RT-Ensemble Pred was built based on the online datasets and tested by local dataset. The predictive accuracy of RT-Ensemble Pred was higher than the models without any sampling methods. The results showed that RT-Ensemble Pred could predict the metabolites which was not included in the database and the metabolites which were from new CMs. It could also be used for the prediction of other compounds beside metabolites. Furthermore, a tool of RT-Ensemble Pred was packed and can be freely downloaded at https://gitlab.com/mikic93/rt-ensemble-pred. It provides convenience for the users who need to predict the retention time of metabolites.
Collapse
Affiliation(s)
- Biying Chen
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Jinghai, Tianjin 301617, PR China
| | - Chenxi Wang
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Jinghai, Tianjin 301617, PR China
| | - Zhifei Fu
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Jinghai, Tianjin 301617, PR China
| | - Haiyang Yu
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Jinghai, Tianjin 301617, PR China
| | - Erwei Liu
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Jinghai, Tianjin 301617, PR China
| | - Xiumei Gao
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Jinghai, Tianjin 301617, PR China
| | - Jie Li
- Tianjin Key Laboratory of Clinical Multi-omics, Airport Economy Zone, Tianjin, China.
| | - Lifeng Han
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Jinghai, Tianjin 301617, PR China.
| |
Collapse
|
23
|
Trostel L, Coll C, Fenner K, Hafner J. Combining predictive and analytical methods to elucidate pharmaceutical biotransformation in activated sludge. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2023; 25:1322-1336. [PMID: 37539453 DOI: 10.1039/d3em00161j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
While man-made chemicals in the environment are ubiquitous and a potential threat to human health and ecosystem integrity, the environmental fate of chemical contaminants such as pharmaceuticals is often poorly understood. Biodegradation processes driven by microbial communities convert chemicals into transformation products (TPs) that may themselves have adverse ecological effects. The detection of TPs formed during biodegradation has been continuously improved thanks to the development of TP prediction algorithms and analytical workflows. Here, we contribute to this advance by (i) reviewing past applications of TP identification workflows, (ii) applying an updated workflow for TP prediction to 42 pharmaceuticals in biodegradation experiments with activated sludge, and (iii) benchmarking 5 different pathway prediction models, comprising 4 prediction models trained on different datasets provided by enviPath, and the state-of-the-art EAWAG pathway prediction system. Using the updated workflow, we could tentatively identify 79 transformation products for 31 pharmaceutical compounds. Compared to previous works, we have further automatized several steps that were previously performed by hand. By benchmarking the enviPath prediction system on experimental data, we demonstrate the usefulness of the pathway prediction tool to generate suspect lists for screening, and we propose new avenues to improve their accuracy. Moreover, we provide a well-documented workflow that can be (i) readily applied to detect transformation products in activated sludge and (ii) potentially extended to other environmental studies.
Collapse
Affiliation(s)
- Leo Trostel
- Department of Environmental Chemistry, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf, 8600, Zürich, Switzerland.
| | - Claudia Coll
- Department of Environmental Chemistry, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf, 8600, Zürich, Switzerland.
| | - Kathrin Fenner
- Department of Environmental Chemistry, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf, 8600, Zürich, Switzerland.
- Department of Chemistry, University of Zürich, 8057 Zürich, Switzerland
| | - Jasmin Hafner
- Department of Environmental Chemistry, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf, 8600, Zürich, Switzerland.
- Department of Chemistry, University of Zürich, 8057 Zürich, Switzerland
| |
Collapse
|
24
|
Karunaratne E, Hill DW, Dührkop K, Böcker S, Grant DF. Combining Experimental with Computational Infrared and Mass Spectra for High-Throughput Nontargeted Chemical Structure Identification. Anal Chem 2023; 95:11901-11907. [PMID: 37540774 DOI: 10.1021/acs.analchem.3c00937] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/06/2023]
Abstract
The inability to identify the structures of most metabolites detected in environmental or biological samples limits the utility of nontargeted metabolomics. The most widely used analytical approaches combine mass spectrometry and machine learning methods to rank candidate structures contained in large chemical databases. Given the large chemical space typically searched, the use of additional orthogonal data may improve the identification rates and reliability. Here, we present results of combining experimental and computational mass and IR spectral data for high-throughput nontargeted chemical structure identification. Experimental MS/MS and gas-phase IR data for 148 test compounds were obtained from NIST. Candidate structures for each of the test compounds were obtained from PubChem (mean = 4444 candidate structures per test compound). Our workflow used CSI:FingerID to initially score and rank the candidate structures. The top 1000 ranked candidates were subsequently used for IR spectra prediction, scoring, and ranking using density functional theory (DFT-IR). Final ranking of the candidates was based on a composite score calculated as the average of the CSI:FingerID and DFT-IR rankings. This approach resulted in the correct identification of 88 of the 148 test compounds (59%). 129 of the 148 test compounds (87%) were ranked within the top 20 candidates. These identification rates are the highest yet reported when candidate structures are used from PubChem. Combining experimental and computational MS/MS and IR spectral data is a potentially powerful option for prioritizing candidates for final structure verification.
Collapse
Affiliation(s)
- Erandika Karunaratne
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut 06269, United States
| | - Dennis W Hill
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut 06269, United States
| | - Kai Dührkop
- Chair for Bioinformatics, Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena 07743, Germany
| | - Sebastian Böcker
- Chair for Bioinformatics, Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena 07743, Germany
| | - David F Grant
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut 06269, United States
| |
Collapse
|
25
|
Muhamadali H, Winder CL, Dunn WB, Goodacre R. Unlocking the secrets of the microbiome: exploring the dynamic microbial interplay with humans through metabolomics and their manipulation for synthetic biology applications. Biochem J 2023; 480:891-908. [PMID: 37378961 PMCID: PMC10317162 DOI: 10.1042/bcj20210534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 06/12/2023] [Accepted: 06/16/2023] [Indexed: 06/29/2023]
Abstract
Metabolomics is a powerful research discovery tool with the potential to measure hundreds to low thousands of metabolites. In this review, we discuss the application of GC-MS and LC-MS in discovery-based metabolomics research, we define metabolomics workflows and we highlight considerations that need to be addressed in order to generate robust and reproducible data. We stress that metabolomics is now routinely applied across the biological sciences to study microbiomes from relatively simple microbial systems to their complex interactions within consortia in the host and the environment and highlight this in a range of biological species and mammalian systems including humans. However, challenges do still exist that need to be overcome to maximise the potential for metabolomics to help us understanding biological systems. To demonstrate the potential of the approach we discuss the application of metabolomics in two broad research areas: (1) synthetic biology to increase the production of high-value fine chemicals and reduction in secondary by-products and (2) gut microbial interaction with the human host. While burgeoning in importance, the latter is still in its infancy and will benefit from the development of tools to detangle host-gut-microbial interactions and their impact on human health and diseases.
Collapse
Affiliation(s)
- Howbeer Muhamadali
- Centre for Metabolomics Research, Department of Biochemistry, Cell and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, U.K
| | - Catherine L. Winder
- Centre for Metabolomics Research, Department of Biochemistry, Cell and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, U.K
| | - Warwick B. Dunn
- Centre for Metabolomics Research, Department of Biochemistry, Cell and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, U.K
| | - Royston Goodacre
- Centre for Metabolomics Research, Department of Biochemistry, Cell and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, U.K
| |
Collapse
|
26
|
Wei Y, Sun Y, Jia S, Yan P, Xiong C, Qi M, Wang C, Du Z, Jiang H. Identification of endogenous carbonyl steroids in human serum by chemical derivatization, hydrogen/deuterium exchange mass spectrometry and the quantitative structure-retention relationship. J Chromatogr B Analyt Technol Biomed Life Sci 2023; 1226:123776. [PMID: 37311272 DOI: 10.1016/j.jchromb.2023.123776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 05/09/2023] [Accepted: 05/30/2023] [Indexed: 06/15/2023]
Abstract
Steroids are tetracyclic aliphatic compounds, and most of them contain carbonyl groups. The disordered homeostasis of steroids is closely related to the occurrence and progression of various diseases. Due to high structural similarity, low concentrations in vivo, poor ionization efficiency, and interference from endogenous substances, it is very challenging to comprehensively and unambiguously identify endogenous steroids in biological matrix. Herein, an integrated strategy was developed for the characterization of endogenous steroids in serum based on chemical derivatization, ultra-performance liquid chromatography quadrupole Exactive mass spectrometry (UPLC-Q-Exactive-MS/MS), hydrogen/deuterium (H/D) exchange, and a quantitative structure-retention relationship (QSRR) model. To enhance the mass spectrometry (MS) response of carbonyl steroids, the ketonic carbonyl group was derivatized by Girard T (GT). Firstly, the fragmentation rules of derivatized carbonyl steroid standards by GT were summarized. Then, carbonyl steroids in serum were derivatized by GT and identified based on the fragmentation rules or by comparing retention time and MS/MS spectra with those of standards. H/D exchange MS was utilized to distinguish derivatized steroid isomers for the first time. Finally, a QSRR model was constructed to predict the retention time of the unknown steroid derivatives. With this strategy, 93 carbonyl steroids were identified from human serum, and 30 of them were determined to be dicarbonyl steroids by the charge number of characteristic ions and the number of exchangeable hrdrogen or comparing with standards. The QSRR model built by the machine learning algorithms has an excellent regression correlation, thus the accurate structures of 14 carbonyl steroids were determined, among which three steroids were reported for the first time in human serum. This study provides a new analytical method for the comprehensive and reliable identification of carbonyl steroids in biological matrix.
Collapse
Affiliation(s)
- Yinyu Wei
- Tongji School of Pharmacy, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Yi Sun
- Tongji School of Pharmacy, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Shuailong Jia
- Department of Pharmacy, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, 430030 Wuhan, China
| | - Pan Yan
- Department of Pharmacy, The Affiliated Changsha Central Hospital, Hengyang Medical School, University of South China, Changsha 410028, China
| | - Chaomei Xiong
- Tongji School of Pharmacy, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Meiling Qi
- Tongji School of Pharmacy, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Chenxi Wang
- Tongji School of Pharmacy, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Zhifeng Du
- Tongji School of Pharmacy, Huazhong University of Science and Technology, Wuhan 430030, China.
| | - Hongliang Jiang
- Tongji School of Pharmacy, Huazhong University of Science and Technology, Wuhan 430030, China.
| |
Collapse
|
27
|
Wang X, Zheng F, Sheng M, Xu G, Lin X. Retention time prediction for small samples based on integrating molecular representations and adaptive network. J Chromatogr B Analyt Technol Biomed Life Sci 2023; 1217:123624. [PMID: 36780745 DOI: 10.1016/j.jchromb.2023.123624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Revised: 01/13/2023] [Accepted: 01/27/2023] [Indexed: 02/07/2023]
Abstract
Retention time (RT) can provide orthogonal information different from that of mass spectrometry and contribute to identifying compounds. Many machine learning methods have been developed and applied to RT prediction. In application, the training data size is usually small in most chromatography systems. To enhance the performance of RT prediction, this study proposes a RT prediction method based on multi-data combinations and adaptive neural network (MDC-ANN). MDC-ANN establishes the RT prediction model for the target chromatographic system through transfer learning and a base deep learning model trained on a big dataset. It selects the optimal molecular representation combination from the multiple input candidates and automatically determines the neural network structure according to the determined input combination. MDC-ANN was compared with two new efficient deep learning methods, three transferring methods and four popular machine learning methods on 14 small datasets and showed advantages in MAE, MedAE, MRE and R2 in most cases. The experiment results illustrated that integrating multiple molecular representations can provide more information, improve the performance of RT prediction and contribute to compound annotation, different chromatographic systems may use different molecular representation combinations to obtain good RT prediction performance. Hence, MDC-ANN which automatically determines the best combination of molecular representations for a specific system is promising for predicting RTs accurately in real applications.
Collapse
Affiliation(s)
- Xiaoxiao Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Fujian Zheng
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, Liaoning, China.
| | - Meizhen Sheng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, Liaoning, China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China.
| |
Collapse
|
28
|
Lenski M, Maallem S, Zarcone G, Garçon G, Lo-Guidice JM, Anthérieu S, Allorge D. Prediction of a Large-Scale Database of Collision Cross-Section and Retention Time Using Machine Learning to Reduce False Positive Annotations in Untargeted Metabolomics. Metabolites 2023; 13:metabo13020282. [PMID: 36837901 PMCID: PMC9962007 DOI: 10.3390/metabo13020282] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 02/07/2023] [Accepted: 02/12/2023] [Indexed: 02/18/2023] Open
Abstract
Metabolite identification in untargeted metabolomics is complex, with the risk of false positive annotations. This work aims to use machine learning to successively predict the retention time (Rt) and the collision cross-section (CCS) of an open-access database to accelerate the interpretation of metabolomic results. Standards of metabolites were tested using liquid chromatography coupled with high-resolution mass spectrometry. In CCSBase and QSRR predictor machine learning models, experimental results were used to generate predicted CCS and Rt of the Human Metabolome Database. From 542 standards, 266 and 301 compounds were detected in positive and negative electrospray ionization mode, respectively, corresponding to 380 different metabolites. CCS and Rt were then predicted using machine learning tools for almost 114,000 metabolites. R2 score of the linear regression between predicted and measured data achieved 0.938 and 0.898 for CCS and Rt, respectively, demonstrating the models' reliability. A CCS and Rt index filter of mean error ± 2 standard deviations could remove most misidentifications. Its application to data generated from a toxicology study on tobacco cigarettes reduced hits by 76%. Regarding the volume of data produced by metabolomics, the practical workflow provided allows for the implementation of valuable large-scale databases to improve the biological interpretation of metabolomics data.
Collapse
Affiliation(s)
- Marie Lenski
- ULR 4483, IMPECS—IMPact de l’Environnement Chimique sur la Santé humaine, CHU Lille, Institut Pasteur de Lille, Université de Lille, F-59000 Lille, France
- CHU Lille, Unité Fonctionnelle de Toxicologie, F-59037 Lille, France
- Correspondence:
| | - Saïd Maallem
- ULR 4483, IMPECS—IMPact de l’Environnement Chimique sur la Santé humaine, CHU Lille, Institut Pasteur de Lille, Université de Lille, F-59000 Lille, France
| | - Gianni Zarcone
- ULR 4483, IMPECS—IMPact de l’Environnement Chimique sur la Santé humaine, CHU Lille, Institut Pasteur de Lille, Université de Lille, F-59000 Lille, France
| | - Guillaume Garçon
- ULR 4483, IMPECS—IMPact de l’Environnement Chimique sur la Santé humaine, CHU Lille, Institut Pasteur de Lille, Université de Lille, F-59000 Lille, France
| | - Jean-Marc Lo-Guidice
- ULR 4483, IMPECS—IMPact de l’Environnement Chimique sur la Santé humaine, CHU Lille, Institut Pasteur de Lille, Université de Lille, F-59000 Lille, France
| | - Sébastien Anthérieu
- ULR 4483, IMPECS—IMPact de l’Environnement Chimique sur la Santé humaine, CHU Lille, Institut Pasteur de Lille, Université de Lille, F-59000 Lille, France
| | - Delphine Allorge
- ULR 4483, IMPECS—IMPact de l’Environnement Chimique sur la Santé humaine, CHU Lille, Institut Pasteur de Lille, Université de Lille, F-59000 Lille, France
- CHU Lille, Unité Fonctionnelle de Toxicologie, F-59037 Lille, France
| |
Collapse
|
29
|
Hissong R, Evans KR, Evans CR. Compound Identification Strategies in Mass Spectrometry-Based Metabolomics and Pharmacometabolomics. Handb Exp Pharmacol 2023; 277:43-71. [PMID: 36409330 DOI: 10.1007/164_2022_617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The metabolome is composed of a vast array of molecules, including endogenous metabolites and lipids, diet- and microbiome-derived substances, pharmaceuticals and supplements, and exposome chemicals. Correct identification of compounds from this diversity of classes is essential to derive biologically relevant insights from metabolomics data. In this chapter, we aim to provide a practical overview of compound identification strategies for mass spectrometry-based metabolomics, with a particular eye toward pharmacologically-relevant studies. First, we describe routine compound identification strategies applicable to targeted metabolomics. Next, we discuss both experimental (data acquisition-focused) and computational (software-focused) strategies used to identify unknown compounds in untargeted metabolomics data. We then discuss the importance of, and methods for, assessing and reporting the level of confidence of compound identifications. Throughout the chapter, we discuss how these steps can be implemented using today's technology, but also highlight research underway to further improve accuracy and certainty of compound identification. For readers interested in interpreting metabolomics data already collected, this chapter will supply important context regarding the origin of the metabolite names assigned to features in the data and help them assess the certainty of the identifications. For those planning new data acquisition, the chapter supplies guidance for designing experiments and selecting analysis methods to enable accurate compound identification, and it will point the reader toward best-practice data analysis and reporting strategies to allow sound biological and pharmacological interpretation.
Collapse
|
30
|
Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00577-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
AbstractStructural annotation of small molecules in biological samples remains a key bottleneck in untargeted metabolomics, despite rapid progress in predictive methods and tools during the past decade. Liquid chromatography–tandem mass spectrometry, one of the most widely used analysis platforms, can detect thousands of molecules in a sample, the vast majority of which remain unidentified even with best-of-class methods. Here we present LC-MS2Struct, a machine learning framework for structural annotation of small-molecule data arising from liquid chromatography–tandem mass spectrometry (LC-MS2) measurements. LC-MS2Struct jointly predicts the annotations for a set of mass spectrometry features in a sample, using a novel structured prediction model trained to optimally combine the output of state-of-the-art MS2 scorers and observed retention orders. We evaluate our method on a dataset covering all publicly available reversed-phase LC-MS2 data in the MassBank reference database, including 4,327 molecules measured using 18 different LC conditions from 16 contributors, greatly expanding the chemical analytical space covered in previous multi-MS2 scorer evaluations. LC-MS2Struct obtains significantly higher annotation accuracy than earlier methods and improves the annotation accuracy of state-of-the-art MS2 scorers by up to 106%. The use of stereochemistry-aware molecular fingerprints improves prediction performance, which highlights limitations in existing approaches and has strong implications for future computational LC-MS2 developments.
Collapse
|
31
|
Mass Spectrometric Methods for Non-Targeted Screening of Metabolites: A Future Perspective for the Identification of Unknown Compounds in Plant Extracts. SEPARATIONS 2022. [DOI: 10.3390/separations9120415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Phyto products are widely used in natural products, such as medicines, cosmetics or as so-called “superfoods”. However, the exact metabolite composition of these products is still unknown, due to the time-consuming process of metabolite identification. Non-target screening by LC-HRMS/MS could be a technique to overcome these problems with its capacity to identify compounds based on their retention time, accurate mass and fragmentation pattern. In particular, the use of computational tools, such as deconvolution algorithms, retention time prediction, in silico fragmentation and sophisticated search algorithms, for comparison of spectra similarity with mass spectral databases facilitate researchers to conduct a more exhaustive profiling of metabolic contents. This review aims to provide an overview of various techniques and tools for non-target screening of phyto samples using LC-HRMS/MS.
Collapse
|
32
|
Celma A, Bade R, Sancho JV, Hernandez F, Humphries M, Bijlsma L. Prediction of Retention Time and Collision Cross Section (CCS H+, CCS H-, and CCS Na+) of Emerging Contaminants Using Multiple Adaptive Regression Splines. J Chem Inf Model 2022; 62:5425-5434. [PMID: 36280383 PMCID: PMC9709913 DOI: 10.1021/acs.jcim.2c00847] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Ultra-high performance liquid chromatography coupled to ion mobility separation and high-resolution mass spectrometry instruments have proven very valuable for screening of emerging contaminants in the aquatic environment. However, when applying suspect or nontarget approaches (i.e., when no reference standards are available), there is no information on retention time (RT) and collision cross-section (CCS) values to facilitate identification. In silico prediction tools of RT and CCS can therefore be of great utility to decrease the number of candidates to investigate. In this work, Multiple Adaptive Regression Splines (MARS) were evaluated for the prediction of both RT and CCS. MARS prediction models were developed and validated using a database of 477 protonated molecules, 169 deprotonated molecules, and 249 sodium adducts. Multivariate and univariate models were evaluated showing a better fit for univariate models to the experimental data. The RT model (R2 = 0.855) showed a deviation between predicted and experimental data of ±2.32 min (95% confidence intervals). The deviation observed for CCS data of protonated molecules using the CCSH model (R2 = 0.966) was ±4.05% with 95% confidence intervals. The CCSH model was also tested for the prediction of deprotonated molecules, resulting in deviations below ±5.86% for the 95% of the cases. Finally, a third model was developed for sodium adducts (CCSNa, R2 = 0.954) with deviation below ±5.25% for 95% of the cases. The developed models have been incorporated in an open-access and user-friendly online platform which represents a great advantage for third-party research laboratories for predicting both RT and CCS data.
Collapse
Affiliation(s)
- Alberto Celma
- Environmental
and Public Health Analytical
Chemistry, Research Institute for Pesticides
and Water, University Jaume I, E-12071Castelló, Spain,Department
of Aquatic Sciences and Assessment, Swedish
University of Agricultural Sciences (SLU), SE-750 07Uppsala, Sweden
| | - Richard Bade
- University
of South Australia, Adelaide, UniSA: Clinical and Health Sciences,
Health and Biomedical Innovation, AdelaideSA-5000, South
Australia, Australia,Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, 20 Cornwall Street, WoolloongabbaAUS-4102, Queensland, Australia
| | - Juan Vicente Sancho
- Environmental
and Public Health Analytical
Chemistry, Research Institute for Pesticides
and Water, University Jaume I, E-12071Castelló, Spain
| | - Félix Hernandez
- Environmental
and Public Health Analytical
Chemistry, Research Institute for Pesticides
and Water, University Jaume I, E-12071Castelló, Spain
| | - Melissa Humphries
- School
of Mathematical Sciences, University of
Adelaide, Ingkarni Wardli Building, North Terrace Campus, SA-5005Adelaide, Australia,
| | - Lubertus Bijlsma
- Environmental
and Public Health Analytical
Chemistry, Research Institute for Pesticides
and Water, University Jaume I, E-12071Castelló, Spain,
| |
Collapse
|
33
|
Harrieder EM, Kretschmer F, Dunn W, Böcker S, Witting M. Critical assessment of chromatographic metadata in publicly available metabolomics data repositories. Metabolomics 2022; 18:97. [PMID: 36436113 PMCID: PMC9701651 DOI: 10.1007/s11306-022-01956-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 11/11/2022] [Indexed: 11/28/2022]
Abstract
INTRODUCTION The structural identification of metabolites represents one of the current bottlenecks in non-targeted liquid chromatography-mass spectrometry (LC-MS) based metabolomics. The Metabolomics Standard Initiative has developed a multilevel system to report confidence in metabolite identification, which involves the use of MS, MS/MS and orthogonal data. Limitations due to similar or same fragmentation pattern (e.g. isomeric compounds) can be overcome by the additional orthogonal information of the retention time (RT), since it is a system property that is different for each chromatographic setup. OBJECTIVES In contrast to MS data, sharing of RT data is not as widespread. The quality of data and its (re-)useability depend very much on the quality of the metadata. We aimed to evaluate the coverage and quality of this metadata from public metabolomics repositories. METHODS We acquired an overview on the current reporting of chromatographic separation conditions. For this purpose, we defined the following information as important details that have to be provided: column name and dimension, flow rate, temperature, composition of eluents and gradient. RESULTS We found that 70% of descriptions of the chromatographic setups are incomplete (according to our definition) and an additional 10% of the descriptions contained ambiguous and/or incorrect information. Accordingly, only about 20% of the descriptions allow further (re-)use of the data, e.g. for RT prediction. Therefore, we have started to develop a unified and standardized notation for chromatographic metadata with detailed and specific description of eluents, columns and gradients. CONCLUSION Reporting of chromatographic metadata is currently not unified. Our recommended suggestions for metadata reporting will enable more standardization and automatization in future reporting.
Collapse
Affiliation(s)
- Eva-Maria Harrieder
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Fleming Kretschmer
- Chair of Bioinformatics, Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2, 07743, Jena, Germany
| | - Warwick Dunn
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular, and Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - Sebastian Böcker
- Chair of Bioinformatics, Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2, 07743, Jena, Germany
| | - Michael Witting
- Metabolomics and Proteomics Core, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.
- Chair of Analytical Food Chemistry, TUM School of Life Sciences, Technical University of Munich, Maximus-Von-Imhof-Forum 2, 85354, Freising, Germany.
| |
Collapse
|
34
|
Bittremieux W, Wang M, Dorrestein PC. The critical role that spectral libraries play in capturing the metabolomics community knowledge. Metabolomics 2022; 18:94. [PMID: 36409434 PMCID: PMC10284100 DOI: 10.1007/s11306-022-01947-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 10/19/2022] [Indexed: 11/22/2022]
Abstract
BACKGROUND Spectral library searching is currently the most common approach for compound annotation in untargeted metabolomics. Spectral libraries applicable to liquid chromatography mass spectrometry have grown in size over the past decade to include hundreds of thousands to millions of mass spectra and tens of thousands of compounds, forming an essential knowledge base for the interpretation of metabolomics experiments. AIM OF REVIEW We describe existing spectral library resources, highlight different strategies for compiling spectral libraries, and discuss quality considerations that should be taken into account when interpreting spectral library searching results. Finally, we describe how spectral libraries are empowering the next generation of machine learning tools in computational metabolomics, and discuss several opportunities for using increasingly accessible large spectral libraries. KEY SCIENTIFIC CONCEPTS OF REVIEW This review focuses on the current state of spectral libraries for untargeted LC-MS/MS based metabolomics. We show how the number of entries in publicly accessible spectral libraries has increased more than 60-fold in the past eight years to aid molecular interpretation and we discuss how the role of spectral libraries in untargeted metabolomics will evolve in the near future.
Collapse
Affiliation(s)
- Wout Bittremieux
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
| | - Mingxun Wang
- Department of Computer Science, University of California Riverside, Riverside, CA, 92507, USA
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA.
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
35
|
Aalizadeh R, Nikolopoulou V, Thomaidis NS. Development of Liquid Chromatographic Retention Index Based on Cocamide Diethanolamine Homologous Series (C( n)-DEA). Anal Chem 2022; 94:15987-15996. [DOI: 10.1021/acs.analchem.2c02893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Reza Aalizadeh
- Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15771, Athens, Greece
| | - Varvara Nikolopoulou
- Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15771, Athens, Greece
| | - Nikolaos S. Thomaidis
- Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15771, Athens, Greece
| |
Collapse
|
36
|
Retention Time Prediction with Message-Passing Neural Networks. SEPARATIONS 2022. [DOI: 10.3390/separations9100291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023] Open
Abstract
Retention time prediction, facilitated by advances in machine learning, has become a useful tool in untargeted LC-MS applications. State-of-the-art approaches include graph neural networks and 1D-convolutional neural networks that are trained on the METLIN small molecule retention time dataset (SMRT). These approaches demonstrate accurate predictions comparable with the experimental error for the training set. The weak point of retention time prediction approaches is the transfer of predictions to various systems. The accuracy of this step depends both on the method of mapping and on the accuracy of the general model trained on SMRT. Therefore, improvements to both parts of prediction workflows may lead to improved compound annotations. Here, we evaluate capabilities of message-passing neural networks (MPNN) that have demonstrated outstanding performance on many chemical tasks to accurately predict retention times. The model was initially trained on SMRT, providing mean and median absolute cross-validation errors of 32 and 16 s, respectively. The pretrained MPNN was further fine-tuned on five publicly available small reversed-phase retention sets in a transfer learning mode and demonstrated up to 30% improvement of prediction accuracy for these sets compared with the state-of-the-art methods. We demonstrated that filtering isomeric candidates by predicted retention with the thresholds obtained from ROC curves eliminates up to 50% of false identities.
Collapse
|
37
|
García CA, Gil-de-la-Fuente A, Barbas C, Otero A. Probabilistic metabolite annotation using retention time prediction and meta-learned projections. J Cheminform 2022; 14:33. [PMID: 35672784 PMCID: PMC9172150 DOI: 10.1186/s13321-022-00613-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 05/20/2022] [Indexed: 12/31/2022] Open
Abstract
Retention time information is used for metabolite annotation in metabolomic experiments. But its usefulness is hindered by the availability of experimental retention time data in metabolomic databases, and by the lack of reproducibility between different chromatographic methods. Accurate prediction of retention time for a given chromatographic method would be a valuable support for metabolite annotation. We have trained state-of-the-art machine learning regressors using the 80, 038 experimental retention times from the METLIN Small Molecule Retention Tim (SMRT) dataset. The models included deep neural networks, deep kernel learning, several gradient boosting models, and a blending approach. 5, 666 molecular descriptors and 2, 214 fingerprints (MACCS166, Extended Connectivity, and Path Fingerprints fingerprints) were generated with the alvaDesc software. The models were trained using only the descriptors, only the fingerprints, and both types of features simultaneously. Bayesian hyperparameter search was used for parameter tuning. To avoid data-leakage when reporting the performance metrics, nested cross-validation was employed. The best results were obtained by a heavily regularized deep neural network trained with cosine annealing warm restarts and stochastic weight averaging, achieving a mean and median absolute errors of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$39.2 \pm 1.2\; s$$\end{document}39.2±1.2s and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$17.2 \pm 0.9\;s$$\end{document}17.2±0.9s, respectively. To the best of our knowledge, these are the most accurate predictions published up to date over the SMRT dataset. To project retention times between chromatographic methods, a novel Bayesian meta-learning approach that can learn from just a few molecules is proposed. By applying this projection between the deep neural network retention time predictions and a given chromatographic method, our approach can be integrated into a metabolite annotation workflow to obtain z-scores for the candidate annotations. To this end, it is enough that just as few as 10 molecules of a given experiment have been identified (probably by using pure metabolite standards). The use of z-scores permits considering the uncertainty in the projection when ranking candidates, and not only the accuracy. In this scenario, our results show that in 68% of the cases the correct molecule was among the top three candidates filtered by mass and ranked according to z-scores. This shows the usefulness of this information to support metabolite annotation. Python code is available on GitHub at https://github.com/constantino-garcia/cmmrt.
Collapse
|
38
|
Popov RS, Ivanchina NV, Dmitrenok PS. Application of MS-Based Metabolomic Approaches in Analysis of Starfish and Sea Cucumber Bioactive Compounds. Mar Drugs 2022; 20:320. [PMID: 35621972 PMCID: PMC9147407 DOI: 10.3390/md20050320] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 05/11/2022] [Accepted: 05/11/2022] [Indexed: 12/12/2022] Open
Abstract
Today, marine natural products are considered one of the main sources of compounds for drug development. Starfish and sea cucumbers are potential sources of natural products of pharmaceutical interest. Among their metabolites, polar steroids, triterpene glycosides, and polar lipids have attracted a great deal of attention; however, studying these compounds by conventional methods is challenging. The application of modern MS-based approaches can help to obtain valuable information about such compounds. This review provides an up-to-date overview of MS-based applications for starfish and sea cucumber bioactive compounds analysis. While describing most characteristic features of MS-based approaches in the context of starfish and sea cucumber metabolites, including sample preparation and MS analysis steps, the present paper mainly focuses on the application of MS-based metabolic profiling of polar steroid compounds, triterpene glycosides, and lipids. The application of MS in metabolomics studies is also outlined.
Collapse
Affiliation(s)
- Roman S. Popov
- G.B. Elyakov Pacific Institute of Bioorganic Chemistry, Far Eastern Branch of Russian Academy of Sciences, 159 Prospect 100-let Vladivostoku, Vladivostok 690022, Russia;
| | | | - Pavel S. Dmitrenok
- G.B. Elyakov Pacific Institute of Bioorganic Chemistry, Far Eastern Branch of Russian Academy of Sciences, 159 Prospect 100-let Vladivostoku, Vladivostok 690022, Russia;
| |
Collapse
|
39
|
Klingberg J, Keen B, Cawley A, Pasin D, Fu S. Developments in high-resolution mass spectrometric analyses of new psychoactive substances. Arch Toxicol 2022; 96:949-967. [PMID: 35141767 PMCID: PMC8921034 DOI: 10.1007/s00204-022-03224-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 01/12/2022] [Indexed: 11/17/2022]
Abstract
The proliferation of new psychoactive substances (NPS) has necessitated the development and improvement of current practices for the detection and identification of known NPS and newly emerging derivatives. High-resolution mass spectrometry (HRMS) is quickly becoming the industry standard for these analyses due to its ability to be operated in data-independent acquisition (DIA) modes, allowing for the collection of large amounts of data and enabling retrospective data interrogation as new information becomes available. The increasing popularity of HRMS has also prompted the exploration of new ways to screen for NPS, including broad-spectrum wastewater analysis to identify usage trends in the community and metabolomic-based approaches to examine the effects of drugs of abuse on endogenous compounds. In this paper, the novel applications of HRMS techniques to the analysis of NPS is reviewed. In particular, the development of innovative data analysis and interpretation approaches is discussed, including the application of machine learning and molecular networking to toxicological analyses.
Collapse
Affiliation(s)
- Joshua Klingberg
- Australian Racing Forensic Laboratory, Racing NSW, Sydney, NSW, 2000, Australia.
| | - Bethany Keen
- Centre for Forensic Science, University of Technology Sydney, Broadway, NSW, 2007, Australia
| | - Adam Cawley
- Australian Racing Forensic Laboratory, Racing NSW, Sydney, NSW, 2000, Australia
| | - Daniel Pasin
- Section of Forensic Chemistry, Department of Forensic Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Shanlin Fu
- Centre for Forensic Science, University of Technology Sydney, Broadway, NSW, 2007, Australia
| |
Collapse
|
40
|
Sussman EM, Oktem B, Isayeva IS, Liu J, Wickramasekara S, Chandrasekar V, Nahan K, Shin HY, Zheng J. Chemical Characterization and Non-targeted Analysis of Medical Device Extracts: A Review of Current Approaches, Gaps, and Emerging Practices. ACS Biomater Sci Eng 2022; 8:939-963. [PMID: 35171560 DOI: 10.1021/acsbiomaterials.1c01119] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The developers of medical devices evaluate the biocompatibility of their device prior to FDA's review and subsequent introduction to the market. Chemical characterization, described in ISO 10993-18:2020, can generate information for toxicological risk assessment and is an alternative approach for addressing some biocompatibility end points (e.g., systemic toxicity, genotoxicity, carcinogenicity, reproductive/developmental toxicity) that can reduce the time and cost of testing and the need for animal testing. Additionally, chemical characterization can be used to determine whether modifications to the materials and manufacturing processes alter the chemistry of a patient-contacting device to an extent that could impact device safety. Extractables testing is one approach to chemical characterization that employs combinations of non-targeted analysis, non-targeted screening, and/or targeted analysis to establish the identities and quantities of the various chemical constituents that can be released from a device. Due to the difficulty in obtaining a priori information on all the constituents in finished devices, information generation strategies in the form of analytical chemistry testing are often used. Identified and quantified extractables are then assessed using toxicological risk assessment approaches to determine if reported quantities are sufficiently low to overcome the need for further chemical analysis, biological evaluation of select end points, or risk control. For extractables studies to be useful as a screening tool, comprehensive and reliable non-targeted methods are needed. Although non-targeted methods have been adopted by many laboratories, they are laboratory-specific and require expensive analytical instruments and advanced technical expertise to perform. In this Perspective, we describe the elements of extractables studies and provide an overview of the current practices, identified gaps, and emerging practices that may be adopted on a wider scale in the future. This Perspective is outlined according to the steps of an extractables study: information gathering, extraction, extract sample processing, system selection, qualification, quantification, and identification.
Collapse
Affiliation(s)
- Eric M Sussman
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Berk Oktem
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Irada S Isayeva
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Jinrong Liu
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Samanthi Wickramasekara
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Vaishnavi Chandrasekar
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Keaton Nahan
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Hainsworth Y Shin
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| | - Jiwen Zheng
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland 20993, United States
| |
Collapse
|
41
|
Souihi A, Mohai MP, Palm E, Malm L, Kruve A. MultiConditionRT: Predicting liquid chromatography retention time for emerging contaminants for a wide range of eluent compositions and stationary phases. J Chromatogr A 2022; 1666:462867. [DOI: 10.1016/j.chroma.2022.462867] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 01/29/2022] [Accepted: 01/29/2022] [Indexed: 12/25/2022]
|
42
|
Tian Z, Liu F, Li D, Fernie AR, Chen W. Strategies for structure elucidation of small molecules based on LC–MS/MS data from complex biological samples. Comput Struct Biotechnol J 2022; 20:5085-5097. [PMID: 36187931 PMCID: PMC9489805 DOI: 10.1016/j.csbj.2022.09.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 09/03/2022] [Accepted: 09/03/2022] [Indexed: 11/06/2022] Open
Abstract
LC–MS/MS is a major analytical platform for metabolomics, which has become a recent hotspot in the research fields of life and environmental sciences. By contrast, structure elucidation of small molecules based on LC–MS/MS data remains a major challenge in the chemical and biological interpretation of untargeted metabolomics datasets. In recent years, several strategies for structure elucidation using LC–MS/MS data from complex biological samples have been proposed, these strategies can be simply categorized into two types, one based on structure annotation of mass spectra and for the other on retention time prediction. These strategies have helped many scientists conduct research in metabolite-related fields and are indispensable for the development of future tools. Here, we summarized the characteristics of the current tools and strategies for structure elucidation of small molecules based on LC–MS/MS data, and further discussed the directions and perspectives to improve the power of the tools or strategies for structure elucidation.
Collapse
|
43
|
Deep learning for retention time prediction in reversed-phase liquid chromatography. J Chromatogr A 2021; 1664:462792. [PMID: 34999303 DOI: 10.1016/j.chroma.2021.462792] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 12/23/2021] [Accepted: 12/28/2021] [Indexed: 01/16/2023]
Abstract
Retention time prediction in high-performance liquid chromatography (HPLC) is the subject of many studies since it can improve the identification of unknown molecules in untargeted profiling using HPLC coupled with high-resolution mass spectrometry. Lots of approaches were developed for retention time prediction in liquid chromatography for a different number of molecules considering various molecular properties and machine learning algorithms. The recently built large retention time data set of standard compounds from the Metabolite and Chemical Entity Database (METLIN) allows researchers to create a model that can be used for retention time prediction of small molecules with wide varieties of structures and physicochemical properties. The ability to predict retention times using the largest data set was studied for different architectures of deep learning models that were trained on molecular fingerprints, and SMILES (string representation of a molecule) represented as one-hot matrices. The best result was achieved with a one-dimensional convolutional neural network (1D CNN) that uses SMILES as an input. The proposed model reached the mean absolute error and the median absolute error equal to 34.7 and 18.7 s, respectively, which outperformed the results previously obtained for this data set. The pre-trained 1D CNN on the METLIN SMRT data set was transferred on five other data sets to evaluate the generalization ability.
Collapse
|
44
|
Ju R, Liu X, Zheng F, Lu X, Xu G, Lin X. Deep Neural Network Pretrained by Weighted Autoencoders and Transfer Learning for Retention Time Prediction of Small Molecules. Anal Chem 2021; 93:15651-15658. [PMID: 34780148 DOI: 10.1021/acs.analchem.1c03250] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Retention time (RT) prediction contributes to identification of small molecules measured by high-performance liquid chromatography coupled with high-resolution mass spectrometry. Deep learning algorithms based on big data can enhance the accuracy of RT prediction. But at different chromatographic conditions, RTs of compounds are different, and the number of compounds with known RTs is small in most cases. Therefore, the transfer of big data is necessary. In this work, a strategy using a deep neural network (DNN) pretrained by weighed autoencoders and transfer learning (DNNpwa-TL) was proposed to efficiently predict RTs of compounds. The loss function in the autoencoders was calculated with features weighted by mutual information. Then, a DNN pretrained by weighted autoencoders (DNNpwa) was produced. For other specific chromatographic methods, the transfer learning model DNNpwa-TLs were built through fine-tuning the DNNpwa with the help of some compounds with known RTs to conduct the RT prediction. With the above strategy, a DNNpwa was first built with the METLIN small molecule retention time data set containing 80 038 small molecule compounds. A median relative error of 3.1% and a mean relative error of 4.9% were achieved. Then, 17 data sets from different chromatographic methods were studied, and the results showed that the performance of DNNpwa-TL was better than those of other deep learning models. Besides, DNNpwa-TL outperformed random forest, gradient boost, least absolute shrinkage and selection operator regression, and DNN for most of the 17 data sets. Therefore, DNNpwa-TL can provide an efficient method to perform RT prediction of small molecule compounds for different chromatographic methods and conditions.
Collapse
Affiliation(s)
- Ran Ju
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Xinyu Liu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
| | - Fujian Zheng
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
| | - Xin Lu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
45
|
Harrieder EM, Kretschmer F, Böcker S, Witting M. Current state-of-the-art of separation methods used in LC-MS based metabolomics and lipidomics. J Chromatogr B Analyt Technol Biomed Life Sci 2021; 1188:123069. [PMID: 34879285 DOI: 10.1016/j.jchromb.2021.123069] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 11/10/2021] [Accepted: 11/24/2021] [Indexed: 12/23/2022]
Abstract
Metabolomics deals with the large-scale analysis of metabolites, belonging to numerous compound classes and showing an extremely high chemical diversity and complexity. Lipidomics, being a subcategory of metabolomics, analyzes the cellular lipid species. Both require state-of-the-art analytical methods capable of accessing the underlying chemical complexity. One of the major techniques used for the analysis of metabolites and lipids is Liquid Chromatography-Mass Spectrometry (LC-MS), offering both different selectivities in LC separation and high sensitivity in MS detection. Chromatography can be divided into different modes, based on the properties of the employed separation system. The most popular ones are Reversed-Phase (RP) separation for non- to mid-polar molecules and Hydrophilic Interaction Liquid Chromatography (HILIC) for polar molecules. So far, no single analysis method exists that can cover the entire range of metabolites or lipids, due to the huge chemical diversity. Consequently, different separation methods have been used for different applications and research questions. In this review, we explore the current use of LC-MS in metabolomics and lipidomics. As a proxy, we examined the use of chromatographic methods in the public repositories EBI MetaboLights and NIH Metabolomics Workbench. We extracted 1484 method descriptions, collected separation metadata and generated an overview on the current use of columns, eluents, etc. Based on this overview, we reviewed current practices and identified potential future trends as well as required improvements that may allow us to increase metabolite coverage, throughput or both simultaneously.
Collapse
Affiliation(s)
- Eva-Maria Harrieder
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Fleming Kretschmer
- Chair of Bioinformatics, Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2, 07743 Jena, Germany
| | - Sebastian Böcker
- Chair of Bioinformatics, Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2, 07743 Jena, Germany
| | - Michael Witting
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; Metabolomics and Proteomics Core, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; Chair of Analytical Food Chemistry, Technical University of Munich, Maximus-von-Imhof-Forum 2, 85354 Freising, Germany.
| |
Collapse
|
46
|
Suspect and non-target screening of chemicals in clothing textiles by reversed-phase liquid chromatography/hybrid quadrupole-Orbitrap mass spectrometry. Anal Bioanal Chem 2021; 414:1403-1413. [PMID: 34786606 PMCID: PMC8724091 DOI: 10.1007/s00216-021-03766-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 10/26/2021] [Accepted: 11/02/2021] [Indexed: 11/27/2022]
Abstract
The global manufacturing of clothing is usually composed of multistep processes, which include a large number of chemicals. However, there is generally no information regarding the chemical content remaining in the finished clothes. Clothes in close and prolonged skin contact may thus be a significant source of daily human exposure to hazardous compounds depending on their ability to migrate from the textiles and be absorbed by the skin. In the present study, twenty-four imported garments on the Swedish market were investigated with respect to their content of organic compounds, using a screening workflow. Reversed-phase liquid chromatography coupled to electrospray ionization/high-resolution mass spectrometry was used for both suspect and non-target screening. The most frequently detected compound was benzothiazole followed by quinoline. Nitroanilines with suspected mutagenic and possible skin sensitization properties, and quinoline, a carcinogenic compound, were among the compounds occurring at the highest concentrations. In some garments, the level of quinoline was estimated to be close to or higher than 50,000 ng/g, the limit set by the REACH regulation. Other detected compounds were acridine, benzotriazoles, benzothiazoles, phthalates, nitrophenols, and organophosphates. Several of the identified compounds have logP and molecular weight values enabling skin uptake. This pilot study indicates which chemicals and compound classes should be prioritized for future quantitative surveys and control of the chemical content in clothing as well as research on skin transfer, skin absorption, and systemic exposure. The results also show that the current control and prevention from chemicals in imported garments on the Swedish market is insufficient.
Collapse
|
47
|
David A, Chaker J, Price EJ, Bessonneau V, Chetwynd AJ, Vitale CM, Klánová J, Walker DI, Antignac JP, Barouki R, Miller GW. Towards a comprehensive characterisation of the human internal chemical exposome: Challenges and perspectives. ENVIRONMENT INTERNATIONAL 2021; 156:106630. [PMID: 34004450 DOI: 10.1016/j.envint.2021.106630] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 04/15/2021] [Accepted: 05/03/2021] [Indexed: 05/18/2023]
Abstract
The holistic characterisation of the human internal chemical exposome using high-resolution mass spectrometry (HRMS) would be a step forward to investigate the environmental ætiology of chronic diseases with an unprecedented precision. HRMS-based methods are currently operational to reproducibly profile thousands of endogenous metabolites as well as externally-derived chemicals and their biotransformation products in a large number of biological samples from human cohorts. These approaches provide a solid ground for the discovery of unrecognised biomarkers of exposure and metabolic effects associated with many chronic diseases. Nevertheless, some limitations remain and have to be overcome so that chemical exposomics can provide unbiased detection of chemical exposures affecting disease susceptibility in epidemiological studies. Some of these limitations include (i) the lack of versatility of analytical techniques to capture the wide diversity of chemicals; (ii) the lack of analytical sensitivity that prevents the detection of exogenous (and endogenous) chemicals occurring at (ultra) trace levels from restricted sample amounts, and (iii) the lack of automation of the annotation/identification process. In this article, we discuss a number of technological and methodological limitations hindering applications of HRMS-based methods and propose initial steps to push towards a more comprehensive characterisation of the internal chemical exposome. We also discuss other challenges including the need for harmonisation and the difficulty inherent in assessing the dynamic nature of the internal chemical exposome, as well as the need for establishing a strong international collaboration, high level networking, and sustainable research infrastructure. A great amount of research, technological development and innovative bio-informatics tools are still needed to profile and characterise the "invisible" (not profiled), "hidden" (not detected) and "dark" (not annotated) components of the internal chemical exposome and concerted efforts across numerous research fields are paramount.
Collapse
Affiliation(s)
- Arthur David
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35000 Rennes, France.
| | - Jade Chaker
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35000 Rennes, France
| | - Elliott J Price
- Faculty of Sports Studies, Masaryk University, Brno, Czech Republic; RECETOX Centre, Masaryk University, Brno, Czech Republic
| | - Vincent Bessonneau
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35000 Rennes, France
| | - Andrew J Chetwynd
- School of Geography Earth and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK
| | | | - Jana Klánová
- RECETOX Centre, Masaryk University, Brno, Czech Republic
| | - Douglas I Walker
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | | | - Robert Barouki
- Unité UMR-S 1124 Inserm-Université Paris Descartes "Toxicologie Pharmacologie et Signalisation Cellulaire", Paris, France
| | - Gary W Miller
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY, USA
| |
Collapse
|
48
|
Pasin D, Mollerup CB, Rasmussen BS, Linnet K, Dalsgaard PW. Development of a single retention time prediction model integrating multiple liquid chromatography systems: Application to new psychoactive substances. Anal Chim Acta 2021; 1184:339035. [PMID: 34625246 DOI: 10.1016/j.aca.2021.339035] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 09/01/2021] [Accepted: 09/02/2021] [Indexed: 10/20/2022]
Abstract
Database-driven suspect screening has proven to be a useful tool to detect new psychoactive substances (NPS) outside the scope of targeted screening; however, the lack of retention times specific to a liquid chromatography (LC) system can result in a large number of false positives. A singular stream-lined, quantitative structure-retention relationship (QSRR)-based retention time prediction model integrating multiple LC systems with different elution conditions is presented using retention time data (n = 1281) from the online crowd-sourced database, HighResNPS. Modelling was performed using an artificial neural network (ANN), specifically a multi-layer perceptron (MLP), using four molecular descriptors and one-hot encoding of categorical labels. Evaluation of test set predictions (n = 193) yielded coefficient of determination (R2) and mean absolute error (MAE) values of 0.942 and 0.583 min, respectively. The model successfully differentiated between LC systems, predicting 54%, 81% and 97% of the test set within ±0.5, ±1 and ±2 min, respectively. Additionally, retention times for an analyte not previously observed by the model were predicted within ±1 min for each LC system. The developed model can be used to predict retention times for all analytes on HighResNPS for each participating laboratory's LC system to further support suspect screening.
Collapse
Affiliation(s)
- Daniel Pasin
- Section of Forensic Chemistry, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Christian Brinch Mollerup
- Section of Forensic Chemistry, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Brian Schou Rasmussen
- Section of Forensic Chemistry, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Kristian Linnet
- Section of Forensic Chemistry, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Petur Weihe Dalsgaard
- Section of Forensic Chemistry, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
49
|
Yang Q, Ji H, Fan X, Zhang Z, Lu H. Retention time prediction in hydrophilic interaction liquid chromatography with graph neural network and transfer learning. J Chromatogr A 2021; 1656:462536. [PMID: 34563892 DOI: 10.1016/j.chroma.2021.462536] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 09/02/2021] [Accepted: 09/03/2021] [Indexed: 01/04/2023]
Abstract
The combination of retention time (RT), accurate mass and tandem mass spectra can improve the structural annotation in untargeted metabolomics. However, the incorporation of RT for metabolite identification has received less attention because of the limitation of available RT data, especially for hydrophilic interaction liquid chromatography (HILIC). Here, the Graph Neural Network-based Transfer Learning (GNN-TL) is proposed to train a model for HILIC RTs prediction. The graph neural network was pre-trained using an in silico HILIC RT dataset (pseudo-labeling dataset) with ∼306 K molecules. Then, the weights of dense layers in the pre-trained GNN (pre-GNN) model were fine-tuned by transfer learning using a small number of experimental HILIC RTs from the target chromatographic system. The GNN-TL outperformed the methods in Retip, including the Random Forest (RF), Bayesian-regularized neural network (BRNN), XGBoost, light gradient-boosting machine (LightGBM), and Keras. It achieved the lowest mean absolute error (MAE) of 38.6 s on the test set and 33.4 s on an additional test set. It has the best ability to generalize with a small performance difference between training, test, and additional test sets. Furthermore, the predicted RTs can filter out nearly 60% false positive candidates on average, which is valuable for the identification of compounds complementary to mass spectrometry.
Collapse
Affiliation(s)
- Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Hongchao Ji
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Xiaqiong Fan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China.
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China.
| |
Collapse
|
50
|
Celma A, Ahrens L, Gago-Ferrero P, Hernández F, López F, Lundqvist J, Pitarch E, Sancho JV, Wiberg K, Bijlsma L. The relevant role of ion mobility separation in LC-HRMS based screening strategies for contaminants of emerging concern in the aquatic environment. CHEMOSPHERE 2021; 280:130799. [PMID: 34162120 DOI: 10.1016/j.chemosphere.2021.130799] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 04/29/2021] [Accepted: 05/01/2021] [Indexed: 05/24/2023]
Abstract
Ion mobility separation (IMS) coupled to high resolution mass spectrometry (IMS-HRMS) is a promising technique for (non-)target/suspect analysis of micropollutants in complex matrices. IMS separates ionized compounds based on their charge, shape and size facilitating the removal of co-eluting isomeric/isobaric species. Additionally, IMS data can be translated into collision cross-section (CCS) values, which can be used to increase the identification reliability. However, IMS-HRMS for the screening of contaminants of emerging concern (CECs) have been scarcely explored. In this study, the role of IMS-HRMS for the identification of CECs in complex matrices is highlighted, with emphasis on when and with which purpose is of use. The utilization of IMS can result in much cleaner mass spectra, which considerably facilitates data interpretation and the obtaining of reliable identifications. Furthermore, the robustness of IMS measurements across matrices permits the use of CCS as an additional relevant parameter during the identification step even when reference standards are not available. Moreover, an effect on the number of true and false identifications could be demonstrated by including IMS restrictions within the identification workflow. Data shown in this work is of special interest for environmental researchers dealing with the detection of CECs with state-of-the-art IMS-HRMS instruments.
Collapse
Affiliation(s)
- Alberto Celma
- Environmental and Public Health Analytical Chemistry, Research Institute for Pesticides and Water, University Jaume I, Castelló, E-12071, Spain
| | - Lutz Ahrens
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences (SLU), Box 7050, SE-750 07, Uppsala, Sweden
| | - Pablo Gago-Ferrero
- Institute of Environmental Assessment and Water Research (IDAEA) Severo Ochoa Excellence Center, Spanish Council for Scientific Research (CSIC), Jordi Girona 18-26, E-08034, Barcelona, Spain
| | - Félix Hernández
- Environmental and Public Health Analytical Chemistry, Research Institute for Pesticides and Water, University Jaume I, Castelló, E-12071, Spain
| | - Francisco López
- Environmental and Public Health Analytical Chemistry, Research Institute for Pesticides and Water, University Jaume I, Castelló, E-12071, Spain
| | - Johan Lundqvist
- Department of Biomedicine and Veterinary Public Health, Swedish University of Agricultural Sciences, Box 7028, SE-750 07, Uppsala, Sweden
| | - Elena Pitarch
- Environmental and Public Health Analytical Chemistry, Research Institute for Pesticides and Water, University Jaume I, Castelló, E-12071, Spain
| | - Juan Vicente Sancho
- Environmental and Public Health Analytical Chemistry, Research Institute for Pesticides and Water, University Jaume I, Castelló, E-12071, Spain
| | - Karin Wiberg
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences (SLU), Box 7050, SE-750 07, Uppsala, Sweden
| | - Lubertus Bijlsma
- Environmental and Public Health Analytical Chemistry, Research Institute for Pesticides and Water, University Jaume I, Castelló, E-12071, Spain.
| |
Collapse
|