1
|
Wang C, Yuan C, Wang Y, Shi Y, Zhang T, Patti GJ. Predicting Collision Cross-Section Values for Small Molecules through Chemical Class-Based Multimodal Graph Attention Network. J Chem Inf Model 2024. [PMID: 38959055 DOI: 10.1021/acs.jcim.3c01934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Libraries of collision cross-section (CCS) values have the potential to facilitate compound identification in metabolomics. Although computational methods provide an opportunity to increase library size rapidly, accurate prediction of CCS values remains challenging due to the structural diversity of small molecules. Here, we developed a machine learning (ML) model that integrates graph attention networks and multimodal molecular representations to predict CCS values on the basis of chemical class. Our approach, referred to as MGAT-CCS, had superior performance in comparison to other ML models in CCS prediction. MGAT-CCS achieved a median relative error of 0.47%/1.14% (positive/negative mode) and 1.40%/1.63% (positive/negative mode) for lipids and metabolites, respectively. When MGAT-CCS was applied to real-world metabolomics data, it reduced the number of false metabolite candidates by roughly 25% across multiple sample types ranging from plasma and urine to cells. To facilitate its application, we developed a user-friendly stand-alone web server for MGAT-CCS that is freely available at https://mgat-ccs-web.onrender.com. This work represents a step forward in predicting CCS values and can potentially facilitate the identification of small molecules when using ion mobility spectrometry coupled with mass spectrometry.
Collapse
Affiliation(s)
- Cheng Wang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250000, China
- Department of Chemistry, Washington University in St. Louis, St. Louis, Missouri 63130 United States
| | - Chuang Yuan
- School of Life Sciences, and Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing 100871, China
- Department of Biochemistry and Biophysics, School of Basic Medical Sciences, Peking University, Beijing 100191, China
| | - Yahui Wang
- Department of Chemistry, Washington University in St. Louis, St. Louis, Missouri 63130 United States
- Department of Medicine, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Yuying Shi
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250000, China
| | - Tao Zhang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250000, China
| | - Gary J Patti
- Department of Chemistry, Washington University in St. Louis, St. Louis, Missouri 63130 United States
- Department of Medicine, Washington University in St. Louis, St. Louis, Missouri 63130, United States
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, Missouri 63130, United States
- Center for Metabolomics and Isotope Tracing, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| |
Collapse
|
2
|
Beck A, Muhoberac M, Randolph CE, Beveridge CH, Wijewardhane PR, Kenttämaa HI, Chopra G. Recent Developments in Machine Learning for Mass Spectrometry. ACS MEASUREMENT SCIENCE AU 2024; 4:233-246. [PMID: 38910862 PMCID: PMC11191731 DOI: 10.1021/acsmeasuresciau.3c00060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 12/27/2023] [Accepted: 01/22/2024] [Indexed: 06/25/2024]
Abstract
Statistical analysis and modeling of mass spectrometry (MS) data have a long and rich history with several modern MS-based applications using statistical and chemometric methods. Recently, machine learning (ML) has experienced a renaissance due to advents in computational hardware and the development of new algorithms for artificial neural networks (ANN) and deep learning architectures. Moreover, recent successes of new ANN and deep learning architectures in several areas of science, engineering, and society have further strengthened the ML field. Importantly, modern ML methods and architectures have enabled new approaches for tasks related to MS that are now widely adopted in several popular MS-based subdisciplines, such as mass spectrometry imaging and proteomics. Herein, we aim to provide an introductory summary of the practical aspects of ML methodology relevant to MS. Additionally, we seek to provide an up-to-date review of the most recent developments in ML integration with MS-based techniques while also providing critical insights into the future direction of the field.
Collapse
Affiliation(s)
- Armen
G. Beck
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Matthew Muhoberac
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Caitlin E. Randolph
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Connor H. Beveridge
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Prageeth R. Wijewardhane
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Hilkka I. Kenttämaa
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Gaurav Chopra
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
- Department
of Computer Science (by courtesy), Purdue University, West Lafayette, Indiana 47907, United States
- Purdue
Institute for Drug Discovery, Purdue Institute for Cancer Research,
Regenstrief Center for Healthcare Engineering, Purdue Institute for
Inflammation, Immunology and Infectious Disease, Purdue Institute for Integrative Neuroscience, West Lafayette, Indiana 47907 United States
| |
Collapse
|
3
|
Vik D, Pii D, Mudaliar C, Nørregaard-Madsen M, Kontijevskis A. Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns. Sci Rep 2024; 14:8733. [PMID: 38627535 PMCID: PMC11021461 DOI: 10.1038/s41598-024-59620-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 04/12/2024] [Indexed: 04/19/2024] Open
Abstract
This study explores how machine-learning can be used to predict chromatographic retention times (RT) for the analysis of small molecules, with the objective of identifying a machine-learning framework with the robustness required to support a chemical synthesis production platform. We used internally generated data from high-throughput parallel synthesis in context of pharmaceutical drug discovery projects. We tested machine-learning models from the following frameworks: XGBoost, ChemProp, and DeepChem, using a dataset of 7552 small molecules. Our findings show that two specific models, AttentiveFP and ChemProp, performed better than XGBoost and a regular neural network in predicting RT accurately. We also assessed how well these models performed over time and found that molecular graph neural networks consistently gave accurate predictions for new chemical series. In addition, when we applied ChemProp on the publicly available METLIN SMRT dataset, it performed impressively with an average error of 38.70 s. These results highlight the efficacy of molecular graph neural networks, especially ChemProp, in diverse RT prediction scenarios, thereby enhancing the efficiency of chromatographic analysis.
Collapse
Affiliation(s)
- Daniel Vik
- Amgen Research Copenhagen, Amgen Inc., 2100, Copenhagen, Denmark.
| | - David Pii
- Amgen Research Copenhagen, Amgen Inc., 2100, Copenhagen, Denmark
| | - Chirag Mudaliar
- Amgen Research Copenhagen, Amgen Inc., 2100, Copenhagen, Denmark
| | | | | |
Collapse
|
4
|
Xue J, Wang B, Ji H, Li W. RT-Transformer: retention time prediction for metabolite annotation to assist in metabolite identification. Bioinformatics 2024; 40:btae084. [PMID: 38402516 PMCID: PMC10914443 DOI: 10.1093/bioinformatics/btae084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 01/14/2024] [Accepted: 02/22/2024] [Indexed: 02/26/2024] Open
Abstract
MOTIVATION Liquid chromatography retention times prediction can assist in metabolite identification, which is a critical task and challenge in nontargeted metabolomics. However, different chromatographic conditions may result in different retention times for the same metabolite. Current retention time prediction methods lack sufficient scalability to transfer from one specific chromatographic method to another. RESULTS Therefore, we present RT-Transformer, a novel deep neural network model coupled with graph attention network and 1D-Transformer, which can predict retention times under any chromatographic methods. First, we obtain a pre-trained model by training RT-Transformer on the large small molecule retention time dataset containing 80 038 molecules, and then transfer the resulting model to different chromatographic methods based on transfer learning. When tested on the small molecule retention time dataset, as other authors did, the average absolute error reached 27.30 after removing not retained molecules. Still, it reached 33.41 when no samples were removed. The pre-trained RT-Transformer was further transferred to 5 datasets corresponding to different chromatographic conditions and fine-tuned. According to the experimental results, RT-Transformer achieves competitive performance compared to state-of-the-art methods. In addition, RT-Transformer was applied to 41 external molecular retention time datasets. Extensive evaluations indicate that RT-Transformer has excellent scalability in predicting retention times for liquid chromatography and improves the accuracy of metabolite identification. AVAILABILITY AND IMPLEMENTATION The source code for the model is available at https://github.com/01dadada/RT-Transformer. The web server is available at https://huggingface.co/spaces/Xue-Jun/RT-Transformer.
Collapse
Affiliation(s)
- Jun Xue
- School of Information Science and Engineering, Yunnan University, Kunming, Yunnan 650500, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Bingyi Wang
- Yunnan Police College, Kunming, Yunnan 650223, China
- Key Laboratory of Smart Drugs Control (Yunnan Police College), Ministry of Education, Kunming, Yunnan 650223, China
| | - Hongchao Ji
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - WeiHua Li
- School of Information Science and Engineering, Yunnan University, Kunming, Yunnan 650500, China
| |
Collapse
|
5
|
Allwright M, Guennewig B, Hoffmann AE, Rohleder C, Jieu B, Chung LH, Jiang YC, Lemos Wimmer BF, Qi Y, Don AS, Leweke FM, Couttas TA. ReTimeML: a retention time predictor that supports the LC-MS/MS analysis of sphingolipids. Sci Rep 2024; 14:4375. [PMID: 38388524 PMCID: PMC10883992 DOI: 10.1038/s41598-024-53860-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 02/06/2024] [Indexed: 02/24/2024] Open
Abstract
The analysis of ceramide (Cer) and sphingomyelin (SM) lipid species using liquid chromatography-tandem mass spectrometry (LC-MS/MS) continues to present challenges as their precursor mass and fragmentation can correspond to multiple molecular arrangements. To address this constraint, we developed ReTimeML, a freeware that automates the expected retention times (RTs) for Cer and SM lipid profiles from complex chromatograms. ReTimeML works on the principle that LC-MS/MS experiments have pre-determined RTs from internal standards, calibrators or quality controls used throughout the analysis. Employed as reference RTs, ReTimeML subsequently extrapolates the RTs of unknowns using its machine-learned regression library of mass-to-charge (m/z) versus RT profiles, which does not require model retraining for adaptability on different LC-MS/MS pipelines. We validated ReTimeML RT estimations for various Cer and SM structures across different biologicals, tissues and LC-MS/MS setups, exhibiting a mean variance between 0.23 and 2.43% compared to user annotations. ReTimeML also aided the disambiguation of SM identities from isobar distributions in paired serum-cerebrospinal fluid from healthy volunteers, allowing us to identify a series of non-canonical SMs associated between the two biofluids comprised of a polyunsaturated structure that confers increased stability against catabolic clearance.
Collapse
Affiliation(s)
- Michael Allwright
- ForeFront, Brain and Mind Centre, The University of Sydney, Sydney, Australia
| | - Boris Guennewig
- ForeFront, Brain and Mind Centre, The University of Sydney, Sydney, Australia
| | - Anna E Hoffmann
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Endosane Pharmaceuticals GmbH, Berlin, Germany
| | - Cathrin Rohleder
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Endosane Pharmaceuticals GmbH, Berlin, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Beverly Jieu
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Long H Chung
- Centenary Institute, The University of Sydney, Sydney, Australia
| | - Yingxin C Jiang
- Centenary Institute, The University of Sydney, Sydney, Australia
| | - Bruno F Lemos Wimmer
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Yanfei Qi
- Centenary Institute, The University of Sydney, Sydney, Australia
| | - Anthony S Don
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - F Markus Leweke
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Endosane Pharmaceuticals GmbH, Berlin, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Timothy A Couttas
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia.
| |
Collapse
|
6
|
Fine J, Mann AKP, Aggarwal P. Structure Based Machine Learning Prediction of Retention Times for LC Method Development of Pharmaceuticals. Pharm Res 2024; 41:365-374. [PMID: 38332389 DOI: 10.1007/s11095-023-03646-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 12/15/2023] [Indexed: 02/10/2024]
Abstract
PURPOSE Significant resources are spent on developing robust liquid chromatography (LC) methods with optimum conditions for all project in the pipeline. Although, data-driven computer assisted modelling has been implemented to shorten the method development timelines, these modelling approaches require project-specific screening data to model retention time (RT) as function of method parameters. Sometimes method re-development is required, leading to additional investments and redundant laboratory work. Cheminformatics techniques have been successfully used to predict the RT of metabolites & other component mixtures for similar use cases. Here we will show that these techniques can be used to model structurally diverse molecules and predictions of these models trained on multiple LC conditions can be used for downstream data-driven modelling. METHODS The Molecular Operating Environment (MOE) was used to calculate over 800 descriptors using the strucutres of the analytes. These descriptors were used to model the RT of the analytes under four chromatographic conditions. These models were then used to create data-driven models using LC-SIM. RESULTS A structural-based Random Forest (RF) model outperformed other techniques in cross-validation studies and predicted the RTs of a randomized test set with a median percentage error less than 4% for all LC conditions. RTs predicted by this structure-based model were used to fit a data-driven model that identifies optimum LC conditions without any additional experimental work. CONCLUSIONS These results show that small training sets yield pharmaceutically relevant models when used in a combination of structure-based and data-driven model.
Collapse
Affiliation(s)
- Jonathan Fine
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ, 07065, USA
| | | | - Pankaj Aggarwal
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ, 07065, USA.
| |
Collapse
|
7
|
Kwon Y, Kwon H, Han J, Kang M, Kim JY, Shin D, Choi YS, Kang S. Retention Time Prediction through Learning from a Small Training Data Set with a Pretrained Graph Neural Network. Anal Chem 2023; 95:17273-17283. [PMID: 37955847 DOI: 10.1021/acs.analchem.3c03177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2023]
Abstract
Graph neural networks (GNNs) have shown remarkable performance in predicting the retention time (RT) for small molecules. However, the training data set for a particular target chromatographic system tends to exhibit scarcity, which poses a challenge because the experimental process for measuring RT is costly. To address this challenge, transfer learning has been used to leverage an abundant training data set from a related source task. In this study, we present an improved transfer learning method to better predict the RT of molecules for a target chromatographic system by learning from a small training data set with a pretrained GNN. We use a graph isomorphism network as the architecture of the GNN. The GNN is pretrained on the METLIN-SMRT data set and is then fine-tuned on the target training data set for a fixed number of training iterations using the limited-memory Broyden-Fletcher-Goldfarb-Shanno optimizer with a learning rate decay. We demonstrate that the proposed method achieves superior predictive performance on various chromatographic systems compared with that of the existing transfer learning methods, especially when only a small training data set is available for use. A potential avenue for future research is to leverage multiple small training data sets from different chromatographic systems to further enhance the generalization performance.
Collapse
Affiliation(s)
- Youngchun Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Hyukju Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
- Department of Chemistry, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Jongmin Han
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Myeonginn Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Ji-Yeong Kim
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Dongyeeb Shin
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Youn-Suk Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Seokho Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| |
Collapse
|
8
|
Kang Q, Fang P, Zhang S, Qiu H, Lan Z. Deep graph convolutional network for small-molecule retention time prediction. J Chromatogr A 2023; 1711:464439. [PMID: 37865024 DOI: 10.1016/j.chroma.2023.464439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/04/2023] [Accepted: 10/06/2023] [Indexed: 10/23/2023]
Abstract
The retention time (RT) is a crucial source of data for liquid chromatography-mass spectrometry (LCMS). A model that can accurately predict the RT for each molecule would empower filtering candidates with similar spectra but differing RT in LCMS-based molecule identification. Recent research shows that graph neural networks (GNNs) outperform traditional machine learning algorithms in RT prediction. However, all of these models use relatively shallow GNNs. This study for the first time investigates how depth affects GNNs' performance on RT prediction. The results demonstrate that a notable improvement can be achieved by pushing the depth of GNNs to 16 layers by the adoption of residual connection. Additionally, we also find that graph convolutional network (GCN) model benefits from the edge information. The developed deep graph convolutional network, DeepGCN-RT, significantly outperforms the previous state-of-the-art method and achieves the lowest mean absolute percentage error (MAPE) of 3.3% and the lowest mean absolute error (MAE) of 26.55 s on the SMRT test set. We also finetune DeepGCN-RT on seven datasets with various chromatographic conditions. The mean MAE of the seven datasets largely decreases 30% compared to previous state-of-the-art method. On the RIKEN-PlaSMA dataset, we also test the effectiveness of DeepGCN-RT in assisting molecular structure identification. By 30% lessening the number of potential structures, DeepGCN-RT is able to improve top-1 accuracy by about 11%.
Collapse
Affiliation(s)
- Qiyue Kang
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China.
| | - Pengfei Fang
- School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, 210096, China
| | - Shuai Zhang
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China
| | - Huachuan Qiu
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China
| | - Zhenzhong Lan
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China.
| |
Collapse
|
9
|
Fan F, Wu G, Yang Y, Liu F, Qian Y, Yu Q, Ren H, Geng J. A Graph Neural Network Model with a Transparent Decision-Making Process Defines the Applicability Domain for Environmental Estrogen Screening. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:18236-18245. [PMID: 37749748 DOI: 10.1021/acs.est.3c04571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2023]
Abstract
The application of deep learning (DL) models for screening environmental estrogens (EEs) for the sound management of chemicals has garnered significant attention. However, the currently available DL model for screening EEs lacks both a transparent decision-making process and effective applicability domain (AD) characterization, making the reliability of its prediction results uncertain and limiting its practical applications. To address this issue, a graph neural network (GNN) model was developed to screen EEs, achieving accuracy rates of 88.9% and 92.5% on the internal and external test sets, respectively. The decision-making process of the GNN model was explored through the network-like similarity graphs (NSGs) based on the model features (FT). We discovered that the accuracy of the predictions is dependent on the feature distribution of compounds in NSGs. An AD characterization method called ADFT was proposed, which excludes predictions falling outside of the model's prediction range, leading to a 15% improvement in the F1 score of the GNN model. The GNN model with the AD method may serve as an efficient tool for screening EEs, identifying 800 potential EEs in the Inventory of Existing Chemical Substances of China. Additionally, this study offers new insights into comprehending the decision-making process of DL models.
Collapse
Affiliation(s)
- Fan Fan
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Gang Wu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Yining Yang
- School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Fu Liu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Yuli Qian
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Qingmiao Yu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environment, Ministry of Education, Chongqing University, Chongqing 400044, China
| | - Hongqiang Ren
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Jinju Geng
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environment, Ministry of Education, Chongqing University, Chongqing 400044, China
| |
Collapse
|
10
|
Wang Y, Wei W, Du W, Cai J, Liao Y, Lu H, Kong B, Zhang Z. Deep-Learning-Based Mixture Identification for Nuclear Magnetic Resonance Spectroscopy Applied to Plant Flavors. Molecules 2023; 28:7380. [PMID: 37959799 PMCID: PMC10648966 DOI: 10.3390/molecules28217380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 10/25/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open
Abstract
Nuclear magnetic resonance (NMR) is a crucial technique for analyzing mixtures consisting of small molecules, providing non-destructive, fast, reproducible, and unbiased benefits. However, it is challenging to perform mixture identification because of the offset of chemical shifts and peak overlaps that often exist in mixtures such as plant flavors. Here, we propose a deep-learning-based mixture identification method (DeepMID) that can be used to identify plant flavors (mixtures) in a formulated flavor (mixture consisting of several plant flavors) without the need to know the specific components in the plant flavors. A pseudo-Siamese convolutional neural network (pSCNN) and a spatial pyramid pooling (SPP) layer were used to solve the problems due to their high accuracy and robustness. The DeepMID model is trained, validated, and tested on an augmented data set containing 50,000 pairs of formulated and plant flavors. We demonstrate that DeepMID can achieve excellent prediction results in the augmented test set: ACC = 99.58%, TPR = 99.48%, FPR = 0.32%; and two experimentally obtained data sets: one shows ACC = 97.60%, TPR = 92.81%, FPR = 0.78% and the other shows ACC = 92.31%, TPR = 80.00%, FPR = 0.00%. In conclusion, DeepMID is a reliable method for identifying plant flavors in formulated flavors based on NMR spectroscopy, which can assist researchers in accelerating the design of flavor formulations.
Collapse
Affiliation(s)
- Yufei Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China; (Y.W.); (Y.L.); (H.L.)
| | - Weiwei Wei
- Technology Center, China Tobacco Hunan Industrial Co., Ltd., Changsha 410014, China; (W.W.); (W.D.); (J.C.)
| | - Wen Du
- Technology Center, China Tobacco Hunan Industrial Co., Ltd., Changsha 410014, China; (W.W.); (W.D.); (J.C.)
| | - Jiaxiao Cai
- Technology Center, China Tobacco Hunan Industrial Co., Ltd., Changsha 410014, China; (W.W.); (W.D.); (J.C.)
| | - Yuxuan Liao
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China; (Y.W.); (Y.L.); (H.L.)
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China; (Y.W.); (Y.L.); (H.L.)
| | - Bo Kong
- Technology Center, China Tobacco Hunan Industrial Co., Ltd., Changsha 410014, China; (W.W.); (W.D.); (J.C.)
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China; (Y.W.); (Y.L.); (H.L.)
| |
Collapse
|
11
|
Song Y, Chang S, Tian J, Pan W, Feng L, Ji H. A Comprehensive Comparative Analysis of Deep Learning Based Feature Representations for Molecular Taste Prediction. Foods 2023; 12:3386. [PMID: 37761095 PMCID: PMC10529232 DOI: 10.3390/foods12183386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 08/30/2023] [Accepted: 09/01/2023] [Indexed: 09/29/2023] Open
Abstract
Taste determination in small molecules is critical in food chemistry but traditional experimental methods can be time-consuming. Consequently, computational techniques have emerged as valuable tools for this task. In this study, we explore taste prediction using various molecular feature representations and assess the performance of different machine learning algorithms on a dataset comprising 2601 molecules. The results reveal that GNN-based models outperform other approaches in taste prediction. Moreover, consensus models that combine diverse molecular representations demonstrate improved performance. Among these, the molecular fingerprints + GNN consensus model emerges as the top performer, highlighting the complementary strengths of GNNs and molecular fingerprints. These findings have significant implications for food chemistry research and related fields. By leveraging these computational approaches, taste prediction can be expedited, leading to advancements in understanding the relationship between molecular structure and taste perception in various food components and related compounds.
Collapse
Affiliation(s)
- Yu Song
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, China;
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Sihao Chang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Jing Tian
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Weihua Pan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Lu Feng
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, China;
| | - Hongchao Ji
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| |
Collapse
|
12
|
Wang Y, Xiong J, Xiao F, Zhang W, Cheng K, Rao J, Niu B, Tong X, Qu N, Zhang R, Wang D, Chen K, Li X, Zheng M. LogD7.4 prediction enhanced by transferring knowledge from chromatographic retention time, microscopic pKa and logP. J Cheminform 2023; 15:76. [PMID: 37670374 PMCID: PMC10478446 DOI: 10.1186/s13321-023-00754-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 08/25/2023] [Indexed: 09/07/2023] Open
Abstract
Lipophilicity is a fundamental physical property that significantly affects various aspects of drug behavior, including solubility, permeability, metabolism, distribution, protein binding, and toxicity. Accurate prediction of lipophilicity, measured by the logD7.4 value (the distribution coefficient between n-octanol and buffer at physiological pH 7.4), is crucial for successful drug discovery and design. However, the limited availability of data for logD modeling poses a significant challenge to achieving satisfactory generalization capability. To address this challenge, we have developed a novel logD7.4 prediction model called RTlogD, which leverages knowledge from multiple sources. RTlogD combines pre-training on a chromatographic retention time (RT) dataset since the RT is influenced by lipophilicity. Additionally, microscopic pKa values are incorporated as atomic features, providing valuable insights into ionizable sites and ionization capacity. Furthermore, logP is integrated as an auxiliary task within a multitask learning framework. We conducted ablation studies and presented a detailed analysis, showcasing the effectiveness and interpretability of RT, pKa, and logP in the RTlogD model. Notably, our RTlogD model demonstrated superior performance compared to commonly used algorithms and prediction tools. These results underscore the potential of the RTlogD model to improve the accuracy and generalization of logD prediction in drug discovery and design. In summary, the RTlogD model addresses the challenge of limited data availability in logD modeling by leveraging knowledge from RT, microscopic pKa, and logP. Incorporating these factors enhances the predictive capabilities of our model, and it holds promise for real-world applications in drug discovery and design scenarios.
Collapse
Affiliation(s)
- Yitian Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Jiacheng Xiong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Fu Xiao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China
| | - Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Kaiyang Cheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Buying Niu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Ning Qu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Runze Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | | | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China.
| |
Collapse
|
13
|
Choi E, Yoo WJ, Jang HY, Kim TY, Lee SK, Oh HB. Machine learning liquid chromatography retention time prediction model augments the dansylation strategy for metabolite analysis of urine samples. J Chromatogr A 2023; 1705:464167. [PMID: 37348224 DOI: 10.1016/j.chroma.2023.464167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 06/10/2023] [Accepted: 06/15/2023] [Indexed: 06/24/2023]
Abstract
Herein, a standalone software equipped with a graphic user interface (GUI) is developed to predict liquid chromatography mass spectrometry (LC-MS) retention times (RTs) of dansylated metabolites. Dansylation metabolomics strategy developed by Li et al. narrows down a vast chemical space of metabolites into the metabolites containing amines and phenolic hydroxyls. Combined with differential isotope labeling, e.g., 12C-reagent labeled individual samples spiked with a 13C-reagent labeled reference or pooled sample, LC-MS analysis of the dansylated samples enables accurate relative quantification of all labeled metabolites. Herein, the LC-RTs for dansylated metabolites are predicted using an artificial neural network (ANN) machine-learning model. For the ANN modeling, 315 dansylated urine metabolites obtained from the DnsID database are used. The ANN LC-RT prediction model was reliable, with a mean absolute deviation of 0.74 min for the 30 min LC run. In the RT model, a deviation of more than 2 min was observed in only 3.2% of the total 315 metabolites, while a deviation of 1.5 min or more was observed in 11% of the metabolites. Furthermore, it was found that the LC-RT prediction was also reliable even for metabolites containing both amine and phenolic functional groups that can undergo dansylation on either one of the two functional groups, resulting in the generation of two isomeric forms. This RT-prediction model is embedded into a user-friendly GUI and can be used for identifying nontargeted dansylated metabolites with unknown RTs, along with accurate mass measurements. Furthermore, it is demonstrated that the developed software can help identify metabolites from a urine sample of an anonymous healthy pregnant woman.
Collapse
Affiliation(s)
- Eunwoo Choi
- Department of Chemistry, Sogang University, Seoul 04107, Republic of Korea
| | - Won Jun Yoo
- Department of Chemistry, Sogang University, Seoul 04107, Republic of Korea
| | - Hwa-Yong Jang
- Department of Chemistry, Sogang University, Seoul 04107, Republic of Korea
| | - Tae-Young Kim
- School of Earth Sciences and Environmental Engineering, Gwangju Institute of Science and Technology, Gwangju 61005, Republic of Korea
| | - Sung Ki Lee
- Department of Obstetrics and Gynecology, College of Medicine, Konyang University, Daejeon 35365, Republic of Korea.
| | - Han Bin Oh
- Department of Chemistry, Sogang University, Seoul 04107, Republic of Korea.
| |
Collapse
|
14
|
Liao Y, Tian M, Zhang H, Lu H, Jiang Y, Chen Y, Zhang Z. Highly automatic and universal approach for pure ion chromatogram construction from liquid chromatography-mass spectrometry data using deep learning. J Chromatogr A 2023; 1705:464172. [PMID: 37392637 DOI: 10.1016/j.chroma.2023.464172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 06/14/2023] [Accepted: 06/18/2023] [Indexed: 07/03/2023]
Abstract
Feature extraction is the most fundamental step when analyzing liquid chromatography-mass spectrometry (LC-MS) datasets. However, traditional methods require optimal parameter selections and re-optimization for different datasets, thus hindering efficient and objective large-scale data analysis. Pure ion chromatogram (PIC) is widely used because it avoids the peak splitting problem of the extracted ion chromatogram (EIC) and regions of interest (ROIs). Here, we developed a deep learning-based pure ion chromatogram method (DeepPIC) to find PICs using a customized U-Net from centroid mode data of LC-MS directly and automatically. A model was trained, validated, and tested on the Arabidopsis thaliana dataset with 200 input-label pairs. DeepPIC was integrated into KPIC2. The combination enables the entire processing pipeline from raw data to discriminant models for metabolomics datasets. The KPIC2 with DeepPIC was compared against other competing methods (XCMS, FeatureFinderMetabo, and peakonly) on the MM48, simulated MM48, and quantitative datasets. These comparisons showed that DeepPIC outperforms XCMS, FeatureFinderMetabo, and peakonly in recall rates and correlation with sample concentrations. Five datasets of different instruments and samples were used to evaluate the quality of PICs and the universal applicability of DeepPIC, and 95.12% of the found PICs could precisely match their manually labeled PICs. Therefore, KPIC2+DeepPIC is an automatic, practical, and off-the-shelf method to extract features from raw data directly, exceeding traditional methods with careful parameter tuning. It is publicly available at https://github.com/yuxuanliao/DeepPIC.
Collapse
Affiliation(s)
- Yuxuan Liao
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Miao Tian
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yonglei Jiang
- Yunnan Academy of Tobacco Agricultural Sciences, Kunming, Yunnan 650021, China
| | - Yi Chen
- Yunnan Academy of Tobacco Agricultural Sciences, Kunming, Yunnan 650021, China.
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
15
|
Wang X, Li C, Li Z, Qi Y, Zhang X, Zhao X, Zhao C, Lin X, Lu X, Xu G. A Structure-Guided Molecular Network Strategy for Global Untargeted Metabolomics Data Annotation. Anal Chem 2023; 95:11603-11612. [PMID: 37493263 DOI: 10.1021/acs.analchem.3c00849] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]
Abstract
Large-scale metabolite annotation is a bottleneck in untargeted metabolomics. Here, we present a structure-guided molecular network strategy (SGMNS) for deep annotation of untargeted ultra-performance liquid chromatography-high resolution mass spectrometry (MS) metabolomics data. Different from the current network-based metabolite annotation method, SGMNS is based on a global connectivity molecular network (GCMN), which was constructed by molecular fingerprint similarity of chemical structures in metabolome databases. Neighbor metabolites with similar structures in GCMN are expected to produce similar spectra. Network annotation propagation of SGMNS is performed using known metabolites as seeds. The experimental MS/MS spectra of seeds are assigned to corresponding neighbor metabolites in GCMN as their "pseudo" spectra; the propagation is done by searching predicted retention times, MS1, and "pseudo" spectra against metabolite features in untargeted metabolomics data. Then, the annotated metabolite features were used as new seeds for annotation propagation again. Performance evaluation of SGMNS showed its unique advantages for metabolome annotation. The developed method was applied to annotate six typical biological samples; a total of 701, 1557, 1147, 1095, 1237, and 2041 metabolites were annotated from the cell, feces, plasma (NIST SRM 1950), tissue, urine, and their pooled sample, respectively, and the annotation accuracy was >83% with RSD <2%. The results show that SGMNS fully exploits the chemical space of the existing metabolomes for metabolite deep annotation and overcomes the shortcoming of insufficient reference MS/MS spectra.
Collapse
Affiliation(s)
- Xinxin Wang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P.R. China
- University of Chinese Academy of Sciences, Beijing 100049, P.R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P.R. China
| | - Chao Li
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P.R. China
- School of Computer Science & Technology, Dalian University of Technology, Dalian 116024, P.R. China
- University of Chinese Academy of Sciences, Beijing 100049, P.R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P.R. China
| | - Zaifang Li
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P.R. China
- University of Chinese Academy of Sciences, Beijing 100049, P.R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P.R. China
| | - Yanpeng Qi
- School of Computer Science & Technology, Dalian University of Technology, Dalian 116024, P.R. China
| | - Xiuqiong Zhang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P.R. China
- University of Chinese Academy of Sciences, Beijing 100049, P.R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P.R. China
| | - Xinjie Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P.R. China
- University of Chinese Academy of Sciences, Beijing 100049, P.R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P.R. China
| | - Chunxia Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P.R. China
- University of Chinese Academy of Sciences, Beijing 100049, P.R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P.R. China
| | - Xiaohui Lin
- School of Computer Science & Technology, Dalian University of Technology, Dalian 116024, P.R. China
| | - Xin Lu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P.R. China
- University of Chinese Academy of Sciences, Beijing 100049, P.R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P.R. China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P.R. China
- University of Chinese Academy of Sciences, Beijing 100049, P.R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P.R. China
| |
Collapse
|
16
|
Guo R, Zhang Y, Liao Y, Yang Q, Xie T, Fan X, Lin Z, Chen Y, Lu H, Zhang Z. Highly accurate and large-scale collision cross sections prediction with graph neural networks. Commun Chem 2023; 6:139. [PMID: 37402835 DOI: 10.1038/s42004-023-00939-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 06/23/2023] [Indexed: 07/06/2023] Open
Abstract
The collision cross section (CCS) values derived from ion mobility spectrometry can be used to improve the accuracy of compound identification. Here, we have developed the Structure included graph merging with adduct method for CCS prediction (SigmaCCS) based on graph neural networks using 3D conformers as inputs. A model was trained, evaluated, and tested with >5,000 experimental CCS values. It achieved a coefficient of determination of 0.9945 and a median relative error of 1.1751% on the test set. The model-agnostic interpretation method and the visualization of the learned representations were used to investigate the chemical rationality of SigmaCCS. An in-silico database with 282 million CCS values was generated for three different adduct types of 94 million compounds. Its source code is publicly available at https://github.com/zmzhang/SigmaCCS . Altogether, SigmaCCS is an accurate, rational, and off-the-shelf method to directly predict CCS values from molecular structures.
Collapse
Affiliation(s)
- Renfeng Guo
- College of Chemistry and Chemical Engineering, Central South University, 410083, Changsha, China
| | - Youjia Zhang
- School of Computer Science and Technology, Huazhong University of Science and Technology, 430074, Wuhan, China
| | - Yuxuan Liao
- College of Chemistry and Chemical Engineering, Central South University, 410083, Changsha, China
| | - Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, 410083, Changsha, China
| | - Ting Xie
- College of Chemistry and Chemical Engineering, Central South University, 410083, Changsha, China
| | - Xiaqiong Fan
- College of Chemistry and Chemical Engineering, Central South University, 410083, Changsha, China
| | - Zhonglong Lin
- Yunnan Academy of Tobacco Agricultural Sciences, 650021, Kunming, Yunnan, China
| | - Yi Chen
- Yunnan Academy of Tobacco Agricultural Sciences, 650021, Kunming, Yunnan, China.
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, 410083, Changsha, China.
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, 410083, Changsha, China.
| |
Collapse
|
17
|
Sholokhova AY, Matyushin DD, Grinevich OI, Borovikova SA, Buryak AK. Intelligent Workflow and Software for Non-Target Analysis of Complex Samples Using a Mixture of Toxic Transformation Products of Unsymmetrical Dimethylhydrazine as an Example. Molecules 2023; 28:molecules28083409. [PMID: 37110641 PMCID: PMC10143382 DOI: 10.3390/molecules28083409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 04/05/2023] [Accepted: 04/10/2023] [Indexed: 04/29/2023] Open
Abstract
Unsymmetrical dimethylhydrazine (UDMH) is a widely used rocket propellant. Entering the environment or being stored in uncontrolled conditions, UDMH easily forms an enormous variety (at least many dozens) of transformation products. Environmental pollution by UDMH and its transformation products is a major problem in many countries and across the Arctic region. Unfortunately, previous works often use only electron ionization mass spectrometry with a library search, or they consider only the molecular formula to propose the structures of new products. This is quite an unreliable approach. It was demonstrated that a newly proposed artificial intelligence-based workflow allows for the proposal of structures of UDMH transformation products with a greater degree of certainty. The presented free and open-source software with a convenient graphical user interface facilitates the non-target analysis of industrial samples. It has bundled machine learning models for the prediction of retention indices and mass spectra. A critical analysis of whether a combination of several methods of chromatography and mass spectrometry allows us to elucidate the structure of an unknown UDMH transformation product was provided. It was demonstrated that the use of gas chromatographic retention indices for two stationary phases (polar and non-polar) allows for the rejection of false candidates in many cases when only one retention index is not enough. The structures of five previously unknown UDMH transformation products were proposed, and four previously proposed structures were refined.
Collapse
Affiliation(s)
- Anastasia Yu Sholokhova
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia
| | - Dmitriy D Matyushin
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia
| | - Oksana I Grinevich
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia
| | - Svetlana A Borovikova
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia
| | - Aleksey K Buryak
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia
| |
Collapse
|
18
|
Luo M, Yin Y, Zhou Z, Zhang H, Chen X, Wang H, Zhu ZJ. A mass spectrum-oriented computational method for ion mobility-resolved untargeted metabolomics. Nat Commun 2023; 14:1813. [PMID: 37002244 PMCID: PMC10066191 DOI: 10.1038/s41467-023-37539-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Accepted: 03/17/2023] [Indexed: 04/03/2023] Open
Abstract
Ion mobility (IM) adds a new dimension to liquid chromatography-mass spectrometry-based untargeted metabolomics which significantly enhances coverage, sensitivity, and resolving power for analyzing the metabolome, particularly metabolite isomers. However, the high dimensionality of IM-resolved metabolomics data presents a great challenge to data processing, restricting its widespread applications. Here, we develop a mass spectrum-oriented bottom-up assembly algorithm for IM-resolved metabolomics that utilizes mass spectra to assemble four-dimensional peaks in a reverse order of multidimensional separation. We further develop the end-to-end computational framework Met4DX for peak detection, quantification and identification of metabolites in IM-resolved metabolomics. Benchmarking and validation of Met4DX demonstrates superior performance compared to existing tools with regard to coverage, sensitivity, peak fidelity and quantification precision. Importantly, Met4DX successfully detects and differentiates co-eluted metabolite isomers with small differences in the chromatographic and IM dimensions. Together, Met4DX advances metabolite discovery in biological organisms by deciphering the complex 4D metabolomics data.
Collapse
Affiliation(s)
- Mingdu Luo
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032, P. R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P. R. China
| | - Yandong Yin
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032, P. R. China
| | - Zhiwei Zhou
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032, P. R. China
| | - Haosong Zhang
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032, P. R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P. R. China
| | - Xi Chen
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032, P. R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P. R. China
| | - Hongmiao Wang
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032, P. R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P. R. China
| | - Zheng-Jiang Zhu
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032, P. R. China.
- Shanghai Key Laboratory of Aging Studies, Shanghai, 201210, P. R. China.
| |
Collapse
|
19
|
Fan X, Wang Y, Yu C, Lv Y, Zhang H, Yang Q, Wen M, Lu H, Zhang Z. A Universal and Accurate Method for Easily Identifying Components in Raman Spectroscopy Based on Deep Learning. Anal Chem 2023; 95:4863-4870. [PMID: 36908216 DOI: 10.1021/acs.analchem.2c03853] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Raman spectroscopy has been widely used to provide the structural fingerprint for molecular identification. Due to interference from coexisting components, noise, baseline, and systematic differences between spectrometers, component identification with Raman spectra is challenging, especially for mixtures. In this study, a method entitled DeepRaman has been proposed to solve those problems by combining the comparison ability of a pseudo-Siamese neural network (pSNN) and the input-shape flexibility of spatial pyramid pooling (SPP). DeepRaman was trained, validated, and tested with 41,564 augmented Raman spectra from two databases (pharmaceutical material and S.T. Japan). It can achieve 96.29% accuracy, 98.40% true positive rate (TPR), and 94.36% true negative rate (TNR) on the test set. Another six data sets measured on different instruments were used to evaluate the performance of the proposed method from different aspects. DeepRaman can provide accurate identification results and significantly outperform the hit quality index (HQI) method and other deep learning models. In addition, it performs well in cases of different spectral complexity and low-content components. Once the model is established, it can be used directly on different data sets without retraining or transfer learning. Furthermore, it also obtains promising results for the analysis of surface-enhanced Raman spectroscopy (SERS) data sets and Raman imaging data sets. In summary, it is an accurate, universal, and ready-to-use method for component identification in various application scenarios.
Collapse
Affiliation(s)
- Xiaqiong Fan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yue Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Chuanxiu Yu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yuanxia Lv
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Ming Wen
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
20
|
Wang X, Zheng F, Sheng M, Xu G, Lin X. Retention time prediction for small samples based on integrating molecular representations and adaptive network. J Chromatogr B Analyt Technol Biomed Life Sci 2023; 1217:123624. [PMID: 36780745 DOI: 10.1016/j.jchromb.2023.123624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Revised: 01/13/2023] [Accepted: 01/27/2023] [Indexed: 02/07/2023]
Abstract
Retention time (RT) can provide orthogonal information different from that of mass spectrometry and contribute to identifying compounds. Many machine learning methods have been developed and applied to RT prediction. In application, the training data size is usually small in most chromatography systems. To enhance the performance of RT prediction, this study proposes a RT prediction method based on multi-data combinations and adaptive neural network (MDC-ANN). MDC-ANN establishes the RT prediction model for the target chromatographic system through transfer learning and a base deep learning model trained on a big dataset. It selects the optimal molecular representation combination from the multiple input candidates and automatically determines the neural network structure according to the determined input combination. MDC-ANN was compared with two new efficient deep learning methods, three transferring methods and four popular machine learning methods on 14 small datasets and showed advantages in MAE, MedAE, MRE and R2 in most cases. The experiment results illustrated that integrating multiple molecular representations can provide more information, improve the performance of RT prediction and contribute to compound annotation, different chromatographic systems may use different molecular representation combinations to obtain good RT prediction performance. Hence, MDC-ANN which automatically determines the best combination of molecular representations for a specific system is promising for predicting RTs accurately in real applications.
Collapse
Affiliation(s)
- Xiaoxiao Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Fujian Zheng
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, Liaoning, China.
| | - Meizhen Sheng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, Liaoning, China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China.
| |
Collapse
|
21
|
Fan Y, Yu C, Lu H, Chen Y, Hu B, Zhang X, Su J, Zhang Z. Deep learning-based method for automatic resolution of gas chromatography-mass spectrometry data from complex samples. J Chromatogr A 2023; 1690:463768. [PMID: 36641940 DOI: 10.1016/j.chroma.2022.463768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 12/21/2022] [Accepted: 12/28/2022] [Indexed: 12/31/2022]
Abstract
Modern gas chromatography-mass spectrometry (GC-MS) is the workhorse for the high-throughput profiling of volatile compounds in complex samples. It can produce a considerable amount of two-dimensional data, and automatic methods are required to distill chemical information from raw GC-MS data efficiently. In this study, we proposed an Automatic Resolution method (AutoRes) based on pseudo-Siamese convolutional neural networks (pSCNN) to extract the meaningful features swamped by the noises, baseline drifts, retention time shifts, and overlapped peaks. Two pSCNN models were trained with 400,000 augmented spectral pairs, respectively. They can predict the selective region (pSCNN1) and elution region (pSCNN2) of compounds in an untargeted manner. The accuracies of the pSCNN1 model and the pSCNN2 model on their test sets are 99.9% and 92.6%, respectively. Then, the chromatographic profile of each component was automatically resolved by full rank resolution (FRR) based on the predicted regions by these models. The performance of AutoRes was evaluated on the simulated and plant essential oil datasets. Compared to AMDIS and MZmine, AutoRes resolves more reasonable mass spectra, chromatograms, and peak areas to identify and quantify compounds. The average match scores of AutoRes (925 and 936) outperformed AMDIS (909 and 925) and MZmine (888 and 916) when resolving mass spectra from overlapped peaks on the Set Ⅰ and Set Ⅱ of plant essential oil dataset and matching them against the NIST17 library. It extracted peak areas and mass spectra automatically from 10 GC-MS files of plant essential oils, and the entire process was completed in 8 min without any prior information or manual intervention. It is implemented in Python and is available as an open-source package at https://github.com/dyjfan/AutoRes.
Collapse
Affiliation(s)
- Yingjie Fan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, Hunan, China
| | - Chuanxiu Yu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, Hunan, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, Hunan, China
| | - Yi Chen
- Yunnan Academy of Tobacco Agricultural Sciences, Kunming 650021, Yunnan, China
| | - Binbin Hu
- Yunnan Academy of Tobacco Agricultural Sciences, Kunming 650021, Yunnan, China
| | - Xingren Zhang
- Yunnan Academy of Tobacco Agricultural Sciences, Kunming 650021, Yunnan, China; Baoshan City Branch of Yunnan Tobacco Company, Baoshan 678000, Yunnan, China
| | - Jiaen Su
- Dali Prefecture Branch of Yunnan Tobacco Company, Dali 671000, Yunnan, China.
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, Hunan, China.
| |
Collapse
|
22
|
Zhang H, Xu Z, Fan X, Wang Y, Yang Q, Sun J, Wen M, Kang X, Zhang Z, Lu H. Fusion of Quality Evaluation Metrics and Convolutional Neural Network Representations for ROI Filtering in LC-MS. Anal Chem 2023; 95:612-620. [PMID: 36597722 DOI: 10.1021/acs.analchem.2c01398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Region of interest (ROI) extraction is a fundamental step in analyzing metabolomic datasets acquired by liquid chromatography-mass spectrometry (LC-MS). However, noises and backgrounds in LC-MS data often affect the quality of extracted ROIs. Therefore, developing effective ROI evaluation algorithms is necessary to eliminate false positives meanwhile keep the false-negative rate as low as possible. In this study, a deep fused filter of ROIs (dffROI) was proposed to improve the accuracy of ROI extraction by combining the handcrafted evaluation metrics with convolutional neural network (CNN)-learned representations. To evaluate the performance of dffROI, dffROI was compared with peakonly (CNN-learned representation) and five handcrafted metrics on three LC-MS datasets and a gas chromatography-mass spectrometry (GC-MS) dataset. Results show that dffROI can achieve higher accuracy, better true-positive rate, and lower false-positive rate. Its accuracy, true-positive rate, and false-positive rate are 0.9841, 0.9869, and 0.0186 on the test set, respectively. The classification error rate of dffROI (1.59%) is significantly reduced compared with peakonly (2.73%). The model-agnostic feature importance demonstrates the necessity of fusing handcrafted evaluation metrics with the convolutional neural network representations. dffROI is an automatic, robust, and universal method for ROI filtering by virtue of information fusion and end-to-end learning. It is implemented in Python programming language and open-sourced at https://github.com/zhanghailiangcsu/dffROI under BSD License. Furthermore, it has been integrated into the KPIC2 framework previously proposed by our group to facilitate real metabolomic LC-MS dataset analysis.
Collapse
Affiliation(s)
- Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Zhenbo Xu
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Xiaqiong Fan
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Yue Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Jinyu Sun
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Ming Wen
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Xiao Kang
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China.,National International Collaborative Research Center for Medical Metabolomics, Central South University, Changsha410083, China
| |
Collapse
|
23
|
Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00577-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
AbstractStructural annotation of small molecules in biological samples remains a key bottleneck in untargeted metabolomics, despite rapid progress in predictive methods and tools during the past decade. Liquid chromatography–tandem mass spectrometry, one of the most widely used analysis platforms, can detect thousands of molecules in a sample, the vast majority of which remain unidentified even with best-of-class methods. Here we present LC-MS2Struct, a machine learning framework for structural annotation of small-molecule data arising from liquid chromatography–tandem mass spectrometry (LC-MS2) measurements. LC-MS2Struct jointly predicts the annotations for a set of mass spectrometry features in a sample, using a novel structured prediction model trained to optimally combine the output of state-of-the-art MS2 scorers and observed retention orders. We evaluate our method on a dataset covering all publicly available reversed-phase LC-MS2 data in the MassBank reference database, including 4,327 molecules measured using 18 different LC conditions from 16 contributors, greatly expanding the chemical analytical space covered in previous multi-MS2 scorer evaluations. LC-MS2Struct obtains significantly higher annotation accuracy than earlier methods and improves the annotation accuracy of state-of-the-art MS2 scorers by up to 106%. The use of stereochemistry-aware molecular fingerprints improves prediction performance, which highlights limitations in existing approaches and has strong implications for future computational LC-MS2 developments.
Collapse
|
24
|
Cai Y, Zhou Z, Zhu ZJ. Advanced analytical and informatic strategies for metabolite annotation in untargeted metabolomics. Trends Analyt Chem 2022. [DOI: 10.1016/j.trac.2022.116903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
25
|
Sun J, Wen M, Wang H, Ruan Y, Yang Q, Kang X, Zhang H, Zhang Z, Lu H. Prediction of drug-likeness using graph convolutional attention network. Bioinformatics 2022; 38:5262-5269. [PMID: 36222555 DOI: 10.1093/bioinformatics/btac676] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 09/22/2022] [Accepted: 10/08/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The drug-likeness has been widely used as a criterion to distinguish drug-like molecules from non-drugs. Developing reliable computational methods to predict the drug-likeness of compounds is crucial to triage unpromising molecules and accelerate the drug discovery process. RESULTS In this study, a deep learning method was developed to predict the drug-likeness based on the graph convolutional attention network (D-GCAN) directly from molecular structures. Results showed that the D-GCAN model outperformed other state-of-the-art models for drug-likeness prediction. The combination of graph convolution and attention mechanism made an important contribution to the performance of the model. Specifically, the application of the attention mechanism improved accuracy by 4.0%. The utilization of graph convolution improved the accuracy by 6.1%. Results on the dataset beyond Lipinski's rule of five space and the non-US dataset showed that the model had good versatility. Then, the billion-scale GDB-13 database was used as a case study to screen SARS-CoV-2 3C-like protease inhibitors. Sixty-five drug candidates were screened out, most substructures of which are similar to these of existing oral drugs. Candidates screened from S-GDB13 have higher similarity to existing drugs and better molecular docking performance than those from the rest of GDB-13. The screening speed on S-GDB13 is significantly faster than screening directly on GDB-13. In general, D-GCAN is a promising tool to predict the drug-likeness for selecting potential candidates and accelerating drug discovery by excluding unpromising candidates and avoiding unnecessary biological and clinical testing. AVAILABILITY AND IMPLEMENTATION The source code, model and tutorials are available at https://github.com/JinYSun/D-GCAN. The S-GDB13 database is available at https://doi.org/10.5281/zenodo.7054367. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jinyu Sun
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Ming Wen
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Huabei Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yuezhe Ruan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Xiao Kang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
26
|
Zeng X, Xiang H, Yu L, Wang J, Li K, Nussinov R, Cheng F. Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00557-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
27
|
Fouad MA, Serag A, Tolba EH, El-Shal MA, El Kerdawy AM. QSRR modeling of the chromatographic retention behavior of some quinolone and sulfonamide antibacterial agents using firefly algorithm coupled to support vector machine. BMC Chem 2022; 16:85. [PMID: 36329493 PMCID: PMC9635186 DOI: 10.1186/s13065-022-00874-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 10/04/2022] [Indexed: 11/06/2022] Open
Abstract
Quinolone and sulfonamide are two classes of antibacterial agents with an opulent history of medicinal chemistry features that contribute to their bacterial spectrum, efficacy, pharmacokinetics, and adverse effect profiles. The urgent need for their use, combined with the escalating rate of their resistance, necessitates the development of suitable analytical methods that accelerate and facilitate their analysis. In this study, the advanced firefly algorithm (FFA) coupled with support vector regression (SVR) was used to select the most significant descriptors and to construct two quantitative structure-retention relationship (QSRR) models using a series of 11 selected quinolone and 13 sulfonamide drugs, respectively, to predict their retention behavior in HPLC. Precisely, the effect of the pH value and acetonitrile composition in the mobile phase on the retention behavior of quinolones and sulfonamides, respectively, were studied. The obtained QSRR models performed well in both internal and external validations, demonstrating their robustness and predictive ability. Y-randomization validation demonstrated that the obtained models did not result by statistical chance. Moreover, the obtained results shed the light on the molecular features that influence the retention behavior of these two classes under the current chromatographic conditions.
Collapse
Affiliation(s)
- Marwa A. Fouad
- grid.7776.10000 0004 0639 9286Pharmaceutical Chemistry Department, Faculty of Pharmacy, Cairo University, Kasr El-Aini St, P.O. Box 11562, Cairo, Egypt ,Department of Pharmaceutical Chemistry, School of Pharmacy, Newgiza University (NGU), Newgiza, km 22 Cairo–Alexandria Desert Road, Cairo, Egypt
| | - Ahmed Serag
- grid.411303.40000 0001 2155 6022Pharmaceutical Analytical Chemistry Department, Faculty of Pharmacy, Al-Azhar University, 11751 Cairo, Egypt
| | - Enas H. Tolba
- grid.419698.bEgyptian Drug Authority (Former National Organization for Drug Control and Research), Cairo, Egypt
| | - Manal A. El-Shal
- grid.419698.bEgyptian Drug Authority (Former National Organization for Drug Control and Research), Cairo, Egypt
| | - Ahmed M. El Kerdawy
- grid.7776.10000 0004 0639 9286Pharmaceutical Chemistry Department, Faculty of Pharmacy, Cairo University, Kasr El-Aini St, P.O. Box 11562, Cairo, Egypt
| |
Collapse
|
28
|
Ma P, Zhang Z, Jia X, Peng X, Zhang Z, Tarwa K, Wei CI, Liu F, Wang Q. Neural network in food analytics. Crit Rev Food Sci Nutr 2022; 64:4059-4077. [PMID: 36322538 DOI: 10.1080/10408398.2022.2139217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Neural network (i.e. deep learning, NN)-based data analysis techniques have been listed as a pivotal opportunity to protect the integrity and safety of the global food supply chain and forecast $11.2 billion in agriculture markets. As a general-purpose data analytic tool, NN has been applied in several areas of food science, such as food recognition, food supply chain security and omics analysis, and so on. Therefore, given the rapid emergence of NN applications in food safety, this review aims to provide a comprehensive overview of the NN application in food analysis for the first time, focusing on domain-specific applications in food analysis by introducing fundamental methodology, reviewing recent and notable progress, and discussing challenges and potential pitfalls. NN demonstrated that it has a bright future through effective collaboration between food specialist and the broader community in the food field, for example, superiority in food recognition, sensory evaluation, pattern recognition of spectroscopy and chromatography. However, major challenges impeded NN extension including void in the food scientist-friendly interface software package, incomprehensible model behavior, multi-source heterogeneous data, and so on. The breakthrough from other fields proved NN has the potential to offer a revolution in the immediate future.
Collapse
Affiliation(s)
- Peihua Ma
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| | - Zhikun Zhang
- CISPA Helmholtz Center for Information Security, Saarbrucken, Germany
| | - Xiaoxue Jia
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| | - Xiaoke Peng
- College of Food Science and Engineering, Northwest A&F University, Yangling, Shaanxi, PR China
| | - Zhi Zhang
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| | - Kevin Tarwa
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| | - Cheng-I Wei
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| | - Fuguo Liu
- College of Food Science and Engineering, Northwest A&F University, Yangling, Shaanxi, PR China
| | - Qin Wang
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| |
Collapse
|
29
|
Retention Time Prediction with Message-Passing Neural Networks. SEPARATIONS 2022. [DOI: 10.3390/separations9100291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023] Open
Abstract
Retention time prediction, facilitated by advances in machine learning, has become a useful tool in untargeted LC-MS applications. State-of-the-art approaches include graph neural networks and 1D-convolutional neural networks that are trained on the METLIN small molecule retention time dataset (SMRT). These approaches demonstrate accurate predictions comparable with the experimental error for the training set. The weak point of retention time prediction approaches is the transfer of predictions to various systems. The accuracy of this step depends both on the method of mapping and on the accuracy of the general model trained on SMRT. Therefore, improvements to both parts of prediction workflows may lead to improved compound annotations. Here, we evaluate capabilities of message-passing neural networks (MPNN) that have demonstrated outstanding performance on many chemical tasks to accurately predict retention times. The model was initially trained on SMRT, providing mean and median absolute cross-validation errors of 32 and 16 s, respectively. The pretrained MPNN was further fine-tuned on five publicly available small reversed-phase retention sets in a transfer learning mode and demonstrated up to 30% improvement of prediction accuracy for these sets compared with the state-of-the-art methods. We demonstrated that filtering isomeric candidates by predicted retention with the thresholds obtained from ROC curves eliminates up to 50% of false identities.
Collapse
|
30
|
LI Z, ZHENG F, XIA Y, ZHANG X, WANG X, ZHAO C, ZHAO X, LU X, XU G. A novel method for efficient screening and annotation of important pathway-associated metabolites based on the modified metabolome and probe molecules. Se Pu 2022; 40:788-796. [PMID: 36156625 PMCID: PMC9520374 DOI: 10.3724/sp.j.1123.2022.03025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
植物次生代谢物在抵御生物/非生物胁迫、生物间互作以及信息传递等方面发挥重要作用,次生代谢途径解析对植物分子育种、天然产物合成等方面具有重要意义。液相色谱-高分辨串联质谱(LC-HRMS/MS)为次生代谢物鉴定及途径表征提供了技术手段。非靶向LC-HRMS/MS方法可获得丰富的质谱信号,包括一级质谱和二级质谱(MS, MS/MS),但受质谱数据库规模以及次生代谢物复杂性的制约,次生代谢物注释十分困难。该研究以玉米叶片中苯丙烷途径代谢物为例,发展用于非靶向代谢组数据中重要途径代谢物的高效筛选和注释新方法。首先,利用公共代谢途径数据库及文献获取参与苯丙烷代谢途径的61种修饰反应类型,进而从非靶向实验数据中筛选出修饰代谢组。其次,获取开源串联质谱数据中的苯丙烷类化合物作为探针分子,构建探针分子质谱数据库。将探针分子与修饰代谢组共建分子网络,锁定目标途径代谢物并注释结构。该方法在正、负离子模式下分别筛选出玉米叶片中392个和417个苯丙烷途径候选代谢物,去冗余后共注释出129个代谢物,涉及苯丙烷代谢的主要分支途径,如黄酮途径的8个类黄酮、19个氧苷类黄酮和32个碳苷类黄酮,31个羟基肉桂酸途径代谢物以及22个木脂素途径代谢物;其中26个在PubChem和SciFinder数据库中未见收录。该研究利用探针分子结合修饰组可快速锁定途径代谢物,且有助于快速、准确的网络传播注释,可显著提高目标途径代谢物筛选与注释效率,为植物次生代谢途径的深入解析提供分析手段。
Collapse
|
31
|
Lim S, Lee S, Piao Y, Choi M, Bang D, Gu J, Kim S. On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach. Comput Struct Biotechnol J 2022; 20:4288-4304. [PMID: 36051875 PMCID: PMC9399946 DOI: 10.1016/j.csbj.2022.07.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 07/29/2022] [Accepted: 07/29/2022] [Indexed: 11/22/2022] Open
Abstract
A large number of chemical compounds are available in databases such as PubChem and ZINC. However, currently known compounds, though large, represent only a fraction of possible compounds, which is known as chemical space. Many of these compounds in the databases are annotated with properties and assay data that can be used for drug discovery efforts. For this goal, a number of machine learning algorithms have been developed and recent deep learning technologies can be effectively used to navigate chemical space, especially for unknown chemical compounds, in terms of drug-related tasks. In this article, we survey how deep learning technologies can model and utilize chemical compound information in a task-oriented way by exploiting annotated properties and assay data in the chemical compounds databases. We first compile what kind of tasks are trying to be accomplished by machine learning methods. Then, we survey deep learning technologies to show their modeling power and current applications for accomplishing drug related tasks. Next, we survey deep learning techniques to address the insufficiency issue of annotated data for more effective navigation of chemical space. Chemical compound information alone may not be powerful enough for drug related tasks, thus we survey what kind of information, such as assay and gene expression data, can be used to improve the prediction power of deep learning models. Finally, we conclude this survey with four important newly developed technologies that are yet to be fully incorporated into computational analysis of chemical information.
Collapse
Affiliation(s)
- Sangsoo Lim
- Bioinformatics Institute, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Sangseon Lee
- Institute of Computer Technology, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Yinhua Piao
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - MinGyu Choi
- Department of Chemistry, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Jeonghyeon Gu
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- MOGAM Institute for Biomedical Research, Yong-in 16924, South Korea
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| |
Collapse
|
32
|
Fully automatic resolution of untargeted GC-MS data with deep learning assistance. Talanta 2022; 244:123415. [DOI: 10.1016/j.talanta.2022.123415] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 03/24/2022] [Accepted: 03/26/2022] [Indexed: 11/17/2022]
|
33
|
Zheng F, You L, Qin W, Ouyang R, Lv W, Guo L, Lu X, Li E, Zhao X, Xu G. MetEx: A Targeted Extraction Strategy for Improving the Coverage and Accuracy of Metabolite Annotation in Liquid Chromatography-High-Resolution Mass Spectrometry Data. Anal Chem 2022; 94:8561-8569. [PMID: 35670335 DOI: 10.1021/acs.analchem.1c04783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Liquid chromatography-high-resolution mass spectrometry (LC-HRMS) is the most popular platform for untargeted metabolomics studies, but compound annotation is a challenge. In this work, we developed a new LC-HRMS data-targeted extraction method called MetEx for metabolite annotation. MetEx contains the retention time (tR), MS1, and MS2 information of 30 620 metabolites from freely available spectral databases, including MoNA and KEGG. The tR values of 95.4% of the compounds in our database were calculated by the GNN-RT model. The MS2 spectra of 39.4% compounds were also predicted using CFM-ID. MetEx was initially examined on a mixture of 634 standards, considering chemical coverage and accurate metabolite assignment, and later applied to human plasma (NIST SRM 1950), human urine, HepG2 cells, mouse liver tissue, and mouse feces. MetEx correctly assigned 252 out of 253 standards detected in our instruments. The platform also provided 8.0-44.2% more compounds in the biological samples compared to XCMS, MS-DIAL, and MZmine 2. MetEx is implemented and visualized in R and freely available at http://www.metaboex.cn/MetEx.
Collapse
Affiliation(s)
- Fujian Zheng
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Lei You
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Wangshu Qin
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China.,Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Runze Ouyang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Wangjie Lv
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Lei Guo
- Department of Anesthesiology, The First Affiliated Hospital of Harbin Medical University, Harbin 150001, China
| | - Xin Lu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Enyou Li
- Department of Anesthesiology, The First Affiliated Hospital of Harbin Medical University, Harbin 150001, China
| | - Xinjie Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China.,Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| |
Collapse
|
34
|
García CA, Gil-de-la-Fuente A, Barbas C, Otero A. Probabilistic metabolite annotation using retention time prediction and meta-learned projections. J Cheminform 2022; 14:33. [PMID: 35672784 PMCID: PMC9172150 DOI: 10.1186/s13321-022-00613-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 05/20/2022] [Indexed: 12/31/2022] Open
Abstract
Retention time information is used for metabolite annotation in metabolomic experiments. But its usefulness is hindered by the availability of experimental retention time data in metabolomic databases, and by the lack of reproducibility between different chromatographic methods. Accurate prediction of retention time for a given chromatographic method would be a valuable support for metabolite annotation. We have trained state-of-the-art machine learning regressors using the 80, 038 experimental retention times from the METLIN Small Molecule Retention Tim (SMRT) dataset. The models included deep neural networks, deep kernel learning, several gradient boosting models, and a blending approach. 5, 666 molecular descriptors and 2, 214 fingerprints (MACCS166, Extended Connectivity, and Path Fingerprints fingerprints) were generated with the alvaDesc software. The models were trained using only the descriptors, only the fingerprints, and both types of features simultaneously. Bayesian hyperparameter search was used for parameter tuning. To avoid data-leakage when reporting the performance metrics, nested cross-validation was employed. The best results were obtained by a heavily regularized deep neural network trained with cosine annealing warm restarts and stochastic weight averaging, achieving a mean and median absolute errors of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$39.2 \pm 1.2\; s$$\end{document}39.2±1.2s and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$17.2 \pm 0.9\;s$$\end{document}17.2±0.9s, respectively. To the best of our knowledge, these are the most accurate predictions published up to date over the SMRT dataset. To project retention times between chromatographic methods, a novel Bayesian meta-learning approach that can learn from just a few molecules is proposed. By applying this projection between the deep neural network retention time predictions and a given chromatographic method, our approach can be integrated into a metabolite annotation workflow to obtain z-scores for the candidate annotations. To this end, it is enough that just as few as 10 molecules of a given experiment have been identified (probably by using pure metabolite standards). The use of z-scores permits considering the uncertainty in the projection when ranking candidates, and not only the accuracy. In this scenario, our results show that in 68% of the cases the correct molecule was among the top three candidates filtered by mass and ranked according to z-scores. This shows the usefulness of this information to support metabolite annotation. Python code is available on GitHub at https://github.com/constantino-garcia/cmmrt.
Collapse
|
35
|
Machine Learning-Based Retention Time Prediction of Trimethylsilyl Derivatives of Metabolites. Biomedicines 2022; 10:biomedicines10040879. [PMID: 35453629 PMCID: PMC9024754 DOI: 10.3390/biomedicines10040879] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/04/2022] [Accepted: 04/06/2022] [Indexed: 11/16/2022] Open
Abstract
In gas chromatography–mass spectrometry-based untargeted metabolomics, metabolites are identified by comparing mass spectra and chromatographic retention time with reference databases or standard materials. In that sense, machine learning has been used to predict the retention time of metabolites lacking reference data. However, the retention time prediction of trimethylsilyl derivatives of metabolites, typically analyzed in untargeted metabolomics using gas chromatography, has been poorly explored. Here, we provide a rationalized framework for machine learning-based retention time prediction of trimethylsilyl derivatives of metabolites in gas chromatography. We compared different machine learning paradigms, in addition to exploring the influence of the computational molecular structure representation to train the prediction models: fingerprint class and fingerprint calculation software. Our study challenged predicted retention time when using chemical ionization and electron impact ionization sources in simulated and real cases, demonstrating a good correct identity ranking capability by machine learning, despite observing a limited false identity filtering power in cases where a spectrum or a monoisotopic mass match to multiple candidates. Specifically, machine learning prediction yielded median absolute and relative retention index (relative retention time) errors of 37.1 retention index units and 2%, respectively. In addition, fingerprint class and fingerprint calculation software, as well as the molecular structural similarity between the training and test or real case sets, showed to be critical modulators of the prediction performance. Finally, we leveraged the structural similarity between the training and test or real case set to determine the probability that the prediction error is below a specific threshold. Overall, our study demonstrates that predicted retention time can provide insights into the true structure of unknown metabolites by ranking from the most to the least plausible molecular identity, and sets the guidelines to assess the confidence in metabolite identification using predicted retention time data.
Collapse
|
36
|
White JB, Trim PJ, Salagaras T, Long A, Psaltis PJ, Verjans JW, Snel MF. Equivalent Carbon Number and Interclass Retention Time Conversion Enhance Lipid Identification in Untargeted Clinical Lipidomics. Anal Chem 2022; 94:3476-3484. [PMID: 35157429 DOI: 10.1021/acs.analchem.1c03770] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Chromatography is often used as a method for reducing sample complexity prior to analysis by mass spectrometry, and the use of retention time (RT) is becoming increasingly popular to add valuable supporting information in lipid identification. The RT of lipids with the same headgroup in reversed-phase separation can be predicted using the equivalent carbon number (ECN) model. This model describes the effects of acyl chain length and degree of saturation on lipid RT. For the first time, we have found a robust correlation in the chromatographic separation of lipids with different headgroups that share the same fatty acid motive. This relationship can be exploited to perform interclass RT conversion (IC-RTC) by building a model from RT measurements from lipid standards that allows the prediction of RT of one lipid subclass based on another. Here, we utilize ECN modeling and IC-RTC to build a glycerophospholipid RT library with 517 entries based on 136 tandem mass spectrometry-characterized lipid RTs from NIST SRM-1950 plasma and lipid standards. The library was tested on a patient cohort undergoing coronary artery bypass grafting surgery (n = 37). A total of 156 unique circulating glycerophospholipids were identified, of which 52 (1 LPG, 24 PE, 5 PG, 18 PI, and 9 PS) were detected with IC-RTC, thereby demonstrating the utility of this technique for the identification of lipid species not found in commercial standards.
Collapse
Affiliation(s)
- Jake B White
- Adelaide Medical School, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide 5000, South Australia, Australia.,Vascular Research Centre, South Australian Health and Medical Research Institute, Adelaide 5000, South Australia, Australia.,Proteomics, Metabolomics and MS-Imaging Core Facility, South Australian Health and Medical Research Institute, Adelaide 5000, South Australia, Australia
| | - Paul J Trim
- Adelaide Medical School, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide 5000, South Australia, Australia.,Proteomics, Metabolomics and MS-Imaging Core Facility, South Australian Health and Medical Research Institute, Adelaide 5000, South Australia, Australia
| | - Thalia Salagaras
- Vascular Research Centre, South Australian Health and Medical Research Institute, Adelaide 5000, South Australia, Australia
| | - Aaron Long
- Adelaide Medical School, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide 5000, South Australia, Australia
| | - Peter J Psaltis
- Adelaide Medical School, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide 5000, South Australia, Australia.,Vascular Research Centre, South Australian Health and Medical Research Institute, Adelaide 5000, South Australia, Australia
| | - Johan W Verjans
- Adelaide Medical School, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide 5000, South Australia, Australia.,Vascular Research Centre, South Australian Health and Medical Research Institute, Adelaide 5000, South Australia, Australia
| | - Marten F Snel
- Adelaide Medical School, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide 5000, South Australia, Australia.,Proteomics, Metabolomics and MS-Imaging Core Facility, South Australian Health and Medical Research Institute, Adelaide 5000, South Australia, Australia
| |
Collapse
|
37
|
Caldeweyher E, Bauer C, Tehrani AS. An open-source framework for fast-yet-accurate calculation of quantum mechanical features. Phys Chem Chem Phys 2022; 24:10599-10610. [DOI: 10.1039/d2cp01165d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
We present the open-source framework kallisto that enables the efficient and robust calculation of quantum mechanical features for atoms and molecules. For a benchmark set of 49 experimental molecular polarizabilities,...
Collapse
|
38
|
Tian Z, Liu F, Li D, Fernie AR, Chen W. Strategies for structure elucidation of small molecules based on LC–MS/MS data from complex biological samples. Comput Struct Biotechnol J 2022; 20:5085-5097. [PMID: 36187931 PMCID: PMC9489805 DOI: 10.1016/j.csbj.2022.09.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 09/03/2022] [Accepted: 09/03/2022] [Indexed: 11/06/2022] Open
Abstract
LC–MS/MS is a major analytical platform for metabolomics, which has become a recent hotspot in the research fields of life and environmental sciences. By contrast, structure elucidation of small molecules based on LC–MS/MS data remains a major challenge in the chemical and biological interpretation of untargeted metabolomics datasets. In recent years, several strategies for structure elucidation using LC–MS/MS data from complex biological samples have been proposed, these strategies can be simply categorized into two types, one based on structure annotation of mass spectra and for the other on retention time prediction. These strategies have helped many scientists conduct research in metabolite-related fields and are indispensable for the development of future tools. Here, we summarized the characteristics of the current tools and strategies for structure elucidation of small molecules based on LC–MS/MS data, and further discussed the directions and perspectives to improve the power of the tools or strategies for structure elucidation.
Collapse
|
39
|
Kensert A, Bouwmeester R, Efthymiadis K, Van Broeck P, Desmet G, Cabooter D. Graph Convolutional Networks for Improved Prediction and Interpretability of Chromatographic Retention Data. Anal Chem 2021; 93:15633-15641. [PMID: 34780168 DOI: 10.1021/acs.analchem.1c02988] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Machine learning is a popular technique to predict the retention times of molecules based on descriptors. Descriptors and associated labels (e.g., retention times) of a set of molecules can be used to train a machine learning algorithm. However, descriptors are fixed molecular features which are not necessarily optimized for the given machine learning problem (e.g., to predict retention times). Recent advances in molecular machine learning make use of so-called graph convolutional networks (GCNs) to learn molecular representations from atoms and their bonds to adjacent atoms to optimize the molecular representation for the given problem. In this study, two GCNs were implemented to predict the retention times of molecules for three different chromatographic data sets and compared to seven benchmarks (including two state-of-the art machine learning models). Additionally, saliency maps were computed from trained GCNs to better interpret the importance of certain molecular sub-structures in the data sets. Based on the overall observations of this study, the GCNs performed better than all benchmarks, either significantly outperforming them (5-25% lower mean absolute error) or performing similar to them (<5% difference). Saliency maps revealed a significant difference in molecular sub-structures that are important for predictions of different chromatographic data sets (reversed-phase liquid chromatography vs hydrophilic interaction liquid chromatography).
Collapse
Affiliation(s)
- Alexander Kensert
- Department for Pharmaceutical and Pharmacological Sciences, University of Leuven (KU Leuven), Pharmaceutical Analysis, Herestraat 49, Leuven 3000, Belgium.,Department of Chemical Engineering, Vrije Universiteit Brussel, Pleinlaan 2, Brussel 1050, Belgium
| | - Robbin Bouwmeester
- VIB, VIB-UGent Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, Gent 9052, Belgium.,Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, Gent 9052, Belgium
| | - Kyriakos Efthymiadis
- Department for Pharmaceutical and Pharmacological Sciences, University of Leuven (KU Leuven), Pharmaceutical Analysis, Herestraat 49, Leuven 3000, Belgium.,Department of Computer Science, Artificial Intelligence Lab, Vrije Universiteit Brussel, Pleinlaan 9, Brussel 1050, Belgium
| | - Peter Van Broeck
- Department of Pharmaceutical Development and Manufacturing Sciences, Janssen Pharmaceutica, Turnhoutseweg 30, Beerse 2340, Belgium
| | - Gert Desmet
- Department of Chemical Engineering, Vrije Universiteit Brussel, Pleinlaan 2, Brussel 1050, Belgium
| | - Deirdre Cabooter
- Department for Pharmaceutical and Pharmacological Sciences, University of Leuven (KU Leuven), Pharmaceutical Analysis, Herestraat 49, Leuven 3000, Belgium
| |
Collapse
|
40
|
Ju R, Liu X, Zheng F, Lu X, Xu G, Lin X. Deep Neural Network Pretrained by Weighted Autoencoders and Transfer Learning for Retention Time Prediction of Small Molecules. Anal Chem 2021; 93:15651-15658. [PMID: 34780148 DOI: 10.1021/acs.analchem.1c03250] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Retention time (RT) prediction contributes to identification of small molecules measured by high-performance liquid chromatography coupled with high-resolution mass spectrometry. Deep learning algorithms based on big data can enhance the accuracy of RT prediction. But at different chromatographic conditions, RTs of compounds are different, and the number of compounds with known RTs is small in most cases. Therefore, the transfer of big data is necessary. In this work, a strategy using a deep neural network (DNN) pretrained by weighed autoencoders and transfer learning (DNNpwa-TL) was proposed to efficiently predict RTs of compounds. The loss function in the autoencoders was calculated with features weighted by mutual information. Then, a DNN pretrained by weighted autoencoders (DNNpwa) was produced. For other specific chromatographic methods, the transfer learning model DNNpwa-TLs were built through fine-tuning the DNNpwa with the help of some compounds with known RTs to conduct the RT prediction. With the above strategy, a DNNpwa was first built with the METLIN small molecule retention time data set containing 80 038 small molecule compounds. A median relative error of 3.1% and a mean relative error of 4.9% were achieved. Then, 17 data sets from different chromatographic methods were studied, and the results showed that the performance of DNNpwa-TL was better than those of other deep learning models. Besides, DNNpwa-TL outperformed random forest, gradient boost, least absolute shrinkage and selection operator regression, and DNN for most of the 17 data sets. Therefore, DNNpwa-TL can provide an efficient method to perform RT prediction of small molecule compounds for different chromatographic methods and conditions.
Collapse
Affiliation(s)
- Ran Ju
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Xinyu Liu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
| | - Fujian Zheng
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
| | - Xin Lu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
41
|
Yang Q, Ji H, Fan X, Zhang Z, Lu H. Retention time prediction in hydrophilic interaction liquid chromatography with graph neural network and transfer learning. J Chromatogr A 2021; 1656:462536. [PMID: 34563892 DOI: 10.1016/j.chroma.2021.462536] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 09/02/2021] [Accepted: 09/03/2021] [Indexed: 01/04/2023]
Abstract
The combination of retention time (RT), accurate mass and tandem mass spectra can improve the structural annotation in untargeted metabolomics. However, the incorporation of RT for metabolite identification has received less attention because of the limitation of available RT data, especially for hydrophilic interaction liquid chromatography (HILIC). Here, the Graph Neural Network-based Transfer Learning (GNN-TL) is proposed to train a model for HILIC RTs prediction. The graph neural network was pre-trained using an in silico HILIC RT dataset (pseudo-labeling dataset) with ∼306 K molecules. Then, the weights of dense layers in the pre-trained GNN (pre-GNN) model were fine-tuned by transfer learning using a small number of experimental HILIC RTs from the target chromatographic system. The GNN-TL outperformed the methods in Retip, including the Random Forest (RF), Bayesian-regularized neural network (BRNN), XGBoost, light gradient-boosting machine (LightGBM), and Keras. It achieved the lowest mean absolute error (MAE) of 38.6 s on the test set and 33.4 s on an additional test set. It has the best ability to generalize with a small performance difference between training, test, and additional test sets. Furthermore, the predicted RTs can filter out nearly 60% false positive candidates on average, which is valuable for the identification of compounds complementary to mass spectrometry.
Collapse
Affiliation(s)
- Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Hongchao Ji
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Xiaqiong Fan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China.
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China.
| |
Collapse
|