1
|
von Gerichten J, Saunders K, Bailey MJ, Gethings LA, Onoja A, Geifman N, Spick M. Challenges in Lipidomics Biomarker Identification: Avoiding the Pitfalls and Improving Reproducibility. Metabolites 2024; 14:461. [PMID: 39195557 DOI: 10.3390/metabo14080461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Revised: 08/14/2024] [Accepted: 08/15/2024] [Indexed: 08/29/2024] Open
Abstract
Identification of features with high levels of confidence in liquid chromatography-mass spectrometry (LC-MS) lipidomics research is an essential part of biomarker discovery, but existing software platforms can give inconsistent results, even from identical spectral data. This poses a clear challenge for reproducibility in biomarker identification. In this work, we illustrate the reproducibility gap for two open-access lipidomics platforms, MS DIAL and Lipostar, finding just 14.0% identification agreement when analyzing identical LC-MS spectra using default settings. Whilst the software platforms performed more consistently using fragmentation data, agreement was still only 36.1% for MS2 spectra. This highlights the critical importance of validation across positive and negative LC-MS modes, as well as the manual curation of spectra and lipidomics software outputs, in order to reduce identification errors caused by closely related lipids and co-elution issues. This curation process can be supplemented by data-driven outlier detection in assessing spectral outputs, which is demonstrated here using a novel machine learning approach based on support vector machine regression combined with leave-one-out cross-validation. These steps are essential to reduce the frequency of false positive identifications and close the reproducibility gap, including between software platforms, which, for downstream users such as bioinformaticians and clinicians, can be an underappreciated source of biomarker identification errors.
Collapse
Affiliation(s)
- Johanna von Gerichten
- School of Chemistry and Chemical Engineering, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford GU2 7XH, UK
| | - Kyle Saunders
- School of Chemistry and Chemical Engineering, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford GU2 7XH, UK
| | - Melanie J Bailey
- School of Chemistry and Chemical Engineering, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford GU2 7XH, UK
| | | | - Anthony Onoja
- School of Health Sciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford GU2 7XH, UK
| | - Nophar Geifman
- School of Health Sciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford GU2 7XH, UK
| | - Matt Spick
- School of Health Sciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford GU2 7XH, UK
| |
Collapse
|
2
|
van Herwerden D, Nikolopoulos A, Barron LP, O'Brien JW, Pirok BWJ, Thomas KV, Samanipour S. Exploring the chemical subspace of RPLC: A data driven approach. Anal Chim Acta 2024; 1317:342869. [PMID: 39029998 DOI: 10.1016/j.aca.2024.342869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 06/11/2024] [Accepted: 06/11/2024] [Indexed: 07/21/2024]
Abstract
BACKGROUND The chemical space is comprised of a vast number of possible structures, of which an unknown portion comprises the human and environmental exposome. Such samples are frequently analyzed using non-targeted analysis via liquid chromatography (LC) coupled to high-resolution mass spectrometry often employing a reversed phase (RP) column. However, prior to analysis, the contents of these samples are unknown and could be comprised of thousands of known and unknown chemical constituents. Moreover, it is unknown which part of the chemical space is sufficiently retained and eluted using RPLC. RESULTS We present a generic framework that uses a data driven approach to predict whether molecules fall 'inside', 'maybe' inside, or 'outside' of the RPLC subspace. Firstly, three retention index random forest (RF) regression models were constructed that showed that molecular fingerprints are able to predict RPLC retention behavior. Secondly, these models were used to set up the dataset for building an RPLC RF classification model. The RPLC classification model was able to correctly predict whether a chemical belonged to the RPLC subspace with an accuracy of 92% for the testing set. Finally, applying this model to the 91 737 small molecules (i.e., ≤1 000 Da) in NORMAN SusDat showed that 19.1% fall 'outside' of the RPLC subspace. SIGNIFICANCE AND NOVELTY The RPLC chemical space model provides a major step towards mapping the chemical space and is able to assess whether chemicals can potentially be measured with an RPLC method (i.e., not every RPLC method) or if a different selectivity should be considered. Moreover, knowing which chemicals are outside of the RPLC subspace can assist in reducing potential candidates for library searching and avoid screening for chemicals that will not be present in RPLC data.
Collapse
Affiliation(s)
- Denice van Herwerden
- Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, 1098 XH, the Netherlands.
| | - Alexandros Nikolopoulos
- Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, 1098 XH, the Netherlands
| | - Leon P Barron
- Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, 1098 XH, the Netherlands; MRC Centre for Environment and Health, Environmental Research Group, School of Public Health, Faculty of Medicine, Imperial College London, London, W12 0BZ, United Kingdom
| | - Jake W O'Brien
- Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, 1098 XH, the Netherlands; Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Brisbane, QLD, 4102, Australia
| | - Bob W J Pirok
- Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, 1098 XH, the Netherlands
| | - Kevin V Thomas
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Brisbane, QLD, 4102, Australia
| | - Saer Samanipour
- Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, 1098 XH, the Netherlands; UvA Data Science Center, University of Amsterdam, Amsterdam, 1012 WP, the Netherlands.
| |
Collapse
|
3
|
Hupatz H, Rahu I, Wang WC, Peets P, Palm EH, Kruve A. Critical review on in silico methods for structural annotation of chemicals detected with LC/HRMS non-targeted screening. Anal Bioanal Chem 2024:10.1007/s00216-024-05471-x. [PMID: 39138659 DOI: 10.1007/s00216-024-05471-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 07/22/2024] [Accepted: 07/24/2024] [Indexed: 08/15/2024]
Abstract
Non-targeted screening with liquid chromatography coupled to high-resolution mass spectrometry (LC/HRMS) is increasingly leveraging in silico methods, including machine learning, to obtain candidate structures for structural annotation of LC/HRMS features and their further prioritization. Candidate structures are commonly retrieved based on the tandem mass spectral information either from spectral or structural databases; however, the vast majority of the detected LC/HRMS features remain unannotated, constituting what we refer to as a part of the unknown chemical space. Recently, the exploration of this chemical space has become accessible through generative models. Furthermore, the evaluation of the candidate structures benefits from the complementary empirical analytical information such as retention time, collision cross section values, and ionization type. In this critical review, we provide an overview of the current approaches for retrieving and prioritizing candidate structures. These approaches come with their own set of advantages and limitations, as we showcase in the example of structural annotation of ten known and ten unknown LC/HRMS features. We emphasize that these limitations stem from both experimental and computational considerations. Finally, we highlight three key considerations for the future development of in silico methods.
Collapse
Affiliation(s)
- Henrik Hupatz
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden
- Stockholm University Center for Circular and Sustainable Systems (SUCCeSS), Stockholm University, 106 91, Stockholm, Sweden
| | - Ida Rahu
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden.
| | - Wei-Chieh Wang
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden
| | - Pilleriin Peets
- Institute of Biodiversity, Faculty of Biological Science, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07743, Jena, Germany
| | - Emma H Palm
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, 4367, Belvaux, Luxembourg
| | - Anneli Kruve
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden.
- Stockholm University Center for Circular and Sustainable Systems (SUCCeSS), Stockholm University, 106 91, Stockholm, Sweden.
- Department of Environmental Science, Stockholm University, Svante Arrhenius Väg 8, 114 18, Stockholm, Sweden.
| |
Collapse
|
4
|
Vik D, Pii D, Mudaliar C, Nørregaard-Madsen M, Kontijevskis A. Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns. Sci Rep 2024; 14:8733. [PMID: 38627535 PMCID: PMC11021461 DOI: 10.1038/s41598-024-59620-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 04/12/2024] [Indexed: 04/19/2024] Open
Abstract
This study explores how machine-learning can be used to predict chromatographic retention times (RT) for the analysis of small molecules, with the objective of identifying a machine-learning framework with the robustness required to support a chemical synthesis production platform. We used internally generated data from high-throughput parallel synthesis in context of pharmaceutical drug discovery projects. We tested machine-learning models from the following frameworks: XGBoost, ChemProp, and DeepChem, using a dataset of 7552 small molecules. Our findings show that two specific models, AttentiveFP and ChemProp, performed better than XGBoost and a regular neural network in predicting RT accurately. We also assessed how well these models performed over time and found that molecular graph neural networks consistently gave accurate predictions for new chemical series. In addition, when we applied ChemProp on the publicly available METLIN SMRT dataset, it performed impressively with an average error of 38.70 s. These results highlight the efficacy of molecular graph neural networks, especially ChemProp, in diverse RT prediction scenarios, thereby enhancing the efficiency of chromatographic analysis.
Collapse
Affiliation(s)
- Daniel Vik
- Amgen Research Copenhagen, Amgen Inc., 2100, Copenhagen, Denmark.
| | - David Pii
- Amgen Research Copenhagen, Amgen Inc., 2100, Copenhagen, Denmark
| | - Chirag Mudaliar
- Amgen Research Copenhagen, Amgen Inc., 2100, Copenhagen, Denmark
| | | | | |
Collapse
|
5
|
Siraj A, Bouwmeester R, Declercq A, Welp L, Chernev A, Wulf A, Urlaub H, Martens L, Degroeve S, Kohlbacher O, Sachsenberg T. Intensity and retention time prediction improves the rescoring of protein-nucleic acid cross-links. Proteomics 2024; 24:e2300144. [PMID: 38629965 DOI: 10.1002/pmic.202300144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 12/29/2023] [Accepted: 01/05/2024] [Indexed: 04/19/2024]
Abstract
In protein-RNA cross-linking mass spectrometry, UV or chemical cross-linking introduces stable bonds between amino acids and nucleic acids in protein-RNA complexes that are then analyzed and detected in mass spectra. This analytical tool delivers valuable information about RNA-protein interactions and RNA docking sites in proteins, both in vitro and in vivo. The identification of cross-linked peptides with oligonucleotides of different length leads to a combinatorial increase in search space. We demonstrate that the peptide retention time prediction tasks can be transferred to the task of cross-linked peptide retention time prediction using a simple amino acid composition encoding, yielding improved identification rates when the prediction error is included in rescoring. For the more challenging task of including fragment intensity prediction of cross-linked peptides in the rescoring, we obtain, on average, a similar improvement. Further improvement in the encoding and fine-tuning of retention time and intensity prediction models might lead to further gains, and merit further research.
Collapse
Affiliation(s)
- Arslan Siraj
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
- Institute for Biological and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Robbin Bouwmeester
- Department of Biomolecular Medicine, Ghent University, Gent, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, Gent, Belgium
| | - Arthur Declercq
- Department of Biomolecular Medicine, Ghent University, Gent, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, Gent, Belgium
| | - Luisa Welp
- Bioanalytical Mass Spectrometry, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
- Bioanalytics, Institute of Clinical Chemistry, University Medical Center Göttingen, Göttingen, Germany
| | - Aleksandar Chernev
- Bioanalytical Mass Spectrometry, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Alexander Wulf
- Bioanalytical Mass Spectrometry, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Henning Urlaub
- Bioanalytical Mass Spectrometry, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
- Bioanalytics, Institute of Clinical Chemistry, University Medical Center Göttingen, Göttingen, Germany
| | - Lennart Martens
- Department of Biomolecular Medicine, Ghent University, Gent, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, Gent, Belgium
| | - Sven Degroeve
- Department of Biomolecular Medicine, Ghent University, Gent, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, Gent, Belgium
| | - Oliver Kohlbacher
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
- Institute for Biological and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Timo Sachsenberg
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
- Institute for Biological and Medical Informatics, University of Tübingen, Tübingen, Germany
| |
Collapse
|
6
|
Obradović D, Stavrianidi A, Fedorova E, Bogojević A, Shpigun O, Buryak A, Lazović S. A comparative study of the predictive performance of different descriptor calculation tools: Molecular-based elution order modeling and interpretation of retention mechanism for isomeric compounds from METLIN database. J Chromatogr A 2024; 1719:464731. [PMID: 38377661 DOI: 10.1016/j.chroma.2024.464731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/08/2024] [Accepted: 02/09/2024] [Indexed: 02/22/2024]
Abstract
In the pharmaceutical industry, the need for analytical standards is a bottleneck for comprehensive evaluation and quality control of intermediate and end products. These are complex mixtures containing structurally related molecules. In this regard, chromatographic peak annotation, especially for critical pairs of isomers and closest structural analogs, can be supported by using a Quantitative Structure Retention Relationship (QSRR) approach. In our study, we investigated the fundamental basis of the reversed-phase (RP) retention mechanism for 1141 isomeric compounds from the METLIN SMRT dataset. Nine different descriptor calculation tools combined with different feature selection methods (genetic algorithm (GA), stepwise, Boruta) and machine learning (ML) approaches (support vector machine (SVM), multiple linear regression (MLR), random forest (RF), XGBoost) were applied to provide a reliable molecular structure-based interpretation of RP retention behaviour of the isomeric compounds. Strict internal and external validation metrics were used to select models with the best predictive capabilities (rtest > 0.73, order of elution > 60 %). For the developed models, mean absolute errors were in the range of 60 to 110 s. Stepwise and GA showed the most suitable performance as descriptor selection methods, while SVM and XGBoost modeling gave satisfactory predictive characteristics in most cases. Validation performed on the published experimental data for structurally related pharmaceutical compounds confirmed the best accuracy of MLR modeling in combination with GA feature selection of general physico-chemical properties. The resulting models will be useful for the prediction of separation and identification of structurally related compounds in pharmaceutical analysis, providing a simultaneous understanding of the interaction mechanisms leading to their retention under RP conditions.
Collapse
Affiliation(s)
- Darija Obradović
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, Pregrevica 118, Belgrade 11080, Serbia
| | - Andrey Stavrianidi
- Chemistry Department, Lomonosov Moscow State University, 1/3 Leninskie Gory, GSP-1, Moscow 119991, Russia; A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, Moscow 119071, Russia.
| | - Elizaveta Fedorova
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, Moscow 119071, Russia
| | - Aleksandar Bogojević
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, Pregrevica 118, Belgrade 11080, Serbia
| | - Oleg Shpigun
- Chemistry Department, Lomonosov Moscow State University, 1/3 Leninskie Gory, GSP-1, Moscow 119991, Russia
| | - Aleksey Buryak
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, Moscow 119071, Russia
| | - Saša Lazović
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, Pregrevica 118, Belgrade 11080, Serbia
| |
Collapse
|
7
|
Xue J, Wang B, Ji H, Li W. RT-Transformer: retention time prediction for metabolite annotation to assist in metabolite identification. Bioinformatics 2024; 40:btae084. [PMID: 38402516 PMCID: PMC10914443 DOI: 10.1093/bioinformatics/btae084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 01/14/2024] [Accepted: 02/22/2024] [Indexed: 02/26/2024] Open
Abstract
MOTIVATION Liquid chromatography retention times prediction can assist in metabolite identification, which is a critical task and challenge in nontargeted metabolomics. However, different chromatographic conditions may result in different retention times for the same metabolite. Current retention time prediction methods lack sufficient scalability to transfer from one specific chromatographic method to another. RESULTS Therefore, we present RT-Transformer, a novel deep neural network model coupled with graph attention network and 1D-Transformer, which can predict retention times under any chromatographic methods. First, we obtain a pre-trained model by training RT-Transformer on the large small molecule retention time dataset containing 80 038 molecules, and then transfer the resulting model to different chromatographic methods based on transfer learning. When tested on the small molecule retention time dataset, as other authors did, the average absolute error reached 27.30 after removing not retained molecules. Still, it reached 33.41 when no samples were removed. The pre-trained RT-Transformer was further transferred to 5 datasets corresponding to different chromatographic conditions and fine-tuned. According to the experimental results, RT-Transformer achieves competitive performance compared to state-of-the-art methods. In addition, RT-Transformer was applied to 41 external molecular retention time datasets. Extensive evaluations indicate that RT-Transformer has excellent scalability in predicting retention times for liquid chromatography and improves the accuracy of metabolite identification. AVAILABILITY AND IMPLEMENTATION The source code for the model is available at https://github.com/01dadada/RT-Transformer. The web server is available at https://huggingface.co/spaces/Xue-Jun/RT-Transformer.
Collapse
Affiliation(s)
- Jun Xue
- School of Information Science and Engineering, Yunnan University, Kunming, Yunnan 650500, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Bingyi Wang
- Yunnan Police College, Kunming, Yunnan 650223, China
- Key Laboratory of Smart Drugs Control (Yunnan Police College), Ministry of Education, Kunming, Yunnan 650223, China
| | - Hongchao Ji
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - WeiHua Li
- School of Information Science and Engineering, Yunnan University, Kunming, Yunnan 650500, China
| |
Collapse
|
8
|
Allwright M, Guennewig B, Hoffmann AE, Rohleder C, Jieu B, Chung LH, Jiang YC, Lemos Wimmer BF, Qi Y, Don AS, Leweke FM, Couttas TA. ReTimeML: a retention time predictor that supports the LC-MS/MS analysis of sphingolipids. Sci Rep 2024; 14:4375. [PMID: 38388524 PMCID: PMC10883992 DOI: 10.1038/s41598-024-53860-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 02/06/2024] [Indexed: 02/24/2024] Open
Abstract
The analysis of ceramide (Cer) and sphingomyelin (SM) lipid species using liquid chromatography-tandem mass spectrometry (LC-MS/MS) continues to present challenges as their precursor mass and fragmentation can correspond to multiple molecular arrangements. To address this constraint, we developed ReTimeML, a freeware that automates the expected retention times (RTs) for Cer and SM lipid profiles from complex chromatograms. ReTimeML works on the principle that LC-MS/MS experiments have pre-determined RTs from internal standards, calibrators or quality controls used throughout the analysis. Employed as reference RTs, ReTimeML subsequently extrapolates the RTs of unknowns using its machine-learned regression library of mass-to-charge (m/z) versus RT profiles, which does not require model retraining for adaptability on different LC-MS/MS pipelines. We validated ReTimeML RT estimations for various Cer and SM structures across different biologicals, tissues and LC-MS/MS setups, exhibiting a mean variance between 0.23 and 2.43% compared to user annotations. ReTimeML also aided the disambiguation of SM identities from isobar distributions in paired serum-cerebrospinal fluid from healthy volunteers, allowing us to identify a series of non-canonical SMs associated between the two biofluids comprised of a polyunsaturated structure that confers increased stability against catabolic clearance.
Collapse
Affiliation(s)
- Michael Allwright
- ForeFront, Brain and Mind Centre, The University of Sydney, Sydney, Australia
| | - Boris Guennewig
- ForeFront, Brain and Mind Centre, The University of Sydney, Sydney, Australia
| | - Anna E Hoffmann
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Endosane Pharmaceuticals GmbH, Berlin, Germany
| | - Cathrin Rohleder
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Endosane Pharmaceuticals GmbH, Berlin, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Beverly Jieu
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Long H Chung
- Centenary Institute, The University of Sydney, Sydney, Australia
| | - Yingxin C Jiang
- Centenary Institute, The University of Sydney, Sydney, Australia
| | - Bruno F Lemos Wimmer
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Yanfei Qi
- Centenary Institute, The University of Sydney, Sydney, Australia
| | - Anthony S Don
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - F Markus Leweke
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Endosane Pharmaceuticals GmbH, Berlin, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Timothy A Couttas
- Translational Research Collective, Brain and Mind Centre, The University of Sydney, Sydney, NSW, 2006, Australia.
| |
Collapse
|
9
|
Kehl N, Gessner A, Maas R, Fromm MF, Taudte RV. A supervised machine-learning approach for the efficient development of a multi method (LC-MS) for a large number of drugs and subsets thereof: focus on oral antitumor agents. Clin Chem Lab Med 2024; 62:293-302. [PMID: 37606251 DOI: 10.1515/cclm-2023-0468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 07/31/2023] [Indexed: 08/23/2023]
Abstract
OBJECTIVES Accumulating evidence argues for a more widespread use of therapeutic drug monitoring (TDM) to support individualized medicine, especially for therapies where toxicity and efficacy are critical issues, such as in oncology. However, development of TDM assays struggles to keep pace with the rapid introduction of new drugs. Therefore, novel approaches for faster assay development are needed that also allow effortless inclusion of newly approved drugs as well as customization to smaller subsets if scientific or clinical situations require. METHODS We applied and evaluated two machine-learning approaches i.e., a regression-based approach and an artificial neural network (ANN) to retention time (RT) prediction for efficient development of a liquid chromatography mass spectrometry (LC-MS) method quantifying 73 oral antitumor drugs (OADs) and five active metabolites. Individual steps included training, evaluation, comparison, and application of the superior approach to RT prediction, followed by stipulation of the optimal gradient. RESULTS Both approaches showed excellent results for RT prediction (mean difference ± standard deviation: 2.08 % ± 9.44 % ANN; 1.78 % ± 1.93 % regression-based approach). Using the regression-based approach, the optimum gradient (4.91 % MeOH/min) was predicted with a total run time of 17.92 min. The associated method was fully validated following FDA and EMA guidelines. Exemplary modification and application of the regression-based approach to a subset of 14 uro-oncological agents resulted in a considerably shortened run time of 9.29 min. CONCLUSIONS Using a regression-based approach, a multi drug LC-MS assay for RT prediction was efficiently developed, which can be easily expanded to newly approved OADs and customized to smaller subsets if required.
Collapse
Affiliation(s)
- Niklas Kehl
- Institute of Experimental and Clinical Pharmacology and Toxicology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Arne Gessner
- Institute of Experimental and Clinical Pharmacology and Toxicology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Renke Maas
- Institute of Experimental and Clinical Pharmacology and Toxicology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
- FAU NeW - Research Center for New Bioactive Compounds, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Martin F Fromm
- Institute of Experimental and Clinical Pharmacology and Toxicology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
- FAU NeW - Research Center for New Bioactive Compounds, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - R Verena Taudte
- Institute of Experimental and Clinical Pharmacology and Toxicology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
- Core Facility for Metabolomics, Department of Medicine, Philipps-Universität Marburg, 35043 Marburg, Germany
| |
Collapse
|
10
|
Torigoe T, Takahashi M, Heravizadeh O, Ikeda K, Nakatani K, Bamba T, Izumi Y. Predicting Retention Time in Unified-Hydrophilic-Interaction/Anion-Exchange Liquid Chromatography High-Resolution Tandem Mass Spectrometry (Unified-HILIC/AEX/HRMS/MS) for Comprehensive Structural Annotation of Polar Metabolome. Anal Chem 2024; 96:1275-1283. [PMID: 38186224 DOI: 10.1021/acs.analchem.3c04618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
The accuracy of the structural annotation of unidentified peaks obtained in metabolomic analysis using liquid chromatography/tandem mass spectrometry (LC/MS/MS) can be enhanced using retention time (RT) information as well as precursor and product ions. Unified-hydrophilic-interaction/anion-exchange liquid chromatography high-resolution tandem mass spectrometry (unified-HILIC/AEX/HRMS/MS) has been recently developed as an innovative method ideal for nontargeted polar metabolomics. However, the RT prediction for unified-HILIC/AEX has not been developed because of the complex separation mechanism characterized by the continuous transition of the separation modes from HILIC to AEX. In this study, we propose an RT prediction model of unified-HILIC/AEX/HRMS/MS, which enables the comprehensive structural annotation of polar metabolites. With training data for 203 polar metabolites, we ranked the feature importance using a random forest among 12,420 molecular descriptors (MDs) and constructed an RT prediction model with 26 selected MDs. The accuracy of the RT model was evaluated using test data for 51 polar metabolites, and 86.3% of the ΔRTs (difference between measured and predicted RTs) were within ±1.50 min, with a mean absolute error of 0.80 min, indicating high RT prediction accuracy. Nontargeted metabolomic data from the NIST SRM 1950-Metabolites in frozen human plasma were analyzed using the developed RT model and in silico MS/MS prediction, resulting in a successful structural estimation of 216 polar metabolites, in addition to the 62 identified based on standards. The proposed model can help accelerate the structural annotation of unknown hydrophilic metabolites, which is a key issue in metabolomic research.
Collapse
Affiliation(s)
- Taihei Torigoe
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Masatomo Takahashi
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
- Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Omidreza Heravizadeh
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Kazuki Ikeda
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Kohta Nakatani
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
- Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Takeshi Bamba
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
- Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Yoshihiro Izumi
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
- Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| |
Collapse
|
11
|
Kwon Y, Kwon H, Han J, Kang M, Kim JY, Shin D, Choi YS, Kang S. Retention Time Prediction through Learning from a Small Training Data Set with a Pretrained Graph Neural Network. Anal Chem 2023; 95:17273-17283. [PMID: 37955847 DOI: 10.1021/acs.analchem.3c03177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2023]
Abstract
Graph neural networks (GNNs) have shown remarkable performance in predicting the retention time (RT) for small molecules. However, the training data set for a particular target chromatographic system tends to exhibit scarcity, which poses a challenge because the experimental process for measuring RT is costly. To address this challenge, transfer learning has been used to leverage an abundant training data set from a related source task. In this study, we present an improved transfer learning method to better predict the RT of molecules for a target chromatographic system by learning from a small training data set with a pretrained GNN. We use a graph isomorphism network as the architecture of the GNN. The GNN is pretrained on the METLIN-SMRT data set and is then fine-tuned on the target training data set for a fixed number of training iterations using the limited-memory Broyden-Fletcher-Goldfarb-Shanno optimizer with a learning rate decay. We demonstrate that the proposed method achieves superior predictive performance on various chromatographic systems compared with that of the existing transfer learning methods, especially when only a small training data set is available for use. A potential avenue for future research is to leverage multiple small training data sets from different chromatographic systems to further enhance the generalization performance.
Collapse
Affiliation(s)
- Youngchun Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Hyukju Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
- Department of Chemistry, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Jongmin Han
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Myeonginn Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Ji-Yeong Kim
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Dongyeeb Shin
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Youn-Suk Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Seokho Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| |
Collapse
|
12
|
Kang Q, Fang P, Zhang S, Qiu H, Lan Z. Deep graph convolutional network for small-molecule retention time prediction. J Chromatogr A 2023; 1711:464439. [PMID: 37865024 DOI: 10.1016/j.chroma.2023.464439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/04/2023] [Accepted: 10/06/2023] [Indexed: 10/23/2023]
Abstract
The retention time (RT) is a crucial source of data for liquid chromatography-mass spectrometry (LCMS). A model that can accurately predict the RT for each molecule would empower filtering candidates with similar spectra but differing RT in LCMS-based molecule identification. Recent research shows that graph neural networks (GNNs) outperform traditional machine learning algorithms in RT prediction. However, all of these models use relatively shallow GNNs. This study for the first time investigates how depth affects GNNs' performance on RT prediction. The results demonstrate that a notable improvement can be achieved by pushing the depth of GNNs to 16 layers by the adoption of residual connection. Additionally, we also find that graph convolutional network (GCN) model benefits from the edge information. The developed deep graph convolutional network, DeepGCN-RT, significantly outperforms the previous state-of-the-art method and achieves the lowest mean absolute percentage error (MAPE) of 3.3% and the lowest mean absolute error (MAE) of 26.55 s on the SMRT test set. We also finetune DeepGCN-RT on seven datasets with various chromatographic conditions. The mean MAE of the seven datasets largely decreases 30% compared to previous state-of-the-art method. On the RIKEN-PlaSMA dataset, we also test the effectiveness of DeepGCN-RT in assisting molecular structure identification. By 30% lessening the number of potential structures, DeepGCN-RT is able to improve top-1 accuracy by about 11%.
Collapse
Affiliation(s)
- Qiyue Kang
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China.
| | - Pengfei Fang
- School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, 210096, China
| | - Shuai Zhang
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China
| | - Huachuan Qiu
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China
| | - Zhenzhong Lan
- School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China.
| |
Collapse
|
13
|
Chen B, Wang C, Fu Z, Yu H, Liu E, Gao X, Li J, Han L. RT-Ensemble Pred: A tool for retention time prediction of metabolites on different LC-MS systems. J Chromatogr A 2023; 1707:464304. [PMID: 37611386 DOI: 10.1016/j.chroma.2023.464304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 08/15/2023] [Indexed: 08/25/2023]
Abstract
Liquid chromatography-mass spectrometry (LC-MS) could provide a large amount of information to assist in metabolites identification. Different liquid chromatographic methods (CMs) could produce different retention times to the same metabolite. To predict the retention time of local dataset by online datasets has become a trend, but the datasets downloaded from different databases were differences in quantity levels. And the imbalanced data could produce bad influence in model prediction. Thus, based on quantitative structure-retention relationships (QSRRs), an ensemble model, named RT-Ensemble Pred, has been successfully built to predict retention time of different LC-MS systems in this study. A total of 76, 807 metabolites (76, 909 retention times) have been collected across 9 CMs, and 19 natural products and 1 antifungal drug (20 retention times) have been collected to test the model applicability. An ensemble sampling was applied for the preprocessing procedure to solve the problem of imbalanced data. Based on the ensemble sampling, RT-Ensemble Pred could better utilize online datasets for the prediction of retention time. RT-Ensemble Pred was built based on the online datasets and tested by local dataset. The predictive accuracy of RT-Ensemble Pred was higher than the models without any sampling methods. The results showed that RT-Ensemble Pred could predict the metabolites which was not included in the database and the metabolites which were from new CMs. It could also be used for the prediction of other compounds beside metabolites. Furthermore, a tool of RT-Ensemble Pred was packed and can be freely downloaded at https://gitlab.com/mikic93/rt-ensemble-pred. It provides convenience for the users who need to predict the retention time of metabolites.
Collapse
Affiliation(s)
- Biying Chen
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Jinghai, Tianjin 301617, PR China
| | - Chenxi Wang
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Jinghai, Tianjin 301617, PR China
| | - Zhifei Fu
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Jinghai, Tianjin 301617, PR China
| | - Haiyang Yu
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Jinghai, Tianjin 301617, PR China
| | - Erwei Liu
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Jinghai, Tianjin 301617, PR China
| | - Xiumei Gao
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Jinghai, Tianjin 301617, PR China
| | - Jie Li
- Tianjin Key Laboratory of Clinical Multi-omics, Airport Economy Zone, Tianjin, China.
| | - Lifeng Han
- State Key Laboratory of Component-based Chinese Medicine, Haihe Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, 10 Poyanghu Road, Jinghai, Tianjin 301617, PR China.
| |
Collapse
|
14
|
Sholokhova AY, Matyushin DD, Grinevich OI, Borovikova SA, Buryak AK. Intelligent Workflow and Software for Non-Target Analysis of Complex Samples Using a Mixture of Toxic Transformation Products of Unsymmetrical Dimethylhydrazine as an Example. Molecules 2023; 28:molecules28083409. [PMID: 37110641 PMCID: PMC10143382 DOI: 10.3390/molecules28083409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 04/05/2023] [Accepted: 04/10/2023] [Indexed: 04/29/2023] Open
Abstract
Unsymmetrical dimethylhydrazine (UDMH) is a widely used rocket propellant. Entering the environment or being stored in uncontrolled conditions, UDMH easily forms an enormous variety (at least many dozens) of transformation products. Environmental pollution by UDMH and its transformation products is a major problem in many countries and across the Arctic region. Unfortunately, previous works often use only electron ionization mass spectrometry with a library search, or they consider only the molecular formula to propose the structures of new products. This is quite an unreliable approach. It was demonstrated that a newly proposed artificial intelligence-based workflow allows for the proposal of structures of UDMH transformation products with a greater degree of certainty. The presented free and open-source software with a convenient graphical user interface facilitates the non-target analysis of industrial samples. It has bundled machine learning models for the prediction of retention indices and mass spectra. A critical analysis of whether a combination of several methods of chromatography and mass spectrometry allows us to elucidate the structure of an unknown UDMH transformation product was provided. It was demonstrated that the use of gas chromatographic retention indices for two stationary phases (polar and non-polar) allows for the rejection of false candidates in many cases when only one retention index is not enough. The structures of five previously unknown UDMH transformation products were proposed, and four previously proposed structures were refined.
Collapse
Affiliation(s)
- Anastasia Yu Sholokhova
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia
| | - Dmitriy D Matyushin
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia
| | - Oksana I Grinevich
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia
| | - Svetlana A Borovikova
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia
| | - Aleksey K Buryak
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia
| |
Collapse
|
15
|
Tupertsev B, Osipenko S, Kireev A, Nikolaev E, Kostyukevich Y. Simple In Vitro 18O Labeling for Improved Mass Spectrometry-Based Drug Metabolites Identification: Deep Drug Metabolism Study. Int J Mol Sci 2023; 24:ijms24054569. [PMID: 36902002 PMCID: PMC10002766 DOI: 10.3390/ijms24054569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 02/21/2023] [Accepted: 02/23/2023] [Indexed: 03/03/2023] Open
Abstract
The identification of drug metabolites formed with different in vitro systems by HPLC-MS is a standard step in preclinical research. In vitro systems allow modeling of real metabolic pathways of a drug candidate. Despite the emergence of various software and databases, identification of compounds is still a complex task. Measurement of the accurate mass, correlation of chromatographic retention times and fragmentation spectra are often insufficient for identification of compounds especially in the absence of reference materials. Metabolites can "slip under the nose", since it is often not possible to reliably confirm that a signal belongs to a metabolite and not to other compounds in complex systems. Isotope labeling has proved to be a tool that aids in small molecule identification. The introduction of heavy isotopes is done with isotope exchange reactions or with complicated synthetic schemes. Here, we present an approach based on the biocatalytic insertion of oxygen-18 isotope under the action of liver microsomes enzymes in the presence of 18O2. Using the local anesthetic bupivacaine as an example, more than 20 previously unknown metabolites were reliably discovered and annotated in the absence of the reference materials. In combination with high-resolution mass spectrometry and modern methods of mass spectrometric metabolism data processing, we demonstrated the ability of the proposed approach to increase the degree of confidence in interpretating metabolism data.
Collapse
Affiliation(s)
- Boris Tupertsev
- Center of Molecular and Cellular Biology (CMCB), Skolkovo Institute of Science and Technology, Nobel Str., 3, 121205 Moscow, Russia
- Moscow Institute of Physics and Technology, Phystech School of Biological and Medical Physics, Institutskiy per., 9, Dolgoprudny, 141701 Moscow, Russia
| | - Sergey Osipenko
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Nobel Str., 3, 121205 Moscow, Russia
| | - Albert Kireev
- Center of Molecular and Cellular Biology (CMCB), Skolkovo Institute of Science and Technology, Nobel Str., 3, 121205 Moscow, Russia
| | - Eugene Nikolaev
- Center of Molecular and Cellular Biology (CMCB), Skolkovo Institute of Science and Technology, Nobel Str., 3, 121205 Moscow, Russia
| | - Yury Kostyukevich
- Center of Molecular and Cellular Biology (CMCB), Skolkovo Institute of Science and Technology, Nobel Str., 3, 121205 Moscow, Russia
- Correspondence:
| |
Collapse
|
16
|
Wang X, Zheng F, Sheng M, Xu G, Lin X. Retention time prediction for small samples based on integrating molecular representations and adaptive network. J Chromatogr B Analyt Technol Biomed Life Sci 2023; 1217:123624. [PMID: 36780745 DOI: 10.1016/j.jchromb.2023.123624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Revised: 01/13/2023] [Accepted: 01/27/2023] [Indexed: 02/07/2023]
Abstract
Retention time (RT) can provide orthogonal information different from that of mass spectrometry and contribute to identifying compounds. Many machine learning methods have been developed and applied to RT prediction. In application, the training data size is usually small in most chromatography systems. To enhance the performance of RT prediction, this study proposes a RT prediction method based on multi-data combinations and adaptive neural network (MDC-ANN). MDC-ANN establishes the RT prediction model for the target chromatographic system through transfer learning and a base deep learning model trained on a big dataset. It selects the optimal molecular representation combination from the multiple input candidates and automatically determines the neural network structure according to the determined input combination. MDC-ANN was compared with two new efficient deep learning methods, three transferring methods and four popular machine learning methods on 14 small datasets and showed advantages in MAE, MedAE, MRE and R2 in most cases. The experiment results illustrated that integrating multiple molecular representations can provide more information, improve the performance of RT prediction and contribute to compound annotation, different chromatographic systems may use different molecular representation combinations to obtain good RT prediction performance. Hence, MDC-ANN which automatically determines the best combination of molecular representations for a specific system is promising for predicting RTs accurately in real applications.
Collapse
Affiliation(s)
- Xiaoxiao Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Fujian Zheng
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, Liaoning, China.
| | - Meizhen Sheng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, Liaoning, China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China.
| |
Collapse
|
17
|
Retention Time Prediction with Message-Passing Neural Networks. SEPARATIONS 2022. [DOI: 10.3390/separations9100291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023] Open
Abstract
Retention time prediction, facilitated by advances in machine learning, has become a useful tool in untargeted LC-MS applications. State-of-the-art approaches include graph neural networks and 1D-convolutional neural networks that are trained on the METLIN small molecule retention time dataset (SMRT). These approaches demonstrate accurate predictions comparable with the experimental error for the training set. The weak point of retention time prediction approaches is the transfer of predictions to various systems. The accuracy of this step depends both on the method of mapping and on the accuracy of the general model trained on SMRT. Therefore, improvements to both parts of prediction workflows may lead to improved compound annotations. Here, we evaluate capabilities of message-passing neural networks (MPNN) that have demonstrated outstanding performance on many chemical tasks to accurately predict retention times. The model was initially trained on SMRT, providing mean and median absolute cross-validation errors of 32 and 16 s, respectively. The pretrained MPNN was further fine-tuned on five publicly available small reversed-phase retention sets in a transfer learning mode and demonstrated up to 30% improvement of prediction accuracy for these sets compared with the state-of-the-art methods. We demonstrated that filtering isomeric candidates by predicted retention with the thresholds obtained from ROC curves eliminates up to 50% of false identities.
Collapse
|
18
|
Convolutional Neural Network-Based Compound Fingerprint Prediction for Metabolite Annotation. Metabolites 2022; 12:metabo12070605. [PMID: 35888729 PMCID: PMC9316655 DOI: 10.3390/metabo12070605] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/17/2022] [Accepted: 06/21/2022] [Indexed: 11/23/2022] Open
Abstract
Metabolite annotation has been a challenging issue especially in untargeted metabolomics studies by liquid chromatography coupled with mass spectrometry (LC-MS). This is in part due to the limitations of publicly available spectral libraries, which consist of tandem mass spectrometry (MS/MS) data acquired from just a fraction of known metabolites. Machine learning provides the opportunity to predict molecular fingerprints based on MS/MS data. The predicted molecular fingerprints can then be used to help rank putative metabolite IDs obtained by using either the precursor mass or the formula of the unknown metabolite. This method is particularly useful to help annotate metabolites whose corresponding MS/MS spectra are missing or cannot be matched with those in accessible spectral libraries. We investigated a convolutional neural network (CNN) for molecular fingerprint prediction based on data acquired by MS/MS. We used more than 680,000 MS/MS spectra obtained from the MoNA repository and NIST 20, representing about 36,000 compounds for training and testing our CNN model. The trained CNN model is implemented as a python package, MetFID. The package is available on GitHub for users to enter their MS/MS spectra and corresponding putative metabolite IDs to obtain ranked lists of metabolites. Better performance is achieved by MetFID in ranking putative metabolite IDs using the CASMI 2016 benchmark dataset compared to two other machine learning-based tools (CSI:FingerID and ChemDistiller).
Collapse
|
19
|
Tian Z, Liu F, Li D, Fernie AR, Chen W. Strategies for structure elucidation of small molecules based on LC–MS/MS data from complex biological samples. Comput Struct Biotechnol J 2022; 20:5085-5097. [PMID: 36187931 PMCID: PMC9489805 DOI: 10.1016/j.csbj.2022.09.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 09/03/2022] [Accepted: 09/03/2022] [Indexed: 11/06/2022] Open
Abstract
LC–MS/MS is a major analytical platform for metabolomics, which has become a recent hotspot in the research fields of life and environmental sciences. By contrast, structure elucidation of small molecules based on LC–MS/MS data remains a major challenge in the chemical and biological interpretation of untargeted metabolomics datasets. In recent years, several strategies for structure elucidation using LC–MS/MS data from complex biological samples have been proposed, these strategies can be simply categorized into two types, one based on structure annotation of mass spectra and for the other on retention time prediction. These strategies have helped many scientists conduct research in metabolite-related fields and are indispensable for the development of future tools. Here, we summarized the characteristics of the current tools and strategies for structure elucidation of small molecules based on LC–MS/MS data, and further discussed the directions and perspectives to improve the power of the tools or strategies for structure elucidation.
Collapse
|