1
|
Kneipp J, Seifert S, Gärber F. SERS microscopy as a tool for comprehensive biochemical characterization in complex samples. Chem Soc Rev 2024; 53:7641-7656. [PMID: 38934892 DOI: 10.1039/d4cs00460d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024]
Abstract
Surface enhanced Raman scattering (SERS) spectra of biomaterials such as cells or tissues can be used to obtain biochemical information from nanoscopic volumes in these heterogeneous samples. This tutorial review discusses the factors that determine the outcome of a SERS experiment in complex bioorganic samples. They are related to the SERS process itself, the possibility to selectively probe certain regions or constituents of a sample, and the retrieval of the vibrational information in order to identify molecules and their interaction. After introducing basic aspects of SERS experiments in the context of biocompatible environments, spectroscopy in typical microscopic settings is exemplified, including the possibilities to combine SERS with other linear and non-linear microscopic tools, and to exploit approaches that improve lateral and temporal resolution. In particular the great variation of data in a SERS experiment calls for robust data analysis tools. Approaches will be introduced that have been originally developed in the field of bioinformatics for the application to omics data and that show specific potential in the analysis of SERS data. They include the use of simulated data and machine learning tools that can yield chemical information beyond achieving spectral classification.
Collapse
Affiliation(s)
- Janina Kneipp
- Department of Chemistry, Humboldt-Universität zu Berlin, Brook-Taylor-Str. 2, 12489 Berlin, Germany.
| | - Stephan Seifert
- Hamburg School of Food Science, Department of Chemistry, Universität Hamburg, Grindelallee 117, 20146 Hamburg, Germany
| | - Florian Gärber
- Hamburg School of Food Science, Department of Chemistry, Universität Hamburg, Grindelallee 117, 20146 Hamburg, Germany
| |
Collapse
|
2
|
Lösel H, Arndt M, Wenck S, Hansen L, Oberpottkamp M, Seifert S, Fischer M. Exploring the potential of high-resolution LC-MS in combination with ion mobility separation and surrogate minimal depth for enhanced almond origin authentication. Talanta 2024; 271:125598. [PMID: 38224656 DOI: 10.1016/j.talanta.2023.125598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/20/2023] [Accepted: 12/22/2023] [Indexed: 01/17/2024]
Abstract
Almonds (Prunus dulcisMill.) are consumed worldwide and their geographical origin plays a crucial role in determining their market value. In the present study, a total of 250 almond reference samples from six countries (Australia, Spain, Iran, Italy, Morocco, and the USA) were non-polar extracted and analyzed by UPLC-ESI-IM-qToF-MS. Four harvest periods, more than 30 different varieties, including both sweet and bitter almonds, were considered in the method development. Principal component analysis showed that there are three groups of samples with similarities: Australia/USA, Spain/Italy and Iran/Morocco. For origin determination, a random forest achieved an accuracy of 88.8 %. Misclassifications occurred mainly between almonds from the USA and Australia, due to similar varieties and similar external influences such as climate conditions. Metabolites relevant for classification were selected using Surrogate Minimal Depth, with triacylglycerides containing oxidized, odd chained or short chained fatty acids and some phospholipids proven to be the most suitable marker substances. Our results show that focusing on the identified lipids (e. g., using a QqQ-MS instrument) is a promising approach to transfer the origin determination of almonds to routine analysis.
Collapse
Affiliation(s)
- Henri Lösel
- Hamburg School of Food Science - Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146, Hamburg, Germany
| | - Maike Arndt
- Hamburg School of Food Science - Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146, Hamburg, Germany
| | - Soeren Wenck
- Hamburg School of Food Science - Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146, Hamburg, Germany
| | - Lasse Hansen
- Hamburg School of Food Science - Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146, Hamburg, Germany
| | - Marie Oberpottkamp
- Hamburg School of Food Science - Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146, Hamburg, Germany
| | - Stephan Seifert
- Hamburg School of Food Science - Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146, Hamburg, Germany
| | - Markus Fischer
- Hamburg School of Food Science - Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146, Hamburg, Germany.
| |
Collapse
|
3
|
Kasapi M, Xu K, Ebbels TMD, O’Regan DP, Ware JS, Posma JM. LAVASET: Latent Variable Stochastic Ensemble of Trees. An ensemble method for correlated datasets with spatial, spectral, and temporal dependencies. Bioinformatics 2024; 40:btae101. [PMID: 38383048 PMCID: PMC11212485 DOI: 10.1093/bioinformatics/btae101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 01/30/2024] [Accepted: 02/20/2024] [Indexed: 02/23/2024] Open
Abstract
MOTIVATION Random forests (RFs) can deal with a large number of variables, achieve reasonable prediction scores, and yield highly interpretable feature importance values. As such, RFs are appropriate models for feature selection and further dimension reduction. However, RFs are often not appropriate for correlated datasets due to their mode of selecting individual features for splitting. Addressing correlation relationships in high-dimensional datasets is imperative for reducing the number of variables that are assigned high importance, hence making the dimension reduction most efficient. Here, we propose the LAtent VAriable Stochastic Ensemble of Trees (LAVASET) method that derives latent variables based on the distance characteristics of each feature and aims to incorporate the correlation factor in the splitting step. RESULTS Without compromising on performance in the majority of examples, LAVASET outperforms RF by accurately determining feature importance across all correlated variables and ensuring proper distribution of importance values. LAVASET yields mostly non-inferior prediction accuracies to traditional RFs when tested in simulated and real 1D datasets, as well as more complex and high-dimensional 3D datatypes. Unlike traditional RFs, LAVASET is unaffected by single 'important' noisy features (false positives), as it considers the local neighbourhood. LAVASET, therefore, highlights neighbourhoods of features, reflecting real signals that collectively impact the model's predictive ability. AVAILABILITY AND IMPLEMENTATION LAVASET is freely available as a standalone package from https://github.com/melkasapi/LAVASET.
Collapse
Affiliation(s)
- Melpomeni Kasapi
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London W12 0NN, United Kingdom
- Faculty of Medicine, National Heart & Lung Institute, Imperial College London, London W12 0NN, United Kingdom
- MRC London Institute of Medical Sciences, Imperial College London, London W12 0HS, United Kingdom
| | - Kexin Xu
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London W12 0NN, United Kingdom
| | - Timothy M D Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London W12 0NN, United Kingdom
| | - Declan P O’Regan
- Faculty of Medicine, National Heart & Lung Institute, Imperial College London, London W12 0NN, United Kingdom
- MRC London Institute of Medical Sciences, Imperial College London, London W12 0HS, United Kingdom
| | - James S Ware
- Faculty of Medicine, National Heart & Lung Institute, Imperial College London, London W12 0NN, United Kingdom
- MRC London Institute of Medical Sciences, Imperial College London, London W12 0HS, United Kingdom
- Royal Brompton & Harefield Hospitals, Guy’s and St. Thomas’ NHS Foundation Trust, London SW3 6NP, United Kingdom
- Program in Medical & Population Genetics, Broad Institute of MIT & Harvard, Cambridge, MA 02142, United States
| | - Joram M Posma
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London W12 0NN, United Kingdom
| |
Collapse
|
4
|
Wenck S, Mix T, Fischer M, Hackl T, Seifert S. Opening the Random Forest Black Box of 1H NMR Metabolomics Data by the Exploitation of Surrogate Variables. Metabolites 2023; 13:1075. [PMID: 37887402 PMCID: PMC10608983 DOI: 10.3390/metabo13101075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/05/2023] [Accepted: 10/10/2023] [Indexed: 10/28/2023] Open
Abstract
The untargeted metabolomics analysis of biological samples with nuclear magnetic resonance (NMR) provides highly complex data containing various signals from different molecules. To use these data for classification, e.g., in the context of food authentication, machine learning methods are used. These methods are usually applied as a black box, which means that no information about the complex relationships between the variables and the outcome is obtained. In this study, we show that the random forest-based approach surrogate minimal depth (SMD) can be applied for a comprehensive analysis of class-specific differences by selecting relevant variables and analyzing their mutual impact on the classification model of different truffle species. SMD allows the assignment of variables from the same metabolites as well as the detection of interactions between different metabolites that can be attributed to known biological relationships.
Collapse
Affiliation(s)
- Soeren Wenck
- Institute of Food Chemistry, Hamburg School of Food Science, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany (M.F.); (T.H.)
| | - Thorsten Mix
- Institute of Organic Chemistry, University of Hamburg, Martin-Luther-King-Platz 6, 20146 Hamburg, Germany;
| | - Markus Fischer
- Institute of Food Chemistry, Hamburg School of Food Science, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany (M.F.); (T.H.)
| | - Thomas Hackl
- Institute of Food Chemistry, Hamburg School of Food Science, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany (M.F.); (T.H.)
- Institute of Organic Chemistry, University of Hamburg, Martin-Luther-King-Platz 6, 20146 Hamburg, Germany;
| | - Stephan Seifert
- Institute of Food Chemistry, Hamburg School of Food Science, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany (M.F.); (T.H.)
| |
Collapse
|