1
|
Chung J, Zhang J, Saimon AI, Liu Y, Johnson BN, Kong Z. Imbalanced spectral data analysis using data augmentation based on the generative adversarial network. Sci Rep 2024; 14:13230. [PMID: 38853181 PMCID: PMC11163007 DOI: 10.1038/s41598-024-63285-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 05/27/2024] [Indexed: 06/11/2024] Open
Abstract
Spectroscopic techniques generate one-dimensional spectra with distinct peaks and specific widths in the frequency domain. These features act as unique identities for material characteristics. Deep neural networks (DNNs) has recently been considered a powerful tool for automatically categorizing experimental spectra data by supervised classification to evaluate material characteristics. However, most existing work assumes balanced spectral data among various classes in the training data, contrary to actual experiments, where the spectral data is usually imbalanced. The imbalanced training data deteriorates the supervised classification performance, hindering understanding of the phase behavior, specifically, sol-gel transition (gelation) of soft materials and glycomaterials. To address this issue, this paper applies a novel data augmentation method based on a generative adversarial network (GAN) proposed by the authors in their prior work. To demonstrate the effectiveness of the proposed method, the actual imbalanced spectral data from Pluronic F-127 hydrogel and Alpha-Cyclodextrin hydrogel are used to classify the phases of data. Specifically, our approach improves 8.8%, 6.4%, and 6.2% of the performance of the existing data augmentation methods regarding the classifier's F-score, Precision, and Recall on average, respectively. Specifically, our method consists of three DNNs: the generator, discriminator, and classifier. The method generates samples that are not only authentic but emphasize the differentiation between material characteristics to provide balanced training data, improving the classification results. Based on these validated results, we expect the method's broader applications in addressing imbalanced measurement data across diverse domains in materials science and chemical engineering.
Collapse
Affiliation(s)
- Jihoon Chung
- Department of Industrial Engineering, Pusan National University, Busan, South Korea
| | - Junru Zhang
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, USA
| | - Amirul Islam Saimon
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, USA
| | - Yang Liu
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, USA
| | - Blake N Johnson
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, USA.
| | - Zhenyu Kong
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, USA.
| |
Collapse
|
2
|
Perez de Souza L, Fernie AR. Computational methods for processing and interpreting mass spectrometry-based metabolomics. Essays Biochem 2024; 68:5-13. [PMID: 37999335 PMCID: PMC11065554 DOI: 10.1042/ebc20230019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/10/2023] [Accepted: 11/15/2023] [Indexed: 11/25/2023]
Abstract
Metabolomics has emerged as an indispensable tool for exploring complex biological questions, providing the ability to investigate a substantial portion of the metabolome. However, the vast complexity and structural diversity intrinsic to metabolites imposes a great challenge for data analysis and interpretation. Liquid chromatography mass spectrometry (LC-MS) stands out as a versatile technique offering extensive metabolite coverage. In this mini-review, we address some of the hurdles posed by the complex nature of LC-MS data, providing a brief overview of computational tools designed to help tackling these challenges. Our focus centers on two major steps that are essential to most metabolomics investigations: the translation of raw data into quantifiable features, and the extraction of structural insights from mass spectra to facilitate metabolite identification. By exploring current computational solutions, we aim at providing a critical overview of the capabilities and constraints of mass spectrometry-based metabolomics, while introduce some of the most recent trends in data processing and analysis within the field.
Collapse
Affiliation(s)
- Leonardo Perez de Souza
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Alisdair R Fernie
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
- Center for Plant Systems Biology and Biotechnology, 4000 Plovdiv, Bulgaria
| |
Collapse
|
3
|
Zeng J, Li Y, Wang C, Fu S, He M. Combination of in silico prediction and convolutional neural network framework for targeted screening of metabolites from LC-HRMS fingerprints: A case study of "Pericarpium Citri Reticulatae - FructusAurantii". Talanta 2024; 269:125514. [PMID: 38071769 DOI: 10.1016/j.talanta.2023.125514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 11/26/2023] [Accepted: 12/01/2023] [Indexed: 01/05/2024]
Abstract
In this study, a novel approach is introduced, merging in silico prediction with a Convolutional Neural Network (CNN) framework for the targeted screening of in vivo metabolites in Liquid Chromatography-High Resolution Mass Spectrometry (LC-HRMS) fingerprints. Initially, three predictive tools, supplemented by literature, identify potential metabolites for target prototypes derived from Traditional Chinese Medicines (TCMs) or functional foods. Subsequently, a CNN is developed to minimize false positives from CWT-based peak detection. The Extracted Ion Chromatogram (EIC) peaks are then annotated using MS-FINDER across three levels of confidence. This methodology focuses on analyzing the metabolic fingerprints of rats administered with "Pericarpium Citri Reticulatae - Fructus Aurantii" (PCR-FA). Consequently, 384 peaks in positive mode and 282 in negative mode were identified as true peaks of probable metabolites. By contrasting these with "blank serum" data, EIC peaks of adequate intensity were chosen for MS/MS fragment analysis. Ultimately, 14 prototypes (including flavonoids and lactones) and 40 metabolites were precisely linked to their corresponding EIC peaks, thereby providing deeper insight into the pharmacological mechanism. This innovative strategy markedly enhances the chemical coverage in the targeted screening of LC-HRMS metabolic fingerprints.
Collapse
Affiliation(s)
- Jun Zeng
- Department of Pharmaceutical Engineering, School of Chemical Engineering, Xiangtan University, Xiangtan 411105, China
| | - Yaping Li
- Department of Quality Control, Xiangtan Central Hospital, Xiangtan 411100, China
| | - Chuanlin Wang
- Department of Pharmaceutical Engineering, School of Chemical Engineering, Xiangtan University, Xiangtan 411105, China
| | - Sheng Fu
- Hunan prevention and treatment institute for occupational disease, Changsha 410007, China
| | - Min He
- Department of Pharmaceutical Engineering, School of Chemical Engineering, Xiangtan University, Xiangtan 411105, China.
| |
Collapse
|
4
|
Genva M, Fougère L, Bahammou D, Mongrand S, Boutté Y, Fouillen L. A global LC-MS 2 -based methodology to identify and quantify anionic phospholipids in plant samples. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 117:956-971. [PMID: 37937773 DOI: 10.1111/tpj.16525] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 10/10/2023] [Accepted: 10/21/2023] [Indexed: 11/09/2023]
Abstract
Anionic phospholipids (PS, PA, PI, PIPs) are low-abundant phospholipids with impactful functions in cell signaling, membrane trafficking and cell differentiation processes. They can be quickly metabolized and can transiently accumulate at defined spots within the cell or an organ to respond to physiological or environmental stimuli. As even a small change in their composition profile will produce a significant effect on biological processes, it is crucial to develop a sensitive and optimized analytical method to accurately detect and quantify them. While thin-layer chromatography (TLC) separation coupled with gas chromatography (GC) detection methods already exist, they do not allow for precise, sensitive, and accurate quantification of all anionic phospholipid species. Here we developed a method based on high-performance liquid chromatography (HPLC) combined with two-dimensional mass spectrometry (MS2 ) by MRM mode to detect and quantify all molecular species and classes of anionic phospholipids in one shot. This method is based on a derivatization step by methylation that greatly enhances the ionization, the separation of each peak, the peak resolution as well as the limit of detection and quantification for each individual molecular species, and more particularly for PA and PS. Our method universally works in various plant samples. Remarkably, we identified that PS is enriched with very long chain fatty acids in the roots but not in aerial organs of Arabidopsis thaliana. Our work thus paves the way for new studies on how the composition of anionic lipids is finely tuned during plant development and environmental responses.
Collapse
Affiliation(s)
- Manon Genva
- University of Bordeaux, CNRS, Laboratoire de Biogenèse Membranaire (LBM), UMR 5200, F-33140, Villenave d'Ornon, France
- Laboratory of Chemistry of Natural Molecules, Gembloux Agro-Bio Tech, University of Liège, Passage des Déportés 2, 5030, Gembloux, Belgium
| | - Louise Fougère
- University of Bordeaux, CNRS, Laboratoire de Biogenèse Membranaire (LBM), UMR 5200, F-33140, Villenave d'Ornon, France
| | - Delphine Bahammou
- University of Bordeaux, CNRS, Laboratoire de Biogenèse Membranaire (LBM), UMR 5200, F-33140, Villenave d'Ornon, France
| | - Sébastien Mongrand
- University of Bordeaux, CNRS, Laboratoire de Biogenèse Membranaire (LBM), UMR 5200, F-33140, Villenave d'Ornon, France
| | - Yohann Boutté
- University of Bordeaux, CNRS, Laboratoire de Biogenèse Membranaire (LBM), UMR 5200, F-33140, Villenave d'Ornon, France
| | - Laetitia Fouillen
- University of Bordeaux, CNRS, Laboratoire de Biogenèse Membranaire (LBM), UMR 5200, F-33140, Villenave d'Ornon, France
| |
Collapse
|
5
|
Gong Y, Ding W, Wang P, Wu Q, Yao X, Yang Q. Evaluating Machine Learning Methods of Analyzing Multiclass Metabolomics. J Chem Inf Model 2023; 63:7628-7641. [PMID: 38079572 DOI: 10.1021/acs.jcim.3c01525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023]
Abstract
Multiclass metabolomic studies have become popular for revealing the differences in multiple stages of complex diseases, various lifestyles, or the effects of specific treatments. In multiclass metabolomics, there are multiple data manipulation steps for analyzing raw data, which consist of data filtering, the imputation of missing values, data normalization, marker identification, sample separation, classification, and so on. In each step, several to dozens of machine learning methods can be chosen for the given data set, with potentially hundreds or thousands of method combinations in the whole data processing chain. Therefore, a clear understanding of these machine learning methods is helpful for selecting an appropriate method combination for obtaining stable and reliable analytical results of specific data. However, there has rarely been an overall introduction or evaluation of these methods based on multiclass metabolomic data. Herein, detailed descriptions of these machine learning methods in multiple data manipulation steps are reviewed. Moreover, an assessment of these methods was performed using a benchmark data set for multiclass metabolomics. First, 12 imputation methods for imputing missing values were evaluated based on the PSS (Procrustes statistical shape analysis) and NRMSE (normalized root-mean-square error) values. Second, 17 normalization methods for processing multiclass metabolomic data were evaluated by applying the PMAD (pooled median absolute deviation) value. Third, different methods of identifying markers of multiclass metabolomics were evaluated based on the CWrel (relative weighted consistency) value. Fourth, nine classification methods for constructing multiclass models were assessed using the AUC (area under the curve) value. Performance evaluations of machine learning methods are highly recommended to select the most appropriate method combination before performing the final analysis of the given data. Overall, detailed descriptions and evaluation of various machine learning methods are expected to improve analyses of multiclass metabolomic data.
Collapse
Affiliation(s)
- Yaguo Gong
- State Key Laboratory of Quality Research in Chinese Medicine, School of Pharmacy, Macau University of Science and Technology, Macau 999078, China
| | - Wei Ding
- State Key Laboratory of Quality Research in Chinese Medicine, School of Pharmacy, Macau University of Science and Technology, Macau 999078, China
| | - Panpan Wang
- College of Chemistry and Pharmaceutical Engineering, Huanghuai University, Zhumadian 463000, China
| | - Qibiao Wu
- State Key Laboratory of Quality Research in Chinese Medicine, School of Pharmacy, Macau University of Science and Technology, Macau 999078, China
| | - Xiaojun Yao
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
- Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| |
Collapse
|
6
|
Ai J, Zhao W, Yu Q, Qian X, Zhou J, Huo X, Tang F. SR-Unet: A Super-Resolution Algorithm for Ion Trap Mass Spectrometers Based on the Deep Neural Network. Anal Chem 2023; 95:17407-17415. [PMID: 37963290 DOI: 10.1021/acs.analchem.3c04172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
The mass spectrometer is an important tool for modern chemical analysis and detection. Especially, the emergence of miniature mass spectrometers has provided new tools for field analysis and detection. The resolution of a mass spectrometer reflects the ability of the instrument to discriminate between adjacent mass-to-charge ratio ions, and the higher the resolution, the better the discrimination of complex mixtures. Quadrupole ion traps are generally considered as a low-resolution mass spectrometry method, but they have gained wide attention and development in recent years because of their suitability for miniaturization and high qualitative capability. For an ion trap mass spectrometer, the mass sensitivity and resolution can be mutually constrained and need to be balanced by setting an appropriate scanning speed. In this study, a super-resolution U-net algorithm (SR-Unet) is proposed for ion trap mass spectrometry, which can estimate the possible ions from the overlapping ion peaks of low-resolution spectra and improve the equivalent resolution while ensuring sufficient sensitivity and analysis speed of the instrument. By determining the mass spectra of a linear ion trap mass spectrometer (LTQ XL) in Turbo and Normal scan modes, the same unit mass resolution as that at a scan speed of 16,667 Da/s was successfully obtained at 125,000 Da/s. Also, the experiments demonstrated that the algorithm is capable of the mass-to-charge ratio and instrument migration. SR-Unet can be migrated and applied to a miniature mass spectrometer for cruise detection of volatile organic compounds (VOCs), and the identification of VOC species in Photochemical Assessment Monitoring Stations (PAMS) was improved from 31 to 50 species with the same monitoring and analysis speed requirement. Further, super-unit mass resolution peptide detection was achieved on a miniature mass spectrometer with the help of the SR-Unet algorithm, which reduced the full width at half-maxima (FWHM) of bradykinin divalent ions (m/z 531) from 0.35 to 0.15 Da at a scan speed of 375 Da/s and improved the equivalent resolution to 3540. The proposed method provides a new idea to enhance the field mixture detection capability of miniature ion trap mass spectrometers.
Collapse
Affiliation(s)
- Jiawen Ai
- Division of Advanced Manufacturing, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China
- State Key Laboratory of Precision Measurement Technology and Instruments, Department of Precision Instrument, Tsinghua University, Beijing 100084, China
| | - Weize Zhao
- Division of Advanced Manufacturing, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China
- State Key Laboratory of Precision Measurement Technology and Instruments, Department of Precision Instrument, Tsinghua University, Beijing 100084, China
| | - Quan Yu
- Division of Advanced Manufacturing, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China
| | - Xiang Qian
- Division of Advanced Manufacturing, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China
| | - Jianhua Zhou
- School of Biomedical Engineering, Shenzhen Campus of Sun Yat-Sen University, Shenzhen 518107, China
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province, School of Biomedical Engineering, Sun Yat-sen University, Guangzhou 510275, China
| | - Xinming Huo
- School of Biomedical Engineering, Shenzhen Campus of Sun Yat-Sen University, Shenzhen 518107, China
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province, School of Biomedical Engineering, Sun Yat-sen University, Guangzhou 510275, China
| | - Fei Tang
- State Key Laboratory of Precision Measurement Technology and Instruments, Department of Precision Instrument, Tsinghua University, Beijing 100084, China
| |
Collapse
|
7
|
Kumler W, Hazelton BJ, Ingalls AE. Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics. BMC Bioinformatics 2023; 24:404. [PMID: 37891484 PMCID: PMC10612323 DOI: 10.1186/s12859-023-05533-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 10/16/2023] [Indexed: 10/29/2023] Open
Abstract
BACKGROUND Chromatographic peakpicking continues to represent a significant bottleneck in automated LC-MS workflows. Uncontrolled false discovery rates and the lack of manually-calibrated quality metrics require researchers to visually evaluate individual peaks, requiring large amounts of time and breaking replicability. This problem is exacerbated in noisy environmental datasets and for novel separation methods such as hydrophilic interaction columns in metabolomics, creating a demand for a simple, intuitive, and robust metric of peak quality. RESULTS Here, we manually labeled four HILIC oceanographic particulate metabolite datasets to assess the performance of individual peak quality metrics. We used these datasets to construct a predictive model calibrated to the likelihood that visual inspection by an MS expert would include a given mass feature in the downstream analysis. We implemented two novel peak quality metrics, a custom signal-to-noise metric and a test of similarity to a bell curve, both calculated from the raw data in the extracted ion chromatogram, and found that these outperformed existing measurements of peak quality. A simple logistic regression model built on two metrics reduced the fraction of false positives in the analysis from 70-80% down to 1-5% and showed minimal overfitting when applied to novel datasets. We then explored the implications of this quality thresholding on the conclusions obtained by the downstream analysis and found that while only 10% of the variance in the dataset could be explained by depth in the default output from the peakpicker, approximately 40% of the variance was explained when restricted to high-quality peaks alone. CONCLUSIONS We conclude that the poor performance of peakpicking algorithms significantly reduces the power of both univariate and multivariate statistical analyses to detect environmental differences. We demonstrate that simple models built on intuitive metrics and derived from the raw data are more robust and can outperform more complex models when applied to new data. Finally, we show that in properly curated datasets, depth is a major driver of variability in the marine microbial metabolome and identify several interesting metabolite trends for future investigation.
Collapse
Affiliation(s)
- William Kumler
- School of Oceanography, University of Washington, Seattle, WA, 98195, USA
| | - Bryna J Hazelton
- eScience Institute, University of Washington, Seattle, WA, 98195, USA
- Department of Physics, University of Washington, Seattle, WA, 98195, USA
| | - Anitra E Ingalls
- School of Oceanography, University of Washington, Seattle, WA, 98195, USA.
| |
Collapse
|
8
|
Cao H, Shi H, Tang J, Xu Y, Ling Y, Lu X, Yang Y, Zhang X, Wang H. Ultrasensitive discrimination of volatile organic compounds using a microfluidic silicon SERS artificial intelligence chip. iScience 2023; 26:107821. [PMID: 37731613 PMCID: PMC10507157 DOI: 10.1016/j.isci.2023.107821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 07/06/2023] [Accepted: 08/31/2023] [Indexed: 09/22/2023] Open
Abstract
Current gaseous sensors hardly discriminate trace volatile organic compounds at the ppt level. Herein, we present an integrated platform for simultaneously enabling rapid preconcentration, reliable surface-enhanced Raman scattering, (SERS) detection and automatic identification of trace aldehydes at the ppt level. For rapid preconcentration, we demonstrate that the nozzle-like microfluidic concentrator allows the enrichment of rare gaseous analytes by five-fold in only 0.01 ms. The enriched gas is subsequently captured and detected by an integrated silicon-based SERS chip, which is made of zeolitic imidazolate framework-8 coated silver nanoparticles grown in situ on a silicon wafer. After SERS measurement, a fully connected deep neural network is built to extract faint features in the spectral dataset and discriminate volatile organic compound classes. We demonstrate that six kinds of gaseous aldehydes at 100 ppt could be detected and classified with an identification accuracy of ∼80.9% by using this platform.
Collapse
Affiliation(s)
- Haiting Cao
- Suzhou Key Laboratory of Nanotechnology and Biomedicine, Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China
| | - Huayi Shi
- Suzhou Key Laboratory of Nanotechnology and Biomedicine, Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China
| | - Jie Tang
- Suzhou Key Laboratory of Nanotechnology and Biomedicine, Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China
| | - Yanan Xu
- Suzhou Key Laboratory of Nanotechnology and Biomedicine, Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China
| | - Yufan Ling
- State Key Laboratory of Radiation Medicine and Protection, School of Radiation Medicine and Protection, Collaborative Innovation Center of Radiological Medicine of Jiangsu Higher Education Institutions, Soochow University, 199 Renai Road, Suzhou 215123, China
| | - Xing Lu
- Suzhou Key Laboratory of Nanotechnology and Biomedicine, Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China
| | - Yang Yang
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
| | - Xiaojie Zhang
- Department of Experimental Center, Medical College of Soochow University, Suzhou, Jiangsu 215123, China
| | - Houyu Wang
- Suzhou Key Laboratory of Nanotechnology and Biomedicine, Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China
| |
Collapse
|
9
|
Liao Y, Tian M, Zhang H, Lu H, Jiang Y, Chen Y, Zhang Z. Highly automatic and universal approach for pure ion chromatogram construction from liquid chromatography-mass spectrometry data using deep learning. J Chromatogr A 2023; 1705:464172. [PMID: 37392637 DOI: 10.1016/j.chroma.2023.464172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 06/14/2023] [Accepted: 06/18/2023] [Indexed: 07/03/2023]
Abstract
Feature extraction is the most fundamental step when analyzing liquid chromatography-mass spectrometry (LC-MS) datasets. However, traditional methods require optimal parameter selections and re-optimization for different datasets, thus hindering efficient and objective large-scale data analysis. Pure ion chromatogram (PIC) is widely used because it avoids the peak splitting problem of the extracted ion chromatogram (EIC) and regions of interest (ROIs). Here, we developed a deep learning-based pure ion chromatogram method (DeepPIC) to find PICs using a customized U-Net from centroid mode data of LC-MS directly and automatically. A model was trained, validated, and tested on the Arabidopsis thaliana dataset with 200 input-label pairs. DeepPIC was integrated into KPIC2. The combination enables the entire processing pipeline from raw data to discriminant models for metabolomics datasets. The KPIC2 with DeepPIC was compared against other competing methods (XCMS, FeatureFinderMetabo, and peakonly) on the MM48, simulated MM48, and quantitative datasets. These comparisons showed that DeepPIC outperforms XCMS, FeatureFinderMetabo, and peakonly in recall rates and correlation with sample concentrations. Five datasets of different instruments and samples were used to evaluate the quality of PICs and the universal applicability of DeepPIC, and 95.12% of the found PICs could precisely match their manually labeled PICs. Therefore, KPIC2+DeepPIC is an automatic, practical, and off-the-shelf method to extract features from raw data directly, exceeding traditional methods with careful parameter tuning. It is publicly available at https://github.com/yuxuanliao/DeepPIC.
Collapse
Affiliation(s)
- Yuxuan Liao
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Miao Tian
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yonglei Jiang
- Yunnan Academy of Tobacco Agricultural Sciences, Kunming, Yunnan 650021, China
| | - Yi Chen
- Yunnan Academy of Tobacco Agricultural Sciences, Kunming, Yunnan 650021, China.
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
10
|
Perez de Souza L, Bitocchi E, Papa R, Tohge T, Fernie AR. Decreased metabolic diversity in common beans associated with domestication revealed by untargeted metabolomics, information theory, and molecular networking. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023; 115:1021-1036. [PMID: 37272491 DOI: 10.1111/tpj.16277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 04/28/2023] [Accepted: 05/03/2023] [Indexed: 06/06/2023]
Abstract
The process of crop domestication leads to a dramatic reduction in the gene expression associated with metabolic diversity. Genes involved in specialized metabolism appear to be particularly affected. Although there is ample evidence of these effects at the genetic level, a reduction in diversity at the metabolite level has been taken for granted despite having never been adequately accessed and quantified. Here we leveraged the high coverage of ultra high performance liquid chromatography-high-resolution mass spectrometry based metabolomics to investigate the metabolic diversity in the common bean (Phaseolus vulgaris). Information theory highlights a shift towards lower metabolic diversity and specialization when comparing wild and domesticated bean accessions. Moreover, molecular networking approaches facilitated a broader metabolite annotation than achieved to date, and its integration with gene expression data uncovers a metabolic shift from specialized metabolism towards central metabolism upon domestication of this crop.
Collapse
Affiliation(s)
- Leonardo Perez de Souza
- Max-Planck-Institute of Molecular Plant Physiology, Am Müehlenberg 1, Potsdam-Golm, 14476, Germany
| | - Elena Bitocchi
- Department of Agricultural, Food, and Environmental Sciences, Università Politecnica delle Marche, 60131, Ancona, Italy
| | - Roberto Papa
- Department of Agricultural, Food, and Environmental Sciences, Università Politecnica delle Marche, 60131, Ancona, Italy
| | - Takayuki Tohge
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5, Takayama-cho, Ikoma, Nara, 630-0192, Japan
| | - Alisdair R Fernie
- Max-Planck-Institute of Molecular Plant Physiology, Am Müehlenberg 1, Potsdam-Golm, 14476, Germany
| |
Collapse
|
11
|
Tsai JJ, Chang CC, Huang DY, Lin TS, Chen YC. Analysis and classification of coffee beans using single coffee bean mass spectrometry with machine learning strategy. Food Chem 2023; 426:136610. [PMID: 37331144 DOI: 10.1016/j.foodchem.2023.136610] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 04/18/2023] [Accepted: 06/10/2023] [Indexed: 06/20/2023]
Abstract
Coffee is a daily essential, with prices varying based on taste, aroma, and chemical composition. However, distinguishing between different coffee beans is challenging due to time-consuming and destructive sample pretreatment. This study presents a novel approach for directly analyzing single coffee beans through mass spectrometry (MS) without the need for sample pretreatment. Using a single coffee bean deposited with a solvent droplet containing methanol and deionized water, we generated electrospray to extract the main species for MS analysis. Mass spectra of single coffee beans were obtained in just a few seconds. To showcase the effectiveness of the developed method, we used palm civet coffee beans (kopi luwak), one of the most expensive coffee types, as model samples. Our approach distinguished palm civet coffee beans from regular ones with high accuracy, sensitivity, and selectivity. Moreover, we employed a machine learning strategy to rapidly classify coffee beans based on their mass spectra, achieving 99.58% accuracy, 98.75% sensitivity, and 100% selectivity in cross-validation. Our study highlights the potential of combining the single-bean MS method with machine learning for the rapid and non-destructive classification of coffee beans. This approach can help to detect low-priced coffee beans mixed with high-priced ones, benefiting both consumers and the coffee industry.
Collapse
Affiliation(s)
- Jia-Jen Tsai
- Department of Applied Chemistry, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Che-Chia Chang
- Department of Applied Mathematics, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - De-Yi Huang
- Department of Applied Chemistry, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Te-Sheng Lin
- Department of Applied Mathematics, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; National Center for Theoretical Sciences, National Taiwan University, Taipei 10617, Taiwan.
| | - Yu-Chie Chen
- Department of Applied Chemistry, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; International College of Semiconductor Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan.
| |
Collapse
|
12
|
Pan Q, Hu W, He D, He C, Zhang L, Shi Q. Machine-learning assisted molecular formula assignment to high-resolution mass spectrometry data of dissolved organic matter. Talanta 2023; 259:124484. [PMID: 37001397 DOI: 10.1016/j.talanta.2023.124484] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/22/2023] [Accepted: 03/22/2023] [Indexed: 03/29/2023]
Abstract
High-resolution mass spectrometry (HRMS) provides molecular compositional information of dissolved organic matter (DOM) through isotopic assignment from the molecular mass. However, due to the inevitable deviation of molecular mass measurement and the limitation of resolving power, multiple possible solutions frequently occur for a given molecular mass. Lowering the mass deviation threshold and adding assignment restriction rules are often applied to exclude the incorrect solutions, which generally involves time-consuming manual post-processing of mass data. To improve the result accuracy in an automated manner, we developed a molecular formula assignment algorithm based on machine-learning technology. The method integrated a logistic regression model using manually corrected isotopic composition and the peak features of HRMS data (m/z, signal-to-noise ratio, isotope type, and number, etc.) as training data. The developed model can evaluate the correctness of a candidate formula for the given mass peak based on the peak features. The method was verified by various DOM samples FT-ICR MS data (direct infusion negative mode electrospray), achieving a ∼90% accuracy (compared to the traditional approach) for formula assignment. The method was applied to a series of NOM samples and showed a significant improvement in formula assignment compared with the mass matching method.
Collapse
|
13
|
Du X, Dastmalchi F, Ye H, Garrett TJ, Diller MA, Liu M, Hogan WR, Brochhausen M, Lemas DJ. Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software. Metabolomics 2023; 19:11. [PMID: 36745241 DOI: 10.1007/s11306-023-01974-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 01/20/2023] [Indexed: 02/07/2023]
Abstract
BACKGROUND Liquid chromatography-high resolution mass spectrometry (LC-HRMS) is a popular approach for metabolomics data acquisition and requires many data processing software tools. The FAIR Principles - Findability, Accessibility, Interoperability, and Reusability - were proposed to promote open science and reusable data management, and to maximize the benefit obtained from contemporary and formal scholarly digital publishing. More recently, the FAIR principles were extended to include Research Software (FAIR4RS). AIM OF REVIEW This study facilitates open science in metabolomics by providing an implementation solution for adopting FAIR4RS in the LC-HRMS metabolomics data processing software. We believe our evaluation guidelines and results can help improve the FAIRness of research software. KEY SCIENTIFIC CONCEPTS OF REVIEW We evaluated 124 LC-HRMS metabolomics data processing software obtained from a systematic review and selected 61 software for detailed evaluation using FAIR4RS-related criteria, which were extracted from the literature along with internal discussions. We assigned each criterion one or more FAIR4RS categories through discussion. The minimum, median, and maximum percentages of criteria fulfillment of software were 21.6%, 47.7%, and 71.8%. Statistical analysis revealed no significant improvement in FAIRness over time. We identified four criteria covering multiple FAIR4RS categories but had a low %fulfillment: (1) No software had semantic annotation of key information; (2) only 6.3% of evaluated software were registered to Zenodo and received DOIs; (3) only 14.5% of selected software had official software containerization or virtual machine; (4) only 16.7% of evaluated software had a fully documented functions in code. According to the results, we discussed improvement strategies and future directions.
Collapse
Affiliation(s)
- Xinsong Du
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Farhad Dastmalchi
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Hao Ye
- Health Science Center Libraries, University of Florida, Florida, USA
| | - Timothy J Garrett
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Florida, USA
| | - Matthew A Diller
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Mei Liu
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Mathias Brochhausen
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, USA
| | - Dominick J Lemas
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA.
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Florida, Gainesville, United States.
- Center for Perinatal Outcomes Research, University of Florida College of Medicine, Gainesville, United States.
| |
Collapse
|
14
|
Zhang H, Xu Z, Fan X, Wang Y, Yang Q, Sun J, Wen M, Kang X, Zhang Z, Lu H. Fusion of Quality Evaluation Metrics and Convolutional Neural Network Representations for ROI Filtering in LC-MS. Anal Chem 2023; 95:612-620. [PMID: 36597722 DOI: 10.1021/acs.analchem.2c01398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Region of interest (ROI) extraction is a fundamental step in analyzing metabolomic datasets acquired by liquid chromatography-mass spectrometry (LC-MS). However, noises and backgrounds in LC-MS data often affect the quality of extracted ROIs. Therefore, developing effective ROI evaluation algorithms is necessary to eliminate false positives meanwhile keep the false-negative rate as low as possible. In this study, a deep fused filter of ROIs (dffROI) was proposed to improve the accuracy of ROI extraction by combining the handcrafted evaluation metrics with convolutional neural network (CNN)-learned representations. To evaluate the performance of dffROI, dffROI was compared with peakonly (CNN-learned representation) and five handcrafted metrics on three LC-MS datasets and a gas chromatography-mass spectrometry (GC-MS) dataset. Results show that dffROI can achieve higher accuracy, better true-positive rate, and lower false-positive rate. Its accuracy, true-positive rate, and false-positive rate are 0.9841, 0.9869, and 0.0186 on the test set, respectively. The classification error rate of dffROI (1.59%) is significantly reduced compared with peakonly (2.73%). The model-agnostic feature importance demonstrates the necessity of fusing handcrafted evaluation metrics with the convolutional neural network representations. dffROI is an automatic, robust, and universal method for ROI filtering by virtue of information fusion and end-to-end learning. It is implemented in Python programming language and open-sourced at https://github.com/zhanghailiangcsu/dffROI under BSD License. Furthermore, it has been integrated into the KPIC2 framework previously proposed by our group to facilitate real metabolomic LC-MS dataset analysis.
Collapse
Affiliation(s)
- Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Zhenbo Xu
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Xiaqiong Fan
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Yue Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Jinyu Sun
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Ming Wen
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Xiao Kang
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha410083, China.,National International Collaborative Research Center for Medical Metabolomics, Central South University, Changsha410083, China
| |
Collapse
|
15
|
He X, Liu X, Zuo F, Shi H, Jing J. Artificial intelligence-based multi-omics analysis fuels cancer precision medicine. Semin Cancer Biol 2023; 88:187-200. [PMID: 36596352 DOI: 10.1016/j.semcancer.2022.12.009] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 12/16/2022] [Accepted: 12/29/2022] [Indexed: 01/02/2023]
Abstract
With biotechnological advancements, innovative omics technologies are constantly emerging that have enabled researchers to access multi-layer information from the genome, epigenome, transcriptome, proteome, metabolome, and more. A wealth of omics technologies, including bulk and single-cell omics approaches, have empowered to characterize different molecular layers at unprecedented scale and resolution, providing a holistic view of tumor behavior. Multi-omics analysis allows systematic interrogation of various molecular information at each biological layer while posing tricky challenges regarding how to extract valuable insights from the exponentially increasing amount of multi-omics data. Therefore, efficient algorithms are needed to reduce the dimensionality of the data while simultaneously dissecting the mysteries behind the complex biological processes of cancer. Artificial intelligence has demonstrated the ability to analyze complementary multi-modal data streams within the oncology realm. The coincident development of multi-omics technologies and artificial intelligence algorithms has fuelled the development of cancer precision medicine. Here, we present state-of-the-art omics technologies and outline a roadmap of multi-omics integration analysis using an artificial intelligence strategy. The advances made using artificial intelligence-based multi-omics approaches are described, especially concerning early cancer screening, diagnosis, response assessment, and prognosis prediction. Finally, we discuss the challenges faced in multi-omics analysis, along with tentative future trends in this field. With the increasing application of artificial intelligence in multi-omics analysis, we anticipate a shifting paradigm in precision medicine becoming driven by artificial intelligence-based multi-omics technologies.
Collapse
Affiliation(s)
- Xiujing He
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Xiaowei Liu
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Fengli Zuo
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Hubing Shi
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China
| | - Jing Jing
- Laboratory of Integrative Medicine, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan, PR China.
| |
Collapse
|
16
|
Iravani S, Conrad TOF. An Interpretable Deep Learning Approach for Biomarker Detection in LC-MS Proteomics Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:151-161. [PMID: 35007196 DOI: 10.1109/tcbb.2022.3141656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Analyzing mass spectrometry-based proteomics data with deep learning (DL) approaches poses several challenges due to the high dimensionality, low sample size, and high level of noise. Additionally, DL-based workflows are often hindered to be integrated into medical settings due to the lack of interpretable explanation. We present DLearnMS, a DL biomarker detection framework, to address these challenges on proteomics instances of liquid chromatography-mass spectrometry (LC-MS) - a well-established tool for quantifying complex protein mixtures. Our DLearnMS framework learns the clinical state of LC-MS data instances using convolutional neural networks. Based on the trained neural networks, we show how biomarkers can be identified using layer-wise relevance propagation. This enables detecting discriminating regions of the data and the design of more robust networks. One of the main advantages over other established methods is that no explicit preprocessing step is needed in our DLearnMS framework. Our evaluation shows that DLearnMS outperforms conventional LC-MS biomarker detection approaches in identifying fewer false positive peaks while maintaining a comparable amount of true positives peaks. Code availability: The code is available from the following GIT repository: https://github.com/SaharIravani/DlearnMS.
Collapse
|
17
|
Alotaibi M, Shao J, Pauciulo MW, Nichols WC, Hemnes AR, Malhotra A, Kim NH, Yuan JXJ, Fernandes T, Kerr KM, Alshawabkeh L, Desai AA, Bujor AM, Lafyatis R, Watrous JD, Long T, Cheng S, Chan SY, Jain M. Metabolomic Profiles Differentiate Scleroderma-PAH From Idiopathic PAH and Correspond With Worsened Functional Capacity. Chest 2023; 163:204-215. [PMID: 36087794 PMCID: PMC9899641 DOI: 10.1016/j.chest.2022.08.2230] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 07/12/2022] [Accepted: 08/19/2022] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND The prognosis and therapeutic responses are worse for pulmonary arterial hypertension associated with systemic sclerosis (SSc-PAH) compared with idiopathic pulmonary arterial hypertension (IPAH). This discrepancy could be driven by divergence in underlying metabolic determinants of disease. RESEARCH QUESTION Are circulating bioactive metabolites differentially altered in SSc-PAH vs IPAH, and can this alteration explain clinical disparity between these PAH subgroups? STUDY DESIGN AND METHODS Plasma biosamples from 400 patients with SSc-PAH and 1,082 patients with IPAH were included in the study. Another cohort of 100 patients with scleroderma with no PH and 44 patients with scleroderma with PH was included for external validation. More than 700 bioactive lipid metabolites, representing a range of vasoactive and immune-inflammatory pathways, were assayed in plasma samples from independent discovery and validation cohorts using liquid chromatography/high-resolution mass spectrometry-based approaches. Regression analyses were used to identify metabolites that exhibited differential levels between SSc-PAH and IPAH and associated with disease severity. RESULTS From hundreds of circulating bioactive lipid molecules, five metabolites were found to distinguish between SSc-PAH and IPAH, as well as associate with markers of disease severity. Relative to IPAH, patients with SSc-PAH carried increased levels of fatty acid metabolites, including lignoceric acid and nervonic acid, as well as eicosanoids/oxylipins and sex hormone metabolites. INTERPRETATION Patients with SSc-PAH are characterized by an unfavorable bioactive metabolic profile that may explain the poor and limited response to therapy. These data provide important metabolic insights into the molecular heterogeneity underlying differences between subgroups of PAH.
Collapse
Affiliation(s)
- Mona Alotaibi
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, La Jolla, CA; Department of Medicine, University of California San Diego, La Jolla, CA
| | - Junzhe Shao
- School of Life Sciences, Peking University, Beijing, China
| | - Michael W Pauciulo
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, USA; Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
| | - William C Nichols
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, USA; Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
| | - Anna R Hemnes
- Division of Allergy, Pulmonary and Critical Care Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Atul Malhotra
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, La Jolla, CA; Department of Medicine, University of California San Diego, La Jolla, CA
| | - Nick H Kim
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, La Jolla, CA; Department of Medicine, University of California San Diego, La Jolla, CA
| | - Jason X-J Yuan
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, La Jolla, CA; Department of Medicine, University of California San Diego, La Jolla, CA
| | - Timothy Fernandes
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, La Jolla, CA; Department of Medicine, University of California San Diego, La Jolla, CA
| | - Kim M Kerr
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, La Jolla, CA; Department of Medicine, University of California San Diego, La Jolla, CA
| | - Laith Alshawabkeh
- Division of Cardiovascular Medicine, Sulpizio Cardiovascular Institute, University of California San Diego, La Jolla, CA
| | - Ankit A Desai
- Department of Medicine, Indiana University, Indianapolis, IN
| | - Andreea M Bujor
- Division of Rheumatology, Boston University Medical Center, Boston, MA
| | - Robert Lafyatis
- Division of Rheumatology and Clinical Immunology, University of Pittsburgh Medical Center, Pittsburgh, PA
| | - Jeramie D Watrous
- Department of Medicine, University of California San Diego, La Jolla, CA
| | - Tao Long
- Department of Medicine, University of California San Diego, La Jolla, CA
| | - Susan Cheng
- Barbra Streisand Women's Heart Center, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA
| | - Stephen Y Chan
- Center for Pulmonary Vascular Biology and Medicine, Pittsburgh Heart, Lung, Blood Vascular Medicine Institute, Division of Cardiology, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA.
| | - Mohit Jain
- Department of Medicine, University of California San Diego, La Jolla, CA
| |
Collapse
|
18
|
Nichani K, Uhlig S, Colson B, Hettwer K, Simon K, Bönick J, Uhlig C, Kemmlein S, Stoyke M, Gowik P, Huschek G, Rawel HM. Development of Non-Targeted Mass Spectrometry Method for Distinguishing Spelt and Wheat. Foods 2022; 12:141. [PMID: 36613357 PMCID: PMC9818861 DOI: 10.3390/foods12010141] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 12/13/2022] [Accepted: 12/21/2022] [Indexed: 12/29/2022] Open
Abstract
Food fraud, even when not in the news, is ubiquitous and demands the development of innovative strategies to combat it. A new non-targeted method (NTM) for distinguishing spelt and wheat is described, which aids in food fraud detection and authenticity testing. A highly resolved fingerprint in the form of spectra is obtained for several cultivars of spelt and wheat using liquid chromatography coupled high-resolution mass spectrometry (LC-HRMS). Convolutional neural network (CNN) models are built using a nested cross validation (NCV) approach by appropriately training them using a calibration set comprising duplicate measurements of eleven cultivars of wheat and spelt, each. The results reveal that the CNNs automatically learn patterns and representations to best discriminate tested samples into spelt or wheat. This is further investigated using an external validation set comprising artificially mixed spectra, samples for processed goods (spelt bread and flour), eleven untypical spelt, and six old wheat cultivars. These cultivars were not part of model building. We introduce a metric called the D score to quantitatively evaluate and compare the classification decisions. Our results demonstrate that NTMs based on NCV and CNNs trained using appropriately chosen spectral data can be reliable enough to be used on a wider range of cultivars and their mixes.
Collapse
Affiliation(s)
- Kapil Nichani
- QuoData GmbH, Prellerstr. 14, D-01309 Dresden, Germany
- Institute of Nutritional Science, University of Potsdam, Arthur-Scheunert-Allee 114-116, D-14558 Nuthetal, Germany
| | - Steffen Uhlig
- QuoData GmbH, Fabeckstr. 43, D-14195 Berlin, Germany
| | | | | | - Kirsten Simon
- QuoData GmbH, Prellerstr. 14, D-01309 Dresden, Germany
| | - Josephine Bönick
- Bundesinstitut für Risikobewertung, Max-Dohrn-Str. 8-10, D-10589 Berlin, Germany
| | - Carsten Uhlig
- Akees GmbH, Ansbacher Str. 11, D-10787 Berlin, Germany
| | - Sabine Kemmlein
- Bundesamt für Verbraucherschutz und Lebensmittelsicherheit, Diedersdorfer Weg. 1, D-12277 Berlin, Germany
| | - Manfred Stoyke
- Bundesamt für Verbraucherschutz und Lebensmittelsicherheit, Diedersdorfer Weg. 1, D-12277 Berlin, Germany
| | - Petra Gowik
- Bundesamt für Verbraucherschutz und Lebensmittelsicherheit, Diedersdorfer Weg. 1, D-12277 Berlin, Germany
| | - Gerd Huschek
- IGV-Institut für Getreideverarbeitung GmbH, Arthur-Scheunert-Allee 40/41, D-14558 Nuthetal, Germany
| | - Harshadrai M. Rawel
- Institute of Nutritional Science, University of Potsdam, Arthur-Scheunert-Allee 114-116, D-14558 Nuthetal, Germany
| |
Collapse
|
19
|
Sun B, Smialowski P, Aftab W, Schmidt A, Forne I, Straub T, Imhof A. Improving SWATH-MS analysis by deep-learning. Proteomics 2022; 23:e2200179. [PMID: 36571325 DOI: 10.1002/pmic.202200179] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 11/22/2022] [Accepted: 12/21/2022] [Indexed: 12/27/2022]
Abstract
Data-independent acquisition (DIA) of tandem mass spectrometry spectra has emerged as a promising technology to improve coverage and quantification of proteins in complex mixtures. The success of DIA experiments is dependent on the quality of spectral libraries used for data base searching. Frequently, these libraries need to be generated by labor and time intensive data dependent acquisition (DDA) experiments. Recently, several algorithms have been published that allow the generation of theoretical libraries by an efficient prediction of retention time and intensity of the fragment ions. Sequential windowed acquisition of all theoretical fragment ion spectra mass spectrometry (SWATH-MS) is a DIA method that can be applied at an unprecedented speed, but the fragmentation spectra suffer from a lower quality than data acquired on Orbitrap instruments. To reliably generate theoretical libraries that can be used in SWATH experiments, we developed deep-learning for SWATH analysis (dpSWATH), to improve the sensitivity and specificity of data generated by Q-TOF mass spectrometers. The theoretical library built by dpSWATH allowed us to increase the identification rate of proteins compared to traditional or library-free methods. Based on our analysis we conclude that dpSWATH is a superior prediction framework for SWATH-MS measurements than other algorithms based on Orbitrap data.
Collapse
Affiliation(s)
- Bo Sun
- Faculty of Medicine, Biomedical Center, Protein Analysis Unit, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Pawel Smialowski
- Institute of Stem Cell Research, Helmholtz Center Munich, German Research Center for Environmental Health, Germany.,Faculty of Medicine, Biomedical Center, Computational Biology Unit, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Wasim Aftab
- Faculty of Medicine, Biomedical Center, Protein Analysis Unit, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Andreas Schmidt
- Faculty of Medicine, Biomedical Center, Protein Analysis Unit, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Ignasi Forne
- Faculty of Medicine, Biomedical Center, Protein Analysis Unit, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Tobias Straub
- Faculty of Medicine, Biomedical Center, Computational Biology Unit, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Axel Imhof
- Faculty of Medicine, Biomedical Center, Protein Analysis Unit, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| |
Collapse
|
20
|
Ma P, Zhang Z, Jia X, Peng X, Zhang Z, Tarwa K, Wei CI, Liu F, Wang Q. Neural network in food analytics. Crit Rev Food Sci Nutr 2022; 64:4059-4077. [PMID: 36322538 DOI: 10.1080/10408398.2022.2139217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Neural network (i.e. deep learning, NN)-based data analysis techniques have been listed as a pivotal opportunity to protect the integrity and safety of the global food supply chain and forecast $11.2 billion in agriculture markets. As a general-purpose data analytic tool, NN has been applied in several areas of food science, such as food recognition, food supply chain security and omics analysis, and so on. Therefore, given the rapid emergence of NN applications in food safety, this review aims to provide a comprehensive overview of the NN application in food analysis for the first time, focusing on domain-specific applications in food analysis by introducing fundamental methodology, reviewing recent and notable progress, and discussing challenges and potential pitfalls. NN demonstrated that it has a bright future through effective collaboration between food specialist and the broader community in the food field, for example, superiority in food recognition, sensory evaluation, pattern recognition of spectroscopy and chromatography. However, major challenges impeded NN extension including void in the food scientist-friendly interface software package, incomprehensible model behavior, multi-source heterogeneous data, and so on. The breakthrough from other fields proved NN has the potential to offer a revolution in the immediate future.
Collapse
Affiliation(s)
- Peihua Ma
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| | - Zhikun Zhang
- CISPA Helmholtz Center for Information Security, Saarbrucken, Germany
| | - Xiaoxue Jia
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| | - Xiaoke Peng
- College of Food Science and Engineering, Northwest A&F University, Yangling, Shaanxi, PR China
| | - Zhi Zhang
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| | - Kevin Tarwa
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| | - Cheng-I Wei
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| | - Fuguo Liu
- College of Food Science and Engineering, Northwest A&F University, Yangling, Shaanxi, PR China
| | - Qin Wang
- Department of Nutrition and Food Science, College of Agriculture and Natural Resources, University of Maryland, College Park, Maryland, USA
| |
Collapse
|
21
|
Guo J, Yu H, Xing S, Huan T. Addressing big data challenges in mass spectrometry-based metabolomics. Chem Commun (Camb) 2022; 58:9979-9990. [PMID: 35997016 DOI: 10.1039/d2cc03598g] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Advancements in computer science and software engineering have greatly facilitated mass spectrometry (MS)-based untargeted metabolomics. Nowadays, gigabytes of metabolomics data are routinely generated from MS platforms, containing condensed structural and quantitative information from thousands of metabolites. Manual data processing is almost impossible due to the large data size. Therefore, in the "omics" era, we are faced with new challenges, the big data challenges of how to accurately and efficiently process the raw data, extract the biological information, and visualize the results from the gigantic amount of collected data. Although important, proposing solutions to address these big data challenges requires broad interdisciplinary knowledge, which can be challenging for many metabolomics practitioners. Our laboratory in the Department of Chemistry at the University of British Columbia is committed to combining analytical chemistry, computer science, and statistics to develop bioinformatics tools that address these big data challenges. In this Feature Article, we elaborate on the major big data challenges in metabolomics, including data acquisition, feature extraction, quantitative measurements, statistical analysis, and metabolite annotation. We also introduce our recently developed bioinformatics solutions for these challenges. Notably, all of the bioinformatics tools and source codes are freely available on GitHub (https://www.github.com/HuanLab), along with revised and regularly updated content.
Collapse
Affiliation(s)
- Jian Guo
- Department of Chemistry, University of British Columbia, 2036 Main Mall, Vancouver, BC Canada, V6T 1Z1, Canada.
| | - Huaxu Yu
- Department of Chemistry, University of British Columbia, 2036 Main Mall, Vancouver, BC Canada, V6T 1Z1, Canada.
| | - Shipei Xing
- Department of Chemistry, University of British Columbia, 2036 Main Mall, Vancouver, BC Canada, V6T 1Z1, Canada.
| | - Tao Huan
- Department of Chemistry, University of British Columbia, 2036 Main Mall, Vancouver, BC Canada, V6T 1Z1, Canada.
| |
Collapse
|
22
|
Zeng J, Wu H, He M. Image classification combined with faster R–CNN for the peak detection of complex components and their metabolites in untargeted LC-HRMS data. Anal Chim Acta 2022; 1238:340189. [DOI: 10.1016/j.aca.2022.340189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 07/18/2022] [Indexed: 11/01/2022]
|
23
|
Fakouri Baygi S, Kumar Y, Barupal DK. IDSL.IPA Characterizes the Organic Chemical Space in Untargeted LC/HRMS Data Sets. J Proteome Res 2022; 21:1485-1494. [PMID: 35579321 DOI: 10.1021/acs.jproteome.2c00120] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Generating comprehensive and high-fidelity metabolomics data matrices from LC/HRMS data remains to be extremely challenging for population-scale large studies (n > 200). Here, we present a new data processing pipeline, the Intrinsic Peak Analysis (IDSL.IPA) R package (https://ipa.idsl.me), to generate such data matrices specifically for organic compounds. The IDSL.IPA pipeline incorporates (1) identifying potential 12C and 13C ion pairs in individual mass spectra; (2) detecting and characterizing chromatographic peaks using a new sensitive and versatile approach to perform mass correction, peak smoothing, baseline development for local noise measurement, and peak quality determination; (3) correcting retention time and cross-referencing peaks from multiple samples by a dynamic retention index marker approach; (4) annotating peaks using a reference database of m/z and retention time; and (5) accelerating data processing using a parallel computation of the peak detection and alignment steps for larger studies. This pipeline has been successfully evaluated for studies ranging from 200 to 1600 samples. By specifically isolating high quality and reliable signals pertaining to carbon-containing compounds in untargeted LC/HRMS data sets from larger studies, IDSL.IPA opens new opportunities for discovering new biological insights in the population-scale metabolomics and exposomics projects. The package is available in the R CRAN repository at https://cran.r-project.org/package=IDSL.IPA.
Collapse
Affiliation(s)
- Sadjad Fakouri Baygi
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, New York 10029, United States
| | - Yashwant Kumar
- Non-communicable Diseases Division, Translational Health Science and Technology Institute, Faridabad, Haryana 121001, India
| | - Dinesh Kumar Barupal
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, New York 10029, United States
| |
Collapse
|
24
|
Nikolopoulou V, Aalizadeh R, Nika MC, Thomaidis NS. TrendProbe: Time profile analysis of emerging contaminants by LC-HRMS non-target screening and deep learning convolutional neural network. JOURNAL OF HAZARDOUS MATERIALS 2022; 428:128194. [PMID: 35033918 DOI: 10.1016/j.jhazmat.2021.128194] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 12/08/2021] [Accepted: 12/29/2021] [Indexed: 06/14/2023]
Abstract
Peak prioritization is one of the key steps in non-target screening of environmental samples to direct the identification efforts to relevant and important features. Occurrence of chemicals is sometimes a function of time and their presence in consecutive days (trend) reveals important aspects such as discharges from agricultural, industrial or domestic activities. This study presents a validated computational framework based on deep learning conventional neural network to classify trends of chemicals over 30 consecutive days of sampling in two sampling sites (upstream and downstream of a river). From trend analysis and factor analysis, the chemicals could be classified into periodic, spill, increasing, decreasing and false trend. The developed method was validated with list of 42 reference standards (target screening) and applied to samples. 25 compounds were selected by the deep learning and identified via non-target screening. Three classes of surfactants were identified for the first time in river water and two of them were never reported in the literature. Overall, 21 new homologous series of the newly identified surfactants were tentatively identified. The aquatic toxicity of the identified compounds was estimated by in silico tools and a few compounds along with their homologous series showed potential risk to aquatic environment.
Collapse
Affiliation(s)
- Varvara Nikolopoulou
- Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15771 Athens, Greece
| | - Reza Aalizadeh
- Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15771 Athens, Greece.
| | - Maria-Christina Nika
- Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15771 Athens, Greece
| | - Nikolaos S Thomaidis
- Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15771 Athens, Greece.
| |
Collapse
|
25
|
Yun D, Kang D, Jang J, Angeles AT, Pyo J, Jeon J, Baek SS, Cho KH. A novel method for micropollutant quantification using deep learning and multi-objective optimization. WATER RESEARCH 2022; 212:118080. [PMID: 35114526 DOI: 10.1016/j.watres.2022.118080] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 01/11/2022] [Accepted: 01/12/2022] [Indexed: 06/14/2023]
Abstract
Micropollutants (MPs) released into aquatic ecosystems have adverse effects on public health. Hence, monitoring and managing MPs in aquatic systems are imperative. MPs can be quantified by high-resolution mass spectrometry (HRMS) with stable isotope-labeled (SIL) standards. However, high cost of SIL solutions is a significant issue. This study aims to develop a rapid and cost-effective analytical approach to estimate MP concentrations in aquatic systems based on deep learning (DL) and multi-objective optimization. We hypothesized that internal standards could quantify the MP concentrations other than the target substance. Our approach considered the precision of intra-/inter-day repeatability and natural organic matter information to reduce instrumental error and matrix effect. We selected standard solutions to estimate the concentrations of 18 MPs. Among the optimal DL models, DarkNet-53 using nine standard solutions yielded the highest performance, while ResNet-50 yielded the lowest. Overall, this study demonstrated the capability of DL models for estimating MP concentrations.
Collapse
Affiliation(s)
- Daeun Yun
- School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Ulsan 44919, South Korea
| | - Daeho Kang
- Department of Environmental Engineering, Changwon National University, Changwon, Gyeongsangnamdo, 51140, South Korea
| | - Jiyi Jang
- School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Ulsan 44919, South Korea
| | - Anne Therese Angeles
- School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Ulsan 44919, South Korea
| | - JongCheol Pyo
- Center for Environmental Data Strategy, Korea Environment Institute, Sejong 30147, South Korea
| | - Junho Jeon
- Department of Environmental Engineering, Changwon National University, Changwon, Gyeongsangnamdo, 51140, South Korea; School of Smart and Green Engineering, Changwon National University, Changwon, Gyeongsangnamdo, 51140, South Korea
| | - Sang-Soo Baek
- Department of Environmental Engineering, Yeungnam University, 280 Daehak-Ro, Gyeongsan-Si, Gyeongbuk 38541, South Korea.
| | - Kyung Hwa Cho
- School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Ulsan 44919, South Korea.
| |
Collapse
|
26
|
Jiang Q, Seth S, Scharl T, Schroeder T, Jungbauer A, Dimartino S. Prediction of the performance of pre-packed purification columns through machine learning. J Sep Sci 2022; 45:1445-1457. [PMID: 35262290 PMCID: PMC9310636 DOI: 10.1002/jssc.202100864] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 01/31/2022] [Accepted: 03/01/2022] [Indexed: 11/11/2022]
Abstract
Pre-packed columns have been increasingly used in process development and biomanufacturing thanks to their ease of use and consistency. Traditionally, packing quality is predicted through rate models, which require extensive calibration efforts through independent experiments to determine relevant mass transfer and kinetic rate constants. Here we propose machine learning as a complementary predictive tool for column performance. A machine learning algorithm, extreme gradient boosting, was applied to a large data set of packing quality (plate height and asymmetry) for pre-packed columns as a function of quantitative parameters (column length, column diameter, and particle size) and qualitative attributes (backbone and functional mode). The machine learning model offered excellent predictive capabilities for the plate height and the asymmetry (90 and 93%, respectively), with packing quality strongly influenced by backbone (∼70% relative importance) and functional mode (∼15% relative importance), well above all other quantitative column parameters. The results highlight the ability of machine learning to provide reliable predictions of column performance from simple, generic parameters, including strategic qualitative parameters such as backbone and functionality, usually excluded from quantitative considerations. Our results will guide further efforts in column optimization, for example, by focusing on improvements of backbone and functional mode to obtain optimized packings.
Collapse
Affiliation(s)
- Qihao Jiang
- Institute of BioengineeringSchool of EngineeringThe University of EdinburghEdinburghUK
| | - Sohan Seth
- School of InformaticsThe University of EdinburghEdinburghUK
| | - Theresa Scharl
- Austrian Centre of Industrial BiotechnologyViennaAustria
- Institute of StatisticsUniversity of Natural Resources and Life Sciences ViennaViennaAustria
| | | | - Alois Jungbauer
- Austrian Centre of Industrial BiotechnologyViennaAustria
- Department of BiotechnologyUniversity of Natural Resources and Life SciencesViennaAustria
| | - Simone Dimartino
- Institute of BioengineeringSchool of EngineeringThe University of EdinburghEdinburghUK
| |
Collapse
|
27
|
A ‘shape-orientated’ algorithm employing an adapted Marr wavelet and shape matching index improves the performance of continuous wavelet transform for chromatographic peak detection and quantification. J Chromatogr A 2022; 1673:463086. [DOI: 10.1016/j.chroma.2022.463086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 04/09/2022] [Accepted: 04/20/2022] [Indexed: 11/24/2022]
|
28
|
Zhong P, Wei X, Li X, Wei X, Wu S, Huang W, Koidis A, Xu Z, Lei H. Untargeted metabolomics by liquid chromatography‐mass spectrometry for food authentication: A review. Compr Rev Food Sci Food Saf 2022; 21:2455-2488. [DOI: 10.1111/1541-4337.12938] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 02/20/2022] [Accepted: 02/21/2022] [Indexed: 12/17/2022]
Affiliation(s)
- Peng Zhong
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
| | - Xiaoqun Wei
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
| | - Xiangmei Li
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
| | - Xiaoyi Wei
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
| | - Shaozong Wu
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
| | - Weijuan Huang
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
| | - Anastasios Koidis
- Institute for Global Food Security Queen's University Belfast Belfast UK
| | - Zhenlin Xu
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
| | - Hongtao Lei
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
- Guangdong Laboratory for Lingnan Modern Agriculture South China Agricultural University Guangzhou 510642 China
| |
Collapse
|
29
|
Abstract
![]()
Available automated
methods for peak detection in untargeted metabolomics
suffer from poor precision. We present NeatMS, which uses machine
learning based on a convoluted neural network to reduce the number
and fraction of false peaks. NeatMS comes with a pre-trained model
representing expert knowledge in the differentiation of true chemical
signal from noise. Furthermore, it provides all necessary functions
to easily train new models or improve existing ones by transfer learning.
Thus, the tool improves peak curation and contributes to the robust
and scalable analysis of large-scale experiments. We show how to integrate
it into different liquid chromatography–mass spectrometry (LC-MS)
analysis workflows, quantify its performance, and compare it to various
other approaches. NeatMS software is available as open source on github
under permissive MIT license and is also provided as easy-to-install
PyPi and Bioconda packages.
Collapse
Affiliation(s)
- Yoann Gloaguen
- Berlin Institute of Health at Charité, Metabolomics Platform, 10178 Berlin, Germany.,Berlin Institute of Health at Charité, Core Unit Bioinformatics, 10178 Berlin, Germany.,Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125 Berlin, Germany
| | - Jennifer A Kirwan
- Berlin Institute of Health at Charité, Metabolomics Platform, 10178 Berlin, Germany.,Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125 Berlin, Germany
| | - Dieter Beule
- Berlin Institute of Health at Charité, Core Unit Bioinformatics, 10178 Berlin, Germany.,Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125 Berlin, Germany
| |
Collapse
|
30
|
Pirttilä K, Balgoma D, Rainer J, Pettersson C, Hedeland M, Brunius C. Comprehensive Peak Characterization (CPC) in Untargeted LC-MS Analysis. Metabolites 2022; 12:137. [PMID: 35208212 PMCID: PMC8878835 DOI: 10.3390/metabo12020137] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/21/2022] [Accepted: 01/29/2022] [Indexed: 02/05/2023] Open
Abstract
LC-MS-based untargeted metabolomics is heavily dependent on algorithms for automated peak detection and data preprocessing due to the complexity and size of the raw data generated. These algorithms are generally designed to be as inclusive as possible in order to minimize the number of missed peaks. This is known to result in an abundance of false positive peaks that further complicate downstream data processing and analysis. As a consequence, considerable effort is spent identifying features of interest that might represent peak detection artifacts. Here, we present the CPC algorithm, which allows automated characterization of detected peaks with subsequent filtering of low quality peaks using quality criteria familiar to analytical chemists. We provide a thorough description of the methods in addition to applying the algorithms to authentic metabolomics data. In the example presented, the algorithm removed about 35% of the peaks detected by XCMS, a majority of which exhibited a low signal-to-noise ratio. The algorithm is made available as an R-package and can be fully integrated into a standard XCMS workflow.
Collapse
Affiliation(s)
- Kristian Pirttilä
- Department of Medicinal Chemistry, Uppsala University, SE-75123 Uppsala, Sweden; (D.B.); (C.P.); (M.H.)
| | - David Balgoma
- Department of Medicinal Chemistry, Uppsala University, SE-75123 Uppsala, Sweden; (D.B.); (C.P.); (M.H.)
| | - Johannes Rainer
- Institute for Biomedicine, Eurac Research, Affiliated Institute of the University of Lübeck, 39100 Bolzano, Italy;
| | - Curt Pettersson
- Department of Medicinal Chemistry, Uppsala University, SE-75123 Uppsala, Sweden; (D.B.); (C.P.); (M.H.)
| | - Mikael Hedeland
- Department of Medicinal Chemistry, Uppsala University, SE-75123 Uppsala, Sweden; (D.B.); (C.P.); (M.H.)
| | - Carl Brunius
- Department of Biology and Biological Engineering, Chalmers University of Technology, SE-41296 Gothenburg, Sweden;
- Chalmers Mass Spectrometry Infrastructure, Chalmers University of Technology, SE-41296 Gothenburg, Sweden
| |
Collapse
|
31
|
Sensitivity and generalized analytical sensitivity expressions for quantitative analysis using convolutional neural networks. Anal Chim Acta 2022; 1192:338697. [DOI: 10.1016/j.aca.2021.338697] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 05/21/2021] [Accepted: 05/23/2021] [Indexed: 11/17/2022]
|
32
|
Du X, Aristizabal-Henao JJ, Garrett TJ, Brochhausen M, Hogan WR, Lemas DJ. A Checklist for Reproducible Computational Analysis in Clinical Metabolomics Research. Metabolites 2022; 12:87. [PMID: 35050209 PMCID: PMC8779534 DOI: 10.3390/metabo12010087] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 12/25/2021] [Accepted: 01/10/2022] [Indexed: 12/15/2022] Open
Abstract
Clinical metabolomics emerged as a novel approach for biomarker discovery with the translational potential to guide next-generation therapeutics and precision health interventions. However, reproducibility in clinical research employing metabolomics data is challenging. Checklists are a helpful tool for promoting reproducible research. Existing checklists that promote reproducible metabolomics research primarily focused on metadata and may not be sufficient to ensure reproducible metabolomics data processing. This paper provides a checklist including actions that need to be taken by researchers to make computational steps reproducible for clinical metabolomics studies. We developed an eight-item checklist that includes criteria related to reusable data sharing and reproducible computational workflow development. We also provided recommended tools and resources to complete each item, as well as a GitHub project template to guide the process. The checklist is concise and easy to follow. Studies that follow this checklist and use recommended resources may facilitate other researchers to reproduce metabolomics results easily and efficiently.
Collapse
Affiliation(s)
- Xinsong Du
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA; (X.D.); (W.R.H.)
| | | | - Timothy J. Garrett
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, USA;
| | - Mathias Brochhausen
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA;
| | - William R. Hogan
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA; (X.D.); (W.R.H.)
| | - Dominick J. Lemas
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA; (X.D.); (W.R.H.)
| |
Collapse
|
33
|
Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol 2022; 23:40-55. [PMID: 34518686 DOI: 10.1038/s41580-021-00407-0] [Citation(s) in RCA: 579] [Impact Index Per Article: 289.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/23/2021] [Indexed: 02/08/2023]
Abstract
The expanding scale and inherent complexity of biological data have encouraged a growing use of machine learning in biology to build informative and predictive models of the underlying biological processes. All machine learning techniques fit models to data; however, the specific methods are quite varied and can at first glance seem bewildering. In this Review, we aim to provide readers with a gentle introduction to a few key machine learning techniques, including the most recently developed and widely used techniques involving deep neural networks. We describe how different techniques may be suited to specific types of biological data, and also discuss some best practices and points to consider when one is embarking on experiments involving machine learning. Some emerging directions in machine learning methodology are also discussed.
Collapse
Affiliation(s)
- Joe G Greener
- Department of Computer Science, University College London, London, UK
| | - Shaun M Kandathil
- Department of Computer Science, University College London, London, UK
| | - Lewis Moffat
- Department of Computer Science, University College London, London, UK
| | - David T Jones
- Department of Computer Science, University College London, London, UK.
| |
Collapse
|
34
|
Defining Blood Plasma and Serum Metabolome by GC-MS. Metabolites 2021; 12:metabo12010015. [PMID: 35050137 PMCID: PMC8779220 DOI: 10.3390/metabo12010015] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 12/18/2021] [Accepted: 12/21/2021] [Indexed: 01/04/2023] Open
Abstract
Metabolomics uses advanced analytical chemistry methods to analyze metabolites in biological samples. The most intensively studied samples are blood and its liquid components: plasma and serum. Armed with advanced equipment and progressive software solutions, the scientific community has shown that small molecules’ roles in living systems are not limited to traditional “building blocks” or “just fuel” for cellular energy. As a result, the conclusions based on studying the metabolome are finding practical reflection in molecular medicine and a better understanding of fundamental biochemical processes in living systems. This review is not a detailed protocol of metabolomic analysis. However, it should support the reader with information about the achievements in the whole process of metabolic exploration of human plasma and serum using mass spectrometry combined with gas chromatography.
Collapse
|
35
|
|
36
|
Li R, Li L, Xu Y, Yang J. Machine learning meets omics: applications and perspectives. Brief Bioinform 2021; 23:6425809. [PMID: 34791021 DOI: 10.1093/bib/bbab460] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 09/29/2021] [Accepted: 10/07/2021] [Indexed: 02/07/2023] Open
Abstract
The innovation of biotechnologies has allowed the accumulation of omics data at an alarming rate, thus introducing the era of 'big data'. Extracting inherent valuable knowledge from various omics data remains a daunting problem in bioinformatics. Better solutions often need some kind of more innovative methods for efficient handlings and effective results. Recent advancements in integrated analysis and computational modeling of multi-omics data helped address such needs in an increasingly harmonious manner. The development and application of machine learning have largely advanced our insights into biology and biomedicine and greatly promoted the development of therapeutic strategies, especially for precision medicine. Here, we propose a comprehensive survey and discussion on what happened, is happening and will happen when machine learning meets omics. Specifically, we describe how artificial intelligence can be applied to omics studies and review recent advancements at the interface between machine learning and the ever-widest range of omics including genomics, transcriptomics, proteomics, metabolomics, radiomics, as well as those at the single-cell resolution. We also discuss and provide a synthesis of ideas, new insights, current challenges and perspectives of machine learning in omics.
Collapse
Affiliation(s)
- Rufeng Li
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China
| | - Lixin Li
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China
| | - Yungang Xu
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an, 710129, China
| | - Juan Yang
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China.,Key Laboratory of Environment and Genes Related to Diseases (Xi'an Jiaotong University), Ministry of Education of China, Xi'an 710061, P. R. China
| |
Collapse
|
37
|
Zamora Obando HR, Duarte GHB, Simionato AVC. Metabolomics Data Treatment: Basic Directions of the Full Process. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2021; 1336:243-264. [PMID: 34628635 DOI: 10.1007/978-3-030-77252-9_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
The present chapter describes basic aspects of the main steps for data processing on mass spectrometry-based metabolomics platforms, focusing on the main objectives and important considerations of each step. Initially, an overview of metabolomics and the pivotal techniques applied in the field are presented. Important features of data acquisition and preprocessing such as data compression, noise filtering, and baseline correction are revised focusing on practical aspects. Peak detection, deconvolution, and alignment as well as missing values are also discussed. Special attention is given to chemical and mathematical normalization approaches and the role of the quality control (QC) samples. Methods for uni- and multivariate statistical analysis and data pretreatment that could impact them are reviewed, emphasizing the most widely used multivariate methods, i.e., principal components analysis (PCA), partial least squares-discriminant analysis (PLS-DA), orthogonal partial least square-discriminant analysis (OPLS-DA), and hierarchical cluster analysis (HCA). Criteria for model validation and softwares used in data processing were also approached. The chapter ends with some concerns about the minimal requirements to report metadata in metabolomics.
Collapse
Affiliation(s)
- Hans Rolando Zamora Obando
- Department of Analytical Chemistry, Institute of Chemistry, University of Campinas, Campinas, SP, Brazil
| | | | | |
Collapse
|
38
|
Wang CY, Ko TS, Hsu CC. Interpreting convolutional neural network for real-time volatile organic compounds detection and classification using optical emission spectroscopy of plasma. Anal Chim Acta 2021; 1179:338822. [PMID: 34535253 DOI: 10.1016/j.aca.2021.338822] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 06/28/2021] [Accepted: 06/30/2021] [Indexed: 01/02/2023]
Abstract
This study presents the investigation of optical emission spectroscopy of plasma using interpretable convolutional neural network (CNN) for real-time volatile organic compounds (VOCs) classification. A microplasma-generation platform was developed to efficiently collect 64 k spectra from various types of VOCs at different concentrations, as training and testing sets for machine learning. A CNN model was trained to classify VOCs with accuracy of 99.9%. To interpret the CNN model and its predictions, the spectral processing mechanism of the CNN was visualized by feature maps and the critical spectral features were identified by gradient-weighted class activation mapping. Such approaches brought insights on how CNN analyzes the spectra and enables the CNN operation to be explainable. Finally, the CNN model was incorporated with the microplasma platform to demonstrate the application of real-time VOC monitoring. The type of VOCs can be identified and reported via messages within 10 s once the microplasma is ignited. We believe that using CNN brings a novel route for plasma spectroscopy analysis for VOC classification and impacts the fields of plasma, spectroscopy, and environmental monitoring.
Collapse
Affiliation(s)
- Ching-Yu Wang
- Department of Chemical Engineering, National Taiwan University, Taipei, Taiwan
| | - Tsung-Shun Ko
- Department of Chemical Engineering, National Taiwan University, Taipei, Taiwan
| | - Cheng-Che Hsu
- Department of Chemical Engineering, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
39
|
Guo J, Shen S, Xing S, Chen Y, Chen F, Porter EM, Yu H, Huan T. EVA: Evaluation of Metabolic Feature Fidelity Using a Deep Learning Model Trained With Over 25000 Extracted Ion Chromatograms. Anal Chem 2021; 93:12181-12186. [PMID: 34455775 DOI: 10.1021/acs.analchem.1c01309] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Extracting metabolic features from liquid chromatography-mass spectrometry (LC-MS) data relies on the recognition of extracted ion chromatogram (EIC) peak shapes using peak picking algorithms. Unfortunately, all peak picking algorithms present a significant drawback of generating a problematic number of false positives. In this work, we take advantage of deep learning technology to develop a convolutional neural network (CNN)-based program that can automatically recognize metabolic features with poor EIC shapes, which are of low feature fidelity and more likely to be false. Our CNN model was trained using 25095 EIC plots collected from 22 LC-MS-based metabolomics projects of various sample types, LC and MS conditions. Notably, we manually inspected all the EIC plots to assign good or poor EIC quality for accurate model training. The trained CNN model is embedded into a C#-based program, named EVA (short for evaluation). The EVA Windows Application is a versatile platform that can process metabolic features generated by LC-MS systems of various vendors and processed using various data processing software. Our comprehensive evaluation of EVA indicates that it achieves over 90% classification accuracy. EVA can be readily used in LC-MS-based metabolomics projects and is freely available on the Microsoft Store by searching "EVA Metabolomics".
Collapse
Affiliation(s)
- Jian Guo
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, V6T 1Z1, British Columbia, Canada
| | - Sam Shen
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, V6T 1Z1, British Columbia, Canada
| | - Shipei Xing
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, V6T 1Z1, British Columbia, Canada
| | - Ying Chen
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, V6T 1Z1, British Columbia, Canada
| | - Frank Chen
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, V6T 1Z1, British Columbia, Canada
| | - Elizabeth M Porter
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, V6T 1Z1, British Columbia, Canada
| | - Huaxu Yu
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, V6T 1Z1, British Columbia, Canada
| | - Tao Huan
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, V6T 1Z1, British Columbia, Canada
| |
Collapse
|
40
|
Ma A, Qi X. Mining plant metabolomes: Methods, applications, and perspectives. PLANT COMMUNICATIONS 2021; 2:100238. [PMID: 34746766 PMCID: PMC8554038 DOI: 10.1016/j.xplc.2021.100238] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Revised: 07/31/2021] [Accepted: 09/02/2021] [Indexed: 06/13/2023]
Abstract
Plants produce a variety of metabolites that are essential for plant growth and human health. To fully understand the diversity of metabolites in certain plants, lots of methods have been developed for metabolites detection and data processing. In the data-processing procedure, how to effectively reduce false-positive peaks, analyze large-scale metabolic data, and annotate plant metabolites remains challenging. In this review, we introduce and discuss some prominent methods that could be exploited to solve these problems, including a five-step filtering method for reducing false-positive signals in LC-MS analysis, QPMASS for analyzing ultra-large GC-MS data, and MetDNA for annotating metabolites. The main applications of plant metabolomics in species discrimination, metabolic pathway dissection, population genetic studies, and some other aspects are also highlighted. To further promote the development of plant metabolomics, more effective and integrated methods/platforms for metabolite detection and comprehensive databases for metabolite identification are highly needed. With the improvement of these technologies and the development of genomics and transcriptomics, plant metabolomics will be widely used in many fields.
Collapse
Affiliation(s)
- Aimin Ma
- Key Laboratory of Plant Molecular Physiology, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaoquan Qi
- Key Laboratory of Plant Molecular Physiology, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
41
|
Britt HM, Cragnolini T, Thalassinos K. Integration of Mass Spectrometry Data for Structural Biology. Chem Rev 2021; 122:7952-7986. [PMID: 34506113 DOI: 10.1021/acs.chemrev.1c00356] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Mass spectrometry (MS) is increasingly being used to probe the structure and dynamics of proteins and the complexes they form with other macromolecules. There are now several specialized MS methods, each with unique sample preparation, data acquisition, and data processing protocols. Collectively, these methods are referred to as structural MS and include cross-linking, hydrogen-deuterium exchange, hydroxyl radical footprinting, native, ion mobility, and top-down MS. Each of these provides a unique type of structural information, ranging from composition and stoichiometry through to residue level proximity and solvent accessibility. Structural MS has proved particularly beneficial in studying protein classes for which analysis by classic structural biology techniques proves challenging such as glycosylated or intrinsically disordered proteins. To capture the structural details for a particular system, especially larger multiprotein complexes, more than one structural MS method with other structural and biophysical techniques is often required. Key to integrating these diverse data are computational strategies and software solutions to facilitate this process. We provide a background to the structural MS methods and briefly summarize other structural methods and how these are combined with MS. We then describe current state of the art approaches for the integration of structural MS data for structural biology. We quantify how often these methods are used together and provide examples where such combinations have been fruitful. To illustrate the power of integrative approaches, we discuss progress in solving the structures of the proteasome and the nuclear pore complex. We also discuss how information from structural MS, particularly pertaining to protein dynamics, is not currently utilized in integrative workflows and how such information can provide a more accurate picture of the systems studied. We conclude by discussing new developments in the MS and computational fields that will further enable in-cell structural studies.
Collapse
Affiliation(s)
- Hannah M Britt
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, United Kingdom
| | - Tristan Cragnolini
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, United Kingdom.,Institute of Structural and Molecular Biology, Birkbeck College, University of London, London WC1E 7HX, United Kingdom
| | - Konstantinos Thalassinos
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, United Kingdom.,Institute of Structural and Molecular Biology, Birkbeck College, University of London, London WC1E 7HX, United Kingdom
| |
Collapse
|
42
|
Bacong JRC, Juanico DEO. Predictive Chromatography of Leaf Extracts Through Encoded Environmental Forcing on Phytochemical Synthesis. FRONTIERS IN PLANT SCIENCE 2021; 12:613507. [PMID: 34512676 PMCID: PMC8424046 DOI: 10.3389/fpls.2021.613507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 07/26/2021] [Indexed: 06/13/2023]
Abstract
Environment fluctuations can influence a plant's phytochemical profile via phenotypic plasticity. This adaptive response ensures a plant's survival under fluctuating growth conditions. However, the resulting plant extract composition becomes unpredictable, which is a problem for highly standardized medicinal applications. Here we demonstrate, for the first time, the feasibility of tracking the changes in the phytochemical profile based on real-time measurements of a few environment and extract-preparation variables. As a result, we predicted the chromatograms of Blumea balsamifera extracts through an imputation-augmented convolutional neural network, which uses the image-transformed temporal measurements of the variables. We developed a sensor network that collected data in a greenhouse and a training algorithm that concurrently generated a data representation of the implicit plant-environment interactions leading to the mutable chromatograms of leaf extracts. We anticipate the generic applicability of the method for any plant and recognize its potential for addressing the standardization problems in plant therapeutics.
Collapse
|
43
|
Abstract
Mass-spectrometry-based proteomics enables quantitative analysis of thousands of human proteins. However, experimental and computational challenges restrict progress in the field. This review summarizes the recent flurry of machine-learning strategies using artificial deep neural networks (or "deep learning") that have started to break barriers and accelerate progress in the field of shotgun proteomics. Deep learning now accurately predicts physicochemical properties of peptides from their sequence, including tandem mass spectra and retention time. Furthermore, deep learning methods exist for nearly every aspect of the modern proteomics workflow, enabling improved feature selection, peptide identification, and protein inference.
Collapse
Affiliation(s)
- Jesse G. Meyer
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| |
Collapse
|
44
|
Pérez-Cova M, Jaumot J, Tauler R. Untangling comprehensive two-dimensional liquid chromatography data sets using regions of interest and multivariate curve resolution approaches. Trends Analyt Chem 2021. [DOI: 10.1016/j.trac.2021.116207] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
45
|
Kensert A, Collaerts G, Efthymiadis K, Van Broeck P, Desmet G, Cabooter D. Deep convolutional autoencoder for the simultaneous removal of baseline noise and baseline drift in chromatograms. J Chromatogr A 2021; 1646:462093. [PMID: 33853038 DOI: 10.1016/j.chroma.2021.462093] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 03/15/2021] [Accepted: 03/19/2021] [Indexed: 12/25/2022]
Abstract
Enhancement of chromatograms, such as the reduction of baseline noise and baseline drift, is often essential to accurately detect and quantify analytes in a mixture. Current methods have been well studied and adopted for decades and have assisted researchers in obtaining reliable results. However, these methods rely on relatively simple statistics of the data (chromatograms) which in some cases result in significant information loss and inaccuracies. In this study, a deep one-dimensional convolutional autoencoder was developed that simultaneously removes baseline noise and baseline drift with minimal information loss, for a large number and great variety of chromatograms. To enable the autoencoder to denoise a chromatogram to be almost, or completely, noise-free, it was trained on data obtained from an implemented chromatogram simulator that generated 190.000 representative simulated chromatograms. The trained autoencoder was then tested and compared to some of the most widely used and well-established denoising methods on testing datasets of tens of thousands of simulated chromatograms; and then further tested and verified on real chromatograms. The results show that the developed autoencoder can successfully remove baseline noise and baseline drift simultaneously with minimal information loss; outperforming methods like Savitzky-Golay smoothing, Gaussian smoothing and wavelet smoothing for baseline noise reduction (root mean squared error of 1.094 mAU compared to 2.074 mAU, 2.394 mAU and 2.199 mAU) and Savitkzy-Golay smoothing combined with asymmetric least-squares or polynomial fitting for baseline noise and baseline drift reduction (root mean absolute error of 1.171 mAU compared to 3.397 mAU and 4.923 mAU). Evidence is presented that autoencoders can be utilized to enhance and correct chromatograms and consequently improve and alleviate downstream data analysis, with the drawback of needing a carefully implemented simulator, that generates realistic chromatograms, to train the autoencoder.
Collapse
Affiliation(s)
- Alexander Kensert
- University of Leuven (KU Leuven), Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, Herestraat 49, 3000 Leuven, Belgium; Vrije Universiteit Brussel, Department of Chemical Engineering, Pleinlaan 2, 1050 Brussel, Belgium
| | - Gilles Collaerts
- University of Leuven (KU Leuven), Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, Herestraat 49, 3000 Leuven, Belgium
| | - Kyriakos Efthymiadis
- University of Leuven (KU Leuven), Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, Herestraat 49, 3000 Leuven, Belgium; Vrije Universiteit Brussel, Department of Computer Science, Artificial Intelligence Laboratory, Pleinlaan 9, 1050 Brussel, Belgium
| | - Peter Van Broeck
- Janssen Pharmaceutica, Department of Pharmaceutical Development and Manufacturing Sciences, Turnhoutseweg 30, Beerse, Belgium
| | - Gert Desmet
- Vrije Universiteit Brussel, Department of Chemical Engineering, Pleinlaan 2, 1050 Brussel, Belgium
| | - Deirdre Cabooter
- University of Leuven (KU Leuven), Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, Herestraat 49, 3000 Leuven, Belgium.
| |
Collapse
|
46
|
Wang H, Pujos-Guillot E, Comte B, de Miranda JL, Spiwok V, Chorbev I, Castiglione F, Tieri P, Watterson S, McAllister R, de Melo Malaquias T, Zanin M, Rai TS, Zheng H. Deep learning in systems medicine. Brief Bioinform 2021; 22:1543-1559. [PMID: 33197934 PMCID: PMC8382976 DOI: 10.1093/bib/bbaa237] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 08/25/2020] [Accepted: 08/26/2020] [Indexed: 12/11/2022] Open
Abstract
Systems medicine (SM) has emerged as a powerful tool for studying the human body at the systems level with the aim of improving our understanding, prevention and treatment of complex diseases. Being able to automatically extract relevant features needed for a given task from high-dimensional, heterogeneous data, deep learning (DL) holds great promise in this endeavour. This review paper addresses the main developments of DL algorithms and a set of general topics where DL is decisive, namely, within the SM landscape. It discusses how DL can be applied to SM with an emphasis on the applications to predictive, preventive and precision medicine. Several key challenges have been highlighted including delivering clinical impact and improving interpretability. We used some prototypical examples to highlight the relevance and significance of the adoption of DL in SM, one of them is involving the creation of a model for personalized Parkinson's disease. The review offers valuable insights and informs the research in DL and SM.
Collapse
Affiliation(s)
| | - Estelle Pujos-Guillot
- metabolomic platform dedicated to metabolism studies in nutrition and health in the French National Research Institute for Agriculture, Food and Environment
| | - Blandine Comte
- French National Research Institute for Agriculture, Food and Environment
| | - Joao Luis de Miranda
- (ESTG/IPP) and a Researcher (CERENA/IST) in optimization methods and process systems engineering
| | - Vojtech Spiwok
- Molecular Modelling Researcher applying machine learning to accelerate molecular simulations
| | - Ivan Chorbev
- Faculty for Computer Science and Engineering, University Ss Cyril and Methodius in Skopje, North Macedonia working in the area of eHealth and assistive technologies
| | | | - Paolo Tieri
- National Research Council of Italy (CNR) and a lecturer at Sapienza University in Rome, working in the field of network medicine and computational biology
| | | | - Roisin McAllister
- Research Associate working in CTRIC, University of Ulster, Derry, and has worked in clinical and academic roles in the fields of molecular diagnostics and biomarker discovery
| | | | - Massimiliano Zanin
- Researcher working in the Institute for Cross-Disciplinary Physics and Complex Systems, Spain, with an interest on data analysis and integration using statistical physics techniques
| | - Taranjit Singh Rai
- Lecturer in cellular ageing at the Centre for Stratified Medicine. Dr Rai’s research interests are in cellular senescence, which is thought to promote cellular and tissue ageing in disease, and the development of senolytic compounds to restrict this process
| | - Huiru Zheng
- Professor of computer sciences at Ulster University
| |
Collapse
|
47
|
Data processing strategies for non-targeted analysis of foods using liquid chromatography/high-resolution mass spectrometry. Trends Analyt Chem 2021. [DOI: 10.1016/j.trac.2021.116188] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
48
|
Cifarelli V, Beeman SC, Smith GI, Yoshino J, Morozov D, Beals JW, Kayser BD, Watrous JD, Jain M, Patterson BW, Klein S. Decreased adipose tissue oxygenation associates with insulin resistance in individuals with obesity. J Clin Invest 2021; 130:6688-6699. [PMID: 33164985 DOI: 10.1172/jci141828] [Citation(s) in RCA: 69] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 08/26/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUNDData from studies conducted in rodent models have shown that decreased adipose tissue (AT) oxygenation is involved in the pathogenesis of obesity-induced insulin resistance. Here, we evaluated the potential influence of AT oxygenation on AT biology and insulin sensitivity in people.METHODSWe evaluated subcutaneous AT oxygen partial pressure (pO2); liver and whole-body insulin sensitivity; AT expression of genes and pathways involved in inflammation, fibrosis, and branched-chain amino acid (BCAA) catabolism; systemic markers of inflammation; and plasma BCAA concentrations, in 3 groups of participants that were rigorously stratified by adiposity and insulin sensitivity: metabolically healthy lean (MHL; n = 11), metabolically healthy obese (MHO; n = 15), and metabolically unhealthy obese (MUO; n = 20).RESULTSAT pO2 progressively declined from the MHL to the MHO to the MUO group, and was positively associated with hepatic and whole-body insulin sensitivity. AT pO2 was positively associated with the expression of genes involved in BCAA catabolism, in conjunction with an inverse relationship between AT pO2 and plasma BCAA concentrations. AT pO2 was negatively associated with AT gene expression of markers of inflammation and fibrosis. Plasma PAI-1 increased from the MHL to the MHO to the MUO group and was negatively correlated with AT pO2, whereas the plasma concentrations of other cytokines and chemokines were not different among the MHL and MUO groups.CONCLUSIONThese results support the notion that reduced AT oxygenation in individuals with obesity contributes to insulin resistance by increasing plasma PAI-1 concentrations and decreasing AT BCAA catabolism and thereby increasing plasma BCAA concentrations.TRIAL REGISTRATIONClinicalTrials.gov NCT02706262.FUNDINGThis study was supported by NIH grants K01DK109119, T32HL130357, K01DK116917, R01ES027595, P42ES010337, DK56341 (Nutrition Obesity Research Center), DK20579 (Diabetes Research Center), DK052574 (Digestive Disease Research Center), and UL1TR002345 (Clinical and Translational Science Award); NIH Shared Instrumentation Grants S10RR0227552, S10OD020025, and S10OD026929; and the Foundation for Barnes-Jewish Hospital.
Collapse
Affiliation(s)
- Vincenza Cifarelli
- Center for Human Nutrition and Atkins Center of Excellence in Obesity Medicine, and
| | - Scott C Beeman
- Center for Human Nutrition and Atkins Center of Excellence in Obesity Medicine, and.,Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Gordon I Smith
- Center for Human Nutrition and Atkins Center of Excellence in Obesity Medicine, and
| | - Jun Yoshino
- Center for Human Nutrition and Atkins Center of Excellence in Obesity Medicine, and
| | - Darya Morozov
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Joseph W Beals
- Center for Human Nutrition and Atkins Center of Excellence in Obesity Medicine, and
| | - Brandon D Kayser
- Center for Human Nutrition and Atkins Center of Excellence in Obesity Medicine, and
| | - Jeramie D Watrous
- Departments of Medicine and Pharmacology, University of California, San Diego, La Jolla, California, USA
| | - Mohit Jain
- Departments of Medicine and Pharmacology, University of California, San Diego, La Jolla, California, USA
| | - Bruce W Patterson
- Center for Human Nutrition and Atkins Center of Excellence in Obesity Medicine, and
| | - Samuel Klein
- Center for Human Nutrition and Atkins Center of Excellence in Obesity Medicine, and
| |
Collapse
|
49
|
Eyke NS, Koscher BA, Jensen KF. Toward Machine Learning-Enhanced High-Throughput Experimentation. TRENDS IN CHEMISTRY 2021. [DOI: 10.1016/j.trechm.2020.12.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
50
|
Baek SS, Choi Y, Jeon J, Pyo J, Park J, Cho KH. Replacing the internal standard to estimate micropollutants using deep and machine learning. WATER RESEARCH 2021; 188:116535. [PMID: 33147564 DOI: 10.1016/j.watres.2020.116535] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 09/29/2020] [Accepted: 10/18/2020] [Indexed: 06/11/2023]
Abstract
Similar to the worldwide proliferation of urbanization, micropollutants have been involved in aquatic and ecological environmental systems. These pollutants have the propensity to wreak havoc on human health and the ecological system; hence, it is important to persistently monitor micropollutants in the environment. Micropollutants are commonly quantified via target analysis using high resolution mass spectrometry and the stable isotope labeled (SIL) standard. However, the cost-intensiveness of this standard presents a major obstacle in measuring micropollutants. This study resolved this problem by developing data-driven models, including deep learning (DL) and machine learning (ML), to estimate the concentration of micropollutants without resorting to the SIL standard. Our study hypothesized that natural organic matter (NOM) could replace internal standards if there was a specific mass spectrum (MS) subset, including NOM information, which correlated with an SIL standard peak. Therefore, we analyzed the MS to find the specific MS subsets for replacing the SIL standard peak. Thirty-five alternative MS subsets were determined for applying DL and ML as input data. Thereafter, we trained four different DL models, namely, ResNet101, GoogLeNet, VGG16, and Inception v3, as well as three different ML models, i.e., random forest (RF), support vector machine (SVM), and artificial neural network (ANN). A total of 680 MS data were used for the model training to estimate five different micropollutants, namely Sulpiride, Metformin, and Benzotriazole. Among the DL models, ResNet 101 exhibited the highest model performance, showing that the average validation R2 and MSE were 0.84 and 0.26 ng/L, respectively, while RF was the best in the ML models, manifesting R2 and MSE values of 0.69 and 0.58 ng/L. The trained models showed accurate training and validation results for the estimation of the five micropollutant concentrations. Therefore, this study demonstrates that the suggested analysis has a potential for alternative micropollutant measurement that has rapid and economic vantages.
Collapse
Affiliation(s)
- Sang-Soo Baek
- School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea
| | - Younghun Choi
- Graduate School of FEED of Eco-Friendly Offshore Structure, Changwon National University, Changwon, Gyeongsangnamdo, 51140, Republic of Korea
| | - Junho Jeon
- School of Civil, Environmental and Chemical Engineering, Changwon National University, Changwon, Gyeongsangnamdo, 51140, Korea
| | - JongCheol Pyo
- School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea
| | - Jongkwan Park
- School of Civil, Environmental and Chemical Engineering, Changwon National University, Changwon, Gyeongsangnamdo, 51140, Korea.
| | - Kyung Hwa Cho
- School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea.
| |
Collapse
|