1
|
Azmy L, Al-Olayan E, Abdelhamid MAA, Zayed A, Gheda SF, Youssif KA, Abou-Zied HA, Abdelmohsen UR, Ibraheem IBM, Pack SP, Elsayed KNM. Antimicrobial Activity of Arthrospira platensis-Mediated Gold Nanoparticles against Streptococcus pneumoniae: A Metabolomic and Docking Study. Int J Mol Sci 2024; 25:10090. [PMID: 39337576 PMCID: PMC11432420 DOI: 10.3390/ijms251810090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 09/06/2024] [Accepted: 09/11/2024] [Indexed: 09/30/2024] Open
Abstract
The emergence of antibiotic-resistant Streptococcus pneumoniae necessitates the discovery of novel therapeutic agents. This study investigated the antimicrobial potential of green-synthesized gold nanoparticles (AuNPs) fabricated using Arthrospira platensis extract. Characterization using Fourier transform infrared spectroscopy revealed the presence of functional groups such as ketones, aldehydes, and carboxylic acids in the capping agents, suggesting their role in AuNP stabilization. Transmission electron microscopy demonstrated the formation of rod-shaped AuNPs with a mean diameter of 134.8 nm, as determined by dynamic light scattering, and a zeta potential of -27.2 mV, indicating good colloidal stability. The synthesized AuNPs exhibited potent antibacterial activity against S. pneumoniae, with a minimum inhibitory concentration (MIC) of 12 μg/mL, surpassing the efficacy of the control antibiotic, tigecycline. To elucidate the underlying mechanisms of action, an untargeted metabolomic analysis of the A. platensis extract was performed, identifying 26 potential bioactive compounds belonging to diverse chemical classes. In silico studies focused on molecular docking simulations revealed that compound 22 exhibited a strong binding affinity to S. pneumoniae topoisomerase IV, a critical enzyme for bacterial DNA replication. Molecular dynamics simulations further validated the stability of this protein-ligand complex. These findings collectively highlight the promising antimicrobial potential of A. platensis-derived AuNPs and their constituent compounds, warranting further investigation for the development of novel anti-pneumococcal therapeutics.
Collapse
Affiliation(s)
- Lamya Azmy
- Department of Botany and Microbiology, Faculty of Science, Beni-Suef University, Beni-Suef 62521, Egypt
| | - Ebtesam Al-Olayan
- Department of Zoology, College of Science, King Saud University, Riyadh 11472, Saudi Arabia
| | - Mohamed A A Abdelhamid
- Biology Department, Faculty of Education and Arts, Sohar University, Sohar 311, Oman
- Department of Biotechnology and Bioinformatics, Korea University, Sejong-Ro 2511, Sejong 30019, Republic of Korea
| | - Ahmed Zayed
- Department of Pharmacognosy, College of Pharmacy, Tanta University, Elguish Street (Medical Campus), Tanta 31527, Egypt
| | - Saly F Gheda
- Department of Botany, Faculty of Science, Tanta University, Tanta 31527, Egypt
| | - Khayrya A Youssif
- Department of Pharmacognosy, Faculty of Pharmacy, El Saleheya El Gadida University, Sharkia 44813, Egypt
| | - Hesham A Abou-Zied
- Department of Medicinal Chemistry, Faculty of Pharmacy, Deraya University, New Minia 61111, Egypt
| | - Usama R Abdelmohsen
- Department of Pharmacognosy, Faculty of Pharmacy, Deraya University, New Minia 61111, Egypt
- Department of Pharmacognosy, Faculty of Pharmacy, Minia University, Minia 61519, Egypt
| | - Ibraheem B M Ibraheem
- Department of Botany and Microbiology, Faculty of Science, Beni-Suef University, Beni-Suef 62521, Egypt
| | - Seung Pil Pack
- Department of Biotechnology and Bioinformatics, Korea University, Sejong-Ro 2511, Sejong 30019, Republic of Korea
| | - Khaled N M Elsayed
- Department of Botany and Microbiology, Faculty of Science, Beni-Suef University, Beni-Suef 62521, Egypt
| |
Collapse
|
2
|
Fedorov II, Protasov SA, Tarasova IA, Gorshkov MV. Ultrafast Proteomics. BIOCHEMISTRY. BIOKHIMIIA 2024; 89:1349-1361. [PMID: 39245450 DOI: 10.1134/s0006297924080017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 06/21/2024] [Accepted: 06/24/2024] [Indexed: 09/10/2024]
Abstract
Current stage of proteomic research in the field of biology, medicine, development of new drugs, population screening, or personalized approaches to therapy dictates the need to analyze large sets of samples within the reasonable experimental time. Until recently, mass spectrometry measurements in proteomics were characterized as unique in identifying and quantifying cellular protein composition, but low throughput, requiring many hours to analyze a single sample. This was in conflict with the dynamics of changes in biological systems at the whole cellular proteome level upon the influence of external and internal factors. Thus, low speed of the whole proteome analysis has become the main factor limiting developments in functional proteomics, where it is necessary to annotate intracellular processes not only in a wide range of conditions, but also over a long period of time. Enormous level of heterogeneity of tissue cells or tumors, even of the same type, dictates the need to analyze biological systems at the level of individual cells. These studies involve obtaining molecular characteristics for tens, if not hundreds of thousands of individual cells, including their whole proteome profiles. Development of mass spectrometry technologies providing high resolution and mass measurement accuracy, predictive chromatography, new methods for peptide separation by ion mobility and processing of proteomic data based on artificial intelligence algorithms have opened a way for significant, if not radical, increase in the throughput of whole proteome analysis and led to implementation of the novel concept of ultrafast proteomics. Work done just in the last few years has demonstrated the proteome-wide analysis throughput of several hundred samples per day at a depth of several thousand proteins, levels unimaginable three or four years ago. The review examines background of these developments, as well as modern methods and approaches that implement ultrafast analysis of the entire proteome.
Collapse
Affiliation(s)
- Ivan I Fedorov
- Moscow Institute of Physics and Technology (National University), Dolgoprudny, Moscow Region, 141700, Russia
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Sergey A Protasov
- Moscow Institute of Physics and Technology (National University), Dolgoprudny, Moscow Region, 141700, Russia
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Irina A Tarasova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia.
| |
Collapse
|
3
|
Sadia M, Boudguiyer Y, Helmus R, Seijo M, Praetorius A, Samanipour S. A stochastic approach for parameter optimization of feature detection algorithms for non-target screening in mass spectrometry. Anal Bioanal Chem 2024:10.1007/s00216-024-05425-3. [PMID: 38995405 DOI: 10.1007/s00216-024-05425-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 06/05/2024] [Accepted: 06/18/2024] [Indexed: 07/13/2024]
Abstract
Feature detection plays a crucial role in non-target screening (NTS), requiring careful selection of algorithm parameters to minimize false positive (FP) features. In this study, a stochastic approach was employed to optimize the parameter settings of feature detection algorithms used in processing high-resolution mass spectrometry data. This approach was demonstrated using four open-source algorithms (OpenMS, SAFD, XCMS, and KPIC2) within the patRoon software platform for processing extracts from drinking water samples spiked with 46 per- and polyfluoroalkyl substances (PFAS). The designed method is based on a stochastic strategy involving random sampling from variable space and the use of Pearson correlation to assess the impact of each parameter on the number of detected suspect analytes. Using our approach, the optimized parameters led to improvement in the algorithm performance by increasing suspect hits in case of SAFD and XCMS, and reducing the total number of detected features (i.e., minimizing FP) for OpenMS. These improvements were further validated on three different drinking water samples as test dataset. The optimized parameters resulted in a lower false discovery rate (FDR%) compared to the default parameters, effectively increasing the detection of true positive features. This work also highlights the necessity of algorithm parameter optimization prior to starting the NTS to reduce the complexity of such datasets.
Collapse
Affiliation(s)
- Mohammad Sadia
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands.
| | - Youssef Boudguiyer
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands
| | - Rick Helmus
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands
| | - Marianne Seijo
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands
| | - Antonia Praetorius
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands
| | - Saer Samanipour
- Van'T Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
4
|
Liu Y, Yang Y, Chen W, Shen F, Xie L, Zhang Y, Zhai Y, He F, Zhu Y, Chang C. DeepRTAlign: toward accurate retention time alignment for large cohort mass spectrometry data analysis. Nat Commun 2023; 14:8188. [PMID: 38081814 PMCID: PMC10713976 DOI: 10.1038/s41467-023-43909-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
Retention time (RT) alignment is a crucial step in liquid chromatography-mass spectrometry (LC-MS)-based proteomic and metabolomic experiments, especially for large cohort studies. The most popular alignment tools are based on warping function method and direct matching method. However, existing tools can hardly handle monotonic and non-monotonic RT shifts simultaneously. Here, we develop a deep learning-based RT alignment tool, DeepRTAlign, for large cohort LC-MS data analysis. DeepRTAlign has been demonstrated to have improved performances by benchmarking it against current state-of-the-art approaches on multiple real-world and simulated proteomic and metabolomic datasets. The results also show that DeepRTAlign can improve identification sensitivity without compromising quantitative accuracy. Furthermore, using the MS features aligned by DeepRTAlign, we trained and validated a robust classifier to predict the early recurrence of hepatocellular carcinoma. DeepRTAlign provides an advanced solution to RT alignment in large cohort LC-MS studies, which is currently a major bottleneck in proteomics and metabolomics research.
Collapse
Affiliation(s)
- Yi Liu
- Faculty of Environment and Life, Beijing University of Technology, Beijing, 100023, China
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Yun Yang
- International Academy of Phronesis Medicine (Guang Dong), No. 96 Xindao Ring South Road, Guangzhou International Bio Island, Guangzhou, 510000, China
- South China Institute of Biomedicine, No. 83 Ruihe Road, Guangzhou, 510535, China
| | - Wendong Chen
- International Academy of Phronesis Medicine (Guang Dong), No. 96 Xindao Ring South Road, Guangzhou International Bio Island, Guangzhou, 510000, China
- South China Institute of Biomedicine, No. 83 Ruihe Road, Guangzhou, 510535, China
| | - Feng Shen
- Department of Hepatic Surgery IV, the Eastern Hepatobiliary Surgery Hospital, Naval Medical University, Shanghai, 200433, China
| | - Linhai Xie
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
- International Academy of Phronesis Medicine (Guang Dong), No. 96 Xindao Ring South Road, Guangzhou International Bio Island, Guangzhou, 510000, China
- South China Institute of Biomedicine, No. 83 Ruihe Road, Guangzhou, 510535, China
| | - Yingying Zhang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
- Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Yuanjun Zhai
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Fuchu He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
- International Academy of Phronesis Medicine (Guang Dong), No. 96 Xindao Ring South Road, Guangzhou International Bio Island, Guangzhou, 510000, China
- Research Unit of Proteomics Driven Cancer Precision Medicine, Chinese Academy of Medical Sciences, Beijing, 102206, China
| | - Yunping Zhu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China.
| | - Cheng Chang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China.
- Research Unit of Proteomics Driven Cancer Precision Medicine, Chinese Academy of Medical Sciences, Beijing, 102206, China.
| |
Collapse
|
5
|
Deberneh HM, Sadygov RG. Retention Time Alignment for Protein Turnover Studies Using Heavy Water Metabolic Labeling. J Proteome Res 2023; 22:410-419. [PMID: 36692003 PMCID: PMC10233748 DOI: 10.1021/acs.jproteome.2c00592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Retention time (RT) alignment has been important for robust protein identification and quantification in proteomics. In data-dependent acquisition mode, whereby the precursor ions are semistochastically chosen for fragmentation in MS/MS, the alignment is used in an approach termed matched between runs (MBR). MBR transfers peptides, which were fragmented and identified in one experiment, to a replicate experiment where they were not identified. Before the MBR transfer, the RTs of experiments are aligned to reduce the chance of erroneous transfers. Despite its widespread use in other areas of quantitative proteomics, RT alignment has not been applied in data analyses for protein turnover using an atom-based stable isotope-labeling agent such as metabolic labeling with deuterium oxide, D2O. Deuterium incorporation changes isotope profiles of intact peptides in full scans and their fragment ions in tandem mass spectra. It reduces the peptide identification rates in current database search engines. Therefore, the MBR becomes more important. Here, we report on an approach to incorporate RT alignment with peptide quantification in studies of proteome turnover using heavy water metabolic labeling and LC-MS. The RT alignment uses correlation-optimized time warping. The alignment, followed by the MBR, improves labeling time point coverage, especially for long labeling durations.
Collapse
Affiliation(s)
- Henock M. Deberneh
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, 301 University of Blvd, Galveston, TX 77555
| | - Rovshan G. Sadygov
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, 301 University of Blvd, Galveston, TX 77555
| |
Collapse
|
6
|
Shaver AO, Garcia BM, Gouveia GJ, Morse AM, Liu Z, Asef CK, Borges RM, Leach FE, Andersen EC, Amster IJ, Fernández FM, Edison AS, McIntyre LM. An anchored experimental design and meta-analysis approach to address batch effects in large-scale metabolomics. Front Mol Biosci 2022; 9:930204. [PMID: 36438654 PMCID: PMC9682135 DOI: 10.3389/fmolb.2022.930204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 10/10/2022] [Indexed: 11/27/2022] Open
Abstract
Untargeted metabolomics studies are unbiased but identifying the same feature across studies is complicated by environmental variation, batch effects, and instrument variability. Ideally, several studies that assay the same set of metabolic features would be used to select recurring features to pursue for identification. Here, we developed an anchored experimental design. This generalizable approach enabled us to integrate three genetic studies consisting of 14 test strains of Caenorhabditis elegans prior to the compound identification process. An anchor strain, PD1074, was included in every sample collection, resulting in a large set of biological replicates of a genetically identical strain that anchored each study. This enables us to estimate treatment effects within each batch and apply straightforward meta-analytic approaches to combine treatment effects across batches without the need for estimation of batch effects and complex normalization strategies. We collected 104 test samples for three genetic studies across six batches to produce five analytical datasets from two complementary technologies commonly used in untargeted metabolomics. Here, we use the model system C. elegans to demonstrate that an augmented design combined with experimental blocks and other metabolomic QC approaches can be used to anchor studies and enable comparisons of stable spectral features across time without the need for compound identification. This approach is generalizable to systems where the same genotype can be assayed in multiple environments and provides biologically relevant features for downstream compound identification efforts. All methods are included in the newest release of the publicly available SECIMTools based on the open-source Galaxy platform.
Collapse
Affiliation(s)
- Amanda O. Shaver
- Department of Genetics, University of Georgia, Athens, GA, United States,Complex Carbohydrate Research Center, University of Georgia, Athens, GA, United States
| | - Brianna M. Garcia
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, United States,Department of Chemistry, University of Georgia, Athens, GA, United States
| | - Goncalo J. Gouveia
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, United States,Department of Biochemistry, University of Georgia, Athens, GA, United States
| | - Alison M. Morse
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, United States
| | - Zihao Liu
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, United States
| | - Carter K. Asef
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, United States
| | - Ricardo M. Borges
- Walter Mors Institute of Research on Natural Products, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Franklin E. Leach
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, United States,Department of Environmental Health Science, University of Georgia, Athens, GA, United States
| | - Erik C. Andersen
- Department of Molecular Biosciences, Northwestern University, Evanston, IL, United States
| | - I. Jonathan Amster
- Department of Chemistry, University of Georgia, Athens, GA, United States
| | - Facundo M. Fernández
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, United States
| | - Arthur S. Edison
- Department of Genetics, University of Georgia, Athens, GA, United States,Complex Carbohydrate Research Center, University of Georgia, Athens, GA, United States,Department of Biochemistry, University of Georgia, Athens, GA, United States
| | - Lauren M. McIntyre
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, United States,University of Florida Genetics Institute, University of Florida, Gainesville, FL, United States,*Correspondence: Lauren M. McIntyre,
| |
Collapse
|
7
|
Skoraczyński G, Gambin A, Miasojedow B. Alignstein: Optimal transport for improved LC-MS retention time alignment. Gigascience 2022; 11:giac101. [PMID: 36329619 PMCID: PMC9633278 DOI: 10.1093/gigascience/giac101] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 08/24/2022] [Accepted: 09/30/2022] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND Reproducibility of liquid chromatography separation is limited by retention time drift. As a result, measured signals lack correspondence over replicates of the liquid chromatography-mass spectrometry (LC-MS) experiments. Correction of these errors is named retention time alignment and needs to be performed before further quantitative analysis. Despite the availability of numerous alignment algorithms, their accuracy is limited (e.g., for retention time drift that swaps analytes' elution order). RESULTS We present the Alignstein, an algorithm for LC-MS retention time alignment. It correctly finds correspondence even for swapped signals. To achieve this, we implemented the generalization of the Wasserstein distance to compare multidimensional features without any reduction of the information or dimension of the analyzed data. Moreover, Alignstein by design requires neither a reference sample nor prior signal identification. We validate the algorithm on publicly available benchmark datasets obtaining competitive results. Finally, we show that it can detect the information contained in the tandem mass spectrum by the spatial properties of chromatograms. CONCLUSIONS We show that the use of optimal transport effectively overcomes the limitations of existing algorithms for statistical analysis of mass spectrometry datasets. The algorithm's source code is available at https://github.com/grzsko/Alignstein.
Collapse
Affiliation(s)
- Grzegorz Skoraczyński
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, 02-097 Warsaw, Poland
| | - Anna Gambin
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, 02-097 Warsaw, Poland
| | - Błażej Miasojedow
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, 02-097 Warsaw, Poland
| |
Collapse
|
8
|
Wang M, Bai Z, Zhu H, Zheng T, Chen X, Li P, Zhang J, Ma F. A New Strategy Based on LC-Q TRAP-MS for Determining the Distribution of Polyphenols in Different Apple Varieties. Foods 2022; 11:3390. [PMID: 36360003 PMCID: PMC9657627 DOI: 10.3390/foods11213390] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 10/19/2022] [Accepted: 10/25/2022] [Indexed: 09/28/2023] Open
Abstract
Apples are a rich source of polyphenols in the human diet. However, the distribution of polyphenols in different apple varieties and tissues is still largely unclear. In this study, a new liquid chromatography-tandem mass spectrometry (LC-MS/MS) strategy was developed to reveal the spatial distribution of polyphenols in different apple tissues and varieties. A method based on multiple reaction monitoring (MRM)-enhanced product ion (EPI) was established in the information-dependent acquisition (IDA) mode for pseudo-target screening of major apple polyphenols. A total of 39 apple polyphenolic metabolites were finally identified. Qualitative and quantitative results showed that the variety and content of polyphenols in apple peels were higher than those of other tissues. In apple roots, stems, and leaves, the highest polyphenol variety and content were found in wild species, followed by cultivars and elite varieties. Dihydrochalcone substances, one kind of major apple polyphenols, were more abundant in apple roots, stems, and leaves. This strategy can be applied as a model for other agricultural products, in addition to revealing the distribution of polyphenols in different tissues of apples, which provides a theoretical basis for the utilization of polyphenol resources and variety selection.
Collapse
Affiliation(s)
- Minyan Wang
- State Key Laboratory of Crop Stress Biology for Arid Areas/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Xianyang 712100, China
| | - Zhangzhen Bai
- College of Landscape Architecture and Arts, Northwest A&F University, Xianyang 712100, China
| | - Huili Zhu
- State Key Laboratory of Crop Stress Biology for Arid Areas/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Xianyang 712100, China
| | - Tiantian Zheng
- College of Landscape Architecture and Arts, Northwest A&F University, Xianyang 712100, China
| | - Xiujiao Chen
- State Key Laboratory of Crop Stress Biology for Arid Areas/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Xianyang 712100, China
| | - Pengmin Li
- State Key Laboratory of Crop Stress Biology for Arid Areas/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Xianyang 712100, China
| | - Jing Zhang
- State Key Laboratory of Crop Stress Biology for Arid Areas/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Xianyang 712100, China
| | - Fengwang Ma
- State Key Laboratory of Crop Stress Biology for Arid Areas/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Xianyang 712100, China
| |
Collapse
|
9
|
Malinka F, Zareie A, Prochazka J, Sedlacek R, Novosadova V. Batch alignment via retention orders for preprocessing large-scale multi-batch LC-MS experiments. Bioinformatics 2022; 38:3759-3767. [PMID: 35748696 DOI: 10.1093/bioinformatics/btac407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 05/20/2022] [Accepted: 06/20/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Meticulous selection of chromatographic peak detection parameters and algorithms is a crucial step in preprocessing LC-MS data. However, as mass-to-charge ratio (m/z) and retention time shifts are larger between batches than within batches, finding apt parameters for all samples of a large-scale multi-batch experiment with the aim of minimizing information loss becomes a challenging task. Preprocessing independent batches individually can curtail said problems but requires a method for aligning and combining them for further downstream analysis. RESULTS We present two methods for aligning and combining individually preprocessed batches in multi-batch LC-MS experiments. Our developed methods were tested on six sets of simulated and six sets of real datasets. Furthermore, by estimating the probabilities of peak insertion, deletion, and swap between batches in authentic datasets we demonstrate that retention order swaps are not rare in untargeted LC-MS data. AVAILABILITY kmersAlignment and rtcorrectedAlignment algorithms are made available as an R package with raw data at https://metabocombiner.img.cas.cz. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- František Malinka
- Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, Průmyslova 595, 252 50, Vestec, Czech Republic
| | - Ashkan Zareie
- Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, Průmyslova 595, 252 50, Vestec, Czech Republic
| | - Jan Prochazka
- Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, Průmyslova 595, 252 50, Vestec, Czech Republic
| | - Radislav Sedlacek
- Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, Průmyslova 595, 252 50, Vestec, Czech Republic
| | - Vendula Novosadova
- Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, Průmyslova 595, 252 50, Vestec, Czech Republic
| |
Collapse
|
10
|
Helmus R, Ter Laak TL, van Wezel AP, de Voogt P, Schymanski EL. patRoon: open source software platform for environmental mass spectrometry based non-target screening. J Cheminform 2021; 13:1. [PMID: 33407901 PMCID: PMC7789171 DOI: 10.1186/s13321-020-00477-w] [Citation(s) in RCA: 90] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 11/23/2020] [Indexed: 12/22/2022] Open
Abstract
Mass spectrometry based non-target analysis is increasingly adopted in environmental sciences to screen and identify numerous chemicals simultaneously in highly complex samples. However, current data processing software either lack functionality for environmental sciences, solve only part of the workflow, are not openly available and/or are restricted in input data formats. In this paper we present patRoon, a new R based open-source software platform, which provides comprehensive, fully tailored and straightforward non-target analysis workflows. This platform makes the use, evaluation and mixing of well-tested algorithms seamless by harmonizing various common (primarily open) software tools under a consistent interface. In addition, patRoon offers various functionality and strategies to simplify and perform automated processing of complex (environmental) data effectively. patRoon implements several effective optimization strategies to significantly reduce computational times. The ability of patRoon to perform time-efficient and automated non-target data annotation of environmental samples is demonstrated with a simple and reproducible workflow using open-access data of spiked samples from a drinking water treatment plant study. In addition, the ability to easily use, combine and evaluate different algorithms was demonstrated for three commonly used feature finding algorithms. This article, combined with already published works, demonstrate that patRoon helps make comprehensive (environmental) non-target analysis readily accessible to a wider community of researchers.
Collapse
Affiliation(s)
- Rick Helmus
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, P.O. Box 94240, 1090 GE, Amsterdam, The Netherlands.
| | - Thomas L Ter Laak
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, P.O. Box 94240, 1090 GE, Amsterdam, The Netherlands.,KWR Water Research Institute, Chemical Water Quality and Health, P.O. Box 1072, 3430 BB, Nieuwegein, The Netherlands
| | - Annemarie P van Wezel
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, P.O. Box 94240, 1090 GE, Amsterdam, The Netherlands
| | - Pim de Voogt
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, P.O. Box 94240, 1090 GE, Amsterdam, The Netherlands
| | - Emma L Schymanski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, L-4367, Belvaux, Luxembourg
| |
Collapse
|
11
|
Buric F, Zrimec J, Zelezniak A. Parallel Factor Analysis Enables Quantification and Identification of Highly Convolved Data-Independent-Acquired Protein Spectra. PATTERNS (NEW YORK, N.Y.) 2020; 1:100137. [PMID: 33336195 PMCID: PMC7733873 DOI: 10.1016/j.patter.2020.100137] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 09/14/2020] [Accepted: 10/12/2020] [Indexed: 11/26/2022]
Abstract
High-throughput data-independent acquisition (DIA) is the method of choice for quantitative proteomics, combining the best practices of targeted and shotgun approaches. The resultant DIA spectra are, however, highly convolved and with no direct precursor-fragment correspondence, complicating biological sample analysis. Here, we present CANDIA (canonical decomposition of data-independent-acquired spectra), a GPU-powered unsupervised multiway factor analysis framework that deconvolves multispectral scans to individual analyte spectra, chromatographic profiles, and sample abundances, using parallel factor analysis. The deconvolved spectra can be annotated with traditional database search engines or used as high-quality input for de novo sequencing methods. We demonstrate that spectral libraries generated with CANDIA substantially reduce the false discovery rate underlying the validation of spectral quantification. CANDIA covers up to 33 times more total ion current than library-based approaches, which typically use less than 5% of total recorded ions, thus allowing quantification and identification of signals from unexplored DIA spectra. Conventional DIA spectral libraries cover less than 3% of a scan's total ion count CANDIA deconvolves peptide signals by leveraging all scan data CANDIA uses GPUs to enable tensor algebra on massive DIA mass spectrometry data CANDIA output enables high-confidence and precise quantitative proteomics
The latest high-throughput mass spectrometry-based technologies can record virtually all molecules from complex biological samples, providing a holistic picture of proteomes in cells and tissues and enabling an evaluation of the overall status of a person's health. However, current best practices are still only scratching the surface of the wealth of available information obtained from the massive proteome datasets, and efficient novel data-driven strategies are needed. Powered by advances in GPU hardware and open-source machine-learning frameworks, we developed a data-driven approach, CANDIA, which disassembles highly complex proteomics data into the elementary molecular signatures of the proteins in biological samples. Our work provides a performant and adaptable solution that complements existing mass spectrometry techniques. As the central mathematical methods are generic, other scientific fields that are dealing with highly convolved datasets will benefit from this work.
Collapse
Affiliation(s)
- Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, Gothenburg 412 96, Sweden
| | - Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, Gothenburg 412 96, Sweden
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, Gothenburg 412 96, Sweden.,Science for Life Laboratory, Tomtebodavägen 23a, Stockholm 171 65, Sweden
| |
Collapse
|
12
|
Comparison of Three Untargeted Data Processing Workflows for Evaluating LC-HRMS Metabolomics Data. Metabolites 2020; 10:metabo10090378. [PMID: 32967365 PMCID: PMC7570355 DOI: 10.3390/metabo10090378] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 09/17/2020] [Accepted: 09/21/2020] [Indexed: 12/13/2022] Open
Abstract
The evaluation of liquid chromatography high-resolution mass spectrometry (LC-HRMS) raw data is a crucial step in untargeted metabolomics studies to minimize false positive findings. A variety of commercial or open source software solutions are available for such data processing. This study aims to compare three different data processing workflows (Compound Discoverer 3.1, XCMS Online combined with MetaboAnalyst 4.0, and a manually programmed tool using R) to investigate LC-HRMS data of an untargeted metabolomics study. Simple but highly standardized datasets for evaluation were prepared by incubating pHLM (pooled human liver microsomes) with the synthetic cannabinoid A-CHMINACA. LC-HRMS analysis was performed using normal- and reversed-phase chromatography followed by full scan MS in positive and negative mode. MS/MS spectra of significant features were subsequently recorded in a separate run. The outcome of each workflow was evaluated by its number of significant features, peak shape quality, and the results of the multivariate statistics. Compound Discoverer as an all-in-one solution is characterized by its ease of use and seems, therefore, suitable for simple and small metabolomic studies. The two open source solutions allowed extensive customization but particularly, in the case of R, made advanced programming skills necessary. Nevertheless, both provided high flexibility and may be suitable for more complex studies and questions.
Collapse
|
13
|
Müller E, Huber CE, Brack W, Krauss M, Schulze T. Symbolic Aggregate Approximation Improves Gap Filling in High-Resolution Mass Spectrometry Data Processing. Anal Chem 2020; 92:10425-10432. [PMID: 32786516 DOI: 10.1021/acs.analchem.0c00899] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Nontargeted mass spectrometry (MS) is widely used in life sciences and environmental chemistry to investigate large sets of samples. A major problem for larger-scale MS studies is data gaps or missing values in aligned data sets. The main causes for these data gaps are the absence of the compound from the sample, issues related to chromatography or mass spectrometry (for example, broad peaks, early eluting peaks, ion suppression, low ionization efficiency), and issues related to software (mainly limitations of peak detection algorithms). While those algorithms are heuristic by necessity and should be used with strict settings to minimize the number of false positive and negative peaks in a data set, gap filling may be used to reduce missing data in single samples remaining after peak detection. In this study, we present a new gap filling algorithm. The method is based on the symbolic aggregation approximation (SAX) algorithm that was developed for the evaluation and classification of time series in data mining studies. We adopted SAX for liquid chromatography high-resolution MS nontarget screening to support the detection of missing peaks in aligned mass spectral data sets. The SAX-based algorithm improves the detection efficiency considerably compared to existing gap filling methods including the Peak Finder algorithm provided in MZmine.
Collapse
Affiliation(s)
- Erik Müller
- UFZ-Helmholtz Centre for Environmental Research, Permoserstraße 15, 04318 Leipzig, Germany.,RWTH Aachen University, Institute for Environmental Research, Worringerweg 1, 52074 Aachen, Germany
| | - Carolin Elisabeth Huber
- UFZ-Helmholtz Centre for Environmental Research, Permoserstraße 15, 04318 Leipzig, Germany.,RWTH Aachen University, Institute for Environmental Research, Worringerweg 1, 52074 Aachen, Germany
| | - Werner Brack
- UFZ-Helmholtz Centre for Environmental Research, Permoserstraße 15, 04318 Leipzig, Germany.,RWTH Aachen University, Institute for Environmental Research, Worringerweg 1, 52074 Aachen, Germany
| | - Martin Krauss
- UFZ-Helmholtz Centre for Environmental Research, Permoserstraße 15, 04318 Leipzig, Germany
| | - Tobias Schulze
- UFZ-Helmholtz Centre for Environmental Research, Permoserstraße 15, 04318 Leipzig, Germany
| |
Collapse
|
14
|
Lebanov L, Chatterjee S, Tedone L, Chapman SC, Linford MR, Paull B. Comprehensive characterisation of ylang-ylang essential oils according to distillation time, origin, and chemical composition using a multivariate approach applied to average mass spectra and segmented average mass spectral data. J Chromatogr A 2020; 1618:460853. [DOI: 10.1016/j.chroma.2020.460853] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/12/2019] [Accepted: 01/03/2020] [Indexed: 12/20/2022]
|
15
|
Maniscalco M, Cutignano A, Paris D, Melck DJ, Molino A, Fuschillo S, Motta A. Metabolomics of Exhaled Breath Condensate by Nuclear Magnetic Resonance Spectroscopy and Mass Spectrometry: A Methodological Approach. Curr Med Chem 2020; 27:2381-2399. [DOI: 10.2174/0929867325666181008122749] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 07/30/2018] [Accepted: 08/06/2018] [Indexed: 12/15/2022]
Abstract
:
Respiratory diseases present a very high prevalence in the general population, with an
increase in morbidity, mortality and health-care expenses worldwide. They are complex and heterogeneous
pathologies that may present different pathological facets in different subjects, often
with personal evolution. Therefore, there is a need to identify patients with similar characteristics,
prognosis or treatment, defining the so-called phenotype, but also to mark specific differences
within each phenotype, defining the endotypes.
:
Biomarkers are very useful to study respiratory phenotypes and endotypes. Metabolomics, one of
the recently introduced “omics”, is becoming a leading technique for biomarker discovery. For the
airways, metabolomics appears to be well suited as the respiratory tract offers a natural matrix, the
Exhaled Breath Condensate (EBC), in which several biomarkers can be measured. In this review,
we will discuss the main methodological issues related to the application of Nuclear Magnetic
Resonance (NMR) spectroscopy and Mass Spectrometry (MS) to EBC metabolomics for investigating
respiratory diseases.
Collapse
Affiliation(s)
- Mauro Maniscalco
- Pulmonary Rehabilitation Unit, ICS Maugeri SpA IRCCS, Via Bagni Vecchi 1, 82037 Telese Terme (Benevento), Italy
| | - Adele Cutignano
- Institute of Biomolecular Chemistry, National Research Council, Via Campi Flegrei 34, Comprensorio Olivetti Edificio A, 80078 Pozzuoli (Naples), Italy
| | - Debora Paris
- Institute of Biomolecular Chemistry, National Research Council, Via Campi Flegrei 34, Comprensorio Olivetti Edificio A, 80078 Pozzuoli (Naples), Italy
| | - Dominique J. Melck
- Institute of Biomolecular Chemistry, National Research Council, Via Campi Flegrei 34, Comprensorio Olivetti Edificio A, 80078 Pozzuoli (Naples), Italy
| | - Antonio Molino
- Department of Respiratory Medicine, University Federico II, 80131 Naples, Italy
| | - Salvatore Fuschillo
- Pulmonary Rehabilitation Unit, ICS Maugeri SpA IRCCS, Via Bagni Vecchi 1, 82037 Telese Terme (Benevento), Italy
| | - Andrea Motta
- Institute of Biomolecular Chemistry, National Research Council, Via Campi Flegrei 34, Comprensorio Olivetti Edificio A, 80078 Pozzuoli (Naples), Italy
| |
Collapse
|
16
|
A Data Set of 255,000 Randomly Selected and Manually Classified Extracted Ion Chromatograms for Evaluation of Peak Detection Methods. Metabolites 2020; 10:metabo10040162. [PMID: 32331455 PMCID: PMC7240950 DOI: 10.3390/metabo10040162] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 04/18/2020] [Accepted: 04/20/2020] [Indexed: 11/25/2022] Open
Abstract
Non-targeted mass spectrometry (MS) has become an important method over recent years in the fields of metabolomics and environmental research. While more and more algorithms and workflows become available to process a large number of non-targeted data sets, there still exist few manually evaluated universal test data sets for refining and evaluating these methods. The first step of non-targeted screening, peak detection and refinement of it is arguably the most important step for non-targeted screening. However, the absence of a model data set makes it harder for researchers to evaluate peak detection methods. In this Data Descriptor, we provide a manually checked data set consisting of 255,000 EICs (5000 peaks randomly sampled from across 51 samples) for the evaluation on peak detection and gap-filling algorithms. The data set was created from a previous real-world study, of which a subset was used to extract and manually classify ion chromatograms by three mass spectrometry experts. The data set consists of the converted mass spectrometry files, intermediate processing files and the central file containing a table with all important information for the classified peaks.
Collapse
|
17
|
Chong J, Xia J. Using MetaboAnalyst 4.0 for Metabolomics Data Analysis, Interpretation, and Integration with Other Omics Data. Methods Mol Biol 2020; 2104:337-360. [PMID: 31953825 DOI: 10.1007/978-1-0716-0239-3_17] [Citation(s) in RCA: 106] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
MetaboAnalyst ( www.metaboanalyst.ca ) is an easy-to-use, comprehensive web-based tool, freely available for metabolomics data processing, statistical analysis, functional interpretation, as well as integration with other omics data. This chapter first provides an introductory overview to the current MetaboAnalyst (version 4.0) with regards to its underlying design concepts and user interface structure. Subsequent sections describe three common metabolomics data analysis workflows covering targeted metabolomics, untargeted metabolomics, and multi-omics data integration.
Collapse
Affiliation(s)
- Jasmine Chong
- Institute of Parasitology, McGill University, Montreal, QC, Canada
| | - Jianguo Xia
- Institute of Parasitology, McGill University, Montreal, QC, Canada.
- Department of Animal Science, McGill University, Montreal, QC, Canada.
- Department of Microbiology and Immunology, McGill University, Montreal, QC, Canada.
- Department of Human Genetics, McGill University, Montreal, QC, Canada.
| |
Collapse
|
18
|
Välikangas T, Suomi T, Elo LL. A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation. Brief Bioinform 2019; 19:1344-1355. [PMID: 28575146 PMCID: PMC6291797 DOI: 10.1093/bib/bbx054] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Indexed: 01/15/2023] Open
Abstract
Label-free mass spectrometry (MS) has developed into an important tool applied in various fields of biological and life sciences. Several software exist to process the raw MS data into quantified protein abundances, including open source and commercial solutions. Each software includes a set of unique algorithms for different tasks of the MS data processing workflow. While many of these algorithms have been compared separately, a thorough and systematic evaluation of their overall performance is missing. Moreover, systematic information is lacking about the amount of missing values produced by the different proteomics software and the capabilities of different data imputation methods to account for them.In this study, we evaluated the performance of five popular quantitative label-free proteomics software workflows using four different spike-in data sets. Our extensive testing included the number of proteins quantified and the number of missing values produced by each workflow, the accuracy of detecting differential expression and logarithmic fold change and the effect of different imputation and filtering methods on the differential expression results. We found that the Progenesis software performed consistently well in the differential expression analysis and produced few missing values. The missing values produced by the other software decreased their performance, but this difference could be mitigated using proper data filtering or imputation methods. Among the imputation methods, we found that the local least squares (lls) regression imputation consistently increased the performance of the software in the differential expression analysis, and a combination of both data filtering and local least squares imputation increased performance the most in the tested data sets.
Collapse
Affiliation(s)
- Tommi Välikangas
- Computational Biomedicine Group, Turku Centre for Biotechnology Finland
| | - Tomi Suomi
- Computational Biomedicine research group at the Turku Centre for Biotechnology Finland
| | - Laura L Elo
- Biomathematics, Research Director in Bioinformatics and Group Leader in Computational Biomedicine at Turku Centre for Biotechnology, University of Turku, Finland
| |
Collapse
|
19
|
Chen AT, Franks A, Slavov N. DART-ID increases single-cell proteome coverage. PLoS Comput Biol 2019; 15:e1007082. [PMID: 31260443 PMCID: PMC6625733 DOI: 10.1371/journal.pcbi.1007082] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Revised: 07/12/2019] [Accepted: 05/06/2019] [Indexed: 01/09/2023] Open
Abstract
Analysis by liquid chromatography and tandem mass spectrometry (LC-MS/MS) can identify and quantify thousands of proteins in microgram-level samples, such as those comprised of thousands of cells. This process, however, remains challenging for smaller samples, such as the proteomes of single mammalian cells, because reduced protein levels reduce the number of confidently sequenced peptides. To alleviate this reduction, we developed Data-driven Alignment of Retention Times for IDentification (DART-ID). DART-ID implements principled Bayesian frameworks for global retention time (RT) alignment and for incorporating RT estimates towards improved confidence estimates of peptide-spectrum-matches. When applied to bulk or to single-cell samples, DART-ID increased the number of data points by 30-50% at 1% FDR, and thus decreased missing data. Benchmarks indicate excellent quantification of peptides upgraded by DART-ID and support their utility for quantitative analysis, such as identifying cell types and cell-type specific proteins. The additional datapoints provided by DART-ID boost the statistical power and double the number of proteins identified as differentially abundant in monocytes and T-cells. DART-ID can be applied to diverse experimental designs and is freely available at http://dart-id.slavovlab.net.
Collapse
Affiliation(s)
- Albert Tian Chen
- Department of Bioengineering, Northeastern University, Boston, Massachusetts, United States of America
- Barnett Institute, Northeastern University, Boston, Massachusetts, United States of America
| | - Alexander Franks
- Department of Statistics and Applied Probability, University of California Santa Barbara, California, United States of America
| | - Nikolai Slavov
- Department of Bioengineering, Northeastern University, Boston, Massachusetts, United States of America
- Barnett Institute, Northeastern University, Boston, Massachusetts, United States of America
- Department of Biology, Northeastern University, Boston, Massachusetts, United States of America
| |
Collapse
|
20
|
Cui J, Chen Q, Dong X, Shang K, Qi X, Cui H. A matching algorithm with isotope distribution pattern in LC-MS based on support vector machine (SVM) learning model. RSC Adv 2019; 9:27874-27882. [PMID: 35530479 PMCID: PMC9071103 DOI: 10.1039/c9ra03789f] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 08/17/2019] [Indexed: 01/25/2023] Open
Abstract
In proteomics, it is important to detect, analyze, and quantify complex peptide components and differences.
Collapse
Affiliation(s)
- Jian Cui
- Department of Information Technology Shengli College
- China University of Petroleum Huadong
- Dongying
- P. R. China
| | - Qiang Chen
- Department of Information Technology Shengli College
- China University of Petroleum Huadong
- Dongying
- P. R. China
| | - Xiaorui Dong
- Department of Information Technology Shengli College
- China University of Petroleum Huadong
- Dongying
- P. R. China
| | - Kai Shang
- Department of Information Technology Shengli College
- China University of Petroleum Huadong
- Dongying
- P. R. China
| | - Xin Qi
- Department of Computer Science in College of Computer and Communication Engineering
- China University of Petroleum Huadong
- Qingdao
- P. R. China
| | - Hao Cui
- Department of Computer Science in College of Computer and Communication Engineering
- China University of Petroleum Huadong
- Qingdao
- P. R. China
| |
Collapse
|
21
|
Manier SK, Keller A, Meyer MR. Automated optimization of XCMS parameters for improved peak picking of liquid chromatography-mass spectrometry data using the coefficient of variation and parameter sweeping for untargeted metabolomics. Drug Test Anal 2018; 11:752-761. [PMID: 30479047 DOI: 10.1002/dta.2552] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Revised: 11/15/2018] [Accepted: 11/22/2018] [Indexed: 01/25/2023]
Abstract
Accurate peak picking and further processing is a current challenge in the analysis of untargeted metabolomics using liquid chromatography-mass spectrometry (LC-MS) data. The optimization of these processes is crucial to obtain proper results. This study investigated and optimized the detection of peaks by XCMS, a widely used R package for peak picking and processing of high-resolution LC-MS metabolomics data by their coefficient of variation using neat standard solutions of drug like compounds. The obtained results were additionally verified by using fortified pooled plasma samples. Settings of the mass spectrometer were optimized by recommendations in literature to enable a reliable detection of the investigated analytes. XCMS parameters were evaluated using a comprehensive parameter sweeping approach. The optimization steps were statistically evaluated and further visualized after principal component analysis (PCA). Concerning the lower concentrated solution in methanol samples, the optimization of both mass spectrometer and XCMS parameters improved the median coefficient of variation from 24% to 7%, retention time fluctuation from 9.3 seconds to 0.54 seconds, and fluctuation of the mass to charge ratio (m/z) from m/z 0.00095 to m/z 0.00028. The number of parent compounds and their related species annotated by CAMERA increased from 88 to 113 while the total amount of features decreased from 3282 to 428. Optimized MS settings such as increased resolution led to a higher specificity of peak picking. PCA supported these findings by showing the best clustering of samples after optimization of both mass spectrometer and XCMS parameters. The results implied that peak picking needs to be individually adapted for the experimental set up. Reducing unwanted variation in the data set was most successful after combining high resolving power with strict peak picking settings.
Collapse
Affiliation(s)
- Sascha K Manier
- Department of Experimental and Clinical Toxicology, Institute of Experimental and Clinical Pharmacology and Toxicology, Saarland University, Center for Molecular Signaling (PZMS), 66421, Homburg, Germany
| | - Andreas Keller
- Chair of Clinical Bioinformatics, Saarland University, Saarbruecken, Germany
| | - Markus R Meyer
- Department of Experimental and Clinical Toxicology, Institute of Experimental and Clinical Pharmacology and Toxicology, Saarland University, Center for Molecular Signaling (PZMS), 66421, Homburg, Germany
| |
Collapse
|
22
|
Clinical metabolomics of exhaled breath condensate in chronic respiratory diseases. Adv Clin Chem 2018; 88:121-149. [PMID: 30612604 DOI: 10.1016/bs.acc.2018.10.002] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Chronic respiratory diseases (CRDs) are complex multifactorial disorders involving the airways and other lung structures. The development of reliable markers for an early and accurate diagnosis, including disease phenotype, and prediction of the response and/or adherence to treatment prescribed are essential points for the correct management of CRDs. Beside the traditional techniques to detect biomarkers, "omics" sciences have stimulated interest in clinical field as they could potentially improve the study of disease phenotype. Perturbations in a variety of metabolic and signaling pathways could contribute an understanding of CRDs pathogenesis. In particular, metabolomics provides powerful tools to map biological perturbations and their relationship with disease pathogenesis. The exhaled breath condensate (EBC) is a natural matrix of the respiratory tract, and is well suited for metabolomics studies. In this article, we review the current state of metabolomics methodology applied to EBC in the study of CRDs.
Collapse
|
23
|
Sousa PFM, de Waard A, Åberg KM. Elucidation of chromatographic peak shifts in complex samples using a chemometrical approach. Anal Bioanal Chem 2018; 410:5229-5235. [PMID: 29947907 PMCID: PMC6061714 DOI: 10.1007/s00216-018-1173-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 05/04/2018] [Accepted: 05/29/2018] [Indexed: 11/08/2022]
Abstract
Chromatographic retention time peak shifts between consecutive analyses is a well-known fact yet not fully understood. Algorithms have been developed to align peaks between runs, but with no specific studies considering the causes of peak shifts. Here, designed experiments reveal chromatographic shift patterns for a complex peptide mixture that are attributable to the temperature and pH of the mobile phase. These results demonstrate that peak shifts are highly structured and are to a high degree explained by underlying differences in physico-chemical parameters of the chromatographic system and also provide experimental support for the alignment algorithm called the generalized fuzzy Hough transform which exploits this fact. It can be expected that the development of alignment algorithms enters a new phase resulting in increasingly accurate alignment by considering the latent structure of the peak shifts.
Collapse
Affiliation(s)
- Pedro F M Sousa
- Unit for Analytical Chemistry, Department of Environmental and Analytical Chemistry, Stockholm University, SE-106 91, Stockholm, Sweden.
| | - Angela de Waard
- Unit for Analytical Chemistry, Department of Environmental and Analytical Chemistry, Stockholm University, SE-106 91, Stockholm, Sweden
| | - K Magnus Åberg
- Unit for Analytical Chemistry, Department of Environmental and Analytical Chemistry, Stockholm University, SE-106 91, Stockholm, Sweden
| |
Collapse
|
24
|
Ottensmann M, Stoffel MA, Nichols HJ, Hoffman JI. GCalignR: An R package for aligning gas-chromatography data for ecological and evolutionary studies. PLoS One 2018; 13:e0198311. [PMID: 29879149 PMCID: PMC5991698 DOI: 10.1371/journal.pone.0198311] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 05/09/2018] [Indexed: 11/17/2022] Open
Abstract
Chemical cues are arguably the most fundamental means of animal communication and play an important role in mate choice and kin recognition. Consequently, there is growing interest in the use of gas chromatography (GC) to investigate the chemical basis of eco-evolutionary interactions. Both GC-MS (mass spectrometry) and FID (flame ionization detection) are commonly used to characterise the chemical composition of biological samples such as skin swabs. The resulting chromatograms comprise peaks that are separated according to their retention times and which represent different substances. Across chromatograms of different samples, homologous substances are expected to elute at similar retention times. However, random and often unavoidable experimental variation introduces noise, making the alignment of homologous peaks challenging, particularly with GC-FID data where mass spectral data are lacking. Here we present GCalignR, a user-friendly R package for aligning GC-FID data based on retention times. The package was developed specifically for ecological and evolutionary studies that seek to investigate similarity patterns across multiple and often highly variable biological samples, for example representing different sexes, age classes or reproductive stages. The package also implements dynamic visualisations to facilitate inspection and fine-tuning of the resulting alignments and can be integrated within a broader workflow in R to facilitate downstream multivariate analyses. We demonstrate an example workflow using empirical data from Antarctic fur seals and explore the impact of user-defined parameter values by calculating alignment error rates for multiple datasets. The resulting alignments had low error rates for most of the explored parameter space and we could also show that GCalignR performed equally well or better than other available software. We hope that GCalignR will help to simplify the processing of chemical datasets and improve the standardization and reproducibility of chemical analyses in studies of animal chemical communication and related fields.
Collapse
Affiliation(s)
- Meinolf Ottensmann
- Department of Animal Behaviour, Bielefeld University, Bielefeld, Germany
| | - Martin A Stoffel
- Department of Animal Behaviour, Bielefeld University, Bielefeld, Germany.,School of Natural Sciences and Psychology, Faculty of Science, Liverpool John Moores University, Liverpool, United Kingdom
| | - Hazel J Nichols
- School of Natural Sciences and Psychology, Faculty of Science, Liverpool John Moores University, Liverpool, United Kingdom
| | - Joseph I Hoffman
- Department of Animal Behaviour, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
25
|
Data Treatment for LC-MS Untargeted Analysis. Methods Mol Biol 2018. [PMID: 29654581 DOI: 10.1007/978-1-4939-7643-0_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Liquid chromatography-mass spectrometry (LC-MS) untargeted experiments require complex chemometrics strategies to extract information from the experimental data. Here we discuss "data preprocessing", the set of procedures performed on the raw data to produce a data matrix which will be the starting point for the subsequent statistical analysis. Data preprocessing is a crucial step on the path to knowledge extraction, which should be carefully controlled and optimized in order to maximize the output of any untargeted metabolomics investigation.
Collapse
|
26
|
Extracting Knowledge from MS Clinical Metabolomic Data: Processing and Analysis Strategies. Methods Mol Biol 2018. [PMID: 29363089 DOI: 10.1007/978-1-4939-7592-1_28] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Assessing potential alterations of metabolic pathways using large-scale approaches today plays a central role in clinical research. Because several thousands of mass features can be measured for each sample with separation techniques hyphenated to mass spectrometry (MS) detection, adapted strategies should be implemented to detect altered pathways and help to elucidate the mechanisms of pathologies. These procedures include peak detection, sample alignment, normalization, statistical analysis, and metabolite annotation. Interestingly, considerable advances have been made over the last years in terms of analytics, bioinformatics, and chemometrics to help massive and complex metabolomic data to be more adequately handled with automated processing and data analysis workflows. Recent developments and remaining challenges related to MS signal processing, metabolite annotation, and biomarker discovery based on statistical models are illustrated in this chapter considering their application to clinical research.
Collapse
|
27
|
Tutorial: Correction of shifts in single-stage LC-MS(/MS) data. Anal Chim Acta 2018; 999:37-53. [DOI: 10.1016/j.aca.2017.09.039] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2016] [Revised: 09/26/2017] [Accepted: 09/27/2017] [Indexed: 11/19/2022]
|
28
|
Dudzik D, Barbas-Bernardos C, García A, Barbas C. Quality assurance procedures for mass spectrometry untargeted metabolomics. a review. J Pharm Biomed Anal 2017; 147:149-173. [PMID: 28823764 DOI: 10.1016/j.jpba.2017.07.044] [Citation(s) in RCA: 206] [Impact Index Per Article: 29.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Revised: 07/28/2017] [Accepted: 07/29/2017] [Indexed: 12/16/2022]
Abstract
Untargeted metabolomics, as a global approach, has already proven its great potential and capabilities for the investigation of health and disease, as well as the wide applicability for other research areas. Although great progress has been made on the feasibility of metabolomics experiments, there are still some challenges that should be faced and that includes all sources of fluctuations and bias affecting every step involved in multiplatform untargeted metabolomics studies. The identification and reduction of the main sources of unwanted variation regarding the pre-analytical, analytical and post-analytical phase of metabolomics experiments is essential to ensure high data quality. Nowadays, there is still a lack of information regarding harmonized guidelines for quality assurance as those available for targeted analysis. In this review, sources of variations to be considered and minimized along with methodologies and strategies for monitoring and improvement the quality of the results are discussed. The given information is based on evidences from different groups among our own experiences and recommendations for each stage of the metabolomics workflow. The comprehensive overview with tools presented here might serve other researchers interested in monitoring, controlling and improving the reliability of their findings by implementation of good experimental quality practices in the untargeted metabolomics study.
Collapse
Affiliation(s)
- Danuta Dudzik
- Center for Metabolomics and Bioanalysis (CEMBIO), Faculty of Pharmacy, San Pablo CEU University, Boadilla del Monte, ES-28668, Madrid, Spain.
| | - Cecilia Barbas-Bernardos
- Center for Metabolomics and Bioanalysis (CEMBIO), Faculty of Pharmacy, San Pablo CEU University, Boadilla del Monte, ES-28668, Madrid, Spain.
| | - Antonia García
- Center for Metabolomics and Bioanalysis (CEMBIO), Faculty of Pharmacy, San Pablo CEU University, Boadilla del Monte, ES-28668, Madrid, Spain.
| | - Coral Barbas
- Center for Metabolomics and Bioanalysis (CEMBIO), Faculty of Pharmacy, San Pablo CEU University, Boadilla del Monte, ES-28668, Madrid, Spain.
| |
Collapse
|
29
|
Perez de Souza L, Naake T, Tohge T, Fernie AR. From chromatogram to analyte to metabolite. How to pick horses for courses from the massive web resources for mass spectral plant metabolomics. Gigascience 2017; 6:1-20. [PMID: 28520864 PMCID: PMC5499862 DOI: 10.1093/gigascience/gix037] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Revised: 05/08/2017] [Accepted: 05/12/2017] [Indexed: 01/19/2023] Open
Abstract
The grand challenge currently facing metabolomics is the expansion of the coverage of the metabolome from a minor percentage of the metabolic complement of the cell toward the level of coverage afforded by other post-genomic technologies such as transcriptomics and proteomics. In plants, this problem is exacerbated by the sheer diversity of chemicals that constitute the metabolome, with the number of metabolites in the plant kingdom generally considered to be in excess of 200 000. In this review, we focus on web resources that can be exploited in order to improve analyte and ultimately metabolite identification and quantification. There is a wide range of available software that not only aids in this but also in the related area of peak alignment; however, for the uninitiated, choosing which program to use is a daunting task. For this reason, we provide an overview of the pros and cons of the software as well as comments regarding the level of programing skills required to effectively exploit their basic functions. In addition, the torrent of available genome and transcriptome sequences that followed the advent of next-generation sequencing has opened up further valuable resources for metabolite identification. All things considered, we posit that only via a continued communal sharing of information such as that deposited in the databases described within the article are we likely to be able to make significant headway toward improving our coverage of the plant metabolome.
Collapse
Affiliation(s)
- Leonardo Perez de Souza
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Thomas Naake
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Takayuki Tohge
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Alisdair R Fernie
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| |
Collapse
|
30
|
Teleman J, Hauri S, Malmström J. Improvements in Mass Spectrometry Assay Library Generation for Targeted Proteomics. J Proteome Res 2017; 16:2384-2392. [PMID: 28516777 DOI: 10.1021/acs.jproteome.6b00928] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In data-independent acquisition mass spectrometry (DIA-MS), targeted extraction of peptide signals in silico using mass spectrometry assay libraries is a successful method for the identification and quantification of proteins. However, it remains unclear if high quality assay libraries with more accurate peptide ion coordinates can improve peptide target identification rates in DIA analysis. In this study, we systematically improved and evaluated the common algorithmic steps for assay library generation and demonstrate that increased assay quality results in substantially higher identification rates of peptide targets from mouse organ protein lysates measured by DIA-MS. The introduced changes are (1) a new spectrum interpretation algorithm, (2) reapplication of segmented retention time normalization, (3) a ppm fragment mass error matching threshold, (4) usage of internal peptide fragments, and (5) a multilevel false discovery rate calculation. Taken together, these changes yielded 14-36% more identified peptide targets at 1% assay false discovery rate and are implemented in three new open source tools, Fraggle, Tramler, and Franklin, available at https://github.com/fickludd/eviltools . The improved algorithms provide ways to better utilize discovery MS data, translating to substantially increased DIA performance and ultimately better foundations for drawing biological conclusions in DIA-based experiments.
Collapse
Affiliation(s)
- Johan Teleman
- Department of Clinical Sciences, Lund University , BMC D13, 221 84 Lund, Sweden.,Department of Immunotechnology, Lund University , Medicon Village (Building 406), 223 81 Lund, Sweden
| | - Simon Hauri
- Department of Clinical Sciences, Lund University , BMC D13, 221 84 Lund, Sweden
| | - Johan Malmström
- Department of Clinical Sciences, Lund University , BMC D13, 221 84 Lund, Sweden
| |
Collapse
|
31
|
Watrous JD, Henglin M, Claggett B, Lehmann KA, Larson MG, Cheng S, Jain M. Visualization, Quantification, and Alignment of Spectral Drift in Population Scale Untargeted Metabolomics Data. Anal Chem 2017; 89:1399-1404. [PMID: 28208263 PMCID: PMC5455767 DOI: 10.1021/acs.analchem.6b04337] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Untargeted liquid-chromatography-mass spectrometry (LC-MS)-based metabolomics analysis of human biospecimens has become among the most promising strategies for probing the underpinnings of human health and disease. Analysis of spectral data across population scale cohorts, however, is precluded by day-to-day nonlinear signal drifts in LC retention time or batch effects that complicate comparison of thousands of untargeted peaks. To date, there exists no efficient means of visualization and quantitative assessment of signal drift, correction of drift when present, and automated filtering of unstable spectral features, particularly across thousands of data files in population scale experiments. Herein, we report the development of a set of R-based scripts that allow for pre- and postprocessing of raw LC-MS data. These methods can be integrated with existing data analysis workflows by providing initial preprocessing bulk nonlinear retention time correction at the raw data level. Further, this approach provides postprocessing visualization and quantification of peak alignment accuracy, as well as peak-reliability-based parsing of processed data through hierarchical clustering of signal profiles. In a metabolomics data set derived from ∼3000 human plasma samples, we find that application of our alignment tools resulted in substantial improvement in peak alignment accuracy, automated data filtering, and ultimately statistical power for detection of metabolite correlates of clinical measures. These tools will enable metabolomics studies of population scale cohorts.
Collapse
Affiliation(s)
- Jeramie D. Watrous
- Departments of Medicine and Pharmacology, University of California San Diego, La Jolla, California 92093, United States
| | - Mir Henglin
- Cardiovascular Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Brian Claggett
- Cardiovascular Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Kim A. Lehmann
- Departments of Medicine and Pharmacology, University of California San Diego, La Jolla, California 92093, United States
| | - Martin G. Larson
- Framingham Heart Study, Framingham, Massachusetts 01702, United States
- Biostatistics Department, School of Public Health, Boston University, Boston, Massachusetts 02118, United States
| | - Susan Cheng
- Cardiovascular Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115, United States
- Framingham Heart Study, Framingham, Massachusetts 01702, United States
| | - Mohit Jain
- Departments of Medicine and Pharmacology, University of California San Diego, La Jolla, California 92093, United States
| |
Collapse
|
32
|
Covington BC, McLean JA, Bachmann BO. Comparative mass spectrometry-based metabolomics strategies for the investigation of microbial secondary metabolites. Nat Prod Rep 2017; 34:6-24. [PMID: 27604382 PMCID: PMC5214543 DOI: 10.1039/c6np00048g] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Covering: 2000 to 2016The labor-intensive process of microbial natural product discovery is contingent upon identifying discrete secondary metabolites of interest within complex biological extracts, which contain inventories of all extractable small molecules produced by an organism or consortium. Historically, compound isolation prioritization has been driven by observed biological activity and/or relative metabolite abundance and followed by dereplication via accurate mass analysis. Decades of discovery using variants of these methods has generated the natural pharmacopeia but also contributes to recent high rediscovery rates. However, genomic sequencing reveals substantial untapped potential in previously mined organisms, and can provide useful prescience of potentially new secondary metabolites that ultimately enables isolation. Recently, advances in comparative metabolomics analyses have been coupled to secondary metabolic predictions to accelerate bioactivity and abundance-independent discovery work flows. In this review we will discuss the various analytical and computational techniques that enable MS-based metabolomic applications to natural product discovery and discuss the future prospects for comparative metabolomics in natural product discovery.
Collapse
Affiliation(s)
- Brett C Covington
- Department of Chemistry, Vanderbilt University, 7330 Stevenson Center, Nashville, TN 37235, USA.
| | - John A McLean
- Department of Chemistry, Vanderbilt University, 7330 Stevenson Center, Nashville, TN 37235, USA. and Center for Innovative Technology, Vanderbilt University, 5401 Stevenson Center, Nashville, TN 37235, USA
| | - Brian O Bachmann
- Department of Chemistry, Vanderbilt University, 7330 Stevenson Center, Nashville, TN 37235, USA.
| |
Collapse
|
33
|
Spicer R, Salek RM, Moreno P, Cañueto D, Steinbeck C. Navigating freely-available software tools for metabolomics analysis. Metabolomics 2017; 13:106. [PMID: 28890673 PMCID: PMC5550549 DOI: 10.1007/s11306-017-1242-7] [Citation(s) in RCA: 142] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Accepted: 07/25/2017] [Indexed: 12/21/2022]
Abstract
INTRODUCTION The field of metabolomics has expanded greatly over the past two decades, both as an experimental science with applications in many areas, as well as in regards to data standards and bioinformatics software tools. The diversity of experimental designs and instrumental technologies used for metabolomics has led to the need for distinct data analysis methods and the development of many software tools. OBJECTIVES To compile a comprehensive list of the most widely used freely available software and tools that are used primarily in metabolomics. METHODS The most widely used tools were selected for inclusion in the review by either ≥ 50 citations on Web of Science (as of 08/09/16) or the use of the tool being reported in the recent Metabolomics Society survey. Tools were then categorised by the type of instrumental data (i.e. LC-MS, GC-MS or NMR) and the functionality (i.e. pre- and post-processing, statistical analysis, workflow and other functions) they are designed for. RESULTS A comprehensive list of the most used tools was compiled. Each tool is discussed within the context of its application domain and in relation to comparable tools of the same domain. An extended list including additional tools is available at https://github.com/RASpicer/MetabolomicsTools which is classified and searchable via a simple controlled vocabulary. CONCLUSION This review presents the most widely used tools for metabolomics analysis, categorised based on their main functionality. As future work, we suggest a direct comparison of tools' abilities to perform specific data analysis tasks e.g. peak picking.
Collapse
Affiliation(s)
- Rachel Spicer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Reza M. Salek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Daniel Cañueto
- Metabolomics Platform, IISPV, DEEEA, Universitat Rovira i Virgili, Campus Sescelades, Carretera de Valls, s/n, 43007 Tarragona, Catalonia Spain
| | - Christoph Steinbeck
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
- Friedrich-Schiller-University Jena, Lessingstr. 8, Jena, 07743 Germany
| |
Collapse
|
34
|
Uppal K, Walker DI, Liu K, Li S, Go YM, Jones DP. Computational Metabolomics: A Framework for the Million Metabolome. Chem Res Toxicol 2016; 29:1956-1975. [PMID: 27629808 DOI: 10.1021/acs.chemrestox.6b00179] [Citation(s) in RCA: 171] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
"Sola dosis facit venenum." These words of Paracelsus, "the dose makes the poison", can lead to a cavalier attitude concerning potential toxicities of the vast array of low abundance environmental chemicals to which humans are exposed. Exposome research teaches that 80-85% of human disease is linked to environmental exposures. The human exposome is estimated to include >400,000 environmental chemicals, most of which are uncharacterized with regard to human health. In fact, mass spectrometry measures >200,000 m/z features (ions) in microliter volumes derived from human samples; most are unidentified. This crystallizes a grand challenge for chemical research in toxicology: to develop reliable and affordable analytical methods to understand health impacts of the extensive human chemical experience. To this end, there appears to be no choice but to abandon the limitations of measuring one chemical at a time. The present review looks at progress in computational metabolomics to provide probability-based annotation linking ions to known chemicals and serve as a foundation for unambiguous designation of unidentified ions for toxicologic study. We review methods to characterize ions in terms of accurate mass m/z, chromatographic retention time, correlation of adduct, isotopic and fragment forms, association with metabolic pathways and measurement of collision-induced dissociation products, collision cross section, and chirality. Such information can support a largely unambiguous system for documenting unidentified ions in environmental surveillance and human biomonitoring. Assembly of this data would provide a resource to characterize and understand health risks of the array of low-abundance chemicals to which humans are exposed.
Collapse
Affiliation(s)
- Karan Uppal
- Clinical Biomarkers Laboratory, Department of Medicine, Emory University , Atlanta, Georgia 30322, United States
| | - Douglas I Walker
- Clinical Biomarkers Laboratory, Department of Medicine, Emory University , Atlanta, Georgia 30322, United States.,Hercules Exposome Research Center, Department of Environmental Health, Rollins School of Public Health, Emory University , Atlanta, Georgia 30322, United States.,Department of Civil and Environmental Engineering, Tufts University , Medford, Massachusetts 02155, United States
| | - Ken Liu
- Clinical Biomarkers Laboratory, Department of Medicine, Emory University , Atlanta, Georgia 30322, United States
| | - Shuzhao Li
- Clinical Biomarkers Laboratory, Department of Medicine, Emory University , Atlanta, Georgia 30322, United States.,Hercules Exposome Research Center, Department of Environmental Health, Rollins School of Public Health, Emory University , Atlanta, Georgia 30322, United States
| | - Young-Mi Go
- Clinical Biomarkers Laboratory, Department of Medicine, Emory University , Atlanta, Georgia 30322, United States
| | - Dean P Jones
- Clinical Biomarkers Laboratory, Department of Medicine, Emory University , Atlanta, Georgia 30322, United States.,Hercules Exposome Research Center, Department of Environmental Health, Rollins School of Public Health, Emory University , Atlanta, Georgia 30322, United States
| |
Collapse
|
35
|
Brunius C, Shi L, Landberg R. Large-scale untargeted LC-MS metabolomics data correction using between-batch feature alignment and cluster-based within-batch signal intensity drift correction. Metabolomics 2016; 12:173. [PMID: 27746707 PMCID: PMC5031781 DOI: 10.1007/s11306-016-1124-4] [Citation(s) in RCA: 115] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Accepted: 09/15/2016] [Indexed: 11/02/2022]
Abstract
INTRODUCTION Liquid chromatography-mass spectrometry (LC-MS) is a commonly used technique in untargeted metabolomics owing to broad coverage of metabolites, high sensitivity and simple sample preparation. However, data generated from multiple batches are affected by measurement errors inherent to alterations in signal intensity, drift in mass accuracy and retention times between samples both within and between batches. These measurement errors reduce repeatability and reproducibility and may thus decrease the power to detect biological responses and obscure interpretation. OBJECTIVE Our aim was to develop procedures to address and correct for within- and between-batch variability in processing multiple-batch untargeted LC-MS metabolomics data to increase their quality. METHODS Algorithms were developed for: (i) alignment and merging of features that are systematically misaligned between batches, through aggregating feature presence/missingness on batch level and combining similar features orthogonally present between batches; and (ii) within-batch drift correction using a cluster-based approach that allows multiple drift patterns within batch. Furthermore, a heuristic criterion was developed for the feature-wise choice of reference-based or population-based between-batch normalisation. RESULTS In authentic data, between-batch alignment resulted in picking 15 % more features and deconvoluting 15 % of features previously erroneously aligned. Within-batch correction provided a decrease in median quality control feature coefficient of variation from 20.5 to 15.1 %. Algorithms are open source and available as an R package ('batchCorr'). CONCLUSIONS The developed procedures provide unbiased measures of improved data quality, with implications for improved data analysis. Although developed for LC-MS based metabolomics, these methods are generic and can be applied to other data suffering from similar limitations.
Collapse
Affiliation(s)
- Carl Brunius
- Department of Food Science, Uppsala BioCenter, Swedish University of Agricultural Sciences, Box 7051, 750 07 Uppsala, Sweden
- Department of Biology and Biological Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
| | - Lin Shi
- Department of Food Science, Uppsala BioCenter, Swedish University of Agricultural Sciences, Box 7051, 750 07 Uppsala, Sweden
- Department of Biology and Biological Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
| | - Rikard Landberg
- Department of Food Science, Uppsala BioCenter, Swedish University of Agricultural Sciences, Box 7051, 750 07 Uppsala, Sweden
- Unit of Nutritional Epidemiology, Institute of Environmental Medicine, Karolinska Insitutet, Box 210, 171 77 Stockholm, Sweden
- Department of Biology and Biological Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
| |
Collapse
|
36
|
Wu L, Amon S, Lam H. A hybrid retention time alignment algorithm for SWATH-MS data. Proteomics 2016; 16:2272-83. [DOI: 10.1002/pmic.201500511] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Revised: 05/06/2016] [Accepted: 06/10/2016] [Indexed: 11/09/2022]
Affiliation(s)
- Long Wu
- Division of Biomedical Engineering; The Hong Kong University of Science and Technology; Clear Water Bay Hong Kong P. R. China
| | - Sabine Amon
- Department of Biology; Institute of Molecular Systems Biology; ETH Zurich; Zurich Switzerland
| | - Henry Lam
- Division of Biomedical Engineering; The Hong Kong University of Science and Technology; Clear Water Bay Hong Kong P. R. China
- Department of Chemical and Biomolecular Engineering; The Hong Kong University of Science and Technology; Clear Water Bay Hong Kong P. R. China
| |
Collapse
|
37
|
Abstract
Metabolomics-based strategies have become an integral part of modern clinical research, allowing for a better understanding of pathophysiological conditions and disease mechanisms, as well as providing innovative tools for more adequate diagnostic and prognosis approaches. Metabolomics is considered an essential tool in precision medicine, which aims for personalized prevention and tailor-made treatments. Nevertheless, multiple pitfalls may be encountered in clinical metabolomics during the entire workflow, hampering the quality of the data and, thus, the biological interpretation. This review describes the challenges underlying metabolomics-based experiments, discussing step by step the potential pitfalls of the analytical process, including study design, sample collection, storage, as well as preparation, chromatographic and electrophoretic separation, detection and data analysis. Moreover, it offers practical solutions and strategies to tackle these challenges, ensuring the generation of high-quality data.
Collapse
|
38
|
An improved pseudotargeted metabolomics approach using multiple ion monitoring with time-staggered ion lists based on ultra-high performance liquid chromatography/quadrupole time-of-flight mass spectrometry. Anal Chim Acta 2016; 927:82-8. [DOI: 10.1016/j.aca.2016.05.008] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2016] [Revised: 04/30/2016] [Accepted: 05/02/2016] [Indexed: 01/13/2023]
|
39
|
An Integrated Metabolomic and Genomic Mining Workflow To Uncover the Biosynthetic Potential of Bacteria. mSystems 2016; 1:mSystems00028-15. [PMID: 27822535 PMCID: PMC5069768 DOI: 10.1128/msystems.00028-15] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2015] [Accepted: 04/01/2016] [Indexed: 11/20/2022] Open
Abstract
Microorganisms are a rich source of bioactives; however, chemical identification is a major bottleneck. Strategies that can prioritize the most prolific microbial strains and novel compounds are of great interest. Here, we present an integrated approach to evaluate the biosynthetic richness in bacteria and mine the associated chemical diversity. Thirteen strains closely related to Pseudoalteromonas luteoviolacea isolated from all over the Earth were analyzed using an untargeted metabolomics strategy, and metabolomic profiles were correlated with whole-genome sequences of the strains. We found considerable diversity: only 2% of the chemical features and 7% of the biosynthetic genes were common to all strains, while 30% of all features and 24% of the genes were unique to single strains. The list of chemical features was reduced to 50 discriminating features using a genetic algorithm and support vector machines. Features were dereplicated by tandem mass spectrometry (MS/MS) networking to identify molecular families of the same biosynthetic origin, and the associated pathways were probed using comparative genomics. Most of the discriminating features were related to antibacterial compounds, including the thiomarinols that were reported from P. luteoviolacea here for the first time. By comparative genomics, we identified the biosynthetic cluster responsible for the production of the antibiotic indolmycin, which could not be predicted with standard methods. In conclusion, we present an efficient, integrative strategy for elucidating the chemical richness of a given set of bacteria and link the chemistry to biosynthetic genes. IMPORTANCE We here combine chemical analysis and genomics to probe for new bioactive secondary metabolites based on their pattern of distribution within bacterial species. We demonstrate the usefulness of this combined approach in a group of marine Gram-negative bacteria closely related to Pseudoalteromonas luteoviolacea, which is a species known to produce a broad spectrum of chemicals. The approach allowed us to identify new antibiotics and their associated biosynthetic pathways. Combining chemical analysis and genetics is an efficient "mining" workflow for identifying diverse pharmaceutical candidates in a broad range of microorganisms and therefore of great use in bioprospecting.
Collapse
|
40
|
Maansson M, Vynne NG, Klitgaard A, Nybo JL, Melchiorsen J, Nguyen DD, Sanchez LM, Ziemert N, Dorrestein PC, Andersen MR, Gram L. An Integrated Metabolomic and Genomic Mining Workflow To Uncover the Biosynthetic Potential of Bacteria. mSystems 2016. [PMID: 27822535 DOI: 10.1128/msystems.00038-00016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/13/2023] Open
Abstract
Microorganisms are a rich source of bioactives; however, chemical identification is a major bottleneck. Strategies that can prioritize the most prolific microbial strains and novel compounds are of great interest. Here, we present an integrated approach to evaluate the biosynthetic richness in bacteria and mine the associated chemical diversity. Thirteen strains closely related to Pseudoalteromonas luteoviolacea isolated from all over the Earth were analyzed using an untargeted metabolomics strategy, and metabolomic profiles were correlated with whole-genome sequences of the strains. We found considerable diversity: only 2% of the chemical features and 7% of the biosynthetic genes were common to all strains, while 30% of all features and 24% of the genes were unique to single strains. The list of chemical features was reduced to 50 discriminating features using a genetic algorithm and support vector machines. Features were dereplicated by tandem mass spectrometry (MS/MS) networking to identify molecular families of the same biosynthetic origin, and the associated pathways were probed using comparative genomics. Most of the discriminating features were related to antibacterial compounds, including the thiomarinols that were reported from P. luteoviolacea here for the first time. By comparative genomics, we identified the biosynthetic cluster responsible for the production of the antibiotic indolmycin, which could not be predicted with standard methods. In conclusion, we present an efficient, integrative strategy for elucidating the chemical richness of a given set of bacteria and link the chemistry to biosynthetic genes. IMPORTANCE We here combine chemical analysis and genomics to probe for new bioactive secondary metabolites based on their pattern of distribution within bacterial species. We demonstrate the usefulness of this combined approach in a group of marine Gram-negative bacteria closely related to Pseudoalteromonas luteoviolacea, which is a species known to produce a broad spectrum of chemicals. The approach allowed us to identify new antibiotics and their associated biosynthetic pathways. Combining chemical analysis and genetics is an efficient "mining" workflow for identifying diverse pharmaceutical candidates in a broad range of microorganisms and therefore of great use in bioprospecting.
Collapse
Affiliation(s)
- Maria Maansson
- Department of Systems Biology, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Nikolaj G Vynne
- Department of Systems Biology, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Andreas Klitgaard
- Department of Systems Biology, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Jane L Nybo
- Department of Systems Biology, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Jette Melchiorsen
- Department of Systems Biology, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Don D Nguyen
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California, USA
| | - Laura M Sanchez
- Interfaculty Institute of Microbiology and Infection Medicine, University of Tübingen, Tübingen, Germany; Collaborative Mass Spectrometry Innovation Center, University of California at San Diego, La Jolla, California, USA
| | - Nadine Ziemert
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, USA; Interfaculty Institute of Microbiology and Infection Medicine, University of Tübingen, Tübingen, Germany
| | - Pieter C Dorrestein
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, USA; Collaborative Mass Spectrometry Innovation Center, University of California at San Diego, La Jolla, California, USA; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California at San Diego, La Jolla, California, USA
| | - Mikael R Andersen
- Department of Systems Biology, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Lone Gram
- Department of Systems Biology, Technical University of Denmark, Kgs. Lyngby, Denmark
| |
Collapse
|
41
|
Codrea MC, Nahnsen S. Platforms and Pipelines for Proteomics Data Analysis and Management. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 919:203-215. [PMID: 27975218 DOI: 10.1007/978-3-319-41448-5_9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Since mass spectrometry was introduced as the core technology for large-scale analysis of the proteome, the speed of data acquisition, dynamic ranges of measurements, and data quality are continuously improving. These improvements are triggered by regular launches of new methodologies and instruments.
Collapse
Affiliation(s)
- Marius Cosmin Codrea
- Quantitative Biology Center (QBiC), University of Tübingen, Auf der Morgenstelle 10, 72076, Tübingen, Germany
| | - Sven Nahnsen
- Quantitative Biology Center (QBiC), University of Tübingen, Auf der Morgenstelle 10, 72076, Tübingen, Germany.
| |
Collapse
|
42
|
Bari MG, Ramirez N, Wang Z, Zhang J(M. MZDASoft: a software architecture that enables large-scale comparison of protein expression levels over multiple samples based on liquid chromatography/tandem mass spectrometry. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2015; 29:1841-1848. [PMID: 26331936 PMCID: PMC4560111 DOI: 10.1002/rcm.7272] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Revised: 06/22/2015] [Accepted: 07/04/2015] [Indexed: 06/05/2023]
Abstract
RATIONALE Without accurate peak linking/alignment, only the expression levels of a small percentage of proteins can be compared across multiple samples in Liquid Chromatography/Mass Spectrometry/Tandem Mass Spectrometry (LC/MS/MS) due to the selective nature of tandem MS peptide identification. This greatly hampers biomedical research that aims at finding biomarkers for disease diagnosis, treatment, and the understanding of disease mechanisms. A recent algorithm, PeakLink, has allowed the accurate linking of LC/MS peaks without tandem MS identifications to their corresponding ones with identifications across multiple samples collected from different instruments, tissues and labs, which greatly enhanced the ability of comparing proteins. However, PeakLink cannot be implemented practically for large numbers of samples based on existing software architectures, because it requires access to peak elution profiles from multiple LC/MS/MS samples simultaneously. METHODS We propose a new architecture based on parallel processing, which extracts LC/MS peak features, and saves them in database files to enable the implementation of PeakLink for multiple samples. The software has been deployed in High-Performance Computing (HPC) environments. The core part of the software, MZDASoft Parallel Peak Extractor (PPE), can be downloaded with a user and developer's guide, and it can be run on HPC centers directly. The quantification applications, MZDASoft TandemQuant and MZDASoft PeakLink, are written in Matlab, which are compiled with a Matlab runtime compiler. A sample script that incorporates all necessary processing steps of MZDASoft for LC/MS/MS quantification in a parallel processing environment is available. The project webpage is http://compgenomics.utsa.edu/zgroup/MZDASoft. RESULTS The proposed architecture enables the implementation of PeakLink for multiple samples. Significantly more (100%-500%) proteins can be compared over multiple samples with better quantification accuracy in test cases. CONCLUSION MZDASoft enables large-scale comparison of protein expression levels over multiple samples with much larger protein comparison coverage and better quantification accuracy. It is an efficient implementation based on parallel processing which can be used to process large amounts of data.
Collapse
Affiliation(s)
- Mehrab Ghanat Bari
- Department of Electrical and Computer Engineering, Univ. of Texas at San Antonio, One UTSA Circle, San Antonio, TX 782
| | - Nelson Ramirez
- Computational Biology Initiative, Univ. of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78249
| | - Zhiwei Wang
- Computational Biology Initiative, Univ. of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78249
| | - Jianqiu (Michelle) Zhang
- Department of Electrical and Computer Engineering, Univ. of Texas at San Antonio, One UTSA Circle, San Antonio, TX 782
| |
Collapse
|
43
|
Knolhoff AM, Croley TR. Non-targeted screening approaches for contaminants and adulterants in food using liquid chromatography hyphenated to high resolution mass spectrometry. J Chromatogr A 2015; 1428:86-96. [PMID: 26372444 DOI: 10.1016/j.chroma.2015.08.059] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2015] [Revised: 08/14/2015] [Accepted: 08/27/2015] [Indexed: 12/22/2022]
Abstract
The majority of analytical methods for food safety monitor the presence of a specific compound or defined set of compounds. Non-targeted screening methods are complementary to these approaches by detecting and identifying unexpected compounds present in food matrices that may be harmful to public health. However, the development and implementation of generalized non-targeted screening workflows are particularly challenging, especially for food matrices due to inherent sample complexity and diversity and a large analyte concentration range. One approach that can be implemented is liquid chromatography coupled to high-resolution mass spectrometry, which serves to reduce this complexity and is capable of generating molecular formulae for compounds of interest. Current capabilities, strategies, and challenges will be reviewed for sample preparation, mass spectrometry, chromatography, and data processing workflows. Considerations to increase the accuracy and speed of identifying unknown molecular species will also be addressed, including suggestions for achieving sufficient data quality for non-targeted screening applications.
Collapse
Affiliation(s)
- Ann M Knolhoff
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, 5100 Paint Branch Parkway, College Park, MD 20740, United States.
| | - Timothy R Croley
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, 5100 Paint Branch Parkway, College Park, MD 20740, United States
| |
Collapse
|
44
|
Abate-Pella D, Freund DM, Ma Y, Simón-Manso Y, Hollender J, Broeckling CD, Huhman DV, Krokhin OV, Stoll DR, Hegeman AD, Kind T, Fiehn O, Schymanski EL, Prenni JE, Sumner LW, Boswell PG. Retention projection enables accurate calculation of liquid chromatographic retention times across labs and methods. J Chromatogr A 2015; 1412:43-51. [PMID: 26292625 DOI: 10.1016/j.chroma.2015.07.108] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Revised: 07/24/2015] [Accepted: 07/31/2015] [Indexed: 10/23/2022]
Abstract
Identification of small molecules by liquid chromatography-mass spectrometry (LC-MS) can be greatly improved if the chromatographic retention information is used along with mass spectral information to narrow down the lists of candidates. Linear retention indexing remains the standard for sharing retention data across labs, but it is unreliable because it cannot properly account for differences in the experimental conditions used by various labs, even when the differences are relatively small and unintentional. On the other hand, an approach called "retention projection" properly accounts for many intentional differences in experimental conditions, and when combined with a "back-calculation" methodology described recently, it also accounts for unintentional differences. In this study, the accuracy of this methodology is compared with linear retention indexing across eight different labs. When each lab ran a test mixture under a range of multi-segment gradients and flow rates they selected independently, retention projections averaged 22-fold more accurate for uncharged compounds because they properly accounted for these intentional differences, which were more pronounced in steep gradients. When each lab ran the test mixture under nominally the same conditions, which is the ideal situation to reproduce linear retention indices, retention projections still averaged 2-fold more accurate because they properly accounted for many unintentional differences between the LC systems. To the best of our knowledge, this is the most successful study to date aiming to calculate (or even just to reproduce) LC gradient retention across labs, and it is the only study in which retention was reliably calculated under various multi-segment gradients and flow rates chosen independently by labs.
Collapse
Affiliation(s)
- Daniel Abate-Pella
- Department of Horticultural Science, University of Minnesota, 1970 Folwell Ave., St. Paul, MN 55108, USA.
| | - Dana M Freund
- Department of Horticultural Science, University of Minnesota, 1970 Folwell Ave., St. Paul, MN 55108, USA.
| | - Yan Ma
- UC Davis Genome Center, Metabolomics, University of California, Davis, Health Sciences Drive, Davis, CA 95616, USA.
| | - Yamil Simón-Manso
- Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, MD 20899-8380, USA.
| | - Juliane Hollender
- Eawag: Swiss Federal Institute for Aquatic Science and Technology, Überlandstrasse 133, 8600 Dübendorf, Switzerland.
| | - Corey D Broeckling
- Proteomics and Metabolomics Facility, Colorado State University, Fort Collins, CO 80523, USA.
| | - David V Huhman
- The Samuel Roberts Noble Foundation, Ardmore, OK 73401, USA.
| | - Oleg V Krokhin
- Department of Internal Medicine, University of Manitoba, 799 JBRC, 715 McDermot Avenue, Winnipeg R3E 3P4, Canada.
| | - Dwight R Stoll
- Department of Chemistry, Gustavus Adolphus College, 800 West College Avenue, Saint Peter, MN 56082, USA.
| | - Adrian D Hegeman
- Department of Horticultural Science, University of Minnesota, 1970 Folwell Ave., St. Paul, MN 55108, USA.
| | - Tobias Kind
- UC Davis Genome Center, Metabolomics, University of California, Davis, Health Sciences Drive, Davis, CA 95616, USA.
| | - Oliver Fiehn
- UC Davis Genome Center, Metabolomics, University of California, Davis, Health Sciences Drive, Davis, CA 95616, USA; King Abdullaziz University, Department of Biochemistry, Jeddah, Saudi Arabia.
| | - Emma L Schymanski
- Eawag: Swiss Federal Institute for Aquatic Science and Technology, Überlandstrasse 133, 8600 Dübendorf, Switzerland.
| | - Jessica E Prenni
- Proteomics and Metabolomics Facility, Colorado State University, Fort Collins, CO 80523, USA.
| | - Lloyd W Sumner
- The Samuel Roberts Noble Foundation, Ardmore, OK 73401, USA.
| | - Paul G Boswell
- Department of Horticultural Science, University of Minnesota, 1970 Folwell Ave., St. Paul, MN 55108, USA.
| |
Collapse
|
45
|
Chen G, Cui L, Teo GS, Ong CN, Tan CS, Choi H. MetTailor: dynamic block summary and intensity normalization for robust analysis of mass spectrometry data in metabolomics. Bioinformatics 2015. [PMID: 26220962 DOI: 10.1093/bioinformatics/btv434] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Accurate cross-sample peak alignment and reliable intensity normalization is a critical step for robust quantitative analysis in untargetted metabolomics since tandem mass spectrometry (MS/MS) is rarely used for compound identification. Therefore shortcomings in the data processing steps can easily introduce false positives due to misalignments and erroneous normalization adjustments in large sample studies. RESULTS In this work, we developed a software package MetTailor featuring two novel data preprocessing steps to remedy drawbacks in the existing processing tools. First, we propose a novel dynamic block summarization (DBS) method for correcting misalignments from peak alignment algorithms, which alleviates missing data problem due to misalignments. For the purpose of verifying correct re-alignments, we propose to use the cross-sample consistency in isotopic intensity ratios as a quality metric. Second, we developed a flexible intensity normalization procedure that adjusts normalizing factors against the temporal variations in total ion chromatogram (TIC) along the chromatographic retention time (RT). We first evaluated the DBS algorithm using a curated metabolomics dataset, illustrating that the algorithm identifies misaligned peaks and correctly realigns them with good sensitivity. We next demonstrated the DBS algorithm and the RT-based normalization procedure in a large-scale dataset featuring >100 sera samples in primary Dengue infection study. Although the initial alignment was successful for the majority of peaks, the DBS algorithm still corrected ∼7000 misaligned peaks in this data and many recovered peaks showed consistent isotopic patterns with the peaks they were realigned to. In addition, the RT-based normalization algorithm efficiently removed visible local variations in TIC along the RT, without sacrificing the sensitivity of detecting differentially expressed metabolites. AVAILABILITY AND IMPLEMENTATION The R package MetTailor is freely available at the SourceForge website http://mettailor.sourceforge.net/. CONTACT hyung_won_choi@nuhs.edu.sg SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gengbo Chen
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
| | - Liang Cui
- Interdisciplinary Research Group in Infectious Diseases, Singapore-MIT Alliance for Research & Technology, Singapore, Singapore and
| | - Guo Shou Teo
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
| | - Choon Nam Ong
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore, National University of Singapore Environment Research Institute, Singapore, Singapore
| | - Chuen Seng Tan
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
| | - Hyungwon Choi
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
| |
Collapse
|
46
|
Smith R, Taylor RM, Prince JT. Current controlled vocabularies are insufficient to uniquely map molecular entities to mass spectrometry signal. BMC Bioinformatics 2015; 16 Suppl 7:S2. [PMID: 25952148 PMCID: PMC4423578 DOI: 10.1186/1471-2105-16-s7-s2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background The comparison of analyte mass spectrometry precursor (MS1) signal is central to many proteomic (and other -omic) workflows. Standard vocabularies for mass spectrometry exist and provide good coverage for most experimental applications yet are insufficient for concise and unambiguous description of data concepts spanning the range of signal provenance from a molecular perspective (e.g. from charged peptides down to fine isotopes). Without a standard unambiguous nomenclature, literature searches, algorithm reproducibility and algorithm evaluation for MS-omics data processing are nearly impossible. Results We show how terms from current official ontologies are too vague or ambiguous to explicitly map molecular entities to MS signals and we illustrate the inconsistency and ambiguity of current colloquially used terms. We also propose a set of terms for MS1 signal that uniquely, succinctly and intuitively describe data concepts spanning the range of signal provenance from full molecule downs to fine isotopes. We suggest that additional community discussion of these terms should precede any further standardization efforts. We propose a novel nomenclature that spans the range of the required granularity to describe MS data processing from the perspective of the molecular provenance of the MS signal. Conclusions The proposed nomenclature provides a chain of succinct and unique terms spanning the signal created by a charged molecule down through each of its constituent subsignals. We suggest that additional community discussion of these terms should precede any further standardization efforts.
Collapse
|
47
|
Marshall DD, Lei S, Worley B, Huang Y, Garcia-Garcia A, Franco R, Dodds ED, Powers R. Combining DI-ESI-MS and NMR datasets for metabolic profiling. Metabolomics 2015; 11:391-402. [PMID: 25774104 PMCID: PMC4354777 DOI: 10.1007/s11306-014-0704-4] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Metabolomics datasets are commonly acquired by either mass spectrometry (MS) or nuclear magnetic resonance spectroscopy (NMR), despite their fundamental complementarity. In fact, combining MS and NMR datasets greatly improves the coverage of the metabolome and enhances the accuracy of metabolite identification, providing a detailed and high-throughput analysis of metabolic changes due to disease, drug treatment, or a variety of other environmental stimuli. Ideally, a single metabolomics sample would be simultaneously used for both MS and NMR analyses, minimizing the potential for variability between the two datasets. This necessitates the optimization of sample preparation, data collection and data handling protocols to effectively integrate direct-infusion MS data with one-dimensional (1D) 1H NMR spectra. To achieve this goal, we report for the first time the optimization of (i) metabolomics sample preparation for dual analysis by NMR and MS, (ii) high throughput, positive-ion direct infusion electrospray ionization mass spectrometry (DI-ESI-MS) for the analysis of complex metabolite mixtures, and (iii) data handling protocols to simultaneously analyze DI-ESI-MS and 1D 1H NMR spectral data using multiblock bilinear factorizations, namely multiblock principal component analysis (MB-PCA) and multiblock partial least squares (MB-PLS). Finally, we demonstrate the combined use of backscaled loadings, accurate mass measurements and tandem MS experiments to identify metabolites significantly contributing to class separation in MB-PLS-DA scores. We show that integration of NMR and DI-ESI-MS datasets yields a substantial improvement in the analysis of neurotoxin involvement in dopaminergic cell death.
Collapse
Affiliation(s)
- Darrell D. Marshall
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE
68588-0304
| | - Shulei Lei
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE
68588-0304
| | - Bradley Worley
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE
68588-0304
| | - Yuting Huang
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE
68588-0304
| | - Aracely Garcia-Garcia
- Redox Biology Center, University of Nebraska-Lincoln, Lincoln, NE
68583-0905
- School of Veterinary Medicine and Biomedical Sciences, University of
Nebraska-Lincoln, Lincoln, NE 68583-0905
| | - Rodrigo Franco
- Redox Biology Center, University of Nebraska-Lincoln, Lincoln, NE
68583-0905
- School of Veterinary Medicine and Biomedical Sciences, University of
Nebraska-Lincoln, Lincoln, NE 68583-0905
| | - Eric D. Dodds
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE
68588-0304
| | - Robert Powers
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE
68588-0304
- Redox Biology Center, University of Nebraska-Lincoln, Lincoln, NE
68583-0905
| |
Collapse
|
48
|
Kuharev J, Navarro P, Distler U, Jahn O, Tenzer S. In-depth evaluation of software tools for data-independent acquisition based label-free quantification. Proteomics 2015; 15:3140-51. [PMID: 25545627 DOI: 10.1002/pmic.201400396] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2014] [Revised: 10/22/2014] [Accepted: 12/17/2014] [Indexed: 12/21/2022]
Abstract
Label-free quantification (LFQ) based on data-independent acquisition workflows currently experiences increasing popularity. Several software tools have been recently published or are commercially available. The present study focuses on the evaluation of three different software packages (Progenesis, synapter, and ISOQuant) supporting ion mobility enhanced data-independent acquisition data. In order to benchmark the LFQ performance of the different tools, we generated two hybrid proteome samples of defined quantitative composition containing tryptically digested proteomes of three different species (mouse, yeast, Escherichia coli). This model dataset simulates complex biological samples containing large numbers of both unregulated (background) proteins as well as up- and downregulated proteins with exactly known ratios between samples. We determined the number and dynamic range of quantifiable proteins and analyzed the influence of applied algorithms (retention time alignment, clustering, normalization, etc.) on quantification results. Analysis of technical reproducibility revealed median coefficients of variation of reported protein abundances below 5% for MS(E) data for Progenesis and ISOQuant. Regarding accuracy of LFQ, evaluation with synapter and ISOQuant yielded superior results compared to Progenesis. In addition, we discuss reporting formats and user friendliness of the software packages. The data generated in this study have been deposited to the ProteomeXchange Consortium with identifier PXD001240 (http://proteomecentral.proteomexchange.org/dataset/PXD001240).
Collapse
Affiliation(s)
- Jörg Kuharev
- Institute for Immunology, University Medical Center of the Johannes-Gutenberg University Mainz, Mainz, Germany
| | - Pedro Navarro
- Institute for Immunology, University Medical Center of the Johannes-Gutenberg University Mainz, Mainz, Germany
| | - Ute Distler
- Institute for Immunology, University Medical Center of the Johannes-Gutenberg University Mainz, Mainz, Germany
| | - Olaf Jahn
- Proteomics Group, Max-Planck-Institute of Experimental Medicine, Göttingen, Germany
| | - Stefan Tenzer
- Institute for Immunology, University Medical Center of the Johannes-Gutenberg University Mainz, Mainz, Germany
| |
Collapse
|
49
|
Wandy J, Daly R, Breitling R, Rogers S. Incorporating peak grouping information for alignment of multiple liquid chromatography-mass spectrometry datasets. Bioinformatics 2015; 31:1999-2006. [PMID: 25649621 PMCID: PMC4760236 DOI: 10.1093/bioinformatics/btv072] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2014] [Accepted: 01/28/2015] [Indexed: 11/24/2022] Open
Abstract
Motivation: The combination of liquid chromatography and mass spectrometry (LC/MS) has been widely used for large-scale comparative studies in systems biology, including proteomics, glycomics and metabolomics. In almost all experimental design, it is necessary to compare chromatograms across biological or technical replicates and across sample groups. Central to this is the peak alignment step, which is one of the most important but challenging preprocessing steps. Existing alignment tools do not take into account the structural dependencies between related peaks that coelute and are derived from the same metabolite or peptide. We propose a direct matching peak alignment method for LC/MS data that incorporates related peaks information (within each LC/MS run) and investigate its effect on alignment performance (across runs). The groupings of related peaks necessary for our method can be obtained from any peak clustering method and are built into a pair-wise peak similarity score function. The similarity score matrix produced is used by an approximation algorithm for the weighted matching problem to produce the actual alignment result. Results: We demonstrate that related peak information can improve alignment performance. The performance is evaluated on a set of benchmark datasets, where our method performs competitively compared to other popular alignment tools. Availability: The proposed alignment method has been implemented as a stand-alone application in Python, available for download at http://github.com/joewandy/peak-grouping-alignment. Contact:Simon.Rogers@glasgow.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joe Wandy
- School of Computing Science, University of Glasgow, Glasgow, UK, School of Computing and Mathematical Sciences, Liverpool John Moores University, Merseyside, UK and Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - Rónán Daly
- School of Computing Science, University of Glasgow, Glasgow, UK, School of Computing and Mathematical Sciences, Liverpool John Moores University, Merseyside, UK and Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - Rainer Breitling
- School of Computing Science, University of Glasgow, Glasgow, UK, School of Computing and Mathematical Sciences, Liverpool John Moores University, Merseyside, UK and Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - Simon Rogers
- School of Computing Science, University of Glasgow, Glasgow, UK, School of Computing and Mathematical Sciences, Liverpool John Moores University, Merseyside, UK and Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| |
Collapse
|
50
|
Shao Y, Zhu B, Zheng R, Zhao X, Yin P, Lu X, Jiao B, Xu G, Yao Z. Development of urinary pseudotargeted LC-MS-based metabolomics method and its application in hepatocellular carcinoma biomarker discovery. J Proteome Res 2014; 14:906-16. [PMID: 25483141 DOI: 10.1021/pr500973d] [Citation(s) in RCA: 94] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Hepatocellular carcinoma (HCC) is one of the pestilent malignancies leading to cancer-related death. Discovering effective biomarkers for HCC diagnosis is an urgent demand. To identify potential metabolite biomarkers, we developed a urinary pseudotargeted method based on liquid chromatography-hybrid triple quadrupole linear ion trap mass spectrometry (LC-QTRAP MS). Compared with nontargeted method, the pseudotargeted method can achieve better data quality, which benefits differential metabolites discovery. The established method was applied to cirrhosis (CIR) and HCC investigation. It was found that urinary nucleosides, bile acids, citric acid, and several amino acids were significantly changed in liver disease groups compared with the controls, featuring the dysregulation of purine metabolism, energy metabolism, and amino metabolism in liver diseases. Furthermore, some metabolites such as cyclic adenosine monophosphate, glutamine, and short- and medium-chain acylcarnitines were the differential metabolites of HCC and CIR. On the basis of binary logistic regression, butyrylcarnitine (carnitine C4:0) and hydantoin-5-propionic acid were defined as combinational markers to distinguish HCC from CIR. The area under curve was 0.786 and 0.773 for discovery stage and validation stage samples, respectively. These data show that the established pseudotargeted method is a complementary one of targeted and nontargeted methods for metabolomics study.
Collapse
Affiliation(s)
- Yaping Shao
- Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences , 457 Zhongshan Road, Dalian 116023, China
| | | | | | | | | | | | | | | | | |
Collapse
|