1
|
Sun Y, Xing Z, Liang S, Miao Z, Zhuo LB, Jiang W, Zhao H, Gao H, Xie Y, Zhou Y, Yue L, Cai X, Chen YM, Zheng JS, Guo T. metaExpertPro: A Computational Workflow for Metaproteomics Spectral Library Construction and Data-Independent Acquisition Mass Spectrometry Data Analysis. Mol Cell Proteomics 2024; 23:100840. [PMID: 39278598 DOI: 10.1016/j.mcpro.2024.100840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 08/04/2024] [Accepted: 09/11/2024] [Indexed: 09/18/2024] Open
Abstract
Analysis of large-scale data-independent acquisition mass spectrometry metaproteomics data remains a computational challenge. Here, we present a computational pipeline called metaExpertPro for metaproteomics data analysis. This pipeline encompasses spectral library generation using data-dependent acquisition MS, protein identification and quantification using data-independent acquisition mass spectrometry, functional and taxonomic annotation, as well as quantitative matrix generation for both microbiota and hosts. By integrating FragPipe and DIA-NN, metaExpertPro offers compatibility with both Orbitrap and timsTOF MS instruments. To evaluate the depth and accuracy of identification and quantification, we conducted extensive assessments using human fecal samples and benchmark tests. Performance tests conducted on human fecal samples indicated that metaExpertPro quantified an average of 45,000 peptides in a 60-min diaPASEF injection. Notably, metaExpertPro outperformed three existing software tools by characterizing a higher number of peptides and proteins. Importantly, metaExpertPro maintained a low factual false discovery rate of approximately 5% for protein groups across four benchmark tests. Applying a filter of five peptides per genus, metaExpertPro achieved relatively high accuracy (F-score = 0.67-0.90) in genus diversity and showed a high correlation (rSpearman = 0.73-0.82) between the measured and true genus relative abundance in benchmark tests. Additionally, the quantitative results at the protein, taxonomy, and function levels exhibited high reproducibility and consistency across the commonly adopted public human gut microbial protein databases IGC and UHGP. In a metaproteomic analysis of dyslipidemia patients, metaExpertPro revealed characteristic alterations in microbial functions and potential interactions between the microbiota and the host.
Collapse
Affiliation(s)
- Yingying Sun
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Ziyuan Xing
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Shuang Liang
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China; State Key Laboratory for Managing Biotic and Chemical Treats to the Quality and Safety of Agro-products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Zelei Miao
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China; Key Laboratory of Growth Regulation and Translational Research of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China
| | - Lai-Bao Zhuo
- Department of Epidemiology, Guangdong Provincial Key Laboratory of Food, Nutrition and Health, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Wenhao Jiang
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Hui Zhao
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China; Key Laboratory of Growth Regulation and Translational Research of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China
| | - Huanhuan Gao
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Yuting Xie
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Yan Zhou
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Liang Yue
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Xue Cai
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Yu-Ming Chen
- Department of Epidemiology, Guangdong Provincial Key Laboratory of Food, Nutrition and Health, School of Public Health, Sun Yat-sen University, Guangzhou, China.
| | - Ju-Sheng Zheng
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China; Key Laboratory of Growth Regulation and Translational Research of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China.
| | - Tiannan Guo
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China.
| |
Collapse
|
2
|
Tariq U, Saeed F. Predicting peptide properties from mass spectrometry data using deep attention-based multitask network and uncertainty quantification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.21.609035. [PMID: 39229185 PMCID: PMC11370541 DOI: 10.1101/2024.08.21.609035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Database search algorithms reduce the number of potential candidate peptides against which scoring needs to be performed using a single (i.e. mass) property for filtering. While useful, filtering based on one property may lead to exclusion of non-abundant spectra and uncharacterized peptides - potentially exacerbating the streetlight effect. Here we present ProteoRift, a novel attention and multitask deep-network, which can predict multiple peptide properties (length, missed cleavages, and modification status) directly from spectra. We demonstrate that ProteoRift can predict these properties with up to 97% accuracy resulting in search-space reduction by more than 90%. As a result, our end-to-end pipeline is shown to exhibit 8x to 12x speedups with peptide deduction accuracy comparable to algorithmic techniques. We also formulate two uncertainty estimation metrics, which can distinguish between in-distribution and out-of-distribution data (ROC-AUC 0.99) and predict high-scoring mass spectra against correct peptide (ROC-AUC 0.94). These models and metrics are integrated in an end-to-end ML pipeline available at https://github.com/pcdslab/ProteoRift.
Collapse
Affiliation(s)
- Usman Tariq
- Knight Foundation School of Computing, and Information Sciences, Florida International University (FIU), Miami, FL USA
| | - Fahad Saeed
- Knight Foundation School of Computing, and Information Sciences, Florida International University (FIU), Miami, FL USA
- Biomolecular Sciences Institute (BSI), Florida International University, Miami, FL, USA
- Department of Human and Molecular Genetics, Herbert Wertheim School of Medicine, Florida International University, Miami, FL, USA
| |
Collapse
|
3
|
Xue M, Xie Y, Zang X, Zhong Y, Ma X, Sun H, Liu J. Deciphering functional groups of rumen microbiome and their underlying potentially causal relationships in shaping host traits. IMETA 2024; 3:e225. [PMID: 39135684 PMCID: PMC11316931 DOI: 10.1002/imt2.225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 06/25/2024] [Accepted: 06/26/2024] [Indexed: 08/15/2024]
Abstract
Over the years, microbiome research has achieved tremendous advancements driven by culture-independent meta-omics approaches. Despite extensive research, our understanding of the functional roles and causal effects of the microbiome on phenotypes remains limited. In this study, we focused on the rumen metaproteome, combining it with metatranscriptome and metabolome data to accurately identify the active functional distributions of rumen microorganisms and specific functional groups that influence feed efficiency. By integrating host genetics data, we established the potentially causal relationships between microbes-proteins/metabolites-phenotype, and identified specific patterns in which functional groups of rumen microorganisms influence host feed efficiency. We found a causal link between Selenomonas bovis and rumen carbohydrate metabolism, potentially mediated by bacterial chemotaxis and a two-component regulatory system, impacting feed utilization efficiency of dairy cows. Our study on the nutrient utilization functional groups in the rumen of high-feed-efficiency dairy cows, along with the identification of key microbiota functional proteins and their potentially causal relationships, will help move from correlation to causation in rumen microbiome research. This will ultimately enable precise regulation of the rumen microbiota for optimized ruminant production.
Collapse
Affiliation(s)
- Ming‐Yuan Xue
- Institute of Dairy Science, College of Animal SciencesZhejiang UniversityHangzhouChina
- Xianghu LaboratoryHangzhouChina
| | - Yun‐Yi Xie
- Institute of Dairy Science, College of Animal SciencesZhejiang UniversityHangzhouChina
| | - Xin‐Wei Zang
- Institute of Dairy Science, College of Animal SciencesZhejiang UniversityHangzhouChina
| | - Yi‐Fan Zhong
- Institute of Dairy Science, College of Animal SciencesZhejiang UniversityHangzhouChina
| | - Xiao‐Jiao Ma
- Institute of Dairy Science, College of Animal SciencesZhejiang UniversityHangzhouChina
| | - Hui‐Zeng Sun
- Institute of Dairy Science, College of Animal SciencesZhejiang UniversityHangzhouChina
- Ministry of Education Key Laboratory of Molecular Animal NutritionZhejiang UniversityHangzhouChina
| | - Jian‐Xin Liu
- Institute of Dairy Science, College of Animal SciencesZhejiang UniversityHangzhouChina
- Ministry of Education Key Laboratory of Molecular Animal NutritionZhejiang UniversityHangzhouChina
| |
Collapse
|
4
|
Wu E, Xu G, Xie D, Qiao L. Data-independent acquisition in metaproteomics. Expert Rev Proteomics 2024; 21:271-280. [PMID: 39152734 DOI: 10.1080/14789450.2024.2394190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 08/12/2024] [Accepted: 08/14/2024] [Indexed: 08/19/2024]
Abstract
INTRODUCTION Metaproteomics offers insights into the function of complex microbial communities, while it is also capable of revealing microbe-microbe and host-microbe interactions. Data-independent acquisition (DIA) mass spectrometry is an emerging technology, which holds great potential to achieve deep and accurate metaproteomics with higher reproducibility yet still facing a series of challenges due to the inherent complexity of metaproteomics and DIA data. AREAS COVERED This review offers an overview of the DIA metaproteomics approaches, covering aspects such as database construction, search strategy, and data analysis tools. Several cases of current DIA metaproteomics studies are presented to illustrate the procedures. Important ongoing challenges are also highlighted. Future perspectives of DIA methods for metaproteomics analysis are further discussed. Cited references are searched through and collected from Google Scholar and PubMed. EXPERT OPINION Considering the inherent complexity of DIA metaproteomics data, data analysis strategies specifically designed for interpretation are imperative. From this point of view, we anticipate that deep learning methods and de novo sequencing methods will become more prevalent in the future, potentially improving protein coverage in metaproteomics. Moreover, the advancement of metaproteomics also depends on the development of sample preparation methods, data analysis strategies, etc. These factors are key to unlocking the full potential of metaproteomics.
Collapse
Affiliation(s)
- Enhui Wu
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China
- Department of Chemistry, Fudan University, Shanghai, China
| | - Guanyang Xu
- Department of Chemistry, Fudan University, Shanghai, China
| | - Dong Xie
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China
| | - Liang Qiao
- Department of Chemistry, Fudan University, Shanghai, China
| |
Collapse
|
5
|
Alves G, Ogurtsov AY, Porterfield H, Maity T, Jenkins LM, Sacks DB, Yu YK. Multiplexing the Identification of Microorganisms via Tandem Mass Tag Labeling Augmented by Interference Removal through a Novel Modification of the Expectation Maximization Algorithm. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:1138-1155. [PMID: 38740383 PMCID: PMC11157548 DOI: 10.1021/jasms.3c00445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 04/12/2024] [Accepted: 04/17/2024] [Indexed: 05/16/2024]
Abstract
Having fast, accurate, and broad spectrum methods for the identification of microorganisms is of paramount importance to public health, research, and safety. Bottom-up mass spectrometer-based proteomics has emerged as an effective tool for the accurate identification of microorganisms from microbial isolates. However, one major hurdle that limits the deployment of this tool for routine clinical diagnosis, and other areas of research such as culturomics, is the instrument time required for the mass spectrometer to analyze a single sample, which can take ∼1 h per sample, when using mass spectrometers that are presently used in most institutes. To address this issue, in this study, we employed, for the first time, tandem mass tags (TMTs) in multiplex identifications of microorganisms from multiple TMT-labeled samples in one MS/MS experiment. A difficulty encountered when using TMT labeling is the presence of interference in the measured intensities of TMT reporter ions. To correct for interference, we employed in the proposed method a modified version of the expectation maximization (EM) algorithm that redistributes the signal from ion interference back to the correct TMT-labeled samples. We have evaluated the sensitivity and specificity of the proposed method using 94 MS/MS experiments (covering a broad range of protein concentration ratios across TMT-labeled channels and experimental parameters), containing a total of 1931 true positive TMT-labeled channels and 317 true negative TMT-labeled channels. The results of the evaluation show that the proposed method has an identification sensitivity of 93-97% and a specificity of 100% at the species level. Furthermore, as a proof of concept, using an in-house-generated data set composed of some of the most common urinary tract pathogens, we demonstrated that by using the proposed method the mass spectrometer time required per sample, using a 1 h LC-MS/MS run, can be reduced to 10 and 6 min when samples are labeled with TMT-6 and TMT-10, respectively. The proposed method can also be used along with Orbitrap mass spectrometers that have faster MS/MS acquisition rates, like the recently released Orbitrap Astral mass spectrometer, to further reduce the mass spectrometer time required per sample.
Collapse
Affiliation(s)
- Gelio Alves
- National
Center for Biotechnology Information, National Library of Medicine,
National Institutes of Health, Bethesda, Maryland 20894, United States
| | - Aleksey Y. Ogurtsov
- National
Center for Biotechnology Information, National Library of Medicine,
National Institutes of Health, Bethesda, Maryland 20894, United States
| | - Harry Porterfield
- Department
of Laboratory Medicine, Clinical Center, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Tapan Maity
- Laboratory
of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Lisa M. Jenkins
- Laboratory
of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - David B. Sacks
- Department
of Laboratory Medicine, Clinical Center, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Yi-Kuo Yu
- National
Center for Biotechnology Information, National Library of Medicine,
National Institutes of Health, Bethesda, Maryland 20894, United States
| |
Collapse
|
6
|
Nebauer DJ, Pearson LA, Neilan BA. Critical steps in an environmental metaproteomics workflow. Environ Microbiol 2024; 26:e16637. [PMID: 38760994 DOI: 10.1111/1462-2920.16637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 04/30/2024] [Indexed: 05/20/2024]
Abstract
Environmental metaproteomics is a rapidly advancing field that provides insights into the structure, dynamics, and metabolic activity of microbial communities. As the field is still maturing, it lacks consistent workflows, making it challenging for non-expert researchers to navigate. This review aims to introduce the workflow of environmental metaproteomics. It outlines the standard practices for sample collection, processing, and analysis, and offers strategies to overcome the unique challenges presented by common environmental matrices such as soil, freshwater, marine environments, biofilms, sludge, and symbionts. The review also highlights the bottlenecks in data analysis that are specific to metaproteomics samples and provides suggestions for researchers to obtain high-quality datasets. It includes recent benchmarking studies and descriptions of software packages specifically built for metaproteomics analysis. The article is written without assuming the reader's familiarity with single-organism proteomic workflows, making it accessible to those new to proteomics or mass spectrometry in general. This primer for environmental metaproteomics aims to improve accessibility to this exciting technology and empower researchers to tackle challenging and ambitious research questions. While it is primarily a resource for those new to the field, it should also be useful for established researchers looking to streamline or troubleshoot their metaproteomics experiments.
Collapse
Affiliation(s)
- Daniel J Nebauer
- School of Environmental and Life Sciences, The University of Newcastle, Callaghan, New South Wales, Australia
- Centre of Excellence in Synthetic Biology, Australian Research Council, Sydney, New South Wales, Australia
| | - Leanne A Pearson
- School of Environmental and Life Sciences, The University of Newcastle, Callaghan, New South Wales, Australia
- Centre of Excellence in Synthetic Biology, Australian Research Council, Sydney, New South Wales, Australia
| | - Brett A Neilan
- School of Environmental and Life Sciences, The University of Newcastle, Callaghan, New South Wales, Australia
- Centre of Excellence in Synthetic Biology, Australian Research Council, Sydney, New South Wales, Australia
| |
Collapse
|
7
|
Ogurtsov A, Alves G, Rubio A, Joyce B, Andersson B, Karlsson R, Moore ER, Yu YK. MiCId GUI: The Graphical User Interface for MiCId, a Fast Microorganism Classification and Identification Workflow with Accurate Statistics and High Recall. J Comput Biol 2024; 31:175-178. [PMID: 38301204 PMCID: PMC10874827 DOI: 10.1089/cmb.2023.0149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024] Open
Abstract
Although many user-friendly workflows exist for identifications of peptides and proteins in mass-spectrometry-based proteomics, there is a need of easy to use, fast, and accurate workflows for identifications of microorganisms, antimicrobial resistant proteins, and biomass estimation. Identification of microorganisms is a computationally demanding task that requires querying thousands of MS/MS spectra in a database containing thousands to tens of thousands of microorganisms. Existing software can't handle such a task in a time efficient manner, taking hours to process a single MS/MS experiment. Another paramount factor to consider is the necessity of accurate statistical significance to properly control the proportion of false discoveries among the identified microorganisms, and antimicrobial-resistant proteins, and to provide robust biomass estimation. Recently, we have developed Microorganism Classification and Identification (MiCId) workflow that assigns accurate statistical significance to identified microorganisms, antimicrobial-resistant proteins, and biomass estimation. MiCId's workflow is also computationally efficient, taking about 6-17 minutes to process a tandem mass-spectrometry (MS/MS) experiment using computer resources that are available in most laptop and desktop computers, making it a portable workflow. To make data analysis accessible to a broader range of users, beyond users familiar with the Linux environment, we have developed a graphical user interface (GUI) for MiCId's workflow. The GUI brings to users all the functionality of MiCId's workflow in a friendly interface along with tools for data analysis, visualization, and to export results.
Collapse
Affiliation(s)
- Aleksey Ogurtsov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Alex Rubio
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Brendan Joyce
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Björn Andersson
- Bioinformatics Core Facility, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Roger Karlsson
- Department of Infectious Diseases, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
- Nanoxis Consulting AB, Gothenburg, Sweden
| | - Edward R.B. Moore
- Department of Infectious Diseases, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
- Culture Collection University of Gothenburg, Sahlgrenska Academy, University of Gothenburg, Sweden
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
8
|
Holstein T, Muth T. Bioinformatic Workflows for Metaproteomics. Methods Mol Biol 2024; 2820:187-213. [PMID: 38941024 DOI: 10.1007/978-1-0716-3910-8_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]
Abstract
The strong influence of microbiomes on areas such as ecology and human health has become widely recognized in the past years. Accordingly, various techniques for the investigation of the composition and function of microbial community samples have been developed. Metaproteomics, the comprehensive analysis of the proteins from microbial communities, allows for the investigation of not only the taxonomy but also the functional and quantitative composition of microbiome samples. Due to the complexity of the investigated communities, methods developed for single organism proteomics cannot be readily applied to metaproteomic samples. For this purpose, methods specifically tailored to metaproteomics are required. In this work, a detailed overview of current bioinformatic solutions and protocols in metaproteomics is given. After an introduction to the proteomic database search, the metaproteomic post-processing steps are explained in detail. Ten specific bioinformatic software solutions are focused on, covering various steps including database-driven identification and quantification as well as taxonomic and functional assignment.
Collapse
Affiliation(s)
- Tanja Holstein
- Section eScience (S.3), Federal Institute for Materials Research and Testing, Berlin, Germany
- VIB-UGent Center for Medical Biotechnology, VIB and Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Data Competence Center, Robert Koch Institute, Berlin, Deutschland
| | - Thilo Muth
- Section eScience (S.3), Federal Institute for Materials Research and Testing, Berlin, Germany.
- Data Competence Center, Robert Koch Institute, Berlin, Deutschland.
| |
Collapse
|
9
|
Haseeb M, Saeed F. GPU-acceleration of the distributed-memory database peptide search of mass spectrometry data. Sci Rep 2023; 13:18713. [PMID: 37907498 PMCID: PMC10618243 DOI: 10.1038/s41598-023-43033-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 09/18/2023] [Indexed: 11/02/2023] Open
Abstract
Database peptide search is the primary computational technique for identifying peptides from the mass spectrometry (MS) data. Graphical Processing Units (GPU) computing is now ubiquitous in the current-generation of high-performance computing (HPC) systems, yet its application in the database peptide search domain remains limited. Part of the reason is the use of sub-optimal algorithms in the existing GPU-accelerated methods resulting in significantly inefficient hardware utilization. In this paper, we design and implement a new-age CPU-GPU HPC framework, called GiCOPS, for efficient and complete GPU-acceleration of the modern database peptide search algorithms on supercomputers. Our experimentation shows that the GiCOPS exhibits between 1.2 to 5[Formula: see text] speed improvement over its CPU-only predecessor, HiCOPS, and over 10[Formula: see text] improvement over several existing GPU-based database search algorithms for sufficiently large experiment sizes. We further assess and optimize the performance of our framework using the Roofline Model and report near-optimal results for several metrics including computations per second, occupancy rate, memory workload, branch efficiency and shared memory performance. Finally, the CPU-GPU methods and optimizations proposed in our work for complex integer- and memory-bounded algorithmic pipelines can also be extended to accelerate the existing and future peptide identification algorithms. GiCOPS is now integrated with our umbrella HPC framework HiCOPS and is available at: https://github.com/pcdslab/gicops .
Collapse
Affiliation(s)
- Muhammad Haseeb
- Knight Foundation School of Computing and Information Sciences, Florida International University (FIU), Miami, FL, USA
| | - Fahad Saeed
- Knight Foundation School of Computing and Information Sciences, Florida International University (FIU), Miami, FL, USA.
- Biomolecular Sciences Institute (BSI), Miami, FL, USA.
- Department of Human and Molecular Genetics, Herbert Wertheim School of Medicine, Florida International University, Miami, FL, USA.
| |
Collapse
|
10
|
Hao C, Elias JE, Lee PKH, Lam H. metaSpectraST: an unsupervised and database-independent analysis workflow for metaproteomic MS/MS data using spectrum clustering. MICROBIOME 2023; 11:176. [PMID: 37550758 PMCID: PMC10405559 DOI: 10.1186/s40168-023-01602-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 06/18/2023] [Indexed: 08/09/2023]
Abstract
BACKGROUND The high diversity and complexity of the microbial community make it a formidable challenge to identify and quantify the large number of proteins expressed in the community. Conventional metaproteomics approaches largely rely on accurate identification of the MS/MS spectra to their corresponding short peptides in the digested samples, followed by protein inference and subsequent taxonomic and functional analysis of the detected proteins. These approaches are dependent on the availability of protein sequence databases derived either from sample-specific metagenomic data or from public repositories. Due to the incompleteness and imperfections of these protein sequence databases, and the preponderance of homologous proteins expressed by different bacterial species in the community, this computational process of peptide identification and protein inference is challenging and error-prone, which hinders the comparison of metaproteomes across multiple samples. RESULTS We developed metaSpectraST, an unsupervised and database-independent metaproteomics workflow, which quantitatively profiles and compares metaproteomics samples by clustering experimentally observed MS/MS spectra based on their spectral similarity. We applied metaSpectraST to fecal samples collected from littermates of two different mother mice right after weaning. Quantitative proteome profiles of the microbial communities of different mice were obtained without any peptide-spectrum identification and used to evaluate the overall similarity between samples and highlight any differentiating markers. Compared to the conventional database-dependent metaproteomics analysis, metaSpectraST is more successful in classifying the samples and detecting the subtle microbiome changes of mouse gut microbiomes post-weaning. metaSpectraST could also be used as a tool to select the suitable biological replicates from samples with wide inter-individual variation. CONCLUSIONS metaSpectraST enables rapid profiling of metaproteomic samples quantitatively, without the need for constructing the protein sequence database or identification of the MS/MS spectra. It maximally preserves information contained in the experimental MS/MS spectra by clustering all of them first and thus is able to better profile the complex microbial communities and highlight their functional changes, as compared with conventional approaches. tag the videobyte in this section as ESM4 Video Abstract.
Collapse
Affiliation(s)
- Chunlin Hao
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
- School of Energy and Environment, City University of Hong Kong, Hong Kong SAR, China
| | | | - Patrick K. H. Lee
- School of Energy and Environment, City University of Hong Kong, Hong Kong SAR, China
- State Key Laboratory of Marine Pollution, City University of Hong Kong, Hong Kong SAR, China
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| |
Collapse
|
11
|
Simopoulos CMA, Figeys D, Lavallée-Adam M. Novel Bioinformatics Strategies Driving Dynamic Metaproteomic Studies. Methods Mol Biol 2022; 2456:319-338. [PMID: 35612752 DOI: 10.1007/978-1-0716-2124-0_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Constant improvements in mass spectrometry technologies and laboratory workflows have enabled the proteomics investigation of biological samples of growing complexity. Microbiomes represent such complex samples for which metaproteomics analyses are becoming increasingly popular. Metaproteomics experimental procedures create large amounts of data from which biologically relevant signal must be efficiently extracted to draw meaningful conclusions. Such a data processing requires appropriate bioinformatics tools specifically developed for, or capable of handling metaproteomics data. In this chapter, we outline current and novel tools that can perform the most commonly used steps in the analysis of cutting-edge metaproteomics data, such as peptide and protein identification and quantification, as well as data normalization, imputation, mining, and visualization. We also provide details about the experimental setups in which these tools should be used.
Collapse
Affiliation(s)
- Caitlin M A Simopoulos
- Department of Biochemistry, Microbiology and Immunology and Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON, Canada
| | - Daniel Figeys
- Department of Biochemistry, Microbiology and Immunology and Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON, Canada
- School of Pharmaceutical Sciences, University of Ottawa, Ottawa, ON, Canada
| | - Mathieu Lavallée-Adam
- Department of Biochemistry, Microbiology and Immunology and Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON, Canada.
| |
Collapse
|
12
|
Pettersen VK, Antunes LCM, Dufour A, Arrieta MC. Inferring early-life host and microbiome functions by mass spectrometry-based metaproteomics and metabolomics. Comput Struct Biotechnol J 2021; 20:274-286. [PMID: 35024099 PMCID: PMC8718658 DOI: 10.1016/j.csbj.2021.12.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 12/08/2021] [Accepted: 12/08/2021] [Indexed: 12/17/2022] Open
Abstract
Humans have a long-standing coexistence with microorganisms. In particular, the microbial community that populates the human gastrointestinal tract has emerged as a critical player in governing human health and disease. DNA and RNA sequencing techniques that map taxonomical composition and genomic potential of the gut community have become invaluable for microbiome research. However, deriving a biochemical understanding of how activities of the gut microbiome shape host development and physiology requires an expanded experimental design that goes beyond these approaches. In this review, we explore advances in high-throughput techniques based on liquid chromatography-mass spectrometry. These omics methods for the identification of proteins and metabolites have enabled direct characterisation of gut microbiome functions and the crosstalk with the host. We discuss current metaproteomics and metabolomics workflows for producing functional profiles, the existing methodological challenges and limitations, and recent studies utilising these techniques with a special focus on early life gut microbiome.
Collapse
Affiliation(s)
- Veronika Kuchařová Pettersen
- Research Group for Host-Microbe Interactions, Department of Medical Biology, UiT The Arctic University of Norway, Tromsø, Norway
- Pediatric Research Group, Department of Clinical Medicine, UiT The Arctic University of Norway, Tromsø, Norway
- Centre for New Antibacterial Strategies, UiT The Arctic University of Norway, Tromsø, Norway
| | - Luis Caetano Martha Antunes
- Oswaldo Cruz Institute, Oswaldo Cruz Foundation, Rio de Janeiro, RJ, Brazil
- National Institute of Science and Technology of Innovation on Diseases of Neglected Populations, Center for Technological Development in Health, Oswaldo Cruz Foundation, Rio de Janeiro, RJ, Brazil
| | - Antoine Dufour
- Department of Physiology & Pharmacology, University of Calgary, Calgary, Canada
| | - Marie-Claire Arrieta
- Department of Physiology & Pharmacology, University of Calgary, Calgary, Canada
- Department of Pediatrics, University of Calgary, Calgary, AB, Canada
- International Microbiome Centre, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
13
|
Haseeb M, Saeed F. High Performance Computing Framework for Tera-Scale Database Search of Mass Spectrometry Data. NATURE COMPUTATIONAL SCIENCE 2021; 1:550-561. [PMID: 34723198 PMCID: PMC8554525 DOI: 10.1038/s43588-021-00113-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 07/16/2021] [Indexed: 05/09/2023]
Abstract
Database peptide search algorithms deduce peptides from mass spectrometry (MS) data. There has been substantial effort in improving their computational efficiency to achieve larger and more complex systems biology studies. However, modern serial and high-performance computing (HPC) algorithms exhibit sub-optimal performance mainly due to their ineffective parallel designs (low resource utilization), and high overhead costs. We present an HPC framework, called HiCOPS, for efficient acceleration of the database peptide search algorithms on distributed-memory supercomputers. HiCOPS provides, on average, more than 10-fold improvement in speed, and superior parallel performance over several existing HPC database search software. We also formulate a mathematical model for performance analysis and optimization, and report near-optimal results for several key metrics including strong-scale efficiency, hardware utilization, load-balance, inter-process communication and I/O overheads. The core parallel design, techniques, and optimizations presented in HiCOPS are search-algorithm independent and can be extended to efficiently accelerate the existing and future algorithms and software.
Collapse
Affiliation(s)
- Muhammad Haseeb
- Knight Foundation School of Computing and Information
Sciences, Florida International University, Miami, FL, USA
| | - Fahad Saeed
- Knight Foundation School of Computing and Information
Sciences, Florida International University, Miami, FL, USA
- Biomolecular Sciences Institute (BSI), Florida
International University, Miami, FL, USA
- Department of Human and Molecular Genetics, Herbert
Wertheim School of Medicine, Florida International University, Miami, FL, USA
| |
Collapse
|
14
|
Stamboulian M, Li S, Ye Y. Using high-abundance proteins as guides for fast and effective peptide/protein identification from human gut metaproteomic data. MICROBIOME 2021; 9:80. [PMID: 33795009 PMCID: PMC8017886 DOI: 10.1186/s40168-021-01035-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 02/11/2021] [Indexed: 05/23/2023]
Abstract
BACKGROUND A few recent large efforts significantly expanded the collection of human-associated bacterial genomes, which now contains thousands of entities including reference complete/draft genomes and metagenome assembled genomes (MAGs). These genomes provide useful resource for studying the functionality of the human-associated microbiome and their relationship with human health and diseases. One application of these genomes is to provide a universal reference for database search in metaproteomic studies, when matched metagenomic/metatranscriptomic data are unavailable. However, a greater collection of reference genomes may not necessarily result in better peptide/protein identification because the increase of search space often leads to fewer spectrum-peptide matches, not to mention the drastic increase of computation time. Video Abstract METHODS: Here, we present a new approach that uses two steps to optimize the use of the reference genomes and MAGs as the universal reference for human gut metaproteomic MS/MS data analysis. The first step is to use only the high-abundance proteins (HAPs) (i.e., ribosomal proteins and elongation factors) for metaproteomic MS/MS database search and, based on the identification results, to derive the taxonomic composition of the underlying microbial community. The second step is to expand the search database by including all proteins from identified abundant species. We call our approach HAPiID (HAPs guided metaproteomics IDentification). RESULTS We tested our approach using human gut metaproteomic datasets from a previous study and compared it to the state-of-the-art reference database search method MetaPro-IQ for metaproteomic identification in studying human gut microbiota. Our results show that our two-steps method not only performed significantly faster but also was able to identify more peptides. We further demonstrated the application of HAPiID to revealing protein profiles of individual human-associated bacterial species, one or a few species at a time, using metaproteomic data. CONCLUSIONS The HAP guided profiling approach presents a novel effective way for constructing target database for metaproteomic data analysis. The HAPiID pipeline built upon this approach provides a universal tool for analyzing human gut-associated metaproteomic data.
Collapse
Affiliation(s)
- Moses Stamboulian
- Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, 47408 United States
| | - Sujun Li
- Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, 47408 United States
| | - Yuzhen Ye
- Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, 47408 United States
| |
Collapse
|
15
|
Bassignani A, Plancade S, Berland M, Blein-Nicolas M, Guillot A, Chevret D, Moritz C, Huet S, Rizkalla S, Clément K, Doré J, Langella O, Juste C. Benefits of Iterative Searches of Large Databases to Interpret Large Human Gut Metaproteomic Data Sets. J Proteome Res 2021; 20:1522-1534. [PMID: 33528260 DOI: 10.1021/acs.jproteome.0c00669] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The gut microbiota are increasingly considered as a main partner of human health. Metaproteomics enables us to move from the functional potential revealed by metagenomics to the functions actually operating in the microbiome. However, metaproteome deciphering remains challenging. In particular, confident interpretation of a myriad of MS/MS spectra can only be pursued with smart database searches. Here, we compare the interpretation of MS/MS data sets from 48 individual human gut microbiomes using three interrogation strategies of the dedicated Integrated nonredundant Gene Catalog (IGC 9.9 million genes from 1267 individual fecal samples) together with the Homo sapiens database: the classical single-step interrogation strategy and two iterative strategies (in either two or three steps) aimed at preselecting a reduced-sized, more targeted search space for the final peptide spectrum matching. Both iterative searches outperformed the single-step classical search in terms of the number of peptides and protein clusters identified and the depth of taxonomic and functional knowledge, and this was the most convincing with the three-step approach. However, iterative searches do not help in reducing variability of repeated analyses, which is inherent to the traditional data-dependent acquisition mode, but this variability did not affect the hierarchical relationship between replicates and all other samples.
Collapse
Affiliation(s)
- Ariane Bassignani
- Université Paris-Saclay, INRAE, MGP, 78350, Jouy-en-Josas, France.,Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France.,Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France.,MaIAGE, INRAE, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Sandra Plancade
- MaIAGE, INRAE, Université Paris-Saclay, 78350 Jouy-en-Josas, France.,INRAE, UR875 MIAT, F-31326 Castanet-Tolosan, France
| | - Magali Berland
- Université Paris-Saclay, INRAE, MGP, 78350, Jouy-en-Josas, France
| | - Melisande Blein-Nicolas
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France
| | - Alain Guillot
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France
| | - Didier Chevret
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France
| | - Chloé Moritz
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France
| | - Sylvie Huet
- MaIAGE, INRAE, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Salwa Rizkalla
- Sorbonne Université, Inserm, UMRS Nutrition et Obésités; approches systémiques, Paris 75006, France.,Assistance Publique Hôpitaux de Paris, Service de Nutrition, CRNH Ile-de-France, Pitié-Salpêtrière Hospital, Paris 75013, France
| | - Karine Clément
- Sorbonne Université, Inserm, UMRS Nutrition et Obésités; approches systémiques, Paris 75006, France.,Assistance Publique Hôpitaux de Paris, Service de Nutrition, CRNH Ile-de-France, Pitié-Salpêtrière Hospital, Paris 75013, France
| | - Joël Doré
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France
| | - Olivier Langella
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France
| | - Catherine Juste
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France
| |
Collapse
|
16
|
Yan Z, He F, Xiao F, He H, Li D, Cong L, Lin L, Zhu H, Wu Y, Yan R, Li X, Shan H. A semi-tryptic peptide centric metaproteomic mining approach and its potential utility in capturing signatures of gut microbial proteolysis. MICROBIOME 2021; 9:12. [PMID: 33436102 PMCID: PMC7805185 DOI: 10.1186/s40168-020-00967-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 12/06/2020] [Indexed: 05/05/2023]
Abstract
BACKGROUND Proteolysis regulation allows gut microbes to respond rapidly to dynamic intestinal environments by fast degradation of misfolded proteins and activation of regulatory proteins. However, alterations of gut microbial proteolytic signatures under complex disease status such as inflammatory bowel disease (IBD, including Crohn's disease (CD) and ulcerative colitis (UC)), have not been investigated. Metaproteomics holds the potential to investigate gut microbial proteolysis because semi-tryptic peptides mainly derive from endogenous proteolysis. RESULTS We have developed a semi-tryptic peptide centric metaproteomic mining approach to obtain a snapshot of human gut microbial proteolysis signatures. This approach employed a comprehensive meta-database, two-step multiengine database search, and datasets with high-resolution fragmentation spectra to increase the confidence of semi-tryptic peptide identification. The approach was validated by discovering altered proteolysis signatures of Escherichia coli heat shock response. Utilizing two published large-scale metaproteomics datasets containing 623 metaproteomes from 447 fecal and 176 mucosal luminal interface (MLI) samples from IBD patients and healthy individuals, we obtain potential signatures of altered gut microbial proteolysis at taxonomic, functional, and cleavage site motif levels. The functional alterations mainly involved microbial carbohydrate transport and metabolism, oxidative stress, cell motility, protein synthesis, and maturation. Altered microbial proteolysis signatures of CD and UC mainly occurred in terminal ileum and descending colon, respectively. Microbial proteolysis patterns exhibited low correlations with β-diversity and moderate correlations with microbial protease and chaperones levels, respectively. Human protease inhibitors and immunoglobulins were mainly negatively associated with microbial proteolysis patterns, probably because of the inhibitory effects of these host factors on gut microbial proteolysis events. CONCLUSIONS This semi-tryptic peptide centric mining strategy offers a label-free approach to discover signatures of in vivo gut microbial proteolysis events if experimental conditions are well controlled. It can also capture in vitro proteolysis signatures to facilitate the evaluation and optimization of experimental conditions. Our findings highlight the complex and diverse proteolytic events of gut microbiome, providing a unique layer of information beyond taxonomic and proteomic abundance. Video abstract.
Collapse
Affiliation(s)
- Zhixiang Yan
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, 519000, Guangdong Province, China.
| | - Feixiang He
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, 519000, Guangdong Province, China
| | - Fei Xiao
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, 519000, Guangdong Province, China
| | - Huanhuan He
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, 519000, Guangdong Province, China
| | - Dan Li
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, 519000, Guangdong Province, China
| | - Li Cong
- Department of Endocrinology and Metabolism, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, 519000, Guangdong Province, China
| | - Lu Lin
- Department of Gastroenterology, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, 519000, Guangdong Province, China
| | - Huijin Zhu
- Department of Gastroenterology, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, 519000, Guangdong Province, China
| | - Yanyan Wu
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, 519000, Guangdong Province, China
| | - Ru Yan
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Taipa, Macao, China.
| | - Xiaofeng Li
- Department of Gastroenterology, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, 519000, Guangdong Province, China.
| | - Hong Shan
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, 519000, Guangdong Province, China.
- Center for Interventional Medicine, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, 519000, Guangdong Province, China.
| |
Collapse
|
17
|
Kumar Awasthi M, Ravindran B, Sarsaiya S, Chen H, Wainaina S, Singh E, Liu T, Kumar S, Pandey A, Singh L, Zhang Z. Metagenomics for taxonomy profiling: tools and approaches. Bioengineered 2020; 11:356-374. [PMID: 32149573 PMCID: PMC7161568 DOI: 10.1080/21655979.2020.1736238] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2019] [Revised: 02/20/2020] [Accepted: 02/21/2020] [Indexed: 12/25/2022] Open
Abstract
The study of metagenomics is an emerging field that identifies the total genetic materials in an organism along with the set of all genetic materials like deoxyribonucleic acid and ribose nucleic acid, which play a key role with the maintenance of cellular functions. The best part of this technology is that it gives more flexibility to environmental microbiologists to instantly pioneer the immense genetic variability of microbial communities. However, it is intensively complex to identify the suitable sequencing measures of any specific gene that can exclusively indicate the involvement of microbial metagenomes and be able to advance valuable results about these communities. This review provides an overview of the metagenomic advancement that has been advantageous for aggregation of more knowledge about specific genes, microbial communities and its metabolic pathways. More specific drawbacks of metagenomes technology mainly depend on sequence-based analysis. Therefore, this 'targeted based metagenomics' approach will give comprehensive knowledge about the ecological, evolutionary and functional sequence of significantly important genes that naturally exist in living beings either human, animal and microorganisms from distinctive ecosystems.
Collapse
Affiliation(s)
- Mukesh Kumar Awasthi
- College of Natural Resources and Environment, Northwest A&F University, Yangling, Shaanxi Province, China
- Swedish Centre for Resource Recovery, University of Borås, Borås, Sweden
| | - B. Ravindran
- Department of Environmental Energy and Engineering, Kyonggi University Youngtong-Gu, Suwon, South Korea
| | - Surendra Sarsaiya
- Key Laboratory of Basic Pharmacology of Ministry of Education, Zunyi Medical University, Zunyi, Guizhou, China
| | - Hongyu Chen
- Institute of Biology, Freie Universität Berlin Altensteinstr, Berlin, Germany
| | - Steven Wainaina
- Swedish Centre for Resource Recovery, University of Borås, Borås, Sweden
| | - Ekta Singh
- CSIR-National Environmental Engineering Research Institute, Nagpur, India
| | - Tao Liu
- College of Natural Resources and Environment, Northwest A&F University, Yangling, Shaanxi Province, China
| | - Sunil Kumar
- CSIR-National Environmental Engineering Research Institute, Nagpur, India
| | - Ashok Pandey
- Centre for Innovation and Translational Research CSIR-Indian Institute of Toxicology Research, Lucknow, India
| | - Lal Singh
- CSIR-National Environmental Engineering Research Institute, Nagpur, India
| | - Zengqiang Zhang
- College of Natural Resources and Environment, Northwest A&F University, Yangling, Shaanxi Province, China
| |
Collapse
|
18
|
Schiebenhoefer H, Schallert K, Renard BY, Trappe K, Schmid E, Benndorf D, Riedel K, Muth T, Fuchs S. A complete and flexible workflow for metaproteomics data analysis based on MetaProteomeAnalyzer and Prophane. Nat Protoc 2020; 15:3212-3239. [PMID: 32859984 DOI: 10.1038/s41596-020-0368-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 05/29/2020] [Indexed: 12/14/2022]
Abstract
Metaproteomics, the study of the collective protein composition of multi-organism systems, provides deep insights into the biodiversity of microbial communities and the complex functional interplay between microbes and their hosts or environment. Thus, metaproteomics has become an indispensable tool in various fields such as microbiology and related medical applications. The computational challenges in the analysis of corresponding datasets differ from those of pure-culture proteomics, e.g., due to the higher complexity of the samples and the larger reference databases demanding specific computing pipelines. Corresponding data analyses usually consist of numerous manual steps that must be closely synchronized. With MetaProteomeAnalyzer and Prophane, we have established two open-source software solutions specifically developed and optimized for metaproteomics. Among other features, peptide-spectrum matching is improved by combining different search engines and, compared to similar tools, metaproteome annotation benefits from the most comprehensive set of available databases (such as NCBI, UniProt, EggNOG, PFAM, and CAZy). The workflow described in this protocol combines both tools and leads the user through the entire data analysis process, including protein database creation, database search, protein grouping and annotation, and results visualization. To the best of our knowledge, this protocol presents the most comprehensive, detailed and flexible guide to metaproteomics data analysis to date. While beginners are provided with robust, easy-to-use, state-of-the-art data analysis in a reasonable time (a few hours, depending on, among other factors, the protein database size and the number of identified peptides and inferred proteins), advanced users benefit from the flexibility and adaptability of the workflow.
Collapse
Affiliation(s)
- Henning Schiebenhoefer
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Hasso Plattner Institute, Faculty for Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Kay Schallert
- Bioprocess Engineering, Otto von Guericke University, Magdeburg, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Hasso Plattner Institute, Faculty for Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Emanuel Schmid
- ID Computational & Data Science Support, Eidgenössische Technische Hochschule, Zurich, Switzerland
| | - Dirk Benndorf
- Bioprocess Engineering, Otto von Guericke University, Magdeburg, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Katharina Riedel
- Center for Functional Genomics of Microbes (CFGM), Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Thilo Muth
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin, Germany
| | - Stephan Fuchs
- Department of Infectious Diseases, Robert Koch Institute, Wernigerode, Germany.
| |
Collapse
|
19
|
Yang L, Fan W, Xu Y. Metaproteomics insights into traditional fermented foods and beverages. Compr Rev Food Sci Food Saf 2020; 19:2506-2529. [PMID: 33336970 DOI: 10.1111/1541-4337.12601] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Revised: 06/14/2020] [Accepted: 06/17/2020] [Indexed: 12/13/2022]
Abstract
Traditional fermented foods and beverages (TFFB) are important dietary components. Multi-omics techniques have been applied to all aspects of TFFB research to clarify the composition and nutritional value of TFFB, and to reveal the microbial community, microbial interactions, fermentative kinetics, and metabolic profiles during the fermentation process of TFFB. Because of the advantages of metaproteomics in providing functional information, this technology has increasingly been used in research to assess the functional diversity of microbial communities. Metaproteomics is gradually gaining attention in the field of TFFB research because it can reveal the nature of microorganism function at the protein level. This paper reviews the common methods of metaproteomics applied in TFFB research; systematically summarizes the results of metaproteomics research on TFFB, such as sauces, wines, fermented tea, cheese, and fermented fish; and compares the differences in conclusions reached through metaproteomics versus other omics methods. Metaproteomics has great advantages in revealing the microbial functions in TFFB and the interaction between the materials and microbial community. In the future, metaproteomics should be further applied to the study of functional protein markers and protein interaction in TFFB; multi-omics technology requires further integration to reveal the molecular nature of TFFB fermentation.
Collapse
Affiliation(s)
- Liang Yang
- Key Laboratory of Industrial Biotechnology of Ministry of Education, Laboratory of Brewing Microbiology and Applied Enzymology, School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China
| | - Wenlai Fan
- Key Laboratory of Industrial Biotechnology of Ministry of Education, Laboratory of Brewing Microbiology and Applied Enzymology, School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China
| | - Yan Xu
- Key Laboratory of Industrial Biotechnology of Ministry of Education, Laboratory of Brewing Microbiology and Applied Enzymology, School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China
| |
Collapse
|
20
|
Rumen metaproteomics: Closer to linking rumen microbial function to animal productivity traits. Methods 2020; 186:42-51. [PMID: 32758682 DOI: 10.1016/j.ymeth.2020.07.011] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 06/12/2020] [Accepted: 07/27/2020] [Indexed: 12/28/2022] Open
Abstract
The rumen microbiome constitutes a dense and complex mixture of anaerobic bacteria, archaea, protozoa, virus and fungi. Collectively, rumen microbial populations interact closely in order to degrade and ferment complex plant material into nutrients for host metabolism, a process which also produces other by-products, such as methane gas. Our understanding of the rumen microbiome and its functions are of both scientific and industrial interest, as the metabolic functions are connected to animal health and nutrition, but at the same time contribute significantly to global greenhouse gas emissions. While many of the major microbial members of the rumen microbiome are acknowledged, advances in modern culture-independent meta-omic techniques, such as metaproteomics, enable deep exploration into active microbial populations involved in essential rumen metabolic functions. Meaningful and accurate metaproteomic analyses are highly dependent on representative samples, precise protein extraction and fractionation, as well as a comprehensive and high-quality protein sequence database that enables precise protein identification and quantification. This review focuses on the application of rumen metaproteomics, and its potential towards understanding the complex rumen microbiome and its metabolic functions. We present and discuss current methods in sample handling, protein extraction and data analysis for rumen metaproteomics, and finally emphasize the potential of (meta)genome-integrated metaproteomics for accurate reconstruction of active microbial populations in the rumen.
Collapse
|
21
|
Li S, Tang H, Ye Y. A Meta-proteogenomic Approach to Peptide Identification Incorporating Assembly Uncertainty and Genomic Variation. Mol Cell Proteomics 2019; 18:S183-S192. [PMID: 31142575 PMCID: PMC6692780 DOI: 10.1074/mcp.tir118.001233] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 04/25/2019] [Indexed: 01/07/2023] Open
Abstract
Matching metagenomic and/or metatranscriptomic data, currently often under-used, can be useful reference for metaproteomic tandem mass spectra (MS/MS) data analysis. Here we developed a software pipeline for identification of peptides and proteins from metaproteomic MS/MS data using proteins derived from matching metagenomic (and metatranscriptomic) data as the search database, based on two novel approaches Graph2Pro (published) and Var2Pep (new). Graph2Pro retains and uses uncertainties of metagenome assembly for reference-based MS/MS data analysis. Var2Pep considers the variations found in metagenomic/metatranscriptomic sequencing reads that are not retained in the assemblies (contigs). The new software pipeline provides one stop application of both tools, and it supports the use of metagenome assembly from commonly used assemblers including MegaHit and metaSPAdes. When tested on two collections of multi-omic microbiome data sets, our pipeline significantly improved the identification rate of the metaproteomic MS/MS spectra by about two folds, comparing to conventional contig- or read-based approaches (the Var2Pep alone identified 5.6% to 24.1% more unique peptides, depending on the data set). We also showed that identified variant peptides are important for functional profiling of microbiomes. All results suggested that it is important to take into consideration of the assembly uncertainties and genomic variants to facilitate metaproteomic MS/MS data interpretation.
Collapse
Affiliation(s)
- Sujun Li
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN
| | - Haixu Tang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN
| | - Yuzhen Ye
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN.
| |
Collapse
|
22
|
Peters DL, Wang W, Zhang X, Ning Z, Mayne J, Figeys D. Metaproteomic and Metabolomic Approaches for Characterizing the Gut Microbiome. Proteomics 2019; 19:e1800363. [PMID: 31321880 DOI: 10.1002/pmic.201800363] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 06/27/2019] [Indexed: 12/14/2022]
Abstract
The gut microbiome has been shown to play a significant role in human healthy and diseased states. The dynamic signaling that occurs between the host and microbiome is critical for the maintenance of host homeostasis. Analyzing the human microbiome with metaproteomics, metabolomics, and integrative multi-omics analyses can provide significant information on markers for healthy and diseased states, allowing for the eventual creation of microbiome-targeted treatments for diseases associated with dysbiosis. Metaproteomics enables functional activity information to be gained from the microbiome samples, while metabolomics provides insight into the overall metabolic states affecting/representing the host-microbiome interactions. Combining these functional -omic platforms together with microbiome composition profiling allows for a holistic overview on the functional and metabolic state of the microbiome and its influence on human health. Here the benefits of metaproteomics, metabolomics, and the integrative multi-omic approaches to investigating the gut microbiome in the context of human health and diseases are reviewed.
Collapse
Affiliation(s)
- Danielle L Peters
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, 451 Smyth Road, Ottawa, ON, KIH 8M5, Canada
| | - Wenju Wang
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, 451 Smyth Road, Ottawa, ON, KIH 8M5, Canada
| | - Xu Zhang
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, 451 Smyth Road, Ottawa, ON, KIH 8M5, Canada
| | - Zhibin Ning
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, 451 Smyth Road, Ottawa, ON, KIH 8M5, Canada
| | - Janice Mayne
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, 451 Smyth Road, Ottawa, ON, KIH 8M5, Canada
| | - Daniel Figeys
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, 451 Smyth Road, Ottawa, ON, KIH 8M5, Canada.,Canadian Institute for Advanced Research, 661 University Ave, Toronto, ON, M5G 1M1, Canada.,The University of Ottawa and Shanghai Institute of Materia Medica Joint Research Center on Systems and Personalized Pharmacology, 451 Smyth Road, Ottawa, ON, KIH 8M5, Canada
| |
Collapse
|
23
|
Schiebenhoefer H, Van Den Bossche T, Fuchs S, Renard BY, Muth T, Martens L. Challenges and promise at the interface of metaproteomics and genomics: an overview of recent progress in metaproteogenomic data analysis. Expert Rev Proteomics 2019; 16:375-390. [PMID: 31002542 DOI: 10.1080/14789450.2019.1609944] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
INTRODUCTION The study of microbial communities based on the combined analysis of genomic and proteomic data - called metaproteogenomics - has gained increased research attention in recent years. This relatively young field aims to elucidate the functional and taxonomic interplay of proteins in microbiomes and its implications on human health and the environment. Areas covered: This article reviews bioinformatics methods and software tools dedicated to the analysis of data from metaproteomics and metaproteogenomics experiments. In particular, it focuses on the creation of tailored protein sequence databases, on the optimal use of database search algorithms including methods of error rate estimation, and finally on taxonomic and functional annotation of peptide and protein identifications. Expert opinion: Recently, various promising strategies and software tools have been proposed for handling typical data analysis issues in metaproteomics. However, severe challenges remain that are highlighted and discussed in this article; these include: (i) robust false-positive assessment of peptide and protein identifications, (ii) complex protein inference against a background of highly redundant data, (iii) taxonomic and functional post-processing of identification data, and finally, (iv) the assessment and provision of metrics and tools for quantitative analysis.
Collapse
Affiliation(s)
- Henning Schiebenhoefer
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Tim Van Den Bossche
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| | - Stephan Fuchs
- d FG13 Division of Nosocomial Pathogens and Antibiotic Resistances , Robert Koch Institute , Wernigerode , Germany
| | - Bernhard Y Renard
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Thilo Muth
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Lennart Martens
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| |
Collapse
|
24
|
Abstract
The microbiome is emerging as a prominent factor affecting human health, and its dysbiosis is associated with various diseases. Compositional profiling of microbiome is increasingly being supplemented with functional characterization. Metaproteomics is intrinsically focused on functional changes and therefore will be an important tool in those studies of the human microbiome. In the past decade, development of new experimental and bioinformatic approaches for metaproteomics has enabled large-scale human metaproteomic studies. However, challenges still exist, and there remains a lack of standardizations and guidelines for properly performing metaproteomic studies on human microbiome. Herein, we provide a perspective of recent developments, the challenges faced, and the future directions of metaproteomics and its applications. In addition, we propose a set of guidelines/recommendations for performing and reporting the results from metaproteomic experiments for the study of human microbiomes. We anticipate that these guidelines will be optimized further as more metaproteomic questions are raised and addressed, and metaproteomic applications are published, so that they are eventually recognized and applied in the field.
Collapse
Affiliation(s)
- Xu Zhang
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine , University of Ottawa , Ottawa , Ontario K1H 8M5 , Canada
| | - Daniel Figeys
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine , University of Ottawa , Ottawa , Ontario K1H 8M5 , Canada
| |
Collapse
|
25
|
Abstract
Urinary tract infections (UTIs) are one of the most common bacterial infections. Conventional approaches to diagnose these infections rely on microbial urine culture, urine sediment microscopy and basic molecular urinalysis tests, in combination with assessments of patient symptoms that are indicative of UTI. The last decade has seen a more widespread clinical use of standardized MALDI-TOF methods to identify UTI-causing microbial agents. Shotgun proteomics methods to determine the extent of inflammation and types of immune cell effectors in urine have not become part of routine clinical tests. However, such methods are useful to investigate UTI pathogenesis, identify difficult-to-culture pathogens and understand antimicrobial effector mechanisms. The present chapter describes these approaches in order to gain quantitative and qualitative insights into inflammation and immune responses in patients with UTI and simultaneously profile the causative agents. The methods are also applicable to examine catheter-associated UTIs and vaginal infections from urine samples. Protocols provided here pertain to direct analyses of clinical specimens including urine sediments and urethral catheter biofilms.
Collapse
|