1
|
Sequeira JC, Pereira V, Alves MM, Pereira MA, Rocha M, Salvador AF. MOSCA 2.0: A bioinformatics framework for metagenomics, metatranscriptomics and metaproteomics data analysis and visualization. Mol Ecol Resour 2024:e13996. [PMID: 39099161 DOI: 10.1111/1755-0998.13996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 06/14/2024] [Accepted: 07/15/2024] [Indexed: 08/06/2024]
Abstract
The analysis of meta-omics data requires the utilization of several bioinformatics tools and proficiency in informatics. The integration of multiple meta-omics data is even more challenging, and the outputs of existing bioinformatics solutions are not always easy to interpret. Here, we present a meta-omics bioinformatics pipeline, Meta-Omics Software for Community Analysis (MOSCA), which aims to overcome these limitations. MOSCA was initially developed for analysing metagenomics (MG) and metatranscriptomics (MT) data. Now, it also performs MG and metaproteomics (MP) integrated analysis, and MG/MT analysis was upgraded with an additional iterative binning step, metabolic pathways mapping, and several improvements regarding functional annotation and data visualization. MOSCA handles raw sequencing data and mass spectra and performs pre-processing, assembly, annotation, binning and differential gene/protein expression analysis. MOSCA shows taxonomic and functional analysis in large tables, performs metabolic pathways mapping, generates Krona plots and shows gene/protein expression results in heatmaps, improving omics data visualization. MOSCA is easily run from a single command while also providing a web interface (MOSGUITO). Relevant features include an extensive set of customization options, allowing tailored analyses to suit specific research objectives, and the ability to restart the pipeline from intermediary checkpoints using alternative configurations. Two case studies showcased MOSCA results, giving a complete view of the anaerobic microbial communities from anaerobic digesters and insights on the role of specific microorganisms. MOSCA represents a pivotal advancement in meta-omics research, offering an intuitive, comprehensive, and versatile solution for researchers seeking to unravel the intricate tapestry of microbial communities.
Collapse
Affiliation(s)
- João C Sequeira
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Vítor Pereira
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - M Madalena Alves
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - M Alcina Pereira
- Centre of Biological Engineering, University of Minho, Braga, Portugal
- LABBELS - Associate Laboratory, Braga/Guimarães, Portugal
| | - Miguel Rocha
- Centre of Biological Engineering, University of Minho, Braga, Portugal
- LABBELS - Associate Laboratory, Braga/Guimarães, Portugal
| | - Andreia F Salvador
- Centre of Biological Engineering, University of Minho, Braga, Portugal
- LABBELS - Associate Laboratory, Braga/Guimarães, Portugal
| |
Collapse
|
2
|
Kruk ME, Mehta S, Murray K, Higgins L, Do K, Johnson JE, Wagner R, Wendt CH, O’Connor JB, Harris JK, Laguna TA, Jagtap PD, Griffin TJ. An integrated metaproteomics workflow for studying host-microbe dynamics in bronchoalveolar lavage samples applied to cystic fibrosis disease. mSystems 2024; 9:e0092923. [PMID: 38934598 PMCID: PMC11264604 DOI: 10.1128/msystems.00929-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 05/13/2024] [Indexed: 06/28/2024] Open
Abstract
Airway microbiota are known to contribute to lung diseases, such as cystic fibrosis (CF), but their contributions to pathogenesis are still unclear. To improve our understanding of host-microbe interactions, we have developed an integrated analytical and bioinformatic mass spectrometry (MS)-based metaproteomics workflow to analyze clinical bronchoalveolar lavage (BAL) samples from people with airway disease. Proteins from BAL cellular pellets were processed and pooled together in groups categorized by disease status (CF vs. non-CF) and bacterial diversity, based on previously performed small subunit rRNA sequencing data. Proteins from each pooled sample group were digested and subjected to liquid chromatography tandem mass spectrometry (MS/MS). MS/MS spectra were matched to human and bacterial peptide sequences leveraging a bioinformatic workflow using a metagenomics-guided protein sequence database and rigorous evaluation. Label-free quantification revealed differentially abundant human peptides from proteins with known roles in CF, like neutrophil elastase and collagenase, and proteins with lesser-known roles in CF, including apolipoproteins. Differentially abundant bacterial peptides were identified from known CF pathogens (e.g., Pseudomonas), as well as other taxa with potentially novel roles in CF. We used this host-microbe peptide panel for targeted parallel-reaction monitoring validation, demonstrating for the first time an MS-based assay effective for quantifying host-microbe protein dynamics within BAL cells from individual CF patients. Our integrated bioinformatic and analytical workflow combining discovery, verification, and validation should prove useful for diverse studies to characterize microbial contributors in airway diseases. Furthermore, we describe a promising preliminary panel of differentially abundant microbe and host peptide sequences for further study as potential markers of host-microbe relationships in CF disease pathogenesis.IMPORTANCEIdentifying microbial pathogenic contributors and dysregulated human responses in airway disease, such as CF, is critical to understanding disease progression and developing more effective treatments. To this end, characterizing the proteins expressed from bacterial microbes and human host cells during disease progression can provide valuable new insights. We describe here a new method to confidently detect and monitor abundance changes of both microbe and host proteins from challenging BAL samples commonly collected from CF patients. Our method uses both state-of-the art mass spectrometry-based instrumentation to detect proteins present in these samples and customized bioinformatic software tools to analyze the data and characterize detected proteins and their association with CF. We demonstrate the use of this method to characterize microbe and host proteins from individual BAL samples, paving the way for a new approach to understand molecular contributors to CF and other diseases of the airway.
Collapse
Affiliation(s)
- Monica E. Kruk
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minneapolis, Minnesota, USA
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minneapolis, Minnesota, USA
| | - Kevin Murray
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minneapolis, Minnesota, USA
- Center for Metabolomics and Proteomics, University of Minnesota, Minneapolis, Minnesota, USA
| | - LeeAnn Higgins
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minneapolis, Minnesota, USA
- Center for Metabolomics and Proteomics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Katherine Do
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minneapolis, Minnesota, USA
| | - James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota, USA
| | - Reid Wagner
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota, USA
| | - Chris H. Wendt
- Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, Medical School, University of Minnesota, Minneapolis, Minnesota, USA
- Minneapolis VA Health Care System, Minneapolis, Minnesota, USA
| | - John B. O’Connor
- Department of Pediatrics, Division of Pulmonary and Sleep Medicine, Seattle Children’s Hospital, Seattle, Washington, USA
| | - J. Kirk Harris
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, Colorado, USA
| | - Theresa A. Laguna
- Department of Pediatrics, Division of Pulmonary and Sleep Medicine, Seattle Children’s Hospital, Seattle, Washington, USA
- Department of Pediatrics, University of Washington School of Medicine, Seattle, Washington, USA
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minneapolis, Minnesota, USA
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minneapolis, Minnesota, USA
| |
Collapse
|
3
|
WU E, QIAO L. [Microbial metaproteomics--From sample processing to data acquisition and analysis]. Se Pu 2024; 42:658-668. [PMID: 38966974 PMCID: PMC11224941 DOI: 10.3724/sp.j.1123.2024.02009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Indexed: 07/06/2024] Open
Abstract
Microorganisms are closely associated with human diseases and health. Understanding the composition and function of microbial communities requires extensive research. Metaproteomics has recently become an important method for throughout and in-depth study of microorganisms. However, major challenges in terms of sample processing, mass spectrometric data acquisition, and data analysis limit the development of metaproteomics owing to the complexity and high heterogeneity of microbial community samples. In metaproteomic analysis, optimizing the preprocessing method for different types of samples and adopting different microbial isolation, enrichment, extraction, and lysis schemes are often necessary. Similar to those for single-species proteomics, the mass spectrometric data acquisition modes for metaproteomics include data-dependent acquisition (DDA) and data-independent acquisition (DIA). DIA can collect comprehensive peptide information from a sample and holds great potential for future development. However, data analysis for DIA is challenged by the complexity of metaproteome samples, which hinders the deeper coverage of metaproteomes. The most important step in data analysis is the construction of a protein sequence database. The size and completeness of the database strongly influence not only the number of identifications, but also analyses at the species and functional levels. The current gold standard for metaproteome database construction is the metagenomic sequencing-based protein sequence database. A public database-filtering method based on an iterative database search has been proven to have strong practical value. The peptide-centric DIA data analysis method is a mainstream data analysis strategy. The development of deep learning and artificial intelligence will greatly promote the accuracy, coverage, and speed of metaproteomic analysis. In terms of downstream bioinformatics analysis, a series of annotation tools that can perform species annotation at the protein, peptide, and gene levels has been developed in recent years to determine the composition of microbial communities. The functional analysis of microbial communities is a unique feature of metaproteomics compared with other omics approaches. Metaproteomics has become an important component of the multi-omics analysis of microbial communities, and has great development potential in terms of depth of coverage, sensitivity of detection, and completeness of data analysis.
Collapse
|
4
|
Do K, Mehta S, Wagner R, Bhuming D, Rajczewski AT, Skubitz APN, Johnson JE, Griffin TJ, Jagtap PD. A novel clinical metaproteomics workflow enables bioinformatic analysis of host-microbe dynamics in disease. mSphere 2024; 9:e0079323. [PMID: 38780289 PMCID: PMC11332332 DOI: 10.1128/msphere.00793-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/17/2024] [Indexed: 05/25/2024] Open
Abstract
Clinical metaproteomics has the potential to offer insights into the host-microbiome interactions underlying diseases. However, the field faces challenges in characterizing microbial proteins found in clinical samples, usually present at low abundance relative to the host proteins. As a solution, we have developed an integrated workflow coupling mass spectrometry-based analysis with customized bioinformatic identification, quantification, and prioritization of microbial proteins, enabling targeted assay development to investigate host-microbe dynamics in disease. The bioinformatics tools are implemented in the Galaxy ecosystem, offering the development and dissemination of complex bioinformatic workflows. The modular workflow integrates MetaNovo (to generate a reduced protein database), SearchGUI/PeptideShaker and MaxQuant [to generate peptide-spectral matches (PSMs) and quantification], PepQuery2 (to verify the quality of PSMs), Unipept (for taxonomic and functional annotation), and MSstatsTMT (for statistical analysis). We have utilized this workflow in diverse clinical samples, from the characterization of nasopharyngeal swab samples to bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness via analysis of residual fluid from cervical swabs. The complete workflow, including training data and documentation, is available via the Galaxy Training Network, empowering non-expert researchers to utilize these powerful tools in their clinical studies. IMPORTANCE Clinical metaproteomics has immense potential to offer functional insights into the microbiome and its contributions to human disease. However, there are numerous challenges in the metaproteomic analysis of clinical samples, including handling of very large protein sequence databases for sensitive and accurate peptide and protein identification from mass spectrometry data, as well as taxonomic and functional annotation of quantified peptides and proteins to enable interpretation of results. To address these challenges, we have developed a novel clinical metaproteomics workflow that provides customized bioinformatic identification, verification, quantification, and taxonomic and functional annotation. This bioinformatic workflow is implemented in the Galaxy ecosystem and has been used to characterize diverse clinical sample types, such as nasopharyngeal swabs and bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness and availability for use by the research community via analysis of residual fluid from cervical swabs.
Collapse
Affiliation(s)
- Katherine Do
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Reid Wagner
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota, USA
| | - Dechen Bhuming
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Andrew T. Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Amy P. N. Skubitz
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, Minnesota, USA
| | - James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota, USA
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
5
|
Kotimoole CN, Ramya VK, Kaur P, Reiling N, Shandil RK, Narayanan S, Flo TH, Prasad TSK. Discovery of Species-Specific Proteotypic Peptides To Establish a Spectral Library Platform for Identification of Nontuberculosis Mycobacteria from Mass Spectrometry-Based Proteomics. J Proteome Res 2024; 23:1102-1117. [PMID: 38358903 DOI: 10.1021/acs.jproteome.3c00850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]
Abstract
Nontuberculous mycobacteria are opportunistic bacteria pulmonary and extra-pulmonary infections in humans that closely resemble Mycobacterium tuberculosis. Although genome sequencing strategies helped determine NTMs, a common assay for the detection of coinfection by multiple NTMs with M. tuberculosis in the primary attempt of diagnosis is still elusive. Such a lack of efficiency leads to delayed therapy, an inappropriate choice of drugs, drug resistance, disease complications, morbidity, and mortality. Although a high-resolution LC-MS/MS-based multiprotein panel assay can be developed due to its specificity and sensitivity, it needs a library of species-specific peptides as a platform. Toward this, we performed an analysis of proteomes of 9 NTM species with more than 20 million peptide spectrum matches gathered from 26 proteome data sets. Our metaproteomic analyses determined 48,172 species-specific proteotypic peptides across 9 NTMs. Notably, M. smegmatis (26,008), M. abscessus (12,442), M. vaccae (6487), M. fortuitum (1623), M. avium subsp. paratuberculosis (844), M. avium subsp. hominissuis (580), and M. marinum (112) displayed >100 species-specific proteotypic peptides. Finally, these peptides and corresponding spectra have been compiled into a spectral library, FASTA, and JSON formats for future reference and validation in clinical cohorts by the biomedical community for further translation.
Collapse
Affiliation(s)
- Chinmaya Narayana Kotimoole
- Center for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Vadageri Krishnamurthy Ramya
- Foundation for Neglected Disease Research, 20A, KIADB Industrial Area, Veerapura Village, Doddaballapur, Bengaluru 561203, India
| | - Parvinder Kaur
- Foundation for Neglected Disease Research, 20A, KIADB Industrial Area, Veerapura Village, Doddaballapur, Bengaluru 561203, India
| | - Norbert Reiling
- Microbial Interface Biology, Research Center Borstel, Leibniz Lung Center, Parkallee 22, D-23845 Borstel, Germany
- German Center for Infection Research (DZIF), Site Hamburg-Lübeck-Borstel-Riems, 23845 Borstel, Germany
| | - Radha Krishan Shandil
- Foundation for Neglected Disease Research, 20A, KIADB Industrial Area, Veerapura Village, Doddaballapur, Bengaluru 561203, India
| | - Shridhar Narayanan
- Foundation for Neglected Disease Research, 20A, KIADB Industrial Area, Veerapura Village, Doddaballapur, Bengaluru 561203, India
| | - Trude Helen Flo
- Centre of Molecular Inflammation Research, Department of Clinical and Molecular Medicine Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Kunnskapssenteret, Øya 424.04.035, Norway
| | | |
Collapse
|
6
|
Do K, Mehta S, Wagner R, Bhuming D, Rajczewski AT, Skubitz APN, Johnson JE, Griffin TJ, Jagtap PD. A novel clinical metaproteomics workflow enables bioinformatic analysis of host-microbe dynamics in disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.21.568121. [PMID: 38045370 PMCID: PMC10690215 DOI: 10.1101/2023.11.21.568121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Clinical metaproteomics has the potential to offer insights into the host-microbiome interactions underlying diseases. However, the field faces challenges in characterizing microbial proteins found in clinical samples, which are usually present at low abundance relative to the host proteins. As a solution, we have developed an integrated workflow coupling mass spectrometry-based analysis with customized bioinformatic identification, quantification and prioritization of microbial and host proteins, enabling targeted assay development to investigate host-microbe dynamics in disease. The bioinformatics tools are implemented in the Galaxy ecosystem, offering the development and dissemination of complex bioinformatic workflows. The modular workflow integrates MetaNovo (to generate a reduced protein database), SearchGUI/PeptideShaker and MaxQuant (to generate peptide-spectral matches (PSMs) and quantification), PepQuery2 (to verify the quality of PSMs), and Unipept and MSstatsTMT (for taxonomy and functional annotation). We have utilized this workflow in diverse clinical samples, from the characterization of nasopharyngeal swab samples to bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness via analysis of residual fluid from cervical swabs. The complete workflow, including training data and documentation, is available via the Galaxy Training Network, empowering non-expert researchers to utilize these powerful tools in their clinical studies.
Collapse
|
7
|
Jagtap PD, Hoopmann MR, Neely BA, Harvey A, Käll L, Perez-Riverol Y, Abajorga MK, Thomas JA, Weintraub ST, Palmblad M. The Association of Biomolecular Resource Facilities Proteome Informatics Research Group Study on Metaproteomics (iPRG-2020). J Biomol Tech 2023; 34:3fc1f5fe.a058bad4. [PMID: 37969874 PMCID: PMC10644979 DOI: 10.7171/3fc1f5fe.a058bad4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2023]
Abstract
Metaproteomics research using mass spectrometry data has emerged as a powerful strategy to understand the mechanisms underlying microbiome dynamics and the interaction of microbiomes with their immediate environment. Recent advances in sample preparation, data acquisition, and bioinformatics workflows have greatly contributed to progress in this field. In 2020, the Association of Biomolecular Research Facilities Proteome Informatics Research Group launched a collaborative study to assess the bioinformatics options available for metaproteomics research. The study was conducted in 2 phases. In the first phase, participants were provided with mass spectrometry data files and were asked to identify the taxonomic composition and relative taxa abundances in the samples without supplying any protein sequence databases. The most challenging question asked of the participants was to postulate the nature of any biological phenomena that may have taken place in the samples, such as interactions among taxonomic species. In the second phase, participants were provided a protein sequence database composed of the species present in the sample and were asked to answer the same set of questions as for phase 1. In this report, we summarize the data processing methods and tools used by participants, including database searching and software tools used for taxonomic and functional analysis. This study provides insights into the status of metaproteomics bioinformatics in participating laboratories and core facilities.
Collapse
Affiliation(s)
| | | | - Benjamin A. Neely
- National Institute of Standards and TechnologyCharlestonSouth Carolina29412USA
| | | | - Lukas Käll
- Royal Institute of Technology114 28StockholmSweden
| | - Yasset Perez-Riverol
- European Molecular Biology LaboratoryEuropean Bioinformatics InstituteWellcome Trust Genome CampusHinxtonCambridgeCB10 1SDUnited Kingdom
| | | | | | | | - Magnus Palmblad
- Center for Proteomics and MetabolomicsLeiden University Medical Center2000 RC LeidenThe Netherlands
| |
Collapse
|