1
|
Do K, Mehta S, Wagner R, Bhuming D, Rajczewski AT, Skubitz APN, Johnson JE, Griffin TJ, Jagtap PD. A novel clinical metaproteomics workflow enables bioinformatic analysis of host-microbe dynamics in disease. mSphere 2024; 9:e0079323. [PMID: 38780289 DOI: 10.1128/msphere.00793-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/17/2024] [Indexed: 05/25/2024] Open
Abstract
Clinical metaproteomics has the potential to offer insights into the host-microbiome interactions underlying diseases. However, the field faces challenges in characterizing microbial proteins found in clinical samples, usually present at low abundance relative to the host proteins. As a solution, we have developed an integrated workflow coupling mass spectrometry-based analysis with customized bioinformatic identification, quantification, and prioritization of microbial proteins, enabling targeted assay development to investigate host-microbe dynamics in disease. The bioinformatics tools are implemented in the Galaxy ecosystem, offering the development and dissemination of complex bioinformatic workflows. The modular workflow integrates MetaNovo (to generate a reduced protein database), SearchGUI/PeptideShaker and MaxQuant [to generate peptide-spectral matches (PSMs) and quantification], PepQuery2 (to verify the quality of PSMs), Unipept (for taxonomic and functional annotation), and MSstatsTMT (for statistical analysis). We have utilized this workflow in diverse clinical samples, from the characterization of nasopharyngeal swab samples to bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness via analysis of residual fluid from cervical swabs. The complete workflow, including training data and documentation, is available via the Galaxy Training Network, empowering non-expert researchers to utilize these powerful tools in their clinical studies. IMPORTANCE Clinical metaproteomics has immense potential to offer functional insights into the microbiome and its contributions to human disease. However, there are numerous challenges in the metaproteomic analysis of clinical samples, including handling of very large protein sequence databases for sensitive and accurate peptide and protein identification from mass spectrometry data, as well as taxonomic and functional annotation of quantified peptides and proteins to enable interpretation of results. To address these challenges, we have developed a novel clinical metaproteomics workflow that provides customized bioinformatic identification, verification, quantification, and taxonomic and functional annotation. This bioinformatic workflow is implemented in the Galaxy ecosystem and has been used to characterize diverse clinical sample types, such as nasopharyngeal swabs and bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness and availability for use by the research community via analysis of residual fluid from cervical swabs.
Collapse
Affiliation(s)
- Katherine Do
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Reid Wagner
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota, USA
| | - Dechen Bhuming
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Andrew T Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Amy P N Skubitz
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, Minnesota, USA
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
2
|
Holstein T, Muth T. Bioinformatic Workflows for Metaproteomics. Methods Mol Biol 2024; 2820:187-213. [PMID: 38941024 DOI: 10.1007/978-1-0716-3910-8_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]
Abstract
The strong influence of microbiomes on areas such as ecology and human health has become widely recognized in the past years. Accordingly, various techniques for the investigation of the composition and function of microbial community samples have been developed. Metaproteomics, the comprehensive analysis of the proteins from microbial communities, allows for the investigation of not only the taxonomy but also the functional and quantitative composition of microbiome samples. Due to the complexity of the investigated communities, methods developed for single organism proteomics cannot be readily applied to metaproteomic samples. For this purpose, methods specifically tailored to metaproteomics are required. In this work, a detailed overview of current bioinformatic solutions and protocols in metaproteomics is given. After an introduction to the proteomic database search, the metaproteomic post-processing steps are explained in detail. Ten specific bioinformatic software solutions are focused on, covering various steps including database-driven identification and quantification as well as taxonomic and functional assignment.
Collapse
Affiliation(s)
- Tanja Holstein
- Section eScience (S.3), Federal Institute for Materials Research and Testing, Berlin, Germany
- VIB-UGent Center for Medical Biotechnology, VIB and Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Data Competence Center, Robert Koch Institute, Berlin, Deutschland
| | - Thilo Muth
- Section eScience (S.3), Federal Institute for Materials Research and Testing, Berlin, Germany.
- Data Competence Center, Robert Koch Institute, Berlin, Deutschland.
| |
Collapse
|
3
|
Do K, Mehta S, Wagner R, Bhuming D, Rajczewski AT, Skubitz APN, Johnson JE, Griffin TJ, Jagtap PD. A novel clinical metaproteomics workflow enables bioinformatic analysis of host-microbe dynamics in disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.21.568121. [PMID: 38045370 PMCID: PMC10690215 DOI: 10.1101/2023.11.21.568121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Clinical metaproteomics has the potential to offer insights into the host-microbiome interactions underlying diseases. However, the field faces challenges in characterizing microbial proteins found in clinical samples, which are usually present at low abundance relative to the host proteins. As a solution, we have developed an integrated workflow coupling mass spectrometry-based analysis with customized bioinformatic identification, quantification and prioritization of microbial and host proteins, enabling targeted assay development to investigate host-microbe dynamics in disease. The bioinformatics tools are implemented in the Galaxy ecosystem, offering the development and dissemination of complex bioinformatic workflows. The modular workflow integrates MetaNovo (to generate a reduced protein database), SearchGUI/PeptideShaker and MaxQuant (to generate peptide-spectral matches (PSMs) and quantification), PepQuery2 (to verify the quality of PSMs), and Unipept and MSstatsTMT (for taxonomy and functional annotation). We have utilized this workflow in diverse clinical samples, from the characterization of nasopharyngeal swab samples to bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness via analysis of residual fluid from cervical swabs. The complete workflow, including training data and documentation, is available via the Galaxy Training Network, empowering non-expert researchers to utilize these powerful tools in their clinical studies.
Collapse
|
4
|
Neely BA, Ellisor DL, Davis WC. Proteomics as a Metrological Tool to Evaluate Genome Annotation Accuracy Following De Novo Genome Assembly: A Case Study Using the Atlantic Bottlenose Dolphin ( Tursiops truncatus). Genes (Basel) 2023; 14:1696. [PMID: 37761836 PMCID: PMC10531373 DOI: 10.3390/genes14091696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 08/22/2023] [Accepted: 08/23/2023] [Indexed: 09/29/2023] Open
Abstract
The last decade has witnessed dramatic improvements in whole-genome sequencing capabilities coupled to drastically decreased costs, leading to an inundation of high-quality de novo genomes. For this reason, the continued development of genome quality metrics is imperative. Using the 2016 Atlantic bottlenose dolphin NCBI RefSeq annotation and mass spectrometry-based proteomic analysis of six tissues, we confirmed 10,402 proteins from 4711 protein groups, constituting nearly one-third of the possible predicted proteins. Since the identification of larger proteins with more identified peptides implies reduced database fragmentation and improved gene annotation accuracy, we propose the metric NP10, which attempts to capture this quality improvement. The NP10 metric is calculated by first stratifying proteomic results by identifying the top decile (or 10th 10-quantile) of identified proteins based on the number of peptides per protein and then returns the median molecular weight of the resulting proteins. When using the 2016 versus 2012 Tursiops truncatus genome annotation to search this proteomic data set, there was a 21% improvement in NP10. This metric was further demonstrated by using a publicly available proteomic data set to compare human genome annotations from 2004, 2013 and 2016, which showed a 33% improvement in NP10. These results demonstrate that proteomics may be a useful metrological tool to benchmark genome accuracy, though there is a need for reference proteomic datasets across species to facilitate the evaluation of new de novo and existing genome.
Collapse
Affiliation(s)
- Benjamin A. Neely
- National Institute of Standards and Technology, NIST Charleston, 331 Fort Johnson Road, Charleston, SC 29412, USA; (D.L.E.); (W.C.D.)
| | | | | |
Collapse
|
5
|
Mehta S, Bernt M, Chambers M, Fahrner M, Föll MC, Gruening B, Horro C, Johnson JE, Loux V, Rajczewski AT, Schilling O, Vandenbrouck Y, Gustafsson OJR, Thang WCM, Hyde C, Price G, Jagtap PD, Griffin TJ. A Galaxy of informatics resources for MS-based proteomics. Expert Rev Proteomics 2023; 20:251-266. [PMID: 37787106 DOI: 10.1080/14789450.2023.2265062] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/06/2023] [Indexed: 10/04/2023]
Abstract
INTRODUCTION Continuous advances in mass spectrometry (MS) technologies have enabled deeper and more reproducible proteome characterization and a better understanding of biological systems when integrated with other 'omics data. Bioinformatic resources meeting the analysis requirements of increasingly complex MS-based proteomic data and associated multi-omic data are critically needed. These requirements included availability of software that would span diverse types of analyses, scalability for large-scale, compute-intensive applications, and mechanisms to ease adoption of the software. AREAS COVERED The Galaxy ecosystem meets these requirements by offering a multitude of open-source tools for MS-based proteomics analyses and applications, all in an adaptable, scalable, and accessible computing environment. A thriving global community maintains these software and associated training resources to empower researcher-driven analyses. EXPERT OPINION The community-supported Galaxy ecosystem remains a crucial contributor to basic biological and clinical studies using MS-based proteomics. In addition to the current status of Galaxy-based resources, we describe ongoing developments for meeting emerging challenges in MS-based proteomic informatics. We hope this review will catalyze increased use of Galaxy by researchers employing MS-based proteomics and inspire software developers to join the community and implement new tools, workflows, and associated training content that will add further value to this already rich ecosystem.
Collapse
Affiliation(s)
- Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Matthias Bernt
- Helmholtz Centre for Environmental Research - UFZ, Department Computational Biology, Leipzig, Germany
| | | | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Melanie Christine Föll
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Bjoern Gruening
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Carlos Horro
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
| | - Valentin Loux
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
- Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, Jouy-en-Josas, France
| | - Andrew T Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | | | - W C Mike Thang
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Cameron Hyde
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Sippy Downs, University of the Sunshine Coast, Australia
| | - Gareth Price
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
6
|
A Practical Bioinformatics Workflow for Routine Analysis of Bacterial WGS Data. Microorganisms 2022; 10:microorganisms10122364. [PMID: 36557617 PMCID: PMC9781918 DOI: 10.3390/microorganisms10122364] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 11/18/2022] [Accepted: 11/25/2022] [Indexed: 12/03/2022] Open
Abstract
The use of whole-genome sequencing (WGS) for bacterial characterisation has increased substantially in the last decade. Its high throughput and decreasing cost have led to significant changes in outbreak investigations and surveillance of a wide variety of microbial pathogens. Despite the innumerable advantages of WGS, several drawbacks concerning data analysis and management, as well as a general lack of standardisation, hinder its integration in routine use. In this work, a bioinformatics workflow for (Illumina) WGS data is presented for bacterial characterisation including genome annotation, species identification, serotype prediction, antimicrobial resistance prediction, virulence-related genes and plasmid replicon detection, core-genome-based or single nucleotide polymorphism (SNP)-based phylogenetic clustering and sequence typing. Workflow was tested using a collection of 22 in-house sequences of Salmonella enterica isolates belonging to a local outbreak, coupled with a collection of 182 Salmonella genomes publicly available. No errors were reported during the execution period, and all genomes were analysed. The bioinformatics workflow can be tailored to other pathogens of interest and is freely available for academic and non-profit use as an uploadable file to the Galaxy platform.
Collapse
|
7
|
Pinter N, Glätzer D, Fahrner M, Fröhlich K, Johnson J, Grüning BA, Warscheid B, Drepper F, Schilling O, Föll MC. MaxQuant and MSstats in Galaxy Enable Reproducible Cloud-Based Analysis of Quantitative Proteomics Experiments for Everyone. J Proteome Res 2022; 21:1558-1565. [PMID: 35503992 DOI: 10.1021/acs.jproteome.2c00051] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Quantitative mass spectrometry-based proteomics has become a high-throughput technology for the identification and quantification of thousands of proteins in complex biological samples. Two frequently used tools, MaxQuant and MSstats, allow for the analysis of raw data and finding proteins with differential abundance between conditions of interest. To enable accessible and reproducible quantitative proteomics analyses in a cloud environment, we have integrated MaxQuant (including TMTpro 16/18plex), Proteomics Quality Control (PTXQC), MSstats, and MSstatsTMT into the open-source Galaxy framework. This enables the web-based analysis of label-free and isobaric labeling proteomics experiments via Galaxy's graphical user interface on public clouds. MaxQuant and MSstats in Galaxy can be applied in conjunction with thousands of existing Galaxy tools and integrated into standardized, sharable workflows. Galaxy tracks all metadata and intermediate results in analysis histories, which can be shared privately for collaborations or publicly, allowing full reproducibility and transparency of published analysis. To further increase accessibility, we provide detailed hands-on training materials. The integration of MaxQuant and MSstats into the Galaxy framework enables their usage in a reproducible way on accessible large computational infrastructures, hence realizing the foundation for high-throughput proteomics data science for everyone.
Collapse
Affiliation(s)
- Niko Pinter
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany
| | - Damian Glätzer
- Biochemistry and Functional Proteomics, Institute of Biology II, Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany
| | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany
| | - Klemens Fröhlich
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany.,Spemann Graduate School of Biology and Medicine (SGBM), Albert-Ludwigs-University Freiburg, 79104 Freiburg, Germany
| | - James Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | | | - Bettina Warscheid
- Biochemistry and Functional Proteomics, Institute of Biology II, Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany.,Faculty of Chemistry and Pharmacy, Department of Biochemistry, Julius Maximilian University of Würzburg, 97074 Würzburg, Germany
| | - Friedel Drepper
- Biochemistry and Functional Proteomics, Institute of Biology II, Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), 79106 Freiburg, Germany
| | - Melanie Christine Föll
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts 02115, United States
| |
Collapse
|
8
|
Rajczewski AT, Han Q, Mehta S, Kumar P, Jagtap PD, Knutson CG, Fox JG, Tretyakova NY, Griffin TJ. Quantitative Proteogenomic Characterization of Inflamed Murine Colon Tissue Using an Integrated Discovery, Verification, and Validation Proteogenomic Workflow. Proteomes 2022; 10:proteomes10020011. [PMID: 35466239 PMCID: PMC9036229 DOI: 10.3390/proteomes10020011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 03/27/2022] [Accepted: 04/07/2022] [Indexed: 11/24/2022] Open
Abstract
Chronic inflammation of the colon causes genomic and/or transcriptomic events, which can lead to expression of non-canonical protein sequences contributing to oncogenesis. To better understand these mechanisms, Rag2−/−Il10−/− mice were infected with Helicobacter hepaticus to induce chronic inflammation of the cecum and the colon. Transcriptomic data from harvested proximal colon samples were used to generate a customized FASTA database containing non-canonical protein sequences. Using a proteogenomic approach, mass spectrometry data for proximal colon proteins were searched against this custom FASTA database using the Galaxy for Proteomics (Galaxy-P) platform. In addition to the increased abundance in inflammatory response proteins, we also discovered several non-canonical peptide sequences derived from unique proteoforms. We confirmed the veracity of these novel sequences using an automated bioinformatics verification workflow with targeted MS-based assays for peptide validation. Our bioinformatics discovery workflow identified 235 putative non-canonical peptide sequences, of which 58 were verified with high confidence and 39 were validated in targeted proteomics assays. This study provides insights into challenges faced when identifying non-canonical peptides using a proteogenomics approach and demonstrates an integrated workflow addressing these challenges. Our bioinformatic discovery and verification workflow is publicly available and accessible via the Galaxy platform and should be valuable in non-canonical peptide identification using proteogenomics.
Collapse
Affiliation(s)
- Andrew T. Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
| | - Qiyuan Han
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
| | - Charles G. Knutson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; (C.G.K.); (J.G.F.)
| | - James G. Fox
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; (C.G.K.); (J.G.F.)
| | - Natalia Y. Tretyakova
- Department of Medicinal Chemistry, the Masonic Cancer Center, University of Minnesota, Minneapolis, MN 55455, USA;
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
- Correspondence:
| |
Collapse
|
9
|
Parmar BS, Peeters MKR, Boonen K, Clark EC, Baggerman G, Menschaert G, Temmerman L. Identification of Non-Canonical Translation Products in C. elegans Using Tandem Mass Spectrometry. Front Genet 2021; 12:728900. [PMID: 34759956 PMCID: PMC8575065 DOI: 10.3389/fgene.2021.728900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 09/16/2021] [Indexed: 11/22/2022] Open
Abstract
Transcriptome and ribosome sequencing have revealed the existence of many non-canonical transcripts, mainly containing splice variants, ncRNA, sORFs and altORFs. However, identification and characterization of products that may be translated out of these remains a challenge. Addressing this, we here report on 552 non-canonical proteins and splice variants in the model organism C. elegans using tandem mass spectrometry. Aided by sequencing-based prediction, we generated a custom proteome database tailored to search for non-canonical translation products of C. elegans. Using this database, we mined available mass spectrometric resources of C. elegans, from which 51 novel, non-canonical proteins could be identified. Furthermore, we utilized diverse proteomic and peptidomic strategies to detect 40 novel non-canonical proteins in C. elegans by LC-TIMS-MS/MS, of which 6 were common with our meta-analysis of existing resources. Together, this permits us to provide a resource with detailed annotation of 467 splice variants and 85 novel proteins mapped onto UTRs, non-coding regions and alternative open reading frames of the C. elegans genome.
Collapse
Affiliation(s)
- Bhavesh S. Parmar
- Animal Physiology and Neurobiology, University of Leuven (KU Leuven), Leuven, Belgium
| | - Marlies K. R. Peeters
- Laboratory of Bioinformatics and Computational Genomics (BioBix), Department of Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Kurt Boonen
- Centre for Proteomics (CFP), University of Antwerp, Antwerp, Belgium
| | - Ellie C. Clark
- Animal Physiology and Neurobiology, University of Leuven (KU Leuven), Leuven, Belgium
| | - Geert Baggerman
- Centre for Proteomics (CFP), University of Antwerp, Antwerp, Belgium
| | - Gerben Menschaert
- Laboratory of Bioinformatics and Computational Genomics (BioBix), Department of Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Liesbet Temmerman
- Animal Physiology and Neurobiology, University of Leuven (KU Leuven), Leuven, Belgium
| |
Collapse
|
10
|
Vitorino R, Choudhury M, Guedes S, Ferreira R, Thongboonkerd V, Sharma L, Amado F, Srivastava S. Peptidomics and proteogenomics: background, challenges and future needs. Expert Rev Proteomics 2021; 18:643-659. [PMID: 34517741 DOI: 10.1080/14789450.2021.1980388] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
INTRODUCTION With available genomic data and related information, it is becoming possible to better highlight mutations or genomic alterations associated with a particular disease or disorder. The advent of high-throughput sequencing technologies has greatly advanced diagnostics, prognostics, and drug development. AREAS COVERED Peptidomics and proteogenomics are the two post-genomic technologies that enable the simultaneous study of peptides and proteins/transcripts/genes. Both technologies add a remarkably large amount of data to the pool of information on various peptides associated with gene mutations or genome remodeling. Literature search was performed in the PubMed database and is up to date. EXPERT OPINION This article lists various techniques used for peptidomic and proteogenomic analyses. It also explains various bioinformatics workflows developed to understand differentially expressed peptides/proteins and their role in disease pathogenesis. Their role in deciphering disease pathways, cancer research, and biomarker discovery using biofluids is highlighted. Finally, the challenges and future requirements to overcome the current limitations for their effective clinical use are also discussed.
Collapse
Affiliation(s)
- Rui Vitorino
- Faculdade de Medicina da Universidade do Porto, Porto, Portugal.,iBiMED, Department of Medical Sciences, University of Aveiro, Aveiro, Portugal.,Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Manisha Choudhury
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Powai, India
| | - Sofia Guedes
- Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Rita Ferreira
- Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Visith Thongboonkerd
- Medical Proteomics Unit, Office for Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | | | - Francisco Amado
- Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Sanjeeva Srivastava
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Powai, India
| |
Collapse
|
11
|
Tariq MU, Haseeb M, Aledhari M, Razzak R, Parizi RM, Saeed F. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 9:5497-5516. [PMID: 33537181 PMCID: PMC7853650 DOI: 10.1109/access.2020.3047588] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process a small-scale dataset of one million spectra against a genome of 3 GB. In this article, we provide an up-to-date review of tools that can analyze proteogenomic datasets, providing a critical analysis of the techniques' relative merits and potential pitfalls. We also point out potential bottlenecks and recommendations that can be incorporated in the future design of these workflows to ensure scalability with the increasing size of proteogenomic data. Lastly, we make a case of how high-performance computing (HPC) solutions may be the best bet to ensure the scalability of future big data proteogenomic data analysis.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Muhammad Haseeb
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Mohammed Aledhari
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Rehma Razzak
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Reza M Parizi
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
12
|
Vitorino R, Guedes S, Trindade F, Correia I, Moura G, Carvalho P, Santos MAS, Amado F. De novo sequencing of proteins by mass spectrometry. Expert Rev Proteomics 2020; 17:595-607. [PMID: 33016158 DOI: 10.1080/14789450.2020.1831387] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
INTRODUCTION Proteins are crucial for every cellular activity and unraveling their sequence and structure is a crucial step to fully understand their biology. Early methods of protein sequencing were mainly based on the use of enzymatic or chemical degradation of peptide chains. With the completion of the human genome project and with the expansion of the information available for each protein, various databases containing this sequence information were formed. AREAS COVERED De novo protein sequencing, shotgun proteomics and other mass-spectrometric techniques, along with the various software are currently available for proteogenomic analysis. Emphasis is placed on the methods for de novo sequencing, together with potential and shortcomings using databases for interpretation of protein sequence data. EXPERT OPINION As mass-spectrometry sequencing performance is improving with better software and hardware optimizations, combined with user-friendly interfaces, de-novo protein sequencing becomes imperative in shotgun proteomic studies. Issues regarding unknown or mutated peptide sequences, as well as, unexpected post-translational modifications (PTMs) and their identification through false discovery rate searches using the target/decoy strategy need to be addressed. Ideally, it should become integrated in standard proteomic workflows as an add-on to conventional database search engines, which then would be able to provide improved identification.
Collapse
Affiliation(s)
- Rui Vitorino
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal.,iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal.,Unidade De Investigação Cardiovascular, Departamento De Cirurgia E Fisiologia, Faculdade De Medicina, Universidade Do Porto , Porto, Portugal
| | - Sofia Guedes
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal
| | - Fabio Trindade
- Unidade De Investigação Cardiovascular, Departamento De Cirurgia E Fisiologia, Faculdade De Medicina, Universidade Do Porto , Porto, Portugal
| | - Inês Correia
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Gabriela Moura
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Paulo Carvalho
- Laboratory for Structural and Computational Proteomics, Carlos Chagas Institute, FIOCRUZ, Laboratory for Proteomics and Protein Engineering , Brazil
| | - Manuel A S Santos
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Francisco Amado
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal
| |
Collapse
|
13
|
Cesnik AJ, Miller RM, Ibrahim K, Lu L, Millikin RJ, Shortreed MR, Frey BL, Smith LM. Spritz: A Proteogenomic Database Engine. J Proteome Res 2020; 20:1826-1834. [PMID: 32967423 DOI: 10.1021/acs.jproteome.0c00407] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Proteoforms are the workhorses of the cell, and subtle differences between their amino acid sequences or post-translational modifications (PTMs) can change their biological function. To most effectively identify and quantify proteoforms in genetically diverse samples by mass spectrometry (MS), it is advantageous to search the MS data against a sample-specific protein database that is tailored to the sample being analyzed, in that it contains the correct amino acid sequences and relevant PTMs for that sample. To this end, we have developed Spritz (https://smith-chem-wisc.github.io/Spritz/), an open-source software tool for generating protein databases annotated with sequence variations and PTMs. We provide a simple graphical user interface for Windows and scripts that can be run on any operating system. Spritz automatically sets up and executes approximately 20 tools, which enable the construction of a proteogenomic database from only raw RNA sequencing data. Sequence variations that are discovered in RNA sequencing data upon comparison to the Ensembl reference genome are annotated on proteins in these databases, and PTM annotations are transferred from UniProt. Modifications can also be discovered and added to the database using bottom-up mass spectrometry data and global PTM discovery in MetaMorpheus. We demonstrate that such sample-specific databases allow the identification of variant peptides, modified variant peptides, and variant proteoforms by searching bottom-up and top-down proteomic data from the Jurkat human T lymphocyte cell line and demonstrate the identification of phosphorylated variant sites with phosphoproteomic data from the U2OS human osteosarcoma cell line.
Collapse
Affiliation(s)
- Anthony J Cesnik
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States.,Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology, Stockholm 17121, Sweden.,Department of Genetics, Stanford University, Stanford, California 94305, United States.,Chan Zuckerberg Biohub, San Francisco, California 94158, United States
| | - Rachel M Miller
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Khairina Ibrahim
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lei Lu
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Robert J Millikin
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Brian L Frey
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
14
|
Shukla N, Siva N, Malik B, Suravajhala P. Current Challenges and Implications of Proteogenomic Approaches in Prostate Cancer. Curr Top Med Chem 2020; 20:1968-1980. [PMID: 32703135 DOI: 10.2174/1568026620666200722112450] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 05/30/2020] [Accepted: 06/29/2020] [Indexed: 12/16/2022]
Abstract
In the recent past, next-generation sequencing (NGS) approaches have heralded the omics era. With NGS data burgeoning, there arose a need to disseminate the omic data better. Proteogenomics has been vividly used for characterising the functions of candidate genes and is applied in ascertaining various diseased phenotypes, including cancers. However, not much is known about the role and application of proteogenomics, especially Prostate Cancer (PCa). In this review, we outline the need for proteogenomic approaches, their applications and their role in PCa.
Collapse
Affiliation(s)
- Nidhi Shukla
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research, Statue Circle, Jaipur 302001, RJ, India.,Department of Chemistry, School of Basic Sciences, Manipal University Jaipur, Jaipur, India
| | - Narmadhaa Siva
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research, Statue Circle, Jaipur 302001, RJ, India
| | - Babita Malik
- Department of Chemistry, School of Basic Sciences, Manipal University Jaipur, Jaipur, India
| | - Prashanth Suravajhala
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research, Statue Circle, Jaipur 302001, RJ, India
| |
Collapse
|
15
|
Precursor Intensity-Based Label-Free Quantification Software Tools for Proteomic and Multi-Omic Analysis within the Galaxy Platform. Proteomes 2020; 8:proteomes8030015. [PMID: 32650610 PMCID: PMC7563855 DOI: 10.3390/proteomes8030015] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 07/06/2020] [Accepted: 07/07/2020] [Indexed: 01/15/2023] Open
Abstract
For mass spectrometry-based peptide and protein quantification, label-free quantification (LFQ) based on precursor mass peak (MS1) intensities is considered reliable due to its dynamic range, reproducibility, and accuracy. LFQ enables peptide-level quantitation, which is useful in proteomics (analyzing peptides carrying post-translational modifications) and multi-omics studies such as metaproteomics (analyzing taxon-specific microbial peptides) and proteogenomics (analyzing non-canonical sequences). Bioinformatics workflows accessible via the Galaxy platform have proven useful for analysis of such complex multi-omic studies. However, workflows within the Galaxy platform have lacked well-tested LFQ tools. In this study, we have evaluated moFF and FlashLFQ, two open-source LFQ tools, and implemented them within the Galaxy platform to offer access and use via established workflows. Through rigorous testing and communication with the tool developers, we have optimized the performance of each tool. Software features evaluated include: (a) match-between-runs (MBR); (b) using multiple file-formats as input for improved quantification; (c) use of containers and/or conda packages; (d) parameters needed for analyzing large datasets; and (e) optimization and validation of software performance. This work establishes a process for software implementation, optimization, and validation, and offers access to two robust software tools for LFQ-based analysis within the Galaxy platform.
Collapse
|
16
|
Aggarwal S, Kumar A, Jamwal S, Midha MK, Talukdar NC, Yadav AK. HyperQuant-A Computational Pipeline for Higher Order Multiplexed Quantitative Proteomics. ACS OMEGA 2020; 5:10857-10867. [PMID: 32455206 PMCID: PMC7240821 DOI: 10.1021/acsomega.0c00515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 04/09/2020] [Indexed: 06/11/2023]
Abstract
Quantitative proteomics has evolved considerably over the last decade with the advent of higher order multiplexing (HOM) techniques. With the development of methods such as-multitagging, cPILOT, hyperplexing, BONPlex, and MITNCAT, the HOM technique is rapidly taking the center stage in multiplexed quantitative proteomics. These studies combined MS1 and MS2 labels in a single experiment enabling higher sample throughput. While HOM is highly promising, the computational analysis is still a big challenge, as the available tools cannot harness its power completely. We have developed a new quantitative pipeline, HyperQuant to aid in accurately quantitating complex HOM data. The pipeline uses identification results from either MaxQuant or any other search engine and quantitation results from QuantWizIQ. The Mapper and Combiner modules of HyperQuant allow facile integration of the labeled data, along with peptide spectrum match (PSM) intensity/ratio integration for proteins, respectively, for each PSM label combination. This also includes appropriate combination of replicates/fractions before summarizing the protein intensity/ratio, leading to robust quantitation. To the best of our knowledge, this is the first tool for the quantitation of HOM data with flexibility for any combination of MS1 and MS2 labels. We demonstrate its utility in analyzing two 18-plex data sets from the hyperplexing and the BONplex studies. The tool is open source and freely available for noncommercial use. HyperQuant is a highly valuable tool that will help in advancing the field of multiplexed quantitative proteomics.
Collapse
Affiliation(s)
- Suruchi Aggarwal
- Translational
Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd Milestone, Faridabad−Gurgaon
Expressway, Faridabad 121001, Haryana, India
- Division
of Life Sciences, Institute of Advanced
Study in Science and Technology, Vigyan Path, Paschim Boragaon, Garchuk, Guwahati, Assam 781035, India
- Department
of Molecular Biology and Biotechnology, Cotton University, Panbazar, Guwahati, Assam 781001, India
| | - Ajay Kumar
- Translational
Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd Milestone, Faridabad−Gurgaon
Expressway, Faridabad 121001, Haryana, India
| | - Shilpa Jamwal
- Translational
Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd Milestone, Faridabad−Gurgaon
Expressway, Faridabad 121001, Haryana, India
| | - Mukul Kumar Midha
- Translational
Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd Milestone, Faridabad−Gurgaon
Expressway, Faridabad 121001, Haryana, India
| | - Narayan Chandra Talukdar
- Division
of Life Sciences, Institute of Advanced
Study in Science and Technology, Vigyan Path, Paschim Boragaon, Garchuk, Guwahati, Assam 781035, India
- Department
of Molecular Biology and Biotechnology, Cotton University, Panbazar, Guwahati, Assam 781001, India
| | - Amit Kumar Yadav
- Translational
Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd Milestone, Faridabad−Gurgaon
Expressway, Faridabad 121001, Haryana, India
| |
Collapse
|
17
|
Kuhring M, Doellinger J, Nitsche A, Muth T, Renard BY. TaxIt: An Iterative Computational Pipeline for Untargeted Strain-Level Identification Using MS/MS Spectra from Pathogenic Single-Organism Samples. J Proteome Res 2020; 19:2501-2510. [PMID: 32362126 DOI: 10.1021/acs.jproteome.9b00714] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Untargeted accurate strain-level classification of a priori unidentified organisms using tandem mass spectrometry is a challenging task. Reference databases often lack taxonomic depth, limiting peptide assignments to the species level. However, the extension with detailed strain information increases runtime and decreases statistical power. In addition, larger databases contain a higher number of similar proteomes. We present TaxIt, an iterative workflow to address the increasing search space required for MS/MS-based strain-level classification of samples with unknown taxonomic origin. TaxIt first applies reference sequence data for initial identification of species candidates, followed by automated acquisition of relevant strain sequences for low level classification. Furthermore, proteome similarities resulting in ambiguous taxonomic assignments are addressed with an abundance weighting strategy to increase the confidence in candidate taxa. For benchmarking the performance of our method, we apply our iterative workflow on several samples of bacterial and viral origin. In comparison to noniterative approaches using unique peptides or advanced abundance correction, TaxIt identifies microbial strains correctly in all examples presented (with one tie), thereby demonstrating the potential for untargeted and deeper taxonomic classification. TaxIt makes extensive use of public, unrestricted, and continuously growing sequence resources such as the NCBI databases and is available under open-source BSD license at https://gitlab.com/rki_bioinformatics/TaxIt.
Collapse
Affiliation(s)
- Mathias Kuhring
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany.,Core Unit Bioinformatics, Berlin Institute of Health (BIH), 10178 Berlin, Germany.,Berlin Institute of Health Metabolomics Platform, Berlin Institute of Health (BIH), 10178 Berlin, Germany.,Max Delbrück Center (MDC) for Molecular Medicine, 13125 Berlin, Germany
| | - Joerg Doellinger
- Centre for Biological Threats and Special Pathogens, Proteomics and Spectroscopy (ZBS 6), Robert Koch Institute, 13353 Berlin, Germany.,Centre for Biological Threats and Special Pathogens, Highly Pathogenic Viruses (ZBS 1), Robert Koch Institute, 13353 Berlin, Germany
| | - Andreas Nitsche
- Centre for Biological Threats and Special Pathogens, Highly Pathogenic Viruses (ZBS 1), Robert Koch Institute, 13353 Berlin, Germany
| | - Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany.,eScience Division (S.3), Federal Institute for Materials Research and Testing, 12489 Berlin, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany.,Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany
| |
Collapse
|
18
|
Kumar P, Johnson JE, Easterly C, Mehta S, Sajulga R, Nunn B, Jagtap PD, Griffin TJ. A Sectioning and Database Enrichment Approach for Improved Peptide Spectrum Matching in Large, Genome-Guided Protein Sequence Databases. J Proteome Res 2020; 19:2772-2785. [DOI: 10.1021/acs.jproteome.0c00260] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Praveen Kumar
- Bioinformatics and Computational Biology, University of Minnesota−Rochester, Rochester, Minnesota 55904, United States
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Caleb Easterly
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Subina Mehta
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Ray Sajulga
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Brook Nunn
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Pratik D. Jagtap
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Timothy J. Griffin
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| |
Collapse
|
19
|
Blanco-Míguez A, Fdez-Riverola F, Sánchez B, Lourenço A. Resources and tools for the high-throughput, multi-omic study of intestinal microbiota. Brief Bioinform 2020; 20:1032-1056. [PMID: 29186315 DOI: 10.1093/bib/bbx156] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 10/23/2017] [Indexed: 12/18/2022] Open
Abstract
The human gut microbiome impacts several aspects of human health and disease, including digestion, drug metabolism and the propensity to develop various inflammatory, autoimmune and metabolic diseases. Many of the molecular processes that play a role in the activity and dynamics of the microbiota go beyond species and genic composition and thus, their understanding requires advanced bioinformatics support. This article aims to provide an up-to-date view of the resources and software tools that are being developed and used in human gut microbiome research, in particular data integration and systems-level analysis efforts. These efforts demonstrate the power of standardized and reproducible computational workflows for integrating and analysing varied omics data and gaining deeper insights into microbe community structure and function as well as host-microbe interactions.
Collapse
Affiliation(s)
| | | | | | - Anália Lourenço
- Dpto. de Informática - Universidade de Vigo, ESEI - Escuela Superior de Ingeniería Informática, Edificio politécnico, Campus Universitario As Lagoas s/n, 32004 Ourense, Spain
| |
Collapse
|
20
|
McGowan T, Johnson JE, Kumar P, Sajulga R, Mehta S, Jagtap PD, Griffin TJ. Multi-omics Visualization Platform: An extensible Galaxy plug-in for multi-omics data visualization and exploration. Gigascience 2020; 9:giaa025. [PMID: 32236523 PMCID: PMC7102281 DOI: 10.1093/gigascience/giaa025] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 02/13/2020] [Accepted: 02/24/2020] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Proteogenomics integrates genomics, transcriptomics, and mass spectrometry (MS)-based proteomics data to identify novel protein sequences arising from gene and transcript sequence variants. Proteogenomic data analysis requires integration of disparate 'omic software tools, as well as customized tools to view and interpret results. The flexible Galaxy platform has proven valuable for proteogenomic data analysis. Here, we describe a novel Multi-omics Visualization Platform (MVP) for organizing, visualizing, and exploring proteogenomic results, adding a critically needed tool for data exploration and interpretation. FINDINGS MVP is built as an HTML Galaxy plug-in, primarily based on JavaScript. Via the Galaxy API, MVP uses SQLite databases as input-a custom data type (mzSQLite) containing MS-based peptide identification information, a variant annotation table, and a coding sequence table. Users can interactively filter identified peptides based on sequence and data quality metrics, view annotated peptide MS data, and visualize protein-level information, along with genomic coordinates. Peptides that pass the user-defined thresholds can be sent back to Galaxy via the API for further analysis; processed data and visualizations can also be saved and shared. MVP leverages the Integrated Genomics Viewer JavaScript framework, enabling interactive visualization of peptides and corresponding transcript and genomic coding information within the MVP interface. CONCLUSIONS MVP provides a powerful, extensible platform for automated, interactive visualization of proteogenomic results within the Galaxy environment, adding a unique and critically needed tool for empowering exploration and interpretation of results. The platform is extensible, providing a basis for further development of new functionalities for proteogenomic data visualization.
Collapse
Affiliation(s)
- Thomas McGowan
- Minnesota Supercomputing Institute, University of Minnesota, 599 Walter Library, 117 Pleasant Street SE, Minneapolis, MN 55455, USA
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, 599 Walter Library, 117 Pleasant Street SE, Minneapolis, MN 55455, USA
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
- Bioinformatics and Computational Biology program, University of Minnesota-Rochester, 111 South Broadway, Suite 300, Rochester, MN 55904, USA
| | - Ray Sajulga
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| |
Collapse
|
21
|
Föll MC, Moritz L, Wollmann T, Stillger MN, Vockert N, Werner M, Bronsert P, Rohr K, Grüning BA, Schilling O. Accessible and reproducible mass spectrometry imaging data analysis in Galaxy. Gigascience 2019; 8:giz143. [PMID: 31816088 PMCID: PMC6901077 DOI: 10.1093/gigascience/giz143] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Revised: 09/10/2019] [Accepted: 11/10/2019] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Mass spectrometry imaging is increasingly used in biological and translational research because it has the ability to determine the spatial distribution of hundreds of analytes in a sample. Being at the interface of proteomics/metabolomics and imaging, the acquired datasets are large and complex and often analyzed with proprietary software or in-house scripts, which hinders reproducibility. Open source software solutions that enable reproducible data analysis often require programming skills and are therefore not accessible to many mass spectrometry imaging (MSI) researchers. FINDINGS We have integrated 18 dedicated mass spectrometry imaging tools into the Galaxy framework to allow accessible, reproducible, and transparent data analysis. Our tools are based on Cardinal, MALDIquant, and scikit-image and enable all major MSI analysis steps such as quality control, visualization, preprocessing, statistical analysis, and image co-registration. Furthermore, we created hands-on training material for use cases in proteomics and metabolomics. To demonstrate the utility of our tools, we re-analyzed a publicly available N-linked glycan imaging dataset. By providing the entire analysis history online, we highlight how the Galaxy framework fosters transparent and reproducible research. CONCLUSION The Galaxy framework has emerged as a powerful analysis platform for the analysis of MSI data with ease of use and access, together with high levels of reproducibility and transparency.
Collapse
Affiliation(s)
- Melanie Christine Föll
- Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
- Faculty of Biology, University of Freiburg, Schänzlestraße 1, 79104 Freiburg, Germany
| | - Lennart Moritz
- Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
| | - Thomas Wollmann
- Biomedical Computer Vision Group, BioQuant, IPMB, Heidelberg University, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
| | - Maren Nicole Stillger
- Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
- Faculty of Biology, University of Freiburg, Schänzlestraße 1, 79104 Freiburg, Germany
- Institute of Molecular Medicine and Cell Research, Faculty of Medicine, University of Freiburg, Stefan-Meier-Straße 17, 79104 Freiburg, Germany
| | - Niklas Vockert
- Biomedical Computer Vision Group, BioQuant, IPMB, Heidelberg University, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
| | - Martin Werner
- Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
- Faculty of Medicine - University of Freiburg, Breisacher Straße 153, 79110 Freiburg, Germany
- Tumorbank Comprehensive Cancer Center Freiburg, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
- German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Hugstetter Straße 55, 79106 Freiburg, Germany
| | - Peter Bronsert
- Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
- Faculty of Medicine - University of Freiburg, Breisacher Straße 153, 79110 Freiburg, Germany
- Tumorbank Comprehensive Cancer Center Freiburg, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
- German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Hugstetter Straße 55, 79106 Freiburg, Germany
| | - Karl Rohr
- Biomedical Computer Vision Group, BioQuant, IPMB, Heidelberg University, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
| | - Björn Andreas Grüning
- Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Oliver Schilling
- Institute of Surgical Pathology, Medical Center – University of Freiburg, Breisacher Straße 115a, 79106 Freiburg, Germany
- Faculty of Medicine - University of Freiburg, Breisacher Straße 153, 79110 Freiburg, Germany
- German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Hugstetter Straße 55, 79106 Freiburg, Germany
| |
Collapse
|
22
|
Palmblad M, Lamprecht AL, Ison J, Schwämmle V. Automated workflow composition in mass spectrometry-based proteomics. Bioinformatics 2019; 35:656-664. [PMID: 30060113 PMCID: PMC6378944 DOI: 10.1093/bioinformatics/bty646] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Revised: 07/06/2018] [Accepted: 07/26/2018] [Indexed: 11/28/2022] Open
Abstract
Motivation Numerous software utilities operating on mass spectrometry (MS) data are described in the literature and provide specific operations as building blocks for the assembly of on-purpose workflows. Working out which tools and combinations are applicable or optimal in practice is often hard. Thus researchers face difficulties in selecting practical and effective data analysis pipelines for a specific experimental design. Results We provide a toolkit to support researchers in identifying, comparing and benchmarking multiple workflows from individual bioinformatics tools. Automated workflow composition is enabled by the tools’ semantic annotation in terms of the EDAM ontology. To demonstrate the practical use of our framework, we created and evaluated a number of logically and semantically equivalent workflows for four use cases representing frequent tasks in MS-based proteomics. Indeed we found that the results computed by the workflows could vary considerably, emphasizing the benefits of a framework that facilitates their systematic exploration. Availability and implementation The project files and workflows are available from https://github.com/bio-tools/biotoolsCompose/tree/master/Automatic-Workflow-Composition. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Magnus Palmblad
- Center for Proteomics and Metabolomics, Leiden University Medical Center, RC Leiden, The Netherlands
| | - Anna-Lena Lamprecht
- Department of Information and Computing Sciences, Utrecht University, CC Utrecht, The Netherlands
| | - Jon Ison
- National Life Science Supercomputing Center, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
23
|
Schiebenhoefer H, Van Den Bossche T, Fuchs S, Renard BY, Muth T, Martens L. Challenges and promise at the interface of metaproteomics and genomics: an overview of recent progress in metaproteogenomic data analysis. Expert Rev Proteomics 2019; 16:375-390. [PMID: 31002542 DOI: 10.1080/14789450.2019.1609944] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
INTRODUCTION The study of microbial communities based on the combined analysis of genomic and proteomic data - called metaproteogenomics - has gained increased research attention in recent years. This relatively young field aims to elucidate the functional and taxonomic interplay of proteins in microbiomes and its implications on human health and the environment. Areas covered: This article reviews bioinformatics methods and software tools dedicated to the analysis of data from metaproteomics and metaproteogenomics experiments. In particular, it focuses on the creation of tailored protein sequence databases, on the optimal use of database search algorithms including methods of error rate estimation, and finally on taxonomic and functional annotation of peptide and protein identifications. Expert opinion: Recently, various promising strategies and software tools have been proposed for handling typical data analysis issues in metaproteomics. However, severe challenges remain that are highlighted and discussed in this article; these include: (i) robust false-positive assessment of peptide and protein identifications, (ii) complex protein inference against a background of highly redundant data, (iii) taxonomic and functional post-processing of identification data, and finally, (iv) the assessment and provision of metrics and tools for quantitative analysis.
Collapse
Affiliation(s)
- Henning Schiebenhoefer
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Tim Van Den Bossche
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| | - Stephan Fuchs
- d FG13 Division of Nosocomial Pathogens and Antibiotic Resistances , Robert Koch Institute , Wernigerode , Germany
| | - Bernhard Y Renard
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Thilo Muth
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Lennart Martens
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| |
Collapse
|
24
|
Saito MA, Bertrand EM, Duffy ME, Gaylord DA, Held NA, Hervey WJ, Hettich RL, Jagtap PD, Janech MG, Kinkade DB, Leary DH, McIlvin MR, Moore EK, Morris RM, Neely BA, Nunn BL, Saunders JK, Shepherd AI, Symmonds NI, Walsh DA. Progress and Challenges in Ocean Metaproteomics and Proposed Best Practices for Data Sharing. J Proteome Res 2019; 18:1461-1476. [PMID: 30702898 DOI: 10.1021/acs.jproteome.8b00761] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Ocean metaproteomics is an emerging field enabling discoveries about marine microbial communities and their impact on global biogeochemical processes. Recent ocean metaproteomic studies have provided insight into microbial nutrient transport, colimitation of carbon fixation, the metabolism of microbial biofilms, and dynamics of carbon flux in marine ecosystems. Future methodological developments could provide new capabilities such as characterizing long-term ecosystem changes, biogeochemical reaction rates, and in situ stoichiometries. Yet challenges remain for ocean metaproteomics due to the great biological diversity that produces highly complex mass spectra, as well as the difficulty in obtaining and working with environmental samples. This review summarizes the progress and challenges facing ocean metaproteomic scientists and proposes best practices for data sharing of ocean metaproteomic data sets, including the data types and metadata needed to enable intercomparisons of protein distributions and annotations that could foster global ocean metaproteomic capabilities.
Collapse
Affiliation(s)
- Mak A Saito
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - Erin M Bertrand
- Department of Biology , Dalhousie University , Halifax , Nova Scotia B3H 4R2 , Canada
| | - Megan E Duffy
- School of Oceanography , University of Washington , Seattle , Washington 98195-7940 , United States
| | - David A Gaylord
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - Noelle A Held
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | | | - Robert L Hettich
- Oak Ridge National Laboratory and Microbiology Department , University of Tennessee , Knoxville , Tennessee 37996 , United States
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics , University of Minnesota , Saint Paul , Minnesota 55108 , United States
| | - Michael G Janech
- College of Charleston , Charleston , South Carolina 29424 , United States
| | - Danie B Kinkade
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - Dagmar H Leary
- U.S. Naval Research Laboratory , Washington , D.C. 20375 , United States
| | - Matthew R McIlvin
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - Eli K Moore
- Department of Environmental Science , Rowan University , Glassboro , New Jersey 08028 , United States
| | - Robert M Morris
- School of Oceanography , University of Washington , Seattle , Washington 98195-7940 , United States
| | - Benjamin A Neely
- National Institute of Standards and Technology , Charleston , South Carolina 29412 , United States
| | - Brook L Nunn
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - Jaclyn K Saunders
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States.,School of Oceanography , University of Washington , Seattle , Washington 98195-7940 , United States
| | - Adam I Shepherd
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - Nicholas I Symmonds
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - David A Walsh
- Department of Biology , Concordia University , Montreal , Quebec H4B 1R6 , Canada
| |
Collapse
|
25
|
Guillot L, Delage L, Viari A, Vandenbrouck Y, Com E, Ritter A, Lavigne R, Marie D, Peterlongo P, Potin P, Pineau C. Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes. BMC Genomics 2019; 20:56. [PMID: 30654742 PMCID: PMC6337836 DOI: 10.1186/s12864-019-5431-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 01/03/2019] [Indexed: 01/02/2023] Open
Abstract
Background Accurate structural annotation of genomes is still a challenge, despite the progress made over the past decade. The prediction of gene structure remains difficult, especially for eukaryotic species, and is often erroneous and incomplete. We used a proteogenomics strategy, taking advantage of the combination of proteomics datasets and bioinformatics tools, to identify novel protein coding-genes and splice isoforms, assign correct start sites, and validate predicted exons and genes. Results Our proteogenomics workflow, Peptimapper, was applied to the genome annotation of Ectocarpus sp., a key reference genome for both the brown algal lineage and stramenopiles. We generated proteomics data from various life cycle stages of Ectocarpus sp. strains and sub-cellular fractions using a shotgun approach. First, we directly generated peptide sequence tags (PSTs) from the proteomics data. Second, we mapped PSTs onto the translated genomic sequence. Closely located hits (i.e., PSTs locations on the genome) were then clustered to detect potential coding regions based on parameters optimized for the organism. Third, we evaluated each cluster and compared it to gene predictions from existing conventional genome annotation approaches. Finally, we integrated cluster locations into GFF files to use a genome viewer. We identified two potential novel genes, a ribosomal protein L22 and an aryl sulfotransferase and corrected the gene structure of a dihydrolipoamide acetyltransferase. We experimentally validated the results by RT-PCR and using transcriptomics data. Conclusions Peptimapper is a complementary tool for the expert annotation of genomes. It is suitable for any organism and is distributed through a Docker image available on two public bioinformatics docker repositories: Docker Hub and BioShaDock. This workflow is also accessible through the Galaxy framework and for use by non-computer scientists at https://galaxy.protim.eu. Data are available via ProteomeXchange under identifier PXD010618. Electronic supplementary material The online version of this article (10.1186/s12864-019-5431-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Laetitia Guillot
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.,Protim, Univ Rennes, F-35042, Rennes cedex, France
| | - Ludovic Delage
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France
| | - Alain Viari
- INRIA Grenoble-Rhône-Alpes, F-38330, Montbonnot-Saint-Martin, France
| | - Yves Vandenbrouck
- University Grenoble Alpes, CEA, Inserm, BIG-BGE, 38000, Grenoble, France
| | - Emmanuelle Com
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.,Protim, Univ Rennes, F-35042, Rennes cedex, France
| | - Andrés Ritter
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France.,Present address: Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005, Paris, France
| | - Régis Lavigne
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.,Protim, Univ Rennes, F-35042, Rennes cedex, France
| | - Dominique Marie
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France
| | | | - Philippe Potin
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France
| | - Charles Pineau
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France. .,Protim, Univ Rennes, F-35042, Rennes cedex, France.
| |
Collapse
|
26
|
Abstract
Affinity proteomics (AP-MS) is growing in importance for characterizing protein-protein interactions (PPIs) in the form of protein complexes and signaling networks. The AP-MS approach necessitates several different software tools, integrated into reproducible and accessible workflows. However, if the scientist (e.g., a bench biologist) lacks a computational background, then managing large AP-MS datasets can be challenging, manually formatting AP-MS data for input into analysis software can be error-prone, and data visualization involving dozens of variables can be laborious. One solution to address these issues is Galaxy, an open source and web-based platform for developing and deploying user-friendly computational pipelines or workflows. Here, we describe a Galaxy-based platform enabling AP-MS analysis. This platform enables researchers with no prior computational experience to begin with data from a mass spectrometer (e.g., peaklists in mzML format) and perform peak processing, database searching, assignment of interaction confidence scores, and data visualization with a few clicks of a mouse. We provide sample data and a sample workflow with step-by-step instructions to quickly acquaint users with the process.
Collapse
|
27
|
Kumar P, Panigrahi P, Johnson J, Weber WJ, Mehta S, Sajulga R, Easterly C, Crooker BA, Heydarian M, Anamika K, Griffin TJ, Jagtap PD. QuanTP: A Software Resource for Quantitative Proteo-Transcriptomic Comparative Data Analysis and Informatics. J Proteome Res 2018; 18:782-790. [DOI: 10.1021/acs.jproteome.8b00727] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Praveen Kumar
- Bioinformatics and Computational Biology Program, University of Minnesota-Rochester, Rochester, Minnesota 55904, United States
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | | | - James Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Wanda J. Weber
- Department of Animal Science, University of Minnesota, St. Paul, Minnesota 55108, United States
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Ray Sajulga
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Caleb Easterly
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Brian A. Crooker
- Department of Animal Science, University of Minnesota, St. Paul, Minnesota 55108, United States
| | - Mohammad Heydarian
- Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Krishanpal Anamika
- LABS, Persistent Systems, Aryabhata-Pingala, Erandwane, Pune 411004, India
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| |
Collapse
|
28
|
Low TY, Mohtar MA, Ang MY, Jamal R. Connecting Proteomics to Next‐Generation Sequencing: Proteogenomics and Its Current Applications in Biology. Proteomics 2018; 19:e1800235. [DOI: 10.1002/pmic.201800235] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 10/09/2018] [Indexed: 12/17/2022]
Affiliation(s)
- Teck Yew Low
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - M. Aiman Mohtar
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - Mia Yang Ang
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - Rahman Jamal
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| |
Collapse
|
29
|
Johnson JE, Kumar P, Easterly C, Esler M, Mehta S, Eschenlauer AC, Hegeman AD, Jagtap PD, Griffin TJ. Improve your Galaxy text life: The Query Tabular Tool. F1000Res 2018; 7:1604. [PMID: 30519459 PMCID: PMC6248266 DOI: 10.12688/f1000research.16450.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/02/2019] [Indexed: 11/20/2022] Open
Abstract
Galaxy provides an accessible platform where multi-step data analysis workflows integrating disparate software can be run, even by researchers with limited programming expertise. Applications of such sophisticated workflows are many, including those which integrate software from different ‘omic domains (e.g. genomics, proteomics, metabolomics). In these complex workflows, intermediate outputs are often generated as tabular text files, which must be transformed into customized formats which are compatible with the next software tools in the pipeline. Consequently, many text manipulation steps are added to an already complex workflow, overly complicating the process. In some cases, limitations to existing text manipulation are such that desired analyses can only be carried out using highly sophisticated processing steps beyond the reach of even advanced users and developers. For users with some SQL knowledge, these text operations could be combined into single, concise query on a relational database. As a solution, we have developed the Query Tabular Galaxy tool, which leverages a SQLite database generated from tabular input data. This database can be queried and manipulated to produce transformed and customized tabular outputs compatible with downstream processing steps. Regular expressions can also be utilized for even more sophisticated manipulations, such as find and replace and other filtering actions. Using several Galaxy-based multi-omic workflows as an example, we demonstrate how the Query Tabular tool dramatically streamlines and simplifies the creation of multi-step analyses, efficiently enabling complicated textual manipulations and processing. This tool should find broad utility for users of the Galaxy platform seeking to develop and use sophisticated workflows involving text manipulation on tabular outputs.
Collapse
Affiliation(s)
- James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA.,Bioinformatics and Computational Biology Program, University of Minnesota-Rochester, Rochester, MN, 55904, USA
| | - Caleb Easterly
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| | - Mark Esler
- Department of Horticulture, University of Minnesota, St. Paul, MN, 55108, USA
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| | - Arthur C Eschenlauer
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA.,Department of Horticulture, University of Minnesota, St. Paul, MN, 55108, USA
| | - Adrian D Hegeman
- Department of Horticulture, University of Minnesota, St. Paul, MN, 55108, USA
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| |
Collapse
|
30
|
Johnson JE, Kumar P, Easterly C, Esler M, Mehta S, Eschenlauer AC, Hegeman AD, Jagtap PD, Griffin TJ. Improve your Galaxy text life: The Query Tabular Tool. F1000Res 2018; 7:1604. [PMID: 30519459 PMCID: PMC6248266 DOI: 10.12688/f1000research.16450.1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/02/2019] [Indexed: 10/04/2023] Open
Abstract
Galaxy provides an accessible platform where multi-step data analysis workflows integrating disparate software can be run, even by researchers with limited programming expertise. Applications of such sophisticated workflows are many, including those which integrate software from different 'omic domains (e.g. genomics, proteomics, metabolomics). In these complex workflows, intermediate outputs are often generated as tabular text files, which must be transformed into customized formats which are compatible with the next software tools in the pipeline. Consequently, many text manipulation steps are added to an already complex workflow, overly complicating the process. In some cases, limitations to existing text manipulation are such that desired analyses can only be carried out using highly sophisticated processing steps beyond the reach of even advanced users and developers. For users with some SQL knowledge, these text operations could be combined into single, concise query on a relational database. As a solution, we have developed the Query Tabular Galaxy tool, which leverages a SQLite database generated from tabular input data. This database can be queried and manipulated to produce transformed and customized tabular outputs compatible with downstream processing steps. Regular expressions can also be utilized for even more sophisticated manipulations, such as find and replace and other filtering actions. Using several Galaxy-based multi-omic workflows as an example, we demonstrate how the Query Tabular tool dramatically streamlines and simplifies the creation of multi-step analyses, efficiently enabling complicated textual manipulations and processing. This tool should find broad utility for users of the Galaxy platform seeking to develop and use sophisticated workflows involving text manipulation on tabular outputs.
Collapse
Affiliation(s)
- James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
- Bioinformatics and Computational Biology Program, University of Minnesota-Rochester, Rochester, MN, 55904, USA
| | - Caleb Easterly
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| | - Mark Esler
- Department of Horticulture, University of Minnesota, St. Paul, MN, 55108, USA
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| | - Arthur C. Eschenlauer
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
- Department of Horticulture, University of Minnesota, St. Paul, MN, 55108, USA
| | - Adrian D. Hegeman
- Department of Horticulture, University of Minnesota, St. Paul, MN, 55108, USA
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| |
Collapse
|
31
|
Sajulga R, Mehta S, Kumar P, Johnson JE, Guerrero CR, Ryan MC, Karchin R, Jagtap PD, Griffin TJ. Bridging the Chromosome-centric and Biology/Disease-driven Human Proteome Projects: Accessible and Automated Tools for Interpreting the Biological and Pathological Impact of Protein Sequence Variants Detected via Proteogenomics. J Proteome Res 2018; 17:4329-4336. [DOI: 10.1021/acs.jproteome.8b00404] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Ray Sajulga
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
- Bioinformatics and Computational Biology Program, University of Minnesota-Rochester, Rochester, Minnesota 55904, United States
| | - James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Candace R. Guerrero
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Michael C. Ryan
- In-Silico Solutions, Falls Church, Virginia 22043, United States
| | - Rachel Karchin
- Department of Biomedical Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, United States
- The Institute for Computational Medicine, The Johns Hopkins University, Baltimore, Maryland 21218, United States
- Department of Oncology, The Johns Hopkins University School of Medicine, Baltimore, Maryland 21217, United States
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| |
Collapse
|
32
|
Afiuni-Zadeh S, Boylan KLM, Jagtap PD, Griffin TJ, Rudney JD, Peterson ML, Skubitz APN. Evaluating the potential of residual Pap test fluid as a resource for the metaproteomic analysis of the cervical-vaginal microbiome. Sci Rep 2018; 8:10868. [PMID: 30022083 PMCID: PMC6052116 DOI: 10.1038/s41598-018-29092-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Accepted: 07/04/2018] [Indexed: 01/30/2023] Open
Abstract
The human cervical-vaginal area contains proteins derived from microorganisms that may prevent or predispose women to gynecological conditions. The liquid Pap test fixative is an unexplored resource for analysis of microbial communities and the microbe-host interaction. Previously, we showed that the residual cell-free fixative from discarded Pap tests of healthy women could be used for mass spectrometry (MS) based proteomic identification of cervical-vaginal proteins. In this study, we reprocessed these MS raw data files for metaproteomic analysis to characterize the microbial community composition and function of microbial proteins in the cervical-vaginal region. This was accomplished by developing a customized protein sequence database encompassing microbes likely present in the vagina. High-mass accuracy data were searched against the protein FASTA database using a two-step search method within the Galaxy for proteomics platform. Data was analyzed by MEGAN6 (MetaGenomeAnalyzer) for phylogenetic and functional characterization. We identified over 300 unique peptides from a variety of bacterial phyla and Candida. Peptides corresponding to proteins involved in carbohydrate metabolism, oxidation-reduction, and transport were identified. By identifying microbial peptides in Pap test supernatants it may be possible to acquire a functional signature of these microbes, as well as detect specific proteins associated with cervical health and disease.
Collapse
Affiliation(s)
- Somaieh Afiuni-Zadeh
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
| | - Kristin L M Boylan
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
- Center for Mass Spectrometry and Proteomics, University of Minnesota, Minneapolis, MN, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
- Center for Mass Spectrometry and Proteomics, University of Minnesota, Minneapolis, MN, USA
| | - Joel D Rudney
- Department of Diagnostic and Biological Sciences, School of Dentistry, University of Minnesota, Minneapolis, MN, USA
| | | | - Amy P N Skubitz
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
33
|
Levitsky LI, Ivanov MV, Lobas AA, Bubis JA, Tarasova IA, Solovyeva EM, Pridatchenko ML, Gorshkov MV. IdentiPy: An Extensible Search Engine for Protein Identification in Shotgun Proteomics. J Proteome Res 2018; 17:2249-2255. [PMID: 29682971 DOI: 10.1021/acs.jproteome.7b00640] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
We present an open-source, extensible search engine for shotgun proteomics. Implemented in Python programming language, IdentiPy shows competitive processing speed and sensitivity compared with the state-of-the-art search engines. It is equipped with a user-friendly web interface, IdentiPy Server, enabling the use of a single server installation accessed from multiple workstations. Using a simplified version of X!Tandem scoring algorithm and its novel "autotune" feature, IdentiPy outperforms the popular alternatives on high-resolution data sets. Autotune adjusts the search parameters for the particular data set, resulting in improved search efficiency and simplifying the user experience. IdentiPy with the autotune feature shows higher sensitivity compared with the evaluated search engines. IdentiPy Server has built-in postprocessing and protein inference procedures and provides graphic visualization of the statistical properties of the data set and the search results. It is open-source and can be freely extended to use third-party scoring functions or processing algorithms and allows customization of the search workflow for specialized applications.
Collapse
Affiliation(s)
- Lev I Levitsky
- Moscow Institute of Physics and Technology , 9 Institutskiy per. , Dolgoprudny , Moscow Region 141700 , Russian Federation.,V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Mark V Ivanov
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Anna A Lobas
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Julia A Bubis
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Irina A Tarasova
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Elizaveta M Solovyeva
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Marina L Pridatchenko
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Mikhail V Gorshkov
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| |
Collapse
|
34
|
Feng J, Ding C, Qiu N, Ni X, Zhan D, Liu W, Xia X, Li P, Lu B, Zhao Q, Nie P, Song L, Zhou Q, Lai M, Guo G, Zhu W, Ren J, Shi T, Qin J. Firmiana: towards a one-stop proteomic cloud platform for data processing and analysis. Nat Biotechnol 2018; 35:409-412. [PMID: 28486446 DOI: 10.1038/nbt.3825] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Jinwen Feng
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing, China.,The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Chen Ding
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing, China.,State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Institutes of Biomedical Sciences, Fudan University, Shanghai, China
| | - Naiqi Qiu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing, China.,The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Xiaotian Ni
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing, China.,The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Dongdong Zhan
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing, China.,The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Wanlin Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing, China
| | - Xia Xia
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing, China
| | - Peng Li
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Bingxin Lu
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Qi Zhao
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Peng Nie
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Lei Song
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing, China
| | - Quan Zhou
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing, China
| | - Mi Lai
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing, China
| | - Gaigai Guo
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing, China
| | - Weimin Zhu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing, China
| | - Jian Ren
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Tieliu Shi
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Jun Qin
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, National Center for Protein Sciences (The PHOENIX Center, Beijing), Beijing, China.,State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Institutes of Biomedical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
35
|
Blank C, Easterly C, Gruening B, Johnson J, Kolmeder CA, Kumar P, May D, Mehta S, Mesuere B, Brown Z, Elias JE, Hervey WJ, McGowan T, Muth T, Nunn B, Rudney J, Tanca A, Griffin TJ, Jagtap PD. Disseminating Metaproteomic Informatics Capabilities and Knowledge Using the Galaxy-P Framework. Proteomes 2018; 6:proteomes6010007. [PMID: 29385081 PMCID: PMC5874766 DOI: 10.3390/proteomes6010007] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 01/26/2018] [Accepted: 01/26/2018] [Indexed: 01/12/2023] Open
Abstract
The impact of microbial communities, also known as the microbiome, on human health and the environment is receiving increased attention. Studying translated gene products (proteins) and comparing metaproteomic profiles may elucidate how microbiomes respond to specific environmental stimuli, and interact with host organisms. Characterizing proteins expressed by a complex microbiome and interpreting their functional signature requires sophisticated informatics tools and workflows tailored to metaproteomics. Additionally, there is a need to disseminate these informatics resources to researchers undertaking metaproteomic studies, who could use them to make new and important discoveries in microbiome research. The Galaxy for proteomics platform (Galaxy-P) offers an open source, web-based bioinformatics platform for disseminating metaproteomics software and workflows. Within this platform, we have developed easily-accessible and documented metaproteomic software tools and workflows aimed at training researchers in their operation and disseminating the tools for more widespread use. The modular workflows encompass the core requirements of metaproteomic informatics: (a) database generation; (b) peptide spectral matching; (c) taxonomic analysis and (d) functional analysis. Much of the software available via the Galaxy-P platform was selected, packaged and deployed through an online metaproteomics "Contribution Fest" undertaken by a unique consortium of expert software developers and users from the metaproteomics research community, who have co-authored this manuscript. These resources are documented on GitHub and freely available through the Galaxy Toolshed, as well as a publicly accessible metaproteomics gateway Galaxy instance. These documented workflows are well suited for the training of novice metaproteomics researchers, through online resources such as the Galaxy Training Network, as well as hands-on training workshops. Here, we describe the metaproteomics tools available within these Galaxy-based resources, as well as the process by which they were selected and implemented in our community-based work. We hope this description will increase access to and utilization of metaproteomics tools, as well as offer a framework for continued community-based development and dissemination of cutting edge metaproteomics software.
Collapse
Affiliation(s)
- Clemens Blank
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg im Breisgau, Germany.
| | - Caleb Easterly
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Bjoern Gruening
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg im Breisgau, Germany.
| | - James Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Carolin A Kolmeder
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland.
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Damon May
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Bart Mesuere
- Computational Biology Group, Ghent University, Krijgslaan 281, B-9000 Ghent, Belgium.
| | - Zachary Brown
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Joshua E Elias
- Department of Chemical & Systems Biology, Stanford University, Stanford, CA 94305, USA.
| | - W Judson Hervey
- Center for Bio/Molecular Science & Engineering, Naval Research Laboratory, Washington, DC 20375, USA.
| | - Thomas McGowan
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Thilo Muth
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany.
| | - Brook Nunn
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
| | - Joel Rudney
- Department of Diagnostic and Biological Sciences, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Alessandro Tanca
- Porto Conte Ricerche Science and Technology Park of Sardinia, 07041 Alghero, Italy.
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| |
Collapse
|
36
|
Guitton Y, Tremblay-Franco M, Le Corguillé G, Martin JF, Pétéra M, Roger-Mele P, Delabrière A, Goulitquer S, Monsoor M, Duperier C, Canlet C, Servien R, Tardivel P, Caron C, Giacomoni F, Thévenot EA. Create, run, share, publish, and reference your LC–MS, FIA–MS, GC–MS, and NMR data analysis workflows with the Workflow4Metabolomics 3.0 Galaxy online infrastructure for metabolomics. Int J Biochem Cell Biol 2017; 93:89-101. [DOI: 10.1016/j.biocel.2017.07.002] [Citation(s) in RCA: 65] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2017] [Revised: 06/14/2017] [Accepted: 07/10/2017] [Indexed: 12/11/2022]
|
37
|
Chambers MC, Jagtap PD, Johnson JE, McGowan T, Kumar P, Onsongo G, Guerrero CR, Barsnes H, Vaudel M, Martens L, Grüning B, Cooke IR, Heydarian M, Reddy KL, Griffin TJ. An Accessible Proteogenomics Informatics Resource for Cancer Researchers. Cancer Res 2017; 77:e43-e46. [PMID: 29092937 DOI: 10.1158/0008-5472.can-17-0331] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Revised: 04/07/2017] [Accepted: 06/30/2017] [Indexed: 11/16/2022]
Abstract
Proteogenomics has emerged as a valuable approach in cancer research, which integrates genomic and transcriptomic data with mass spectrometry-based proteomics data to directly identify expressed, variant protein sequences that may have functional roles in cancer. This approach is computationally intensive, requiring integration of disparate software tools into sophisticated workflows, challenging its adoption by nonexpert, bench scientists. To address this need, we have developed an extensible, Galaxy-based resource aimed at providing more researchers access to, and training in, proteogenomic informatics. Our resource brings together software from several leading research groups to address two foundational aspects of proteogenomics: (i) generation of customized, annotated protein sequence databases from RNA-Seq data; and (ii) accurate matching of tandem mass spectrometry data to putative variants, followed by filtering to confirm their novelty. Directions for accessing software tools and workflows, along with instructional documentation, can be found at z.umn.edu/canresgithub. Cancer Res; 77(21); e43-46. ©2017 AACR.
Collapse
Affiliation(s)
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota
| | - Thomas McGowan
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota.,Bioinformatics and Computational Biology Program, University of Minnesota-Rochester, Rochester, Minnesota
| | - Getiria Onsongo
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota
| | - Candace R Guerrero
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota
| | - Harald Barsnes
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway.,Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Marc Vaudel
- KG Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway.,Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.,Department of Biochemistry, Ghent University, Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Björn Grüning
- Department of Computer Science, Albert-Ludwigs-University, Freiburg, Freiburg, Germany.,Center for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany
| | - Ira R Cooke
- Comparative Genomics Centre and Department of Molecular and Cell Biology, James Cook University, Queensland, Australia
| | | | - Karen L Reddy
- Department of Biological Chemistry, Center for Epigenetics and Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, Maryland
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota.
| |
Collapse
|
38
|
Proffitt JM, Glenn J, Cesnik AJ, Jadhav A, Shortreed MR, Smith LM, Kavanagh K, Cox LA, Olivier M. Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys. BMC Genomics 2017; 18:877. [PMID: 29132314 PMCID: PMC5683380 DOI: 10.1186/s12864-017-4279-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 11/03/2017] [Indexed: 01/05/2023] Open
Abstract
Background Shotgun proteomics utilizes a database search strategy to compare detected mass spectra to a library of theoretical spectra derived from reference genome information. As such, the robustness of proteomics results is contingent upon the completeness and accuracy of the gene annotation in the reference genome. For animal models of disease where genomic annotation is incomplete, such as non-human primates, proteogenomic methods can improve the detection of proteins by incorporating transcriptional data from RNA-Seq to improve proteomics search databases used for peptide spectral matching. Customized search databases derived from RNA-Seq data are capable of identifying unannotated genetic and splice variants while simultaneously reducing the number of comparisons to only those transcripts actively expressed in the tissue. Results We collected RNA-Seq and proteomic data from 10 vervet monkey liver samples and used the RNA-Seq data to curate sample-specific search databases which were analyzed in the program Morpheus. We compared these results against those from a search database generated from the reference vervet genome. A total of 284 previously unannotated splice junctions were predicted by the RNA-Seq data, 92 of which were confirmed by peptide spectral matches. More than half (53/92) of these unannotated splice variants had orthologs in other non-human primates, suggesting that failure to match these peptides in the reference analyses likely arose from incomplete gene model information. The sample-specific databases also identified 101 unique peptides containing single amino acid substitutions which were missed by the reference database. Because the sample-specific searches were restricted to actively expressed transcripts, the search databases were smaller, more computationally efficient, and identified more peptides at the empirically derived 1 % false discovery rate. Conclusion Proteogenomic approaches are ideally suited to facilitate the discovery and annotation of proteins in less widely studies animal models such as non-human primates. We expect that these approaches will help to improve existing genome annotations of non-human primate species such as vervet. Electronic supplementary material The online version of this article (doi: 10.1186/s12864-017-4279-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- J Michael Proffitt
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, USA
| | - Jeremy Glenn
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, USA
| | - Anthony J Cesnik
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin, USA
| | - Avinash Jadhav
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, USA.,Current address: Department of Internal Medicine, Section of Molecular Medicine, Wake Forest School of Medicine, NRC Building, G-55, Winston-Salem, North Carolina, 27157, USA
| | | | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin, USA.,Genome Center of Wisconsin, University of Wisconsin, Madison, Wisconsin, USA
| | - Kylie Kavanagh
- Department of Pathology and Comparative Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Laura A Cox
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, USA.,Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, Texas, USA
| | - Michael Olivier
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, USA. .,Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, Texas, USA. .,Current address: Department of Internal Medicine, Section of Molecular Medicine, Wake Forest School of Medicine, NRC Building, G-55, Winston-Salem, North Carolina, 27157, USA.
| |
Collapse
|
39
|
Starr AE, Deeke SA, Li L, Zhang X, Daoud R, Ryan J, Ning Z, Cheng K, Nguyen LVH, Abou-Samra E, Lavallée-Adam M, Figeys D. Proteomic and Metaproteomic Approaches to Understand Host–Microbe Interactions. Anal Chem 2017; 90:86-109. [DOI: 10.1021/acs.analchem.7b04340] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Amanda E. Starr
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Shelley A. Deeke
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Leyuan Li
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Xu Zhang
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Rachid Daoud
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - James Ryan
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Zhibin Ning
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Kai Cheng
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Linh V. H. Nguyen
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Elias Abou-Samra
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Mathieu Lavallée-Adam
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Daniel Figeys
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
- Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
- Molecular Architecture of Life Program, Canadian Institute for Advanced Research, Toronto, Ontario, M5G 1M1, Canada
| |
Collapse
|
40
|
Menschaert G, David F. Proteogenomics from a bioinformatics angle: A growing field. MASS SPECTROMETRY REVIEWS 2017; 36:584-599. [PMID: 26670565 PMCID: PMC6101030 DOI: 10.1002/mas.21483] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Accepted: 09/01/2015] [Indexed: 05/16/2023]
Abstract
Proteogenomics is a research area that combines areas as proteomics and genomics in a multi-omics setup using both mass spectrometry and high-throughput sequencing technologies. Currently, the main goals of the field are to aid genome annotation or to unravel the proteome complexity. Mass spectrometry based identifications of matching or homologues peptides can further refine gene models. Also, the identification of novel proteoforms is also made possible based on detection of novel translation initiation sites (cognate or near-cognate), novel transcript isoforms, sequence variation or novel (small) open reading frames in intergenic or un-translated genic regions by analyzing high-throughput sequencing data from RNAseq or ribosome profiling experiments. Other proteogenomics studies using a combination of proteomics and genomics techniques focus on antibody sequencing, the identification of immunogenic peptides or venom peptides. Over the years, a growing amount of bioinformatics tools and databases became available to help streamlining these cross-omics studies. Some of these solutions only help in specific steps of the proteogenomics studies, e.g. building custom sequence databases (based on next generation sequencing output) for mass spectrometry fragmentation spectrum matching. Over the last few years a handful integrative tools also became available that can execute complete proteogenomics analyses. Some of these are presented as stand-alone solutions, whereas others are implemented in a web-based framework such as Galaxy. In this review we aimed at sketching a comprehensive overview of all the bioinformatics solutions that are available for this growing research area. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:584-599, 2017.
Collapse
Affiliation(s)
- Gerben Menschaert
- Lab of Bioinformatics and Computational Genomics, Department of
Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience
Engineering, Ghent University, Ghent, Belgium
- To whom correspondence should be addressed. Tel:
+32 9 264 99 22; Fax: +32 9 264 6220;
| | - Fenyö David
- Center for Health Informatics and Bioinformatics and Department of
Biochemistry and Molecular Pharmacology, New York University School of Medicine, New
York, New York, USA
| |
Collapse
|
41
|
Bronchoalveolar Lavage Fluid Protein Expression in Acute Respiratory Distress Syndrome Provides Insights into Pathways Activated in Subjects with Different Outcomes. Sci Rep 2017; 7:7464. [PMID: 28785034 PMCID: PMC5547130 DOI: 10.1038/s41598-017-07791-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Accepted: 07/04/2017] [Indexed: 02/06/2023] Open
Abstract
Acute respiratory distress syndrome (ARDS) is associated with high mortality. We sought to identify biological pathways in ARDS that differentiate survivors from non-survivors. We studied bronchoalveolar lavage fluid (BALF) from 36 patients with ARDS (20 survivors, 16 non-survivors). Each sample, obtained within seven days of ARDS onset, was depleted of high abundance proteins and labeled for iTRAQ LC-MS/MS separately. Protein identification and relative quantification was performed employing a target-decoy strategy. A variance weighted t-test was used to identify differential expression. Ingenuity Pathway Analysis was used to determine the canonical pathways that differentiated survivors from non-survivors. We identified 1115 high confidence proteins in the BALF out of which 142 were differentially expressed between survivors and non-survivors. These proteins mapped to multiple pathways distinguishing survivors from non-survivors, including several implicated in lung injury and repair such as coagulation/thrombosis, acute phase response signaling and complement activation. We also identified proteins assigned to fibrosis and ones involved in detoxification of lipid peroxide-mediated oxidative stress to be different in survivors and non-survivors. These results support our previous findings demonstrating early differences in the BALF protein expression in ARDS survivors vs. non-survivors, including proteins that counter oxidative stress and canonical pathways associated with fibrosis.
Collapse
|
42
|
|
43
|
Fu S, Liu X, Luo M, Xie K, Nice EC, Zhang H, Huang C. Proteogenomic studies on cancer drug resistance: towards biomarker discovery and target identification. Expert Rev Proteomics 2017; 14:351-362. [PMID: 28276747 DOI: 10.1080/14789450.2017.1299006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
INTRODUCTION Chemoresistance is a major obstacle for current cancer treatment. Proteogenomics is a powerful multi-omics research field that uses customized protein sequence databases generated by genomic and transcriptomic information to identify novel genes (e.g. noncoding, mutation and fusion genes) from mass spectrometry-based proteomic data. By identifying aberrations that are differentially expressed between tumor and normal pairs, this approach can also be applied to validate protein variants in cancer, which may reveal the response to drug treatment. Areas covered: In this review, we will present recent advances in proteogenomic investigations of cancer drug resistance with an emphasis on integrative proteogenomic pipelines and the biomarker discovery which contributes to achieving the goal of using precision/personalized medicine for cancer treatment. Expert commentary: The discovery and comprehensive understanding of potential biomarkers help identify the cohort of patients who may benefit from particular treatments, and will assist real-time clinical decision-making to maximize therapeutic efficacy and minimize adverse effects. With the development of MS-based proteomics and NGS-based sequencing, a growing number of proteogenomic tools are being developed specifically to investigate cancer drug resistance.
Collapse
Affiliation(s)
- Shuyue Fu
- a State Key Laboratory of Biotherapy and Cancer Center , West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy , Chengdu , P.R. China
| | - Xiang Liu
- b Department of Pathology , Sichuan Academy of Medical Sciences, Sichuan Provincial People's Hospital , Chengdu , P.R. China
| | - Maochao Luo
- c West China School of Public Health, Sichuan University , Chengdu , P.R.China
| | - Ke Xie
- d Department of Oncology , Sichuan Academy of Medical Sciences, Sichuan Provincial People's Hospital , Chengdu , P.R. China
| | - Edouard C Nice
- e Department of Biochemistry and Molecular Biology , Monash University , Clayton , Australia
| | - Haiyuan Zhang
- f School of Medicine , Yangtze University , P. R. China
| | - Canhua Huang
- a State Key Laboratory of Biotherapy and Cancer Center , West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy , Chengdu , P.R. China
| |
Collapse
|
44
|
|
45
|
Guerrero CR, Jagtap PD, Johnson JE, Griffin TJ. Using Galaxy for Proteomics. PROTEOME INFORMATICS 2016. [DOI: 10.1039/9781782626732-00289] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The area of informatics for mass spectrometry (MS)-based proteomics data has steadily grown over the last two decades. Numerous, effective software programs now exist for various aspects of proteomic informatics. However, many researchers still have difficulties in using these software. These difficulties arise from problems with running and integrating disparate software programs, scalability issues when dealing with large data volumes, and lack of ability to share and reproduce workflows comprised of different software. The Galaxy framework for bioinformatics provides an attractive option for solving many of these current issues in proteomic informatics. Originally developed as a workbench to enable genomic data analysis, numerous researchers are now turning to Galaxy to implement software for MS-based proteomics applications. Here, we provide an introduction to Galaxy and its features, and describe how software tools are deployed, published and shared via the scalable framework. We also describe some of the existing tools in Galaxy for basic MS-based proteomics data analysis and informatics. Finally, we describe how proteomics tools in Galaxy can be combined with other existing tools for genomic and transcriptomic data analysis to enable powerful multi-omic data analysis applications.
Collapse
Affiliation(s)
- Candace R. Guerrero
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota 321 Church St SE/6-155 Jackson Hall Minneapolis MN 55455 USA
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota 321 Church St SE/6-155 Jackson Hall Minneapolis MN 55455 USA
- Center for Mass Spectrometry and Proteomics, University of Minnesota 1479 Gortner Avenue, St. Paul MN 55108 USA
| | - James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota 512 Walter Library, 117 Pleasant Street SE Minneapolis MN 55455 USA
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota 321 Church St SE/6-155 Jackson Hall Minneapolis MN 55455 USA
- Center for Mass Spectrometry and Proteomics, University of Minnesota 1479 Gortner Avenue, St. Paul MN 55108 USA
| |
Collapse
|
46
|
Weisser H, Wright JC, Mudge JM, Gutenbrunner P, Choudhary JS. Flexible Data Analysis Pipeline for High-Confidence Proteogenomics. J Proteome Res 2016; 15:4686-4695. [PMID: 27786492 PMCID: PMC5703597 DOI: 10.1021/acs.jproteome.6b00765] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Proteogenomics leverages information
derived from proteomic data
to improve genome annotations. Of particular interest are “novel”
peptides that provide direct evidence of protein expression for genomic
regions not previously annotated as protein-coding. We present a modular,
automated data analysis pipeline aimed at detecting such “novel”
peptides in proteomic data sets. This pipeline implements criteria
developed by proteomics and genome annotation experts for high-stringency
peptide identification and filtering. Our pipeline is based on the
OpenMS computational framework; it incorporates multiple database
search engines for peptide identification and applies a machine-learning
approach (Percolator) to post-process search results. We describe
several new and improved software tools that we developed to facilitate
proteogenomic analyses that enhance the wealth of tools provided by
OpenMS. We demonstrate the application of our pipeline to a human
testis tissue data set previously acquired for the Chromosome-Centric
Human Proteome Project, which led to the addition of five new gene
annotations on the human reference genome.
Collapse
Affiliation(s)
| | | | | | - Petra Gutenbrunner
- School of Informatics, Communications, and Media, University of Applied Sciences Upper Austria , Hagenberg 4232, Austria
| | | |
Collapse
|
47
|
Kuenzi BM, Borne AL, Li J, Haura EB, Eschrich SA, Koomen JM, Rix U, Stewart PA. APOSTL: An Interactive Galaxy Pipeline for Reproducible Analysis of Affinity Proteomics Data. J Proteome Res 2016; 15:4747-4754. [PMID: 27680298 DOI: 10.1021/acs.jproteome.6b00660] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
With continuously increasing scale and depth of coverage in affinity proteomics (AP-MS) data, the analysis and visualization is becoming more challenging. A number of tools have been developed to identify high-confidence interactions; however, a cohesive and intuitive pipeline for analysis and visualization is still needed. Here we present Automated Processing of SAINT Templated Layouts (APOSTL), a freely available Galaxy-integrated software suite and analysis pipeline for reproducible, interactive analysis of AP-MS data. APOSTL contains a number of tools woven together using Galaxy workflows, which are intuitive for the user to move from raw data to publication-quality figures within a single interface. APOSTL is an evolving software project with the potential to customize individual analyses with additional Galaxy tools and widgets using the R web application framework, Shiny. The source code, data, and documentation are freely available from GitHub ( https://github.com/bornea/APOSTL ) and other sources.
Collapse
Affiliation(s)
- Brent M Kuenzi
- Department of Drug Discovery, H. Lee Moffitt Cancer Center & Research Institute , Tampa, Florida 33612-9497, United States.,Cancer Biology Ph.D. Program, University of South Florida , Tampa, Florida 33620, United States
| | - Adam L Borne
- Department of Thoracic Oncology, H. Lee Moffitt Cancer Center & Research Institute , Tampa, Florida 33612-9497, United States
| | - Jiannong Li
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute , Tampa, Florida 33612-9497, United States
| | - Eric B Haura
- Department of Thoracic Oncology, H. Lee Moffitt Cancer Center & Research Institute , Tampa, Florida 33612-9497, United States
| | - Steven A Eschrich
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute , Tampa, Florida 33612-9497, United States
| | - John M Koomen
- Molecular Oncology, H. Lee Moffitt Cancer Center & Research Institute , Tampa, Florida 33612-9497, United States
| | - Uwe Rix
- Department of Drug Discovery, H. Lee Moffitt Cancer Center & Research Institute , Tampa, Florida 33612-9497, United States
| | - Paul A Stewart
- Department of Thoracic Oncology, H. Lee Moffitt Cancer Center & Research Institute , Tampa, Florida 33612-9497, United States
| |
Collapse
|
48
|
Abstract
Omics approaches have become popular in biology as powerful discovery tools, and currently gain in interest for diagnostic applications. Establishing the accurate genome sequence of any organism is easy, but the outcome of its annotation by means of automatic pipelines remains imprecise. Some protein-encoding genes may be missed as soon as they are specific and poorly conserved in a given taxon, while important to explain the specific traits of the organism. Translational starts are also poorly predicted in a relatively important number of cases, thus impacting the protein sequence database used in proteomics, comparative genomics, and systems biology. The use of high-throughput proteomics data to improve genome annotation is an attractive option to obtain a more comprehensive molecular picture of a given organism. Here, protocols for reannotating prokaryote genomes are described based on shotgun proteomics and derivatization of protein N-termini with a positively charged reagent coupled to high-resolution tandem mass spectrometry.
Collapse
|
49
|
Luge T, Fischer C, Sauer S. Efficient Application of De Novo RNA Assemblers for Proteomics Informed by Transcriptomics. J Proteome Res 2016; 15:3938-3943. [DOI: 10.1021/acs.jproteome.6b00301] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Toni Luge
- Otto-Warburg-Laboratory, Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Cornelius Fischer
- Otto-Warburg-Laboratory, Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
- BIMSB
and BIH Genomics Platforms, Laboratory of Functional Genomics, Nutrigenomics
and Systems Biology, Max-Delbrück-Center for Molecular Medicine, Robert-Rössle-Straße
10, 13125 Berlin, Germany
| | - Sascha Sauer
- Otto-Warburg-Laboratory, Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
- BIMSB
and BIH Genomics Platforms, Laboratory of Functional Genomics, Nutrigenomics
and Systems Biology, Max-Delbrück-Center for Molecular Medicine, Robert-Rössle-Straße
10, 13125 Berlin, Germany
- CU Systems
Medicine, University of Würzburg, Josef-Schneider-Straße 2, 97080 Würzburg, Germany
| |
Collapse
|
50
|
Li Y, Wang X, Cho JH, Shaw TI, Wu Z, Bai B, Wang H, Zhou S, Beach TG, Wu G, Zhang J, Peng J. JUMPg: An Integrative Proteogenomics Pipeline Identifying Unannotated Proteins in Human Brain and Cancer Cells. J Proteome Res 2016; 15:2309-20. [PMID: 27225868 DOI: 10.1021/acs.jproteome.6b00344] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Proteogenomics is an emerging approach to improve gene annotation and interpretation of proteomics data. Here we present JUMPg, an integrative proteogenomics pipeline including customized database construction, tag-based database search, peptide-spectrum match filtering, and data visualization. JUMPg creates multiple databases of DNA polymorphisms, mutations, splice junctions, partially trypticity, as well as protein fragments translated from the whole transcriptome in all six frames upon RNA-seq de novo assembly. We use a multistage strategy to search these databases sequentially, in which the performance is optimized by re-searching only unmatched high-quality spectra and reusing amino acid tags generated by the JUMP search engine. The identified peptides/proteins are displayed with gene loci using the UCSC genome browser. Then, the JUMPg program is applied to process a label-free mass spectrometry data set of Alzheimer's disease postmortem brain, uncovering 496 new peptides of amino acid substitutions, alternative splicing, frame shift, and "non-coding gene" translation. The novel protein PNMA6BL specifically expressed in the brain is highlighted. We also tested JUMPg to analyze a stable-isotope labeled data set of multiple myeloma cells, revealing 991 sample-specific peptides that include protein sequences in the immunoglobulin light chain variable region. Thus, the JUMPg program is an effective proteogenomics tool for multiomics data integration.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Hong Wang
- Integrated Biomedical Sciences Program, University of Tennessee Health Science Center , 920 Madison Avenue, Memphis, Tennessee 38163, United States
| | | | - Thomas G Beach
- Banner Sun Health Research Institute , Sun City, Arizona 85351, United States
| | | | | | | |
Collapse
|