1
|
Proteogenomic Analysis Provides Novel Insight into Genome Annotation and Nitrogen Metabolism in Nostoc sp. PCC 7120. Microbiol Spectr 2021; 9:e0049021. [PMID: 34523988 PMCID: PMC8557916 DOI: 10.1128/spectrum.00490-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Cyanobacteria, capable of oxygenic photosynthesis, play a vital role in nitrogen and carbon cycles. Nostoc sp. PCC 7120 (Nostoc 7120) is a model cyanobacterium commonly used to study cell differentiation and nitrogen metabolism. Although its genome was released in 2002, a high-quality genome annotation remains unavailable for this model cyanobacterium. Therefore, in this study, we performed an in-depth proteogenomic analysis based on high-resolution mass spectrometry (MS) data to refine the genome annotation of Nostoc 7120. We unambiguously identified 5,519 predicted protein-coding genes and revealed 26 novel genes, 75 revised genes, and 27 different kinds of posttranslational modifications in Nostoc 7120. A subset of these novel proteins were further validated at both the mRNA and peptide levels. Functional analysis suggested that many newly annotated proteins may participate in nitrogen or cadmium/mercury metabolism in Nostoc 7120. Moreover, we constructed an updated Nostoc 7120 database based on our proteogenomic results and presented examples of how the updated database could be used to improve the annotation of proteomic data. Our study provides the most comprehensive annotation of the Nostoc 7120 genome thus far and will serve as a valuable resource for the study of nitrogen metabolism in Nostoc 7120. IMPORTANCE Cyanobacteria are a large group of prokaryotes capable of oxygenic photosynthesis and play a vital role in nitrogen and carbon cycles on Earth. Nostoc 7120 is a commonly used model cyanobacterium for studying cell differentiation and nitrogen metabolism. In this study, we presented the first comprehensive draft map of the Nostoc 7120 proteome and a wide range of posttranslational modifications. In addition, we constructed an updated database of Nostoc 7120 based on our proteogenomic results and presented examples of how the updated database could be used for system-level studies of Nostoc 7120. Our study provides the most comprehensive annotation of Nostoc 7120 genome and a valuable resource for the study of nitrogen metabolism in this model cyanobacterium.
Collapse
|
2
|
Tariq MU, Haseeb M, Aledhari M, Razzak R, Parizi RM, Saeed F. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 9:5497-5516. [PMID: 33537181 PMCID: PMC7853650 DOI: 10.1109/access.2020.3047588] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process a small-scale dataset of one million spectra against a genome of 3 GB. In this article, we provide an up-to-date review of tools that can analyze proteogenomic datasets, providing a critical analysis of the techniques' relative merits and potential pitfalls. We also point out potential bottlenecks and recommendations that can be incorporated in the future design of these workflows to ensure scalability with the increasing size of proteogenomic data. Lastly, we make a case of how high-performance computing (HPC) solutions may be the best bet to ensure the scalability of future big data proteogenomic data analysis.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Muhammad Haseeb
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Mohammed Aledhari
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Rehma Razzak
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Reza M Parizi
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
3
|
Kuhring M, Doellinger J, Nitsche A, Muth T, Renard BY. TaxIt: An Iterative Computational Pipeline for Untargeted Strain-Level Identification Using MS/MS Spectra from Pathogenic Single-Organism Samples. J Proteome Res 2020; 19:2501-2510. [PMID: 32362126 DOI: 10.1021/acs.jproteome.9b00714] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Untargeted accurate strain-level classification of a priori unidentified organisms using tandem mass spectrometry is a challenging task. Reference databases often lack taxonomic depth, limiting peptide assignments to the species level. However, the extension with detailed strain information increases runtime and decreases statistical power. In addition, larger databases contain a higher number of similar proteomes. We present TaxIt, an iterative workflow to address the increasing search space required for MS/MS-based strain-level classification of samples with unknown taxonomic origin. TaxIt first applies reference sequence data for initial identification of species candidates, followed by automated acquisition of relevant strain sequences for low level classification. Furthermore, proteome similarities resulting in ambiguous taxonomic assignments are addressed with an abundance weighting strategy to increase the confidence in candidate taxa. For benchmarking the performance of our method, we apply our iterative workflow on several samples of bacterial and viral origin. In comparison to noniterative approaches using unique peptides or advanced abundance correction, TaxIt identifies microbial strains correctly in all examples presented (with one tie), thereby demonstrating the potential for untargeted and deeper taxonomic classification. TaxIt makes extensive use of public, unrestricted, and continuously growing sequence resources such as the NCBI databases and is available under open-source BSD license at https://gitlab.com/rki_bioinformatics/TaxIt.
Collapse
Affiliation(s)
- Mathias Kuhring
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany.,Core Unit Bioinformatics, Berlin Institute of Health (BIH), 10178 Berlin, Germany.,Berlin Institute of Health Metabolomics Platform, Berlin Institute of Health (BIH), 10178 Berlin, Germany.,Max Delbrück Center (MDC) for Molecular Medicine, 13125 Berlin, Germany
| | - Joerg Doellinger
- Centre for Biological Threats and Special Pathogens, Proteomics and Spectroscopy (ZBS 6), Robert Koch Institute, 13353 Berlin, Germany.,Centre for Biological Threats and Special Pathogens, Highly Pathogenic Viruses (ZBS 1), Robert Koch Institute, 13353 Berlin, Germany
| | - Andreas Nitsche
- Centre for Biological Threats and Special Pathogens, Highly Pathogenic Viruses (ZBS 1), Robert Koch Institute, 13353 Berlin, Germany
| | - Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany.,eScience Division (S.3), Federal Institute for Materials Research and Testing, 12489 Berlin, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany.,Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany
| |
Collapse
|
4
|
In-depth analysis of Bacillus subtilis proteome identifies new ORFs and traces the evolutionary history of modified proteins. Sci Rep 2018; 8:17246. [PMID: 30467398 PMCID: PMC6250715 DOI: 10.1038/s41598-018-35589-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Accepted: 11/07/2018] [Indexed: 01/05/2023] Open
Abstract
Bacillus subtilis is a sporulating Gram-positive bacterium widely used in basic research and biotechnology. Despite being one of the best-characterized bacterial model organism, recent proteomics studies identified only about 50% of its theoretical protein count. Here we combined several hundred MS measurements to obtain a comprehensive map of the proteome, phosphoproteome and acetylome of B. subtilis grown at 37 °C in minimal medium. We covered 75% of the theoretical proteome (3,159 proteins), detected 1,085 phosphorylation and 4,893 lysine acetylation sites and performed a systematic bioinformatic characterization of the obtained data. A subset of analyzed MS files allowed us to reconstruct a network of Hanks-type protein kinases, Ser/Thr/Tyr phosphatases and their substrates. We applied genomic phylostratigraphy to gauge the evolutionary age of B. subtilis protein classes and revealed that protein modifications were present on the oldest bacterial proteins. Finally, we performed a proteogenomic analysis by mapping all MS spectra onto a six-frame translation of B. subtilis genome and found evidence for 19 novel ORFs. We provide the most extensive overview of the proteome and post-translational modifications for B. subtilis to date, with insights into functional annotation and evolutionary aspects of the B. subtilis genome.
Collapse
|
5
|
Omasits U, Varadarajan AR, Schmid M, Goetze S, Melidis D, Bourqui M, Nikolayeva O, Québatte M, Patrignani A, Dehio C, Frey JE, Robinson MD, Wollscheid B, Ahrens CH. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Res 2017; 27:2083-2095. [PMID: 29141959 PMCID: PMC5741054 DOI: 10.1101/gr.218255.116] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 10/25/2017] [Indexed: 12/18/2022]
Abstract
Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae, Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote.
Collapse
Affiliation(s)
- Ulrich Omasits
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Adithi R Varadarajan
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland.,Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Michael Schmid
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Sandra Goetze
- Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Damianos Melidis
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Marc Bourqui
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Olga Nikolayeva
- Institute for Molecular Life Sciences & SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
| | | | - Andrea Patrignani
- Functional Genomics Center Zurich, ETH & UZH Zurich, CH-8057 Zurich, Switzerland
| | | | - Juerg E Frey
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Mark D Robinson
- Institute for Molecular Life Sciences & SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
| | - Bernd Wollscheid
- Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Christian H Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| |
Collapse
|
6
|
Chapman B, Bellgard M. Plant Proteogenomics: Improvements to the Grapevine Genome Annotation. Proteomics 2017; 17. [DOI: 10.1002/pmic.201700197] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Revised: 07/28/2017] [Indexed: 01/09/2023]
Affiliation(s)
- Brett Chapman
- Centre for Comparative Genomics; Murdoch University; Western Australia Australia
| | - Matthew Bellgard
- Centre for Comparative Genomics; Murdoch University; Western Australia Australia
| |
Collapse
|
7
|
Zhang J, Yang MK, Zeng H, Ge F. GAPP: A Proteogenomic Software for Genome Annotation and Global Profiling of Post-translational Modifications in Prokaryotes. Mol Cell Proteomics 2016; 15:3529-3539. [PMID: 27630248 DOI: 10.1074/mcp.m116.060046] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Indexed: 11/06/2022] Open
Abstract
Although the number of sequenced prokaryotic genomes is growing rapidly, experimentally verified annotation of prokaryotic genome remains patchy and challenging. To facilitate genome annotation efforts for prokaryotes, we developed an open source software called GAPP for genome annotation and global profiling of post-translational modifications (PTMs) in prokaryotes. With a single command, it provides a standard workflow to validate and refine predicted genetic models and discover diverse PTM events. We demonstrated the utility of GAPP using proteomic data from Helicobacter pylori, one of the major human pathogens that is responsible for many gastric diseases. Our results confirmed 84.9% of the existing predicted H. pylori proteins, identified 20 novel protein coding genes, and corrected four existing gene models with regard to translation initiation sites. In particular, GAPP revealed a large repertoire of PTMs using the same proteomic data and provided a rich resource that can be used to examine the functions of reversible modifications in this human pathogen. This software is a powerful tool for genome annotation and global discovery of PTMs and is applicable to any sequenced prokaryotic organism; we expect that it will become an integral part of ongoing genome annotation efforts for prokaryotes. GAPP is freely available at https://sourceforge.net/projects/gappproteogenomic/.
Collapse
Affiliation(s)
- Jia Zhang
- From the ‡Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Ming-Kun Yang
- From the ‡Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Honghui Zeng
- §Wuhan Branch, Supercomputing Center, Chinese Academy of Sciences, China
| | - Feng Ge
- From the ‡Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; .,§Wuhan Branch, Supercomputing Center, Chinese Academy of Sciences, China
| |
Collapse
|
8
|
Sajjad W, Rafiq M, Ali B, Hayat M, Zada S, Sajjad W, Kumar T. Proteogenomics: New Emerging Technology. HAYATI JOURNAL OF BIOSCIENCES 2016. [DOI: 10.1016/j.hjb.2016.11.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
9
|
Armengaud J. Next-generation proteomics faces new challenges in environmental biotechnology. Curr Opin Biotechnol 2016; 38:174-82. [DOI: 10.1016/j.copbio.2016.02.025] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
10
|
Sanchez-Lucas R, Mehta A, Valledor L, Cabello-Hurtado F, Romero-Rodrıguez MC, Simova-Stoilova L, Demir S, Rodriguez-de-Francisco LE, Maldonado-Alconada AM, Jorrin-Prieto AL, Jorrín-Novo JV. A year (2014-2015) of plants in Proteomics journal. Progress in wet and dry methodologies, moving from protein catalogs, and the view of classic plant biochemists. Proteomics 2016; 16:866-76. [PMID: 26621614 DOI: 10.1002/pmic.201500351] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 10/26/2015] [Accepted: 11/04/2015] [Indexed: 12/23/2022]
Abstract
The present review is an update of the previous one published in Proteomics 2015 Reviews special issue [Jorrin-Novo, J. V. et al., Proteomics 2015, 15, 1089-1112] covering the July 2014-2015 period. It has been written on the bases of the publications that appeared in Proteomics journal during that period and the most relevant ones that have been published in other high-impact journals. Methodological advances and the contribution of the field to the knowledge of plant biology processes and its translation to agroforestry and environmental sectors will be discussed. This review has been organized in four blocks, with a starting general introduction (literature survey) followed by sections focusing on the methodology (in vitro, in vivo, wet, and dry), proteomics integration with other approaches (systems biology and proteogenomics), biological information, and knowledge (cell communication, receptors, and signaling), ending with a brief mention of some other biological and translational topics to which proteomics has made some contribution.
Collapse
Affiliation(s)
- Rosa Sanchez-Lucas
- Agroforestry and Plant Biochemistry and Proteomics Research Group, Department of Biochemistry and Molecular Biology, University of Córdoba-CeiA3, Córdoba, Spain
| | - Angela Mehta
- Embrapa Recursos Genéticos e Biotecnologia (CENARGEN), Brasília, DF, Brazil
| | - Luis Valledor
- Department of Biology of Organisms and Systems (BOS), University of Oviedo, Oviedo, Spain
| | | | - M Cristina Romero-Rodrıguez
- Centro Multidisciplinario de Investigaciones Tecnológicas, and Departamento de Fitoquímica, Facultad de Ciencias Químicas, Universidad Nacional de Asunción, San Lorenzo, Paraguay
| | - Lyudmila Simova-Stoilova
- Plant Molecular Biology Department, Institute of Plant Physiology and Genetics, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | - Sekvan Demir
- Agroforestry and Plant Biochemistry and Proteomics Research Group, Department of Biochemistry and Molecular Biology, University of Córdoba-CeiA3, Córdoba, Spain
| | - Luis E Rodriguez-de-Francisco
- Agroforestry and Plant Biochemistry and Proteomics Research Group, Department of Biochemistry and Molecular Biology, University of Córdoba-CeiA3, Córdoba, Spain.,INTEC-Sto. Domingo, Santo Domingo, República Dominicana
| | - Ana M Maldonado-Alconada
- Agroforestry and Plant Biochemistry and Proteomics Research Group, Department of Biochemistry and Molecular Biology, University of Córdoba-CeiA3, Córdoba, Spain
| | - Ana L Jorrin-Prieto
- Agroforestry and Plant Biochemistry and Proteomics Research Group, Department of Biochemistry and Molecular Biology, University of Córdoba-CeiA3, Córdoba, Spain
| | - Jesus V Jorrín-Novo
- Agroforestry and Plant Biochemistry and Proteomics Research Group, Department of Biochemistry and Molecular Biology, University of Córdoba-CeiA3, Córdoba, Spain
| |
Collapse
|
11
|
Kumar D, Mondal AK, Kutum R, Dash D. Proteogenomics of rare taxonomic phyla: A prospective treasure trove of protein coding genes. Proteomics 2015; 16:226-40. [PMID: 26773550 DOI: 10.1002/pmic.201500263] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 09/18/2015] [Accepted: 09/28/2015] [Indexed: 01/04/2023]
Abstract
Sustainable innovations in sequencing technologies have resulted in a torrent of microbial genome sequencing projects. However, the prokaryotic genomes sequenced so far are unequally distributed along their phylogenetic tree; few phyla contain the majority, the rest only a few representatives. Accurate genome annotation lags far behind genome sequencing. While automated computational prediction, aided by comparative genomics, remains a popular choice for genome annotation, substantial fraction of these annotations are erroneous. Proteogenomics utilizes protein level experimental observations to annotate protein coding genes on a genome wide scale. Benefits of proteogenomics include discovery and correction of gene annotations regardless of their phylogenetic conservation. This not only allows detection of common, conserved proteins but also the discovery of protein products of rare genes that may be horizontally transferred or taxonomy specific. Chances of encountering such genes are more in rare phyla that comprise a small number of complete genome sequences. We collated all bacterial and archaeal proteogenomic studies carried out to date and reviewed them in the context of genome sequencing projects. Here, we present a comprehensive list of microbial proteogenomic studies, their taxonomic distribution, and also urge for targeted proteogenomics of underexplored taxa to build an extensive reference of protein coding genes.
Collapse
Affiliation(s)
- Dhirendra Kumar
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| | - Anupam Kumar Mondal
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| | - Rintu Kutum
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| | - Debasis Dash
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| |
Collapse
|
12
|
Jagtap PD, Blakely A, Murray K, Stewart S, Kooren J, Johnson JE, Rhodus NL, Rudney J, Griffin TJ. Metaproteomic analysis using the Galaxy framework. Proteomics 2015; 15:3553-65. [DOI: 10.1002/pmic.201500074] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2015] [Revised: 04/25/2015] [Accepted: 06/04/2015] [Indexed: 12/22/2022]
Affiliation(s)
- Pratik D. Jagtap
- Center for Mass Spectrometry and Proteomics; University of Minnesota; Minneapolis MN USA
- Department of Biochemistry; Molecular Biology and Biophysics; University of Minnesota; Minneapolis MN USA
| | | | - Kevin Murray
- Department of Biochemistry; Molecular Biology and Biophysics; University of Minnesota; Minneapolis MN USA
| | | | - Joel Kooren
- Department of Biochemistry; Molecular Biology and Biophysics; University of Minnesota; Minneapolis MN USA
| | | | - Nelson L. Rhodus
- School of Dentistry; University of Minnesota; Minneapolis MN USA
| | - Joel Rudney
- School of Dentistry; University of Minnesota; Minneapolis MN USA
| | - Timothy J. Griffin
- Center for Mass Spectrometry and Proteomics; University of Minnesota; Minneapolis MN USA
- Department of Biochemistry; Molecular Biology and Biophysics; University of Minnesota; Minneapolis MN USA
| |
Collapse
|
13
|
Muth T, Kolmeder CA, Salojärvi J, Keskitalo S, Varjosalo M, Verdam FJ, Rensen SS, Reichl U, de Vos WM, Rapp E, Martens L. Navigating through metaproteomics data: a logbook of database searching. Proteomics 2015; 15:3439-53. [PMID: 25778831 DOI: 10.1002/pmic.201400560] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Revised: 02/13/2015] [Accepted: 03/06/2015] [Indexed: 11/12/2022]
Abstract
Metaproteomic research involves various computational challenges during the identification of fragmentation spectra acquired from the proteome of a complex microbiome. These issues are manifold and range from the construction of customized sequence databases, the optimal setting of search parameters to limitations in the identification search algorithms themselves. In order to assess the importance of these individual factors, we studied the effect of strategies to combine different search algorithms, explored the influence of chosen database search settings, and investigated the impact of the size of the protein sequence database used for identification. Furthermore, we applied de novo sequencing as a complementary approach to classic database searching. All evaluations were performed on a human intestinal metaproteome dataset. Pyrococcus furiosus proteome data were used to contrast database searching of metaproteomic data to a classic proteomic experiment. Searching against subsets of metaproteome databases and the use of multiple search engines increased the number of identifications. The integration of P. furiosus sequences in a metaproteomic sequence database showcased the limitation of the target-decoy-controlled false discovery rate approach in combination with large sequence databases. The selection of varying search engine parameters and the application of de novo sequencing represented useful methods to increase the reliability of the results. Based on our findings, we provide recommendations for the data analysis that help researchers to establish or improve analysis workflows in metaproteomics.
Collapse
Affiliation(s)
- Thilo Muth
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Carolin A Kolmeder
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
| | - Jarkko Salojärvi
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
| | - Salla Keskitalo
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Markku Varjosalo
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Froukje J Verdam
- Department of General Surgery, NUTRIM, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Sander S Rensen
- Department of General Surgery, NUTRIM, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Udo Reichl
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany.,Otto-von-Guericke University, Bioprocess Engineering, Magdeburg, Germany
| | - Willem M de Vos
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland.,Department of Bacteriology and Immunology, University of Helsinki, Helsinki, Finland.,Laboratory of Microbiology, Wageningen University, Wageningen, The Netherlands
| | - Erdmann Rapp
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Lennart Martens
- Department of Biochemistry, Ghent University, Ghent, Belgium.,Department of Medical Protein Research, VIB, Ghent, Belgium
| |
Collapse
|
14
|
Faulkner S, Dun MD, Hondermarck H. Proteogenomics: emergence and promise. Cell Mol Life Sci 2015; 72:953-7. [PMID: 25609363 PMCID: PMC11113406 DOI: 10.1007/s00018-015-1837-y] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Revised: 01/08/2015] [Accepted: 01/12/2015] [Indexed: 12/14/2022]
Abstract
Proteogenomics, or the integration of proteomics with genomics and transcriptomics, is emerging as the next step towards a unified understanding of cellular functions. Looking globally and simultaneously at gene structure, RNA expression, protein synthesis and post-translational modifications have become technically feasible and offer a new perspective to molecular processes. Recent publications have highlighted the value of proteogenomics in oncology for defining the molecular signature of human tumors, and translation to other areas of biomedicine and life sciences is anticipated. This mini-review will discuss recent developments, challenges and perspectives in proteogenomics.
Collapse
Affiliation(s)
- Sam Faulkner
- Faculty of Health and Medicine, School of Biomedical Sciences and Pharmacy and Hunter Medical Research Institute, Life Science Building, University of Newcastle, Callaghan, NSW 2308 Australia
| | - Matthew D. Dun
- Faculty of Health and Medicine, School of Biomedical Sciences and Pharmacy and Hunter Medical Research Institute, Life Science Building, University of Newcastle, Callaghan, NSW 2308 Australia
| | - Hubert Hondermarck
- Faculty of Health and Medicine, School of Biomedical Sciences and Pharmacy and Hunter Medical Research Institute, Life Science Building, University of Newcastle, Callaghan, NSW 2308 Australia
| |
Collapse
|