Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Blakeley P, Overton IM, Hubbard SJ. Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies. J Proteome Res 2012;11:5221-34. [PMID: 23025403 PMCID: PMC3703792 DOI: 10.1021/pr300411q] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

For:	Blakeley P, Overton IM, Hubbard SJ. Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies. J Proteome Res 2012;11:5221-34. [PMID: 23025403 PMCID: PMC3703792 DOI: 10.1021/pr300411q] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Number

Cited by Other Article(s)

Ariffin N, Newman DW, Nelson MG, O’cualain R, Hubbard SJ. Proteogenomic Gene Structure Validation in the Pineapple Genome. J Proteome Res 2024;23:1583-1592. [PMID: 38651221 PMCID: PMC11077482 DOI: 10.1021/acs.jproteome.3c00675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/15/2024] [Accepted: 04/12/2024] [Indexed: 04/25/2024]

Provencher N, Leblanc S, Jacques JF, Roucou X. Exploring the Alternative Proteome with OpenProt and Mass Spectrometry. Methods Mol Biol 2024;2836:3-17. [PMID: 38995532 DOI: 10.1007/978-1-0716-4007-4_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]

Oreper D, Klaeger S, Jhunjhunwala S, Delamarre L. The peptide woods are lovely, dark and deep: Hunting for novel cancer antigens. Semin Immunol 2023;67:101758. [PMID: 37027981 DOI: 10.1016/j.smim.2023.101758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/22/2023] [Accepted: 03/22/2023] [Indexed: 04/08/2023]

Malekos E, Carpenter S. Short open reading frame genes in innate immunity: from discovery to characterization. Trends Immunol 2022;43:741-756. [PMID: 35965152 PMCID: PMC10118063 DOI: 10.1016/j.it.2022.07.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 07/11/2022] [Accepted: 07/13/2022] [Indexed: 12/27/2022]

Fancello L, Burger T. An analysis of proteogenomics and how and when transcriptome-informed reduction of protein databases can enhance eukaryotic proteomics. Genome Biol 2022;23:132. [PMID: 35725496 PMCID: PMC9208142 DOI: 10.1186/s13059-022-02701-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 06/09/2022] [Indexed: 12/03/2022] Open

Abstract

Background

Proteogenomics aims to identify variant or unknown proteins in bottom-up proteomics, by searching transcriptome- or genome-derived custom protein databases. However, empirical observations reveal that these large proteogenomic databases produce lower-sensitivity peptide identifications. Various strategies have been proposed to avoid this, including the generation of reduced transcriptome-informed protein databases, which only contain proteins whose transcripts are detected in the sample-matched transcriptome. These were found to increase peptide identification sensitivity. Here, we present a detailed evaluation of this approach.

Results

We establish that the increased sensitivity in peptide identification is in fact a statistical artifact, directly resulting from the limited capability of target-decoy competition to accurately model incorrect target matches when using excessively small databases. As anti-conservative false discovery rates (FDRs) are likely to hamper the robustness of the resulting biological conclusions, we advocate for alternative FDR control methods that are less sensitive to database size. Nevertheless, reduced transcriptome-informed databases are useful, as they reduce the ambiguity of protein identifications, yielding fewer shared peptides. Furthermore, searching the reference database and subsequently filtering proteins whose transcripts are not expressed reduces protein identification ambiguity to a similar extent, but is more transparent and reproducible.

Conclusions

In summary, using transcriptome information is an interesting strategy that has not been promoted for the right reasons. While the increase in peptide identifications from searching reduced transcriptome-informed databases is an artifact caused by the use of an FDR control method unsuitable to excessively small databases, transcriptome information can reduce the ambiguity of protein identifications.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13059-022-02701-2.

Collapse

Tay AP, Hamey JJ, Martyn GE, Wilson LOW, Wilkins MR. Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing. J Proteome Res 2022;21:1628-1639. [PMID: 35612954 DOI: 10.1021/acs.jproteome.1c00968] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Aggarwal S, Raj A, Kumar D, Dash D, Yadav AK. False discovery rate: the Achilles' heel of proteogenomics. Brief Bioinform 2022;23:6582880. [PMID: 35534181 DOI: 10.1093/bib/bbac163] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 03/14/2022] [Accepted: 04/12/2022] [Indexed: 12/25/2022] Open

Zhu H, Jiang S, Zhou W, Chi H, Sun J, Shi J, Zhang Z, Chang L, Yu L, Zhang L, Lyu Z, Xu P, Zhang Y. Ac-LysargiNase efficiently helps genome reannotation of Mycolicibacterium smegmatis MC2 155. J Proteomics 2022;264:104622. [DOI: 10.1016/j.jprot.2022.104622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 05/10/2022] [Accepted: 05/16/2022] [Indexed: 10/18/2022]

IntroSpect: Motif-Guided Immunopeptidome Database Building Tool to Improve the Sensitivity of HLA I Binding Peptide Identification by Mass Spectrometry. Biomolecules 2022;12:biom12040579. [PMID: 35454168 PMCID: PMC9025654 DOI: 10.3390/biom12040579] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 04/11/2022] [Accepted: 04/12/2022] [Indexed: 01/02/2023] Open

Ahrens CH, Wade JT, Champion MM, Langer JD. A Practical Guide to Small Protein Discovery and Characterization Using Mass Spectrometry. J Bacteriol 2022;204:e0035321. [PMID: 34748388 PMCID: PMC8765459 DOI: 10.1128/jb.00353-21] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Salz R, Bouwmeester R, Gabriels R, Degroeve S, Martens L, Volders PJ, 't Hoen PAC. Personalized Proteome: Comparing Proteogenomics and Open Variant Search Approaches for Single Amino Acid Variant Detection. J Proteome Res 2021;20:3353-3364. [PMID: 33998808 PMCID: PMC8280751 DOI: 10.1021/acs.jproteome.1c00264] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Indexed: 12/30/2022]

Verbruggen S, Gessulat S, Gabriels R, Matsaroki A, Van de Voorde H, Kuster B, Degroeve S, Martens L, Van Criekinge W, Wilhelm M, Menschaert G. Spectral Prediction Features as a Solution for the Search Space Size Problem in Proteogenomics. Mol Cell Proteomics 2021;20:100076. [PMID: 33823297 PMCID: PMC8214147 DOI: 10.1016/j.mcpro.2021.100076] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 03/04/2021] [Accepted: 03/25/2021] [Indexed: 11/17/2022] Open

Ruiz Cuevas MV, Hardy MP, Hollý J, Bonneil É, Durette C, Courcelles M, Lanoix J, Côté C, Staudt LM, Lemieux S, Thibault P, Perreault C, Yewdell JW. Most non-canonical proteins uniquely populate the proteome or immunopeptidome. Cell Rep 2021;34:108815. [PMID: 33691108 PMCID: PMC8040094 DOI: 10.1016/j.celrep.2021.108815] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 01/29/2021] [Accepted: 02/10/2021] [Indexed: 12/16/2022] Open

Affiliation(s)

Maria Virginia Ruiz Cuevas Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada; Department of Biochemistry and Molecular Medicine, Université de Montréal, Montreal, QC H3C 3J7, Canada
Marie-Pierre Hardy Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada
Jaroslav Hollý Cellular Biology Section, Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
Éric Bonneil Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada
Chantal Durette Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada
Mathieu Courcelles Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada
Joël Lanoix Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada
Caroline Côté Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada
Louis M Staudt Lymphoid Malignancies Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
Sébastien Lemieux Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada; Department of Biochemistry and Molecular Medicine, Université de Montréal, Montreal, QC H3C 3J7, Canada
Pierre Thibault Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada; Department of Chemistry, Université de Montréal, Montreal, QC H3C 3J7, Canada
Claude Perreault Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montreal, QC H3C 3J7, Canada; Department of Medicine, Université de Montréal, Montreal, QC H3C 3J7, Canada.
Jonathan W Yewdell Cellular Biology Section, Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA.

Collapse

Petruschke H, Schori C, Canzler S, Riesbeck S, Poehlein A, Daniel R, Frei D, Segessemann T, Zimmerman J, Marinos G, Kaleta C, Jehmlich N, Ahrens CH, von Bergen M. Discovery of novel community-relevant small proteins in a simplified human intestinal microbiome. MICROBIOME 2021;9:55. [PMID: 33622394 PMCID: PMC7903761 DOI: 10.1186/s40168-020-00981-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 12/16/2020] [Indexed: 05/13/2023]

Abstract

BACKGROUND

The intestinal microbiota plays a crucial role in protecting the host from pathogenic microbes, modulating immunity and regulating metabolic processes. We studied the simplified human intestinal microbiota (SIHUMIx) consisting of eight bacterial species with a particular focus on the discovery of novel small proteins with less than 100 amino acids (= sProteins), some of which may contribute to shape the simplified human intestinal microbiota. Although sProteins carry out a wide range of important functions, they are still often missed in genome annotations, and little is known about their structure and function in individual microbes and especially in microbial communities.

RESULTS

We created a multi-species integrated proteogenomics search database (iPtgxDB) to enable a comprehensive identification of novel sProteins. Six of the eight SIHUMIx species, for which no complete genomes were available, were sequenced and de novo assembled. Several proteomics approaches including two earlier optimized sProtein enrichment strategies were applied to specifically increase the chances for novel sProtein discovery. The search of tandem mass spectrometry (MS/MS) data against the multi-species iPtgxDB enabled the identification of 31 novel sProteins, of which the expression of 30 was supported by metatranscriptomics data. Using synthetic peptides, we were able to validate the expression of 25 novel sProteins. The comparison of sProtein expression in each single strain versus a multi-species community cultivation showed that six of these sProteins were only identified in the SIHUMIx community indicating a potentially important role of sProteins in the organization of microbial communities. Two of these novel sProteins have a potential antimicrobial function. Metabolic modelling revealed that a third sProtein is located in a genomic region encoding several enzymes relevant for the community metabolism within SIHUMIx.

CONCLUSIONS

We outline an integrated experimental and bioinformatics workflow for the discovery of novel sProteins in a simplified intestinal model system that can be generically applied to other microbial communities. The further analysis of novel sProteins uniquely expressed in the SIHUMIx multi-species community is expected to enable new insights into the role of sProteins on the functionality of bacterial communities such as those of the human intestinal tract. Video abstract.

Collapse

Affiliation(s)

Hannes Petruschke Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
Christian Schori Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
Sebastian Canzler Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
Sarah Riesbeck Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
Anja Poehlein Institute of Microbiology and Genetics, Department of Genomic and Applied Microbiology, Georg-August University of Göttingen, Göttingen, Germany
Rolf Daniel Institute of Microbiology and Genetics, Department of Genomic and Applied Microbiology, Georg-August University of Göttingen, Göttingen, Germany
Daniel Frei Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
Tina Segessemann Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
Johannes Zimmerman Research Group Medical Systems Biology, Institute for Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
Georgios Marinos Research Group Medical Systems Biology, Institute for Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
Christoph Kaleta Research Group Medical Systems Biology, Institute for Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
Nico Jehmlich Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
Christian H Ahrens Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland.
Martin von Bergen Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany. Institute of Biochemistry, Faculty of Biosciences, Pharmacy and Psychology, University of Leipzig, Leipzig, Germany.

Collapse

Schiebenhoefer H, Schallert K, Renard BY, Trappe K, Schmid E, Benndorf D, Riedel K, Muth T, Fuchs S. A complete and flexible workflow for metaproteomics data analysis based on MetaProteomeAnalyzer and Prophane. Nat Protoc 2020;15:3212-3239. [PMID: 32859984 DOI: 10.1038/s41596-020-0368-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 05/29/2020] [Indexed: 12/14/2022]

Abstract

Metaproteomics, the study of the collective protein composition of multi-organism systems, provides deep insights into the biodiversity of microbial communities and the complex functional interplay between microbes and their hosts or environment. Thus, metaproteomics has become an indispensable tool in various fields such as microbiology and related medical applications. The computational challenges in the analysis of corresponding datasets differ from those of pure-culture proteomics, e.g., due to the higher complexity of the samples and the larger reference databases demanding specific computing pipelines. Corresponding data analyses usually consist of numerous manual steps that must be closely synchronized. With MetaProteomeAnalyzer and Prophane, we have established two open-source software solutions specifically developed and optimized for metaproteomics. Among other features, peptide-spectrum matching is improved by combining different search engines and, compared to similar tools, metaproteome annotation benefits from the most comprehensive set of available databases (such as NCBI, UniProt, EggNOG, PFAM, and CAZy). The workflow described in this protocol combines both tools and leads the user through the entire data analysis process, including protein database creation, database search, protein grouping and annotation, and results visualization. To the best of our knowledge, this protocol presents the most comprehensive, detailed and flexible guide to metaproteomics data analysis to date. While beginners are provided with robust, easy-to-use, state-of-the-art data analysis in a reasonable time (a few hours, depending on, among other factors, the protein database size and the number of identified peptides and inferred proteins), advanced users benefit from the flexibility and adaptability of the workflow.

Collapse

Bouwmeester R, Gabriels R, Van Den Bossche T, Martens L, Degroeve S. The Age of Data-Driven Proteomics: How Machine Learning Enables Novel Workflows. Proteomics 2020;20:e1900351. [PMID: 32267083 DOI: 10.1002/pmic.201900351] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 03/21/2020] [Indexed: 12/30/2022]

Zhang Z, Zhang S, Li X, Zhao Z, Chen C, Zhang J, Li M, Wei Z, Jiang W, Pan B, Li Y, Liu Y, Cao Y, Zhao W, Gu Y, Yu Y, Meng Q, Qi L. Reference genome and annotation updates lead to contradictory prognostic predictions in gene expression signatures: a case study of resected stage I lung adenocarcinoma. Brief Bioinform 2020;22:5834482. [PMID: 32383445 DOI: 10.1093/bib/bbaa081] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 04/02/2020] [Accepted: 04/18/2020] [Indexed: 12/28/2022] Open

Chong C, Müller M, Pak H, Harnett D, Huber F, Grun D, Leleu M, Auger A, Arnaud M, Stevenson BJ, Michaux J, Bilic I, Hirsekorn A, Calviello L, Simó-Riudalbas L, Planet E, Lubiński J, Bryśkiewicz M, Wiznerowicz M, Xenarios I, Zhang L, Trono D, Harari A, Ohler U, Coukos G, Bassani-Sternberg M. Integrated proteogenomic deep sequencing and analytics accurately identify non-canonical peptides in tumor immunopeptidomes. Nat Commun 2020;11:1293. [PMID: 32157095 PMCID: PMC7064602 DOI: 10.1038/s41467-020-14968-9] [Citation(s) in RCA: 173] [Impact Index Per Article: 43.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 02/12/2020] [Indexed: 12/20/2022] Open

Affiliation(s)

Chloe Chong Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
Markus Müller Vital IT, Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Amphipôle, 1015, Lausanne, Switzerland
HuiSong Pak Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
Dermot Harnett Max Delbrück Centre for Molecular Medicine in the Helmholtz Association, Institute for Medical Systems Biology, Hannoversche Straße 28, 10115, Berlin, Germany
Florian Huber Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
Delphine Grun School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Route Cantonale, 1015, Lausanne, Switzerland
Marion Leleu School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Route Cantonale, 1015, Lausanne, Switzerland Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Amphipôle, 1015, Lausanne, Switzerland
Aymeric Auger Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
Marion Arnaud Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
Brian J Stevenson Vital IT, Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Amphipôle, 1015, Lausanne, Switzerland
Justine Michaux Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
Ilija Bilic Max Delbrück Centre for Molecular Medicine in the Helmholtz Association, Institute for Medical Systems Biology, Hannoversche Straße 28, 10115, Berlin, Germany
Antje Hirsekorn Max Delbrück Centre for Molecular Medicine in the Helmholtz Association, Institute for Medical Systems Biology, Hannoversche Straße 28, 10115, Berlin, Germany
Lorenzo Calviello Max Delbrück Centre for Molecular Medicine in the Helmholtz Association, Institute for Medical Systems Biology, Hannoversche Straße 28, 10115, Berlin, Germany
Laia Simó-Riudalbas School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Route Cantonale, 1015, Lausanne, Switzerland
Evarist Planet School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Route Cantonale, 1015, Lausanne, Switzerland
Jan Lubiński Department of Genetics and Pathology, International Hereditary Cancer Center, Pomeranian Medical University, ul. Rybacka 1, 70-204, Szczecin, Poland International Institute for Molecular Oncology, Jakuba Krauthofera 23, 60-203, Poznań, Poland
Marta Bryśkiewicz Department of Genetics and Pathology, International Hereditary Cancer Center, Pomeranian Medical University, ul. Rybacka 1, 70-204, Szczecin, Poland International Institute for Molecular Oncology, Jakuba Krauthofera 23, 60-203, Poznań, Poland
Maciej Wiznerowicz International Institute for Molecular Oncology, Jakuba Krauthofera 23, 60-203, Poznań, Poland Poznan University of Medical Sciences, Fredry 10, 61-701, Poznań, Poland
Ioannis Xenarios Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland Genome Center Health 2030, Chemin de Mines 9, 1202, Genève, Switzerland Department of Training and Research, CHUV/UNIL Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland
Lin Zhang Center for Research on Reproduction and Women's Health, University of Pennsylvania, 421 Curie Boulevard, Philadelphia, PA, 19104, USA Department of Obstetrics and Gynecology, University of Pennsylvania, 3400 Civic Center Boulevard, Philadelphia, PA, 19104, USA
Didier Trono School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Route Cantonale, 1015, Lausanne, Switzerland
Alexandre Harari Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
Uwe Ohler Max Delbrück Centre for Molecular Medicine in the Helmholtz Association, Institute for Medical Systems Biology, Hannoversche Straße 28, 10115, Berlin, Germany Departments of Biology and Computer Science, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
George Coukos Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
Michal Bassani-Sternberg Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland. Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland.

Collapse

Developing Well-Annotated Species-Specific Protein Databases Using Comparative Proteogenomics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019;1140:389-400. [PMID: 31347060 DOI: 10.1007/978-3-030-15950-4_22] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Machado KCT, Fortuin S, Tomazella GG, Fonseca AF, Warren RM, Wiker HG, de Souza SJ, de Souza GA. On the Impact of the Pangenome and Annotation Discrepancies While Building Protein Sequence Databases for Bacteria Proteogenomics. Front Microbiol 2019;10:1410. [PMID: 31281302 PMCID: PMC6596428 DOI: 10.3389/fmicb.2019.01410] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 06/05/2019] [Indexed: 01/19/2023] Open

Weldatsadik R, Datta N, Kolmeder C, Vuopio J, Kere J, Wilkman S, Flatt J, Vuento R, Haapasalo K, Keskitalo S, Varjosalo M, Jokiranta T. Pool-seq driven proteogenomic database for Group G Streptococcus. J Proteomics 2019;201:84-92. [DOI: 10.1016/j.jprot.2019.04.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 03/29/2019] [Accepted: 04/17/2019] [Indexed: 02/07/2023]

Schiebenhoefer H, Van Den Bossche T, Fuchs S, Renard BY, Muth T, Martens L. Challenges and promise at the interface of metaproteomics and genomics: an overview of recent progress in metaproteogenomic data analysis. Expert Rev Proteomics 2019;16:375-390. [PMID: 31002542 DOI: 10.1080/14789450.2019.1609944] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Omasits U, Varadarajan AR, Schmid M, Goetze S, Melidis D, Bourqui M, Nikolayeva O, Québatte M, Patrignani A, Dehio C, Frey JE, Robinson MD, Wollscheid B, Ahrens CH. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Res 2017;27:2083-2095. [PMID: 29141959 PMCID: PMC5741054 DOI: 10.1101/gr.218255.116] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 10/25/2017] [Indexed: 12/18/2022]

Abstract

Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae, Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote.

Collapse

Affiliation(s)

Ulrich Omasits Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
Adithi R Varadarajan Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland.,Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
Michael Schmid Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
Sandra Goetze Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
Damianos Melidis Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
Marc Bourqui Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
Olga Nikolayeva Institute for Molecular Life Sciences & SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
Maxime Québatte Biozentrum, University of Basel, CH-4056 Basel, Switzerland
Andrea Patrignani Functional Genomics Center Zurich, ETH & UZH Zurich, CH-8057 Zurich, Switzerland
Christoph Dehio Biozentrum, University of Basel, CH-4056 Basel, Switzerland
Juerg E Frey Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
Mark D Robinson Institute for Molecular Life Sciences & SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
Bernd Wollscheid Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
Christian H Ahrens Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland

Collapse

Heunis T, Dippenaar A, Warren RM, van Helden PD, van der Merwe RG, Gey van Pittius NC, Pain A, Sampson SL, Tabb DL. Proteogenomic Investigation of Strain Variation in Clinical Mycobacterium tuberculosis Isolates. J Proteome Res 2017;16:3841-3851. [PMID: 28820946 DOI: 10.1021/acs.jproteome.7b00483] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

Affiliation(s)

Tiaan Heunis DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
Anzaan Dippenaar DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
Robin M Warren DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
Paul D van Helden DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
Ruben G van der Merwe DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
Nicolaas C Gey van Pittius DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
Arnab Pain Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology , Thuwal 23955, Saudi Arabia
Samantha L Sampson DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
David L Tabb DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa

Collapse

Wingo TS, Duong DM, Zhou M, Dammer EB, Wu H, Cutler DJ, Lah JJ, Levey AI, Seyfried NT. Integrating Next-Generation Genomic Sequencing and Mass Spectrometry To Estimate Allele-Specific Protein Abundance in Human Brain. J Proteome Res 2017;16:3336-3347. [PMID: 28691493 DOI: 10.1021/acs.jproteome.7b00324] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Kroll JE, da Silva VL, de Souza SJ, de Souza GA. A tool for integrating genetic and mass spectrometry-based peptide data: Proteogenomics Viewer. Bioessays 2017;39. [DOI: 10.1002/bies.201700015] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Li H, Park J, Kim H, Hwang KB, Paek E. Systematic Comparison of False-Discovery-Rate-Controlling Strategies for Proteogenomic Search Using Spike-in Experiments. J Proteome Res 2017;16:2231-2239. [PMID: 28452485 DOI: 10.1021/acs.jproteome.7b00033] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Willems P, Ndah E, Jonckheere V, Stael S, Sticker A, Martens L, Van Breusegem F, Gevaert K, Van Damme P. N-terminal Proteomics Assisted Profiling of the Unexplored Translation Initiation Landscape in Arabidopsis thaliana. Mol Cell Proteomics 2017;16:1064-1080. [PMID: 28432195 PMCID: PMC5461538 DOI: 10.1074/mcp.m116.066662] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Revised: 04/11/2017] [Indexed: 01/05/2023] Open

Affiliation(s)

Patrick Willems From the ‡VIB/UGent Center for Plant Systems Biology, 9052 Ghent, Belgium.,§Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent.,¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium
Elvis Ndah ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium.,**Ghent University, Department of Mathematical Modeling, Statistics and Bioinformatics, 9000 Ghent, Belgium
Veronique Jonckheere ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium
Simon Stael From the ‡VIB/UGent Center for Plant Systems Biology, 9052 Ghent, Belgium.,§Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent.,¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium
Adriaan Sticker ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium.,**Ghent University, Department of Mathematical Modeling, Statistics and Bioinformatics, 9000 Ghent, Belgium
Lennart Martens ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium.,**Ghent University, Department of Mathematical Modeling, Statistics and Bioinformatics, 9000 Ghent, Belgium
Frank Van Breusegem From the ‡VIB/UGent Center for Plant Systems Biology, 9052 Ghent, Belgium.,§Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent
Kris Gevaert ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium
Petra Van Damme ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium; .,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium

Collapse

O'Neill JR, Pak HS, Pairo-Castineira E, Save V, Paterson-Brown S, Nenutil R, Vojtěšek B, Overton I, Scherl A, Hupp TR. Quantitative Shotgun Proteomics Unveils Candidate Novel Esophageal Adenocarcinoma (EAC)-specific Proteins. Mol Cell Proteomics 2017;16:1138-1150. [PMID: 28336725 PMCID: PMC5461543 DOI: 10.1074/mcp.m116.065078] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Revised: 02/26/2017] [Indexed: 12/11/2022] Open

Li H, Joh YS, Kim H, Paek E, Lee SW, Hwang KB. Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification. BMC Genomics 2016;17:1031. [PMID: 28155652 PMCID: PMC5259817 DOI: 10.1186/s12864-016-3327-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Abstract

Background

Proteogenomics is a promising approach for various tasks ranging from gene annotation to cancer research. Databases for proteogenomic searches are often constructed by adding peptide sequences inferred from genomic or transcriptomic evidence to reference protein sequences. Such inflation of databases has potential of identifying novel peptides. However, it also raises concerns on sensitive and reliable peptide identification. Spurious peptides included in target databases may result in underestimated false discovery rate (FDR). On the other hand, inflation of decoy databases could decrease the sensitivity of peptide identification due to the increased number of high-scoring random hits. Although several studies have addressed these issues, widely applicable guidelines for sensitive and reliable proteogenomic search have hardly been available.

Results

To systematically evaluate the effect of database inflation in proteogenomic searches, we constructed a variety of real and simulated proteogenomic databases for yeast and human tandem mass spectrometry (MS/MS) data, respectively. Against these databases, we tested two popular database search tools with various approaches to search result validation: the target-decoy search strategy (with and without a refined scoring-metric) and a mixture model-based method. The effect of separate filtering of known and novel peptides was also examined. The results from real and simulated proteogenomic searches confirmed that separate filtering increases the sensitivity and reliability in proteogenomic search. However, no one method consistently identified the largest (or the smallest) number of novel peptides from real proteogenomic searches.

Conclusions

We propose to use a set of search result validation methods with separate filtering, for sensitive and reliable identification of peptides in proteogenomic search.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-016-3327-5) contains supplementary material, which is available to authorized users.

Collapse

Reannotation of Genomes by Means of Proteomics Data. Methods Enzymol 2016;585:201-216. [PMID: 28109430 DOI: 10.1016/bs.mie.2016.09.019] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Muth T, Renard BY, Martens L. Metaproteomic data analysis at a glance: advances in computational microbial community proteomics. Expert Rev Proteomics 2016;13:757-69. [DOI: 10.1080/14789450.2016.1209418] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Wen B, Xu S, Zhou R, Zhang B, Wang X, Liu X, Xu X, Liu S. PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq. BMC Bioinformatics 2016;17:244. [PMID: 27316337 PMCID: PMC4912784 DOI: 10.1186/s12859-016-1133-3] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2015] [Accepted: 06/09/2016] [Indexed: 11/27/2022] Open

Abstract

Background

Peptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that are absent from a reference database. A customized protein database on the basis of RNA-Seq data is thus proposed to assist with and improve the identification of novel peptides. Correspondingly, development of a comprehensive pipeline, which provides an end-to-end solution for novel peptide detection with the customized protein database, is necessary.

Results

A pipeline with an R package, assigned as a PGA utility, was developed that enables automated treatment to the tandem mass spectrometry (MS/MS) data acquired from different MS platforms and construction of customized protein databases based on RNA-Seq data with or without a reference genome guide. Hence, PGA can identify novel peptides and generate an HTML-based report with a visualized interface. On the basis of a published dataset, PGA was employed to identify peptides, resulting in 636 novel peptides, including 510 single amino acid polymorphism (SAP) peptides, 2 INDEL peptides, 49 splice junction peptides, and 75 novel transcript-derived peptides. The software is freely available from http://bioconductor.org/packages/PGA/, and the example reports are available at http://wenbostar.github.io/PGA/.

Conclusions

The pipeline of PGA, aimed at being platform-independent and easy-to-use, was successfully developed and shown to be capable of identifying novel peptides by searching the customized protein database derived from RNA-Seq data.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1133-3) contains supplementary material, which is available to authorized users.

Collapse

Li Y, Wang X, Cho JH, Shaw TI, Wu Z, Bai B, Wang H, Zhou S, Beach TG, Wu G, Zhang J, Peng J. JUMPg: An Integrative Proteogenomics Pipeline Identifying Unannotated Proteins in Human Brain and Cancer Cells. J Proteome Res 2016;15:2309-20. [PMID: 27225868 DOI: 10.1021/acs.jproteome.6b00344] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]

Sheynkman GM, Shortreed MR, Cesnik AJ, Smith LM. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2016;9:521-45. [PMID: 27049631 PMCID: PMC4991544 DOI: 10.1146/annurev-anchem-071015-041722] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]

Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow. Nat Commun 2016;7:11778. [PMID: 27250503 PMCID: PMC4895710 DOI: 10.1038/ncomms11778] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Accepted: 04/28/2016] [Indexed: 12/16/2022] Open

Global proteogenomic analysis of human MHC class I-associated peptides derived from non-canonical reading frames. Nat Commun 2016;7:10238. [PMID: 26728094 PMCID: PMC4728431 DOI: 10.1038/ncomms10238] [Citation(s) in RCA: 174] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2015] [Accepted: 11/16/2015] [Indexed: 12/21/2022] Open

Olexiouk V, Menschaert G. Identification of Small Novel Coding Sequences, a Proteogenomics Endeavor. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016;926:49-64. [PMID: 27686805 DOI: 10.1007/978-3-319-42316-6_4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Zickmann F, Renard BY. MSProGene: integrative proteogenomics beyond six-frames and single nucleotide polymorphisms. Bioinformatics 2015;31:i106-15. [PMID: 26072472 PMCID: PMC4765881 DOI: 10.1093/bioinformatics/btv236] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Shanmugam AK, Nesvizhskii AI. Effective Leveraging of Targeted Search Spaces for Improving Peptide Identification in Tandem Mass Spectrometry Based Proteomics. J Proteome Res 2015;14:5169-78. [PMID: 26569054 DOI: 10.1021/acs.jproteome.5b00504] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Sun H, Chen C, Shi M, Wang D, Liu M, Li D, Yang P, Li Y, Xie L. Integration of mass spectrometry and RNA-Seq data to confirm human ab initio predicted genes and lncRNAs. Proteomics 2015;14:2760-8. [PMID: 25339270 DOI: 10.1002/pmic.201400174] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2014] [Revised: 09/22/2014] [Accepted: 10/16/2014] [Indexed: 12/14/2022]

Jagtap PD, Blakely A, Murray K, Stewart S, Kooren J, Johnson JE, Rhodus NL, Rudney J, Griffin TJ. Metaproteomic analysis using the Galaxy framework. Proteomics 2015;15:3553-65. [DOI: 10.1002/pmic.201500074] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2015] [Revised: 04/25/2015] [Accepted: 06/04/2015] [Indexed: 12/22/2022]

Zhang K, Fu Y, Zeng WF, He K, Chi H, Liu C, Li YC, Gao Y, Xu P, He SM. A note on the false discovery rate of novel peptides in proteogenomics. Bioinformatics 2015;31:3249-53. [PMID: 26076724 PMCID: PMC4595894 DOI: 10.1093/bioinformatics/btv340] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 05/27/2015] [Indexed: 11/15/2022] Open

Affiliation(s)

Kun Zhang Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, University of Chinese Academy of Sciences, Beijing 100049
Yan Fu National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190 and
Wen-Feng Zeng Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, University of Chinese Academy of Sciences, Beijing 100049
Kun He Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, University of Chinese Academy of Sciences, Beijing 100049
Hao Chi Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190
Chao Liu Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190
Yan-Chang Li State Key Laboratory of Proteomics, National Engineering Research Center for Protein Drugs, Beijing Proteome Research Center, National Center for Protein Sciences Beijing, Beijing Institute of Radiation Medicine, Beijing 102206, China
Yuan Gao State Key Laboratory of Proteomics, National Engineering Research Center for Protein Drugs, Beijing Proteome Research Center, National Center for Protein Sciences Beijing, Beijing Institute of Radiation Medicine, Beijing 102206, China
Ping Xu State Key Laboratory of Proteomics, National Engineering Research Center for Protein Drugs, Beijing Proteome Research Center, National Center for Protein Sciences Beijing, Beijing Institute of Radiation Medicine, Beijing 102206, China
Si-Min He Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190

Collapse

Tay AP, Pang CNI, Twine NA, Hart-Smith G, Harkness L, Kassem M, Wilkins MR. Proteomic Validation of Transcript Isoforms, Including Those Assembled from RNA-Seq Data. J Proteome Res 2015;14:3541-54. [PMID: 25961807 DOI: 10.1021/pr5011394] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Muth T, Kolmeder CA, Salojärvi J, Keskitalo S, Varjosalo M, Verdam FJ, Rensen SS, Reichl U, de Vos WM, Rapp E, Martens L. Navigating through metaproteomics data: a logbook of database searching. Proteomics 2015;15:3439-53. [PMID: 25778831 DOI: 10.1002/pmic.201400560] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Revised: 02/13/2015] [Accepted: 03/06/2015] [Indexed: 11/12/2022]

Gonnelli G, Stock M, Verwaeren J, Maddelein D, De Baets B, Martens L, Degroeve S. A Decoy-Free Approach to the Identification of Peptides. J Proteome Res 2015;14:1792-8. [DOI: 10.1021/pr501164r] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Lee DCH, Jones AR, Hubbard SJ. Computational phosphoproteomics: from identification to localization. Proteomics 2015;15:950-63. [PMID: 25475148 PMCID: PMC4384807 DOI: 10.1002/pmic.201400372] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 10/31/2014] [Accepted: 11/26/2014] [Indexed: 01/08/2023]

Nesvizhskii AI. Proteogenomics: concepts, applications and computational strategies. Nat Methods 2015;11:1114-25. [PMID: 25357241 DOI: 10.1038/nmeth.3144] [Citation(s) in RCA: 505] [Impact Index Per Article: 56.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 09/22/2014] [Indexed: 12/19/2022]

Crappé J, Ndah E, Koch A, Steyaert S, Gawron D, De Keulenaer S, De Meester E, De Meyer T, Van Criekinge W, Van Damme P, Menschaert G. PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration. Nucleic Acids Res 2014;43:e29. [PMID: 25510491 PMCID: PMC4357689 DOI: 10.1093/nar/gku1283] [Citation(s) in RCA: 109] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

Affiliation(s)

Jeroen Crappé Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
Elvis Ndah Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium Department of Medical Protein Research, Flemish Institute of Biotechnology, Ghent, Belgium Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
Alexander Koch Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
Sandra Steyaert Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
Daria Gawron Department of Medical Protein Research, Flemish Institute of Biotechnology, Ghent, Belgium Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
Sarah De Keulenaer Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
Ellen De Meester Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
Tim De Meyer Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
Wim Van Criekinge Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
Petra Van Damme Department of Medical Protein Research, Flemish Institute of Biotechnology, Ghent, Belgium Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
Gerben Menschaert Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium

Collapse

Kucharova V, Wiker HG. Proteogenomics in microbiology: taking the right turn at the junction of genomics and proteomics. Proteomics 2014;14:2360-675. [PMID: 25263021 DOI: 10.1002/pmic.201400168] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Revised: 08/18/2014] [Accepted: 09/23/2014] [Indexed: 12/14/2022]