1
|
Paramasivan S, Ashick M, Dudley KJ, Satake N, Mills PC, Sadowski P, Nagaraj SH. VPBrowse: Genome-based representation of MS/MS spectra to quantify 10,000 bovine proteins. Proteomics 2024; 24:e2300431. [PMID: 38468111 DOI: 10.1002/pmic.202300431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 02/11/2024] [Accepted: 02/26/2024] [Indexed: 03/13/2024]
Abstract
SWATH is a data acquisition strategy acclaimed for generating quantitatively accurate and consistent measurements of proteins across multiple samples. Its utility for proteomics studies in nonlaboratory animals, however, is currently compromised by the lack of sufficiently comprehensive and reliable public libraries, either experimental or predicted, and relevant platforms that support their sharing and utilization in an intuitive manner. Here we describe the development of the Veterinary Proteome Browser, VPBrowse (http://browser.proteo.cloud/), an on-line platform for genome-based representation of the Bos taurus proteome, which is equipped with an interactive database and tools for searching, visualization, and building quantitative mass spectrometry assays. In its current version (VPBrowse 1.0), it contains high-quality fragmentation spectra acquired on QToF instrument for over 36,000 proteotypic peptides, the experimental evidence for over 10,000 proteins. Data can be downloaded in different formats to enable analysis using popular software packages for SWATH data processing whilst normalization to iRT scale ensures compatibility with diverse chromatography systems. When applied to published blood plasma dataset from the biomarker discovery study, the resource supported label-free quantification of additional proteins not reported by the authors previously including PSMA4, a tissue leakage protein and a promising candidate biomarker of animal's response to dehorning-related injury.
Collapse
Affiliation(s)
- Selvam Paramasivan
- School of Veterinary Science, The University of Queensland, Gatton, Queensland, Australia
- Central Analytical Research Facility, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Mohamed Ashick
- LifeBytes India Private Limited, Bengaluru, Karnataka, India
| | - Kevin J Dudley
- Central Analytical Research Facility, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Nana Satake
- School of Veterinary Science, The University of Queensland, Gatton, Queensland, Australia
| | - Paul C Mills
- School of Veterinary Science, The University of Queensland, Gatton, Queensland, Australia
| | - Pawel Sadowski
- Central Analytical Research Facility, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Shivashankar H Nagaraj
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, Queensland, Australia
- Translational Research Institute, Brisbane, Queensland, Australia
| |
Collapse
|
2
|
Tay AP, Hamey JJ, Martyn GE, Wilson LOW, Wilkins MR. Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing. J Proteome Res 2022; 21:1628-1639. [PMID: 35612954 DOI: 10.1021/acs.jproteome.1c00968] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Alternative splicing can lead to distinct protein isoforms. These can have different functions in specific cells and tissues or in different developmental stages. In this study, we explored whether transcripts assembled from long read, nanopore-based, direct RNA-sequencing (RNA-seq) could improve the identification of protein isoforms in human K562 cells. By comparing with Illumina-based short read RNA-seq, we showed that a large proportion of Ensembl transcripts (5949/14,326) and genes expressing alternatively spliced transcripts (486/2981) identified with long direct reads were missed by short paired-end reads. By co-analyzing proteomic and transcriptomic data, we also showed that some peptides (826/35,976), proteins (262/3215), and protein isoforms arising from distinct transcript variants (574/1212) identified with isoform-specific peptides via custom long-read-based databases were missed in Illumina-derived databases. Finally, we generated unequivocal peptide evidence for a set of protein isoforms and showed that long read, direct RNA-seq allows the discovery of novel protein isoforms not already in reference databases or custom databases built from short read RNA-seq data. Our analysis highlights the benefits of long read RNA-seq data in the generation of reference databases to increase tandem mass spectrometry (MS/MS) identification of protein isoforms.
Collapse
Affiliation(s)
- Aidan P Tay
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia.,Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Sydney, New South Wales 2113, Australia.,Applied Biosciences, Macquarie University, Sydney, New South Wales 2109, Australia
| | - Joshua J Hamey
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Gabriella E Martyn
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Laurence O W Wilson
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Sydney, New South Wales 2113, Australia.,Applied Biosciences, Macquarie University, Sydney, New South Wales 2109, Australia
| | - Marc R Wilkins
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia
| |
Collapse
|
3
|
Saha S, Matthews DA, Bessant C. High throughput discovery of protein variants using proteomics informed by transcriptomics. Nucleic Acids Res 2019; 46:4893-4902. [PMID: 29718325 PMCID: PMC6007231 DOI: 10.1093/nar/gky295] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Accepted: 04/11/2018] [Indexed: 11/13/2022] Open
Abstract
Proteomics informed by transcriptomics (PIT), in which proteomic MS/MS spectra are searched against open reading frames derived from de novo assembled transcripts, can reveal previously unknown translated genomic elements (TGEs). However, determining which TGEs are truly novel, which are variants of known proteins, and which are simply artefacts of poor sequence assembly, is challenging. We have designed and implemented an automated solution that classifies putative TGEs by comparing to reference proteome sequences. This allows large-scale identification of sequence polymorphisms, splice isoforms and novel TGEs supported by presence or absence of variant-specific peptide evidence. Unlike previously reported methods, ours does not require a catalogue of known variants, making it more applicable to non-model organisms. The method was validated on human PIT data, then applied to Mus musculus, Pteropus alecto and Aedes aegypti. Novel discoveries included 60 human protein isoforms, 32 392 polymorphisms in P. alecto, and TGEs with non-methionine start sites including tyrosine.
Collapse
Affiliation(s)
- Shyamasree Saha
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End, London E1 4NS, UK
| | - David A Matthews
- School of Cellular and Molecular Medicine, University of Bristol, University Walk, Bristol BS8 1TD, UK
| | - Conrad Bessant
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End, London E1 4NS, UK.,Centre for Computational Biology, Life Sciences Initiative, Queen Mary University of London, Mile End, London E1 4NS, UK
| |
Collapse
|
4
|
Schlaffner CN, Pirklbauer GJ, Bender A, Choudhary JS. Fast, Quantitative and Variant Enabled Mapping of Peptides to Genomes. Cell Syst 2019; 5:152-156.e4. [PMID: 28837811 PMCID: PMC5571441 DOI: 10.1016/j.cels.2017.07.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Revised: 03/24/2017] [Accepted: 07/26/2017] [Indexed: 12/24/2022]
Abstract
Current tools for visualization and integration of proteomics with other omics datasets are inadequate for large-scale studies and capture only basic sequence identity information. Furthermore, the frequent reformatting of annotations for reference genomes required by these tools is known to be highly error prone. We developed PoGo for mapping peptides identified through mass spectrometry to overcome these limitations. PoGo reduced runtime and memory usage by 85% and 20%, respectively, and exhibited overall superior performance over other tools on benchmarking with large-scale human tissue and cancer phosphoproteome datasets comprising ∼3 million peptides. In addition, extended functionality enables representation of single-nucleotide variants, post-translational modifications, and quantitative features. PoGo has been integrated in established frameworks such as the PRIDE tool suite and OpenMS, as well as a standalone tool with user-friendly graphical interface. With the rapid increase of quantitative high-resolution datasets capturing proteomes and global modifications to complement orthogonal genomics platforms, PoGo provides a central utility enabling large-scale visualization and interpretation of transomics datasets.
Collapse
Affiliation(s)
- Christoph N Schlaffner
- Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK; Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire CB2 1EW, UK.
| | - Georg J Pirklbauer
- Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire CB2 1EW, UK
| | - Jyoti S Choudhary
- Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| |
Collapse
|
5
|
Low TY, Mohtar MA, Ang MY, Jamal R. Connecting Proteomics to Next‐Generation Sequencing: Proteogenomics and Its Current Applications in Biology. Proteomics 2018; 19:e1800235. [DOI: 10.1002/pmic.201800235] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 10/09/2018] [Indexed: 12/17/2022]
Affiliation(s)
- Teck Yew Low
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - M. Aiman Mohtar
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - Mia Yang Ang
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - Rahman Jamal
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| |
Collapse
|
6
|
Muth T, Hartkopf F, Vaudel M, Renard BY. A Potential Golden Age to Come-Current Tools, Recent Use Cases, and Future Avenues for De Novo Sequencing in Proteomics. Proteomics 2018; 18:e1700150. [PMID: 29968278 DOI: 10.1002/pmic.201700150] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 05/23/2018] [Indexed: 01/15/2023]
Abstract
In shotgun proteomics, peptide and protein identification is most commonly conducted using database search engines, the method of choice when reference protein sequences are available. Despite its widespread use the database-driven approach is limited, mainly because of its static search space. In contrast, de novo sequencing derives peptide sequence information in an unbiased manner, using only the fragment ion information from the tandem mass spectra. In recent years, with the improvements in MS instrumentation, various new methods have been proposed for de novo sequencing. This review article provides an overview of existing de novo sequencing algorithms and software tools ranging from peptide sequencing to sequence-to-protein mapping. Various use cases are described for which de novo sequencing was successfully applied. Finally, limitations of current methods are highlighted and new directions are discussed for a wider acceptance of de novo sequencing in the community.
Collapse
Affiliation(s)
- Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Felix Hartkopf
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Marc Vaudel
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020, Bergen, Norway.,Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, 5020, Bergen, Norway
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| |
Collapse
|
7
|
Sajulga R, Mehta S, Kumar P, Johnson JE, Guerrero CR, Ryan MC, Karchin R, Jagtap PD, Griffin TJ. Bridging the Chromosome-centric and Biology/Disease-driven Human Proteome Projects: Accessible and Automated Tools for Interpreting the Biological and Pathological Impact of Protein Sequence Variants Detected via Proteogenomics. J Proteome Res 2018; 17:4329-4336. [DOI: 10.1021/acs.jproteome.8b00404] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Ray Sajulga
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
- Bioinformatics and Computational Biology Program, University of Minnesota-Rochester, Rochester, Minnesota 55904, United States
| | - James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Candace R. Guerrero
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Michael C. Ryan
- In-Silico Solutions, Falls Church, Virginia 22043, United States
| | - Rachel Karchin
- Department of Biomedical Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, United States
- The Institute for Computational Medicine, The Johns Hopkins University, Baltimore, Maryland 21218, United States
- Department of Oncology, The Johns Hopkins University School of Medicine, Baltimore, Maryland 21217, United States
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| |
Collapse
|
8
|
Schlaffner CN, Pirklbauer GJ, Bender A, Steen JAJ, Choudhary JS. A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes. J Vis Exp 2018. [PMID: 29889196 PMCID: PMC6101353 DOI: 10.3791/57633] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Cross-talk between genes, transcripts, and proteins is the key to cellular responses; hence, analysis of molecular levels as distinct entities is slowly being extended to integrative studies to enhance the understanding of molecular dynamics within cells. Current tools for the visualization and integration of proteomics with other omics datasets are inadequate for large-scale studies. Furthermore, they only capture basic sequence identify, discarding post-translational modifications and quantitation. To address these issues, we developed PoGo to map peptides with associated post-translational modifications and quantification to reference genome annotation. In addition, the tool was developed to enable the mapping of peptides identified from customized sequence databases incorporating single amino acid variants. While PoGo is a command line tool, the graphical interface PoGoGUI enables non-bioinformatics researchers to easily map peptides to 25 species supported by Ensembl genome annotation. The generated output borrows file formats from the genomics field and, therefore, visualization is supported in most genome browsers. For large-scale studies, PoGo is supported by TrackHubGenerator to create web-accessible repositories of data mapped to genomes that also enable an easy sharing of proteogenomics data. With little effort, this tool can map millions of peptides to reference genomes within only a few minutes, outperforming other available sequence-identity based tools. This protocol demonstrates the best approaches for proteogenomics mapping through PoGo with publicly available datasets of quantitative and phosphoproteomics, as well as large-scale studies.
Collapse
Affiliation(s)
- Christoph N Schlaffner
- Department of Neurobiology, F. M. Kirby Neurobiology Center, Boston Children's Hospital, Harvard Medical School; Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Wellcome Genome Campus; Centre for Molecular Informatics, Department of Chemistry, University of Cambridge;
| | - Georg J Pirklbauer
- Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Wellcome Genome Campus
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge
| | - Judith A J Steen
- Department of Neurobiology, F. M. Kirby Neurobiology Center, Boston Children's Hospital, Harvard Medical School
| | - Jyoti S Choudhary
- Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Wellcome Genome Campus; Functional Proteomics Group, Chester Beatty Laboratories, Institute of Cancer Research
| |
Collapse
|
9
|
Tse SPK, Beauchemin M, Morse D, Lo SCL. Refining Transcriptome Gene Catalogs by MS-Validation of Expressed Proteins. Proteomics 2017; 18. [PMID: 29152876 DOI: 10.1002/pmic.201700271] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Revised: 10/17/2017] [Indexed: 11/11/2022]
Abstract
Protein sequence identification by tandem mass spectroscopy (LC-MS/MS) identifies thousands of protein sequences even in complex mixtures, and provides valuable insight into the biological functions of different cells. For non-model organisms, transcriptomes are generally used to allow peptide identification, an important addition to their use as a gene catalog allowing the potential metabolic activities of cells to be determined. We used LC-MS/MS data to identify which of the six possible reading frames in the transcriptome was actually used by the cell to make protein, and asked whether this would have an impact on downstream analyses using the dataset. We combined results from several LC-MS/MS experiments designed to identify peptide sequences in extracts from the dinoflagellate Lingulodinium polyedra using a 74 655-sequence transcriptome. We compiled a list of 6628 translated nucleic acid sequences that contained the ensemble of peptide matches (termed MS-validated sequences) and assessed the similarity in downstream analyses between this data set and the 6628 nucleic acid sequences from which they were derived. When compared with BLASTx analyses of the DNA sequences, the MS-validated protein-sequences-analyzed using BLASTp showed differences in gene ontology, had more identified BLAST hits, and contained more KEGG pathway enzymes. The MS-validated protein sequences also differ from datasets containing longest open reading frame (ORF) protein sequences. We also note a poor correlation between the levels of protein and mRNA abundance, a comparison not previously performed for dinoflagellates. The differences observed between analyses of MS-validated protein sequence and nucleic acid sequence datasets suggest use of the former may provide a more accurate representation of cellular capacity than the latter. Developing MS-validated protein sequence datasets may also speed interpretation of MS-MS spectra in bottom up proteomics experiments.
Collapse
Affiliation(s)
- Sirius P K Tse
- Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University, Kowloon, Hong Kong.,Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Kowloon, Hong Kong
| | - Mathieu Beauchemin
- Département de Sciences Biologiques, Institut de Recherche en biologie Végétale, Université de Montréal, Montreal, Canada
| | - David Morse
- Département de Sciences Biologiques, Institut de Recherche en biologie Végétale, Université de Montréal, Montreal, Canada
| | - Samuel C L Lo
- Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University, Kowloon, Hong Kong.,Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Kowloon, Hong Kong
| |
Collapse
|
10
|
Omasits U, Varadarajan AR, Schmid M, Goetze S, Melidis D, Bourqui M, Nikolayeva O, Québatte M, Patrignani A, Dehio C, Frey JE, Robinson MD, Wollscheid B, Ahrens CH. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Res 2017; 27:2083-2095. [PMID: 29141959 PMCID: PMC5741054 DOI: 10.1101/gr.218255.116] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 10/25/2017] [Indexed: 12/18/2022]
Abstract
Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae, Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote.
Collapse
Affiliation(s)
- Ulrich Omasits
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Adithi R Varadarajan
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland.,Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Michael Schmid
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Sandra Goetze
- Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Damianos Melidis
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Marc Bourqui
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Olga Nikolayeva
- Institute for Molecular Life Sciences & SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
| | | | - Andrea Patrignani
- Functional Genomics Center Zurich, ETH & UZH Zurich, CH-8057 Zurich, Switzerland
| | | | - Juerg E Frey
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Mark D Robinson
- Institute for Molecular Life Sciences & SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
| | - Bernd Wollscheid
- Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Christian H Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| |
Collapse
|
11
|
Chapman B, Bellgard M. Plant Proteogenomics: Improvements to the Grapevine Genome Annotation. Proteomics 2017; 17. [DOI: 10.1002/pmic.201700197] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Revised: 07/28/2017] [Indexed: 01/09/2023]
Affiliation(s)
- Brett Chapman
- Centre for Comparative Genomics; Murdoch University; Western Australia Australia
| | - Matthew Bellgard
- Centre for Comparative Genomics; Murdoch University; Western Australia Australia
| |
Collapse
|
12
|
Kroll JE, da Silva VL, de Souza SJ, de Souza GA. A tool for integrating genetic and mass spectrometry-based peptide data: Proteogenomics Viewer. Bioessays 2017; 39. [DOI: 10.1002/bies.201700015] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- José Eduardo Kroll
- Institute of Bioinformatics and Biotechnology; Natal − RN Brazil
- Brain Institute; Universidade Federal do Rio Grande do Norte; Natal − RN Brazil
- Bioinformatics Multidisciplinary Environment; Instituto Metrópole Digital; UFRN, Natal-RN Brazil
| | - Vandeclécio Lira da Silva
- Brain Institute; Universidade Federal do Rio Grande do Norte; Natal − RN Brazil
- Bioinformatics Multidisciplinary Environment; Instituto Metrópole Digital; UFRN, Natal-RN Brazil
| | - Sandro José de Souza
- Brain Institute; Universidade Federal do Rio Grande do Norte; Natal − RN Brazil
- Bioinformatics Multidisciplinary Environment; Instituto Metrópole Digital; UFRN, Natal-RN Brazil
| | - Gustavo Antonio de Souza
- Brain Institute; Universidade Federal do Rio Grande do Norte; Natal − RN Brazil
- Bioinformatics Multidisciplinary Environment; Instituto Metrópole Digital; UFRN, Natal-RN Brazil
- Department of Immunology and Centre for Immune Regulation, Oslo University Hospital HF Rikshospitalet; University of Oslo; Oslo Norway
| |
Collapse
|
13
|
Ruggles KV, Krug K, Wang X, Clauser KR, Wang J, Payne SH, Fenyö D, Zhang B, Mani DR. Methods, Tools and Current Perspectives in Proteogenomics. Mol Cell Proteomics 2017; 16:959-981. [PMID: 28456751 DOI: 10.1074/mcp.mr117.000024] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Indexed: 12/20/2022] Open
Abstract
With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications.
Collapse
Affiliation(s)
- Kelly V Ruggles
- From the ‡Department of Medicine, New York University School of Medicine, New York, New York 10016
| | - Karsten Krug
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Xiaojing Wang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030.,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - Karl R Clauser
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Jing Wang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030.,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - Samuel H Payne
- **Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354
| | - David Fenyö
- ‡‡Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, New York 10016; .,§§Institute for Systems Genetics, New York University School of Medicine, New York, New York 10016
| | - Bing Zhang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030; .,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - D R Mani
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142;
| |
Collapse
|
14
|
Guerrero CR, Jagtap PD, Johnson JE, Griffin TJ. Using Galaxy for Proteomics. PROTEOME INFORMATICS 2016. [DOI: 10.1039/9781782626732-00289] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The area of informatics for mass spectrometry (MS)-based proteomics data has steadily grown over the last two decades. Numerous, effective software programs now exist for various aspects of proteomic informatics. However, many researchers still have difficulties in using these software. These difficulties arise from problems with running and integrating disparate software programs, scalability issues when dealing with large data volumes, and lack of ability to share and reproduce workflows comprised of different software. The Galaxy framework for bioinformatics provides an attractive option for solving many of these current issues in proteomic informatics. Originally developed as a workbench to enable genomic data analysis, numerous researchers are now turning to Galaxy to implement software for MS-based proteomics applications. Here, we provide an introduction to Galaxy and its features, and describe how software tools are deployed, published and shared via the scalable framework. We also describe some of the existing tools in Galaxy for basic MS-based proteomics data analysis and informatics. Finally, we describe how proteomics tools in Galaxy can be combined with other existing tools for genomic and transcriptomic data analysis to enable powerful multi-omic data analysis applications.
Collapse
Affiliation(s)
- Candace R. Guerrero
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota 321 Church St SE/6-155 Jackson Hall Minneapolis MN 55455 USA
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota 321 Church St SE/6-155 Jackson Hall Minneapolis MN 55455 USA
- Center for Mass Spectrometry and Proteomics, University of Minnesota 1479 Gortner Avenue, St. Paul MN 55108 USA
| | - James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota 512 Walter Library, 117 Pleasant Street SE Minneapolis MN 55455 USA
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota 321 Church St SE/6-155 Jackson Hall Minneapolis MN 55455 USA
- Center for Mass Spectrometry and Proteomics, University of Minnesota 1479 Gortner Avenue, St. Paul MN 55108 USA
| |
Collapse
|
15
|
Proteomics progresses in microbial physiology and clinical antimicrobial therapy. Eur J Clin Microbiol Infect Dis 2016; 36:403-413. [PMID: 27812806 PMCID: PMC5309286 DOI: 10.1007/s10096-016-2816-4] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Accepted: 10/16/2016] [Indexed: 02/05/2023]
Abstract
Clinical microbial identification plays an important role in optimizing the management of infectious diseases and provides diagnostic and therapeutic support for clinical management. Microbial proteomic research is aimed at identifying proteins associated with microbial activity, which has facilitated the discovery of microbial physiology changes and host–pathogen interactions during bacterial infection and antimicrobial therapy. Here, we summarize proteomic-driven progresses of host–microbial pathogen interactions at multiple levels, mass spectrometry-based microbial proteome identification for clinical diagnosis, and antimicrobial therapy. Proteomic technique progresses pave new ways towards effective prevention and drug discovery for microbial-induced infectious diseases.
Collapse
|
16
|
Zhang J, Yang MK, Zeng H, Ge F. GAPP: A Proteogenomic Software for Genome Annotation and Global Profiling of Post-translational Modifications in Prokaryotes. Mol Cell Proteomics 2016; 15:3529-3539. [PMID: 27630248 DOI: 10.1074/mcp.m116.060046] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Indexed: 11/06/2022] Open
Abstract
Although the number of sequenced prokaryotic genomes is growing rapidly, experimentally verified annotation of prokaryotic genome remains patchy and challenging. To facilitate genome annotation efforts for prokaryotes, we developed an open source software called GAPP for genome annotation and global profiling of post-translational modifications (PTMs) in prokaryotes. With a single command, it provides a standard workflow to validate and refine predicted genetic models and discover diverse PTM events. We demonstrated the utility of GAPP using proteomic data from Helicobacter pylori, one of the major human pathogens that is responsible for many gastric diseases. Our results confirmed 84.9% of the existing predicted H. pylori proteins, identified 20 novel protein coding genes, and corrected four existing gene models with regard to translation initiation sites. In particular, GAPP revealed a large repertoire of PTMs using the same proteomic data and provided a rich resource that can be used to examine the functions of reversible modifications in this human pathogen. This software is a powerful tool for genome annotation and global discovery of PTMs and is applicable to any sequenced prokaryotic organism; we expect that it will become an integral part of ongoing genome annotation efforts for prokaryotes. GAPP is freely available at https://sourceforge.net/projects/gappproteogenomic/.
Collapse
Affiliation(s)
- Jia Zhang
- From the ‡Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Ming-Kun Yang
- From the ‡Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Honghui Zeng
- §Wuhan Branch, Supercomputing Center, Chinese Academy of Sciences, China
| | - Feng Ge
- From the ‡Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; .,§Wuhan Branch, Supercomputing Center, Chinese Academy of Sciences, China
| |
Collapse
|
17
|
Li Y, Wang X, Cho JH, Shaw TI, Wu Z, Bai B, Wang H, Zhou S, Beach TG, Wu G, Zhang J, Peng J. JUMPg: An Integrative Proteogenomics Pipeline Identifying Unannotated Proteins in Human Brain and Cancer Cells. J Proteome Res 2016; 15:2309-20. [PMID: 27225868 DOI: 10.1021/acs.jproteome.6b00344] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Proteogenomics is an emerging approach to improve gene annotation and interpretation of proteomics data. Here we present JUMPg, an integrative proteogenomics pipeline including customized database construction, tag-based database search, peptide-spectrum match filtering, and data visualization. JUMPg creates multiple databases of DNA polymorphisms, mutations, splice junctions, partially trypticity, as well as protein fragments translated from the whole transcriptome in all six frames upon RNA-seq de novo assembly. We use a multistage strategy to search these databases sequentially, in which the performance is optimized by re-searching only unmatched high-quality spectra and reusing amino acid tags generated by the JUMP search engine. The identified peptides/proteins are displayed with gene loci using the UCSC genome browser. Then, the JUMPg program is applied to process a label-free mass spectrometry data set of Alzheimer's disease postmortem brain, uncovering 496 new peptides of amino acid substitutions, alternative splicing, frame shift, and "non-coding gene" translation. The novel protein PNMA6BL specifically expressed in the brain is highlighted. We also tested JUMPg to analyze a stable-isotope labeled data set of multiple myeloma cells, revealing 991 sample-specific peptides that include protein sequences in the immunoglobulin light chain variable region. Thus, the JUMPg program is an effective proteogenomics tool for multiomics data integration.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Hong Wang
- Integrated Biomedical Sciences Program, University of Tennessee Health Science Center , 920 Madison Avenue, Memphis, Tennessee 38163, United States
| | | | - Thomas G Beach
- Banner Sun Health Research Institute , Sun City, Arizona 85351, United States
| | | | | | | |
Collapse
|
18
|
Sheynkman GM, Shortreed MR, Cesnik AJ, Smith LM. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2016; 9:521-45. [PMID: 27049631 PMCID: PMC4991544 DOI: 10.1146/annurev-anchem-071015-041722] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.
Collapse
Affiliation(s)
- Gloria M Sheynkman
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215;
- Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Anthony J Cesnik
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
- Genome Center of Wisconsin, University of Wisconsin, Madison, Wisconsin 53706;
| |
Collapse
|
19
|
Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome Biol 2016; 17:13. [PMID: 26813401 PMCID: PMC4728800 DOI: 10.1186/s13059-016-0881-8] [Citation(s) in RCA: 1405] [Impact Index Per Article: 175.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.
Collapse
Affiliation(s)
- Ana Conesa
- Institute for Food and Agricultural Sciences, Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, 32603, USA. .,Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain.
| | - Pedro Madrigal
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. .,Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery, University of Cambridge, Cambridge, CB2 0SZ, UK.
| | - Sonia Tarazona
- Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain.,Department of Applied Statistics, Operations Research and Quality, Universidad Politécnica de Valencia, 46020, Valencia, Spain
| | - David Gomez-Cabrero
- Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, Karolinska University Hospital, 171 77, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, 17177, Stockholm, Sweden.,Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176, Stockholm, Sweden.,Science for Life Laboratory, 17121, Solna, Sweden
| | - Alejandra Cervera
- Systems Biology Laboratory, Institute of Biomedicine and Genome-Scale Biology Research Program, University of Helsinki, 00014, Helsinki, Finland
| | - Andrew McPherson
- School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6, BC, Canada
| | - Michał Wojciech Szcześniak
- Department of Bioinformatics, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University in Poznań, 61-614, Poznań, Poland
| | - Daniel J Gaffney
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Laura L Elo
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | - Xuegong Zhang
- Key Lab of Bioinformatics/Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing, 100084, China.,School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697-2300, USA. .,Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA.
| |
Collapse
|
20
|
Locard-Paulet M, Pible O, Gonzalez de Peredo A, Alpha-Bazin B, Almunia C, Burlet-Schiltz O, Armengaud J. Clinical implications of recent advances in proteogenomics. Expert Rev Proteomics 2016; 13:185-99. [DOI: 10.1586/14789450.2016.1132169] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
21
|
Abstract
![]()
Every
molecular player in the cast of biology’s central
dogma is being sequenced and quantified with increasing ease and coverage.
To bring the resulting genomic, transcriptomic, and proteomic data
sets into coherence, tools must be developed that do not constrain
data acquisition and analytics in any way but rather provide simple
links across previously acquired data sets with minimal preprocessing
and hassle. Here we present such a tool: PGx, which supports proteogenomic
integration of mass spectrometry proteomics data with next-generation
sequencing by mapping identified peptides onto their putative genomic
coordinates.
Collapse
Affiliation(s)
- Manor Askenazi
- Biomedical Hosting LLC, 33 Lewis Avenue, Arlington, Massachusetts 02474, United States
| | - Kelly V Ruggles
- NYU Langone Medical Center , 227 East 30th Street, New York, New York 10016, United States
| | - David Fenyö
- NYU Langone Medical Center , 227 East 30th Street, New York, New York 10016, United States
| |
Collapse
|
22
|
Wang X, Slebos RJC, Chambers MC, Tabb DL, Liebler DC, Zhang B. proBAMsuite, a Bioinformatics Framework for Genome-Based Representation and Analysis of Proteomics Data. Mol Cell Proteomics 2015; 15:1164-75. [PMID: 26657539 PMCID: PMC4813696 DOI: 10.1074/mcp.m115.052860] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Indexed: 01/13/2023] Open
Abstract
To facilitate genome-based representation and analysis of proteomics data, we developed a new bioinformatics framework, proBAMsuite, in which a central component is the protein BAM (proBAM) file format for organizing peptide spectrum matches (PSMs)1 within the context of the genome. proBAMsuite also includes two R packages, proBAMr and proBAMtools, for generating and analyzing proBAM files, respectively. Applying proBAMsuite to three recently published proteomics datasets, we demonstrated its utility in facilitating efficient genome-based sharing, interpretation, and integration of proteomics data. First, the interpretation of proteomics data is significantly enhanced with the rich genomic annotation information. Second, PSMs can be easily reannotated using user-specified gene annotation schemes and assembled into both protein and gene identifications. Third, using the genome as a common reference, proBAMsuite facilitates seamless proteomics and proteogenomics data integration. Finally, proBAM files can be readily visualized in genome browsers and thus bring proteomics data analysis to a general audience beyond the proteomics community. Results from this study establish proBAMsuite as a useful bioinformatics framework for proteomics and proteogenomics research.
Collapse
Affiliation(s)
| | - Robbert J C Slebos
- §Department of Biochemistry, ¶Jim Ayers Institute for Precancer Detection and Diagnosis, Vanderbilt-Ingram Cancer Center, Nashville, TN 37232
| | | | - David L Tabb
- From the ‡Department of Biomedical Informatics, §Department of Biochemistry
| | - Daniel C Liebler
- From the ‡Department of Biomedical Informatics, §Department of Biochemistry, ¶Jim Ayers Institute for Precancer Detection and Diagnosis, Vanderbilt-Ingram Cancer Center, Nashville, TN 37232
| | - Bing Zhang
- From the ‡Department of Biomedical Informatics, ‖Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN 37232;
| |
Collapse
|
23
|
Abraham PE, Wang X, Ranjan P, Nookaew I, Zhang B, Tuskan GA, Hettich RL. Integrating mRNA and Protein Sequencing Enables the Detection and Quantitative Profiling of Natural Protein Sequence Variants of Populus trichocarpa. J Proteome Res 2015; 14:5318-26. [DOI: 10.1021/acs.jproteome.5b00823] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Paul E. Abraham
- Chemical
Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Xiaojing Wang
- Department
of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, United States
| | - Priya Ranjan
- Chemical
Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Intawat Nookaew
- Biological
Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Bing Zhang
- Department
of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, United States
| | - Gerald A. Tuskan
- Biological
Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Robert L. Hettich
- Chemical
Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| |
Collapse
|
24
|
Tay AP, Pang CNI, Twine NA, Hart-Smith G, Harkness L, Kassem M, Wilkins MR. Proteomic Validation of Transcript Isoforms, Including Those Assembled from RNA-Seq Data. J Proteome Res 2015; 14:3541-54. [PMID: 25961807 DOI: 10.1021/pr5011394] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Human proteome analysis now requires an understanding of protein isoforms. We recently published the PG Nexus pipeline, which facilitates high confidence validation of exons and splice junctions by integrating genomics and proteomics data. Here we comprehensively explore how RNA-seq transcriptomics data, and proteomic analysis of the same sample, can identify protein isoforms. RNA-seq data from human mesenchymal (hMSC) stem cells were analyzed with our new TranscriptCoder tool to generate a database of protein isoform sequences. MS/MS data from matching hMSC samples were then matched against the TranscriptCoder-derived database, along with Ensembl and the neXtProt database. Querying the TranscriptCoder-derived or Ensembl database could unambiguously identify ∼450 protein isoforms, with isoform-specific proteotypic peptides, including candidate hMSC-specific isoforms for the genes DPYSL2 and FXR1. Where isoform-specific peptides did not exist, groups of nonisoform-specific proteotypic peptides could specifically identify many isoforms. In both the above cases, isoforms will be detectable with targeted MS/MS assays. Unfortunately, our analysis also revealed that some isoforms will be difficult to identify unambiguously as they do not have peptides that are sufficiently distinguishing. We covisualize mRNA isoforms and peptides in a genome browser to illustrate the above situations. Mass spectrometry data is available via ProteomeXchange (PXD001449).
Collapse
Affiliation(s)
- Aidan P Tay
- Systems Biology Initiative, The University of New South Wales , Sydney, New South Wales 2052, Australia.,School of Biotechnology and Biomolecular Sciences, The University of New South Wales , Sydney, New South Wales 2052, Australia
| | - Chi Nam Ignatius Pang
- Systems Biology Initiative, The University of New South Wales , Sydney, New South Wales 2052, Australia.,School of Biotechnology and Biomolecular Sciences, The University of New South Wales , Sydney, New South Wales 2052, Australia
| | - Natalie A Twine
- Systems Biology Initiative, The University of New South Wales , Sydney, New South Wales 2052, Australia.,School of Biotechnology and Biomolecular Sciences, The University of New South Wales , Sydney, New South Wales 2052, Australia
| | - Gene Hart-Smith
- Systems Biology Initiative, The University of New South Wales , Sydney, New South Wales 2052, Australia.,School of Biotechnology and Biomolecular Sciences, The University of New South Wales , Sydney, New South Wales 2052, Australia
| | - Linda Harkness
- Endocrine Research Laboratory (KMEB), Department of Endocrinology and Metabolism, Odense University Hospital & University of Southern Denmark , Odense 5230, Denmark
| | - Moustapha Kassem
- Endocrine Research Laboratory (KMEB), Department of Endocrinology and Metabolism, Odense University Hospital & University of Southern Denmark , Odense 5230, Denmark
| | - Marc R Wilkins
- Systems Biology Initiative, The University of New South Wales , Sydney, New South Wales 2052, Australia.,School of Biotechnology and Biomolecular Sciences, The University of New South Wales , Sydney, New South Wales 2052, Australia
| |
Collapse
|
25
|
Winter DL, Abeygunawardena D, Hart-Smith G, Erce MA, Wilkins MR. Lysine methylation modulates the protein-protein interactions of yeast cytochrome C Cyc1p. Proteomics 2015; 15:2166-76. [PMID: 25755154 DOI: 10.1002/pmic.201400521] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Revised: 02/02/2015] [Accepted: 03/02/2015] [Indexed: 12/21/2022]
Abstract
In recent years, protein methylation has been established as a major intracellular PTM. It has also been proposed to modulate protein-protein interactions (PPIs) in the interactome. To investigate the effect of PTMs on PPIs, we recently developed the conditional two-hybrid (C2H) system. With this, we demonstrated that arginine methylation can modulate PPIs in the yeast interactome. Here, we used the C2H system to investigate the effect of lysine methylation. Specifically, we asked whether Ctm1p-mediated trimethylation of yeast cytochrome c Cyc1p, on lysine 78, modulates its interactions with Erv1p, Ccp1p, Cyc2p and Cyc3p. We show that the interactions between Cyc1p and Erv1p, and between Cyc1p and Cyc3p, are significantly increased upon trimethylation of lysine 78. This increase of interaction helps explain the reported facilitation of Cyc1p import into the mitochondrial intermembrane space upon methylation. This first application of the C2H system to the study of methyllysine-modulated interactions further confirms its robustness and flexibility.
Collapse
Affiliation(s)
- Daniel L Winter
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| | - Dhanushi Abeygunawardena
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| | - Gene Hart-Smith
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| | - Melissa A Erce
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| | - Marc R Wilkins
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| |
Collapse
|
26
|
Tavares R, Scherer NM, Ferreira CG, Costa FF, Passetti F. Splice variants in the proteome: a promising and challenging field to targeted drug discovery. Drug Discov Today 2015; 20:353-60. [DOI: 10.1016/j.drudis.2014.11.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Revised: 10/19/2014] [Accepted: 11/07/2014] [Indexed: 12/15/2022]
|
27
|
Ghali F, Krishna R, Perkins S, Collins A, Xia D, Wastling J, Jones AR. ProteoAnnotator - Open source proteogenomics annotation software supporting PSI standards. Proteomics 2014; 14:2731-41. [DOI: 10.1002/pmic.201400265] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Revised: 09/10/2014] [Accepted: 10/02/2014] [Indexed: 11/08/2022]
Affiliation(s)
- Fawaz Ghali
- Institute of Integrative Biology; University of Liverpool; Liverpool UK
| | - Ritesh Krishna
- Institute of Integrative Biology; University of Liverpool; Liverpool UK
| | - Simon Perkins
- Institute of Integrative Biology; University of Liverpool; Liverpool UK
| | - Andrew Collins
- Institute of Integrative Biology; University of Liverpool; Liverpool UK
| | - Dong Xia
- Department of Infection Biology; Institute of Infection and Global Health; University of Liverpool; Liverpool UK
| | - Jonathan Wastling
- Department of Infection Biology; Institute of Infection and Global Health; University of Liverpool; Liverpool UK
- Health Protection Research Unit in Emerging and Zoonotic Infections; The National Institute for Health Research; University of Liverpool; Liverpool UK
| | - Andrew R. Jones
- Institute of Integrative Biology; University of Liverpool; Liverpool UK
| |
Collapse
|
28
|
Kucharova V, Wiker HG. Proteogenomics in microbiology: taking the right turn at the junction of genomics and proteomics. Proteomics 2014; 14:2360-675. [PMID: 25263021 DOI: 10.1002/pmic.201400168] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Revised: 08/18/2014] [Accepted: 09/23/2014] [Indexed: 12/14/2022]
Abstract
High-accuracy and high-throughput proteomic methods have completely changed the way we can identify and characterize proteins. MS-based proteomics can now provide a unique supplement to genomic data and add a new level of information to the interpretation of genomic sequences. Proteomics-driven genome annotation has become especially relevant in microbiology where genomes are sequenced on a daily basis and limitations of an in silico driven annotation process are well recognized. In this review paper, we outline different strategies on how one can design a proteogenomic experiment, for example on genome-sequenced (synonymous proteogenomics) versus unsequenced organisms (ortho-proteogenomics) or with the aid of other "omic" data such as RNA-seq. We touch upon many challenges that are encountered during a typical proteogenomic study, mostly concerning bioinformatics methods and downstream data analysis, but also related to creation and use of sequence databases. A large list of proteogenomic case studies of different microorganisms is provided to illustrate the mapping of MS/MS-derived peptide spectra to genomic DNA sequences. These investigations have led to accurate determination of translational initiation sites, pointed out eventual read-throughs or programmed frameshifts, detected signal peptide processing or other protein maturation events, removed questionable annotation assignments, and provided evidence for predicted hypothetical proteins.
Collapse
Affiliation(s)
- Veronika Kucharova
- Department of Clinical Science, The Gade Research Group for Infection and Immunity, University of Bergen, Norway
| | | |
Collapse
|
29
|
Jagtap PD, Johnson JE, Onsongo G, Sadler FW, Murray K, Wang Y, Shenykman GM, Bandhakavi S, Smith LM, Griffin TJ. Flexible and accessible workflows for improved proteogenomic analysis using the Galaxy framework. J Proteome Res 2014; 13:5898-908. [PMID: 25301683 PMCID: PMC4261978 DOI: 10.1021/pr500812t] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
![]()
Proteogenomics combines large-scale
genomic and transcriptomic
data with mass-spectrometry-based proteomic data to discover novel
protein sequence variants and improve genome annotation. In contrast
with conventional proteomic applications, proteogenomic analysis requires
a number of additional data processing steps. Ideally, these required
steps would be integrated and automated via a single software platform
offering accessibility for wet-bench researchers as well as flexibility
for user-specific customization and integration of new software tools
as they emerge. Toward this end, we have extended the Galaxy bioinformatics
framework to facilitate proteogenomic analysis. Using analysis of
whole human saliva as an example, we demonstrate Galaxy’s flexibility
through the creation of a modular workflow incorporating both established
and customized software tools that improve depth and quality of proteogenomic
results. Our customized Galaxy-based software includes automated,
batch-mode BLASTP searching and a Peptide Sequence Match Evaluator
tool, both useful for evaluating the veracity of putative novel peptide
identifications. Our complex workflow (approximately 140 steps) can
be easily shared using built-in Galaxy functions, enabling their use
and customization by others. Our results provide a blueprint for the
establishment of the Galaxy framework as an ideal solution for the
emerging field of proteogenomics.
Collapse
Affiliation(s)
- Pratik D Jagtap
- Center for Mass Spectrometry and Proteomics, University of Minnesota , 43 Gortner Laboratory, 1479 Gortner Avenue, St. Paul, Minnesota 55108, United States
| | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Dharmasiri U, Isenberg SL, Glish GL, Armistead PM. Differential ion mobility spectrometry coupled to tandem mass spectrometry enables targeted leukemia antigen detection. J Proteome Res 2014; 13:4356-62. [PMID: 25184817 PMCID: PMC4184456 DOI: 10.1021/pr500527c] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Differential ion mobility spectrometry (DIMS) can be used as a filter to remove undesired background ions from reaching the mass spectrometer. The ability to use DIMS as a filter for known analytes makes DIMS coupled to tandem mass spectrometry (DIMS-MS/MS) a promising technique for the detection of cancer antigens that can be predicted by computational algorithms. In experiments using DIMS-MS/MS that were performed without the use of high-performance liquid chromatography (HPLC), a predicted model antigen, GLR (FLSSANEHL), was detected at a concentration of 10 pM (20 amol) in a mixture containing 94 competing model peptide antigens, each at a concentration of 1 μM. Without DIMS filtering, the GLR peptide was undetectable in the mixture even at 100 nM. Again, without using HPLC, DIMS-MS/MS was used to detect 2 of 3 previously characterized antigens produced by the leukemia cell line U937.A2. Because of its sensitivity, a targeted DIMS-MS/MS methodology can likely be used to probe for predicted cancer antigens from cancer cell lines as well as human tumor samples.
Collapse
Affiliation(s)
- Udara Dharmasiri
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill , 450 West Drive, 21-244, Chapel Hill, North Carolina 27599, United States
| | | | | | | |
Collapse
|
31
|
Cooke IR, Jones D, Bowen JK, Deng C, Faou P, Hall NE, Jayachandran V, Liem M, Taranto AP, Plummer KM, Mathivanan S. Proteogenomic analysis of the Venturia pirina (Pear Scab Fungus) secretome reveals potential effectors. J Proteome Res 2014; 13:3635-44. [PMID: 24965097 DOI: 10.1021/pr500176c] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
A proteogenomic analysis is presented for Venturia pirina, a fungus that causes scab disease on European pear (Pyrus communis). V. pirina is host-specific, and the infection is thought to be mediated by secreted effector proteins. Currently, only 36 V. pirina proteins are catalogued in GenBank, and the genome sequence is not publicly available. To identify putative effectors, V. pirina was grown in vitro on and in cellophane sheets mimicking its growth in infected leaves. Secreted extracts were analyzed by tandem mass spectrometry, and the data (ProteomeXchange identifier PXD000710) was queried against a protein database generated by combining in silico predicted transcripts with six frame translations of a whole genome sequence of V. pirina (GenBank Accession JEMP00000000 ). We identified 1088 distinct V. pirina protein groups (FDR 1%) including 1085 detected for the first time. Thirty novel (not in silico predicted) proteins were found, of which 14 were identified as potential effectors based on characteristic features of fungal effector protein sequences. We also used evidence from semitryptic peptides at the protein N-terminus to corroborate in silico signal peptide predictions for 22 proteins, including several potential effectors. The analysis highlights the utility of proteogenomics in the study of secreted effectors.
Collapse
Affiliation(s)
- Ira R Cooke
- Department of Biochemistry, La Trobe Institute for Molecular Science, La Trobe University , Melbourne, Victoria 3086, Australia
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Com E, Melaine N, Chalmel F, Pineau C. Proteomics and integrative genomics for unraveling the mysteries of spermatogenesis: the strategies of a team. J Proteomics 2014; 107:128-43. [PMID: 24751586 DOI: 10.1016/j.jprot.2014.04.013] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2014] [Accepted: 04/09/2014] [Indexed: 11/25/2022]
Abstract
UNLABELLED The strikingly complex structural organization of the mammalian testis in vivo creates particular difficulties for studies of its organization, function and regulation. These difficulties are particularly pronounced for investigations of the molecular communication networks within the seminiferous tubules that govern spermatogenesis. The use of classical molecular and cell biology approaches to unravel this complexity has proved problematic, due to difficulties in maintaining differentiated germ cells in vitro, in particular. The lack of a suitable testing ground has led to a greater reliance on high-quality proteomic and genomic analyses as a prelude to the in vitro antx1d in vivo testing of hypotheses. In this study, we highlight the options currently available for research, as used in our laboratory, in which proteomic and integrative genomic strategies are applied to the study of spermatogenesis in mammals. We will comment on results providing insight into the molecular mechanisms underlying normal and pathological spermatogenesis and new perspectives for the treatment of male infertility in humans. Finally, we will discuss the relevance of our strategies and the unexpected potential and perspectives they offer to teams involved in the study of male reproduction, within the framework of the Human Proteome Project. SIGNIFICANCE Integrative genomics is becoming a powerful strategy for discovering the biological significance hidden in proteomic datasets. This work introduces some of the integrative genomic concepts and works used by our team to gain new insight into mammalian spermatogenesis, a remarkably sophisticated process. We demonstrate the relevance of these integrative approaches to understand the cellular cross talks established between the somatic Sertoli cells and the germ cell lineage, within the seminiferous epithelium. Our work also contributes to new knowledge on the pathophysiology of testicular function, with promising clinical applications. This article is part of a Special Issue entitled: 20years of Proteomics in memory of Viatliano Pallini. Guest Editors: Luca Bini, Juan J. Calvete, Natacha Turck, Denis Hochstrasser and Jean-Charles Sanchez.
Collapse
Affiliation(s)
- Emmanuelle Com
- IRSET, Inserm U1085, Campus de Beaulieu, Rennes F-35042, France; Proteomics Core Facility Biogenouest, Campus de Beaulieu, Rennes F-35042, France
| | - Nathalie Melaine
- IRSET, Inserm U1085, Campus de Beaulieu, Rennes F-35042, France; Proteomics Core Facility Biogenouest, Campus de Beaulieu, Rennes F-35042, France
| | | | - Charles Pineau
- IRSET, Inserm U1085, Campus de Beaulieu, Rennes F-35042, France; Proteomics Core Facility Biogenouest, Campus de Beaulieu, Rennes F-35042, France.
| |
Collapse
|
33
|
Armengaud J, Trapp J, Pible O, Geffard O, Chaumot A, Hartmann EM. Non-model organisms, a species endangered by proteogenomics. J Proteomics 2014; 105:5-18. [PMID: 24440519 DOI: 10.1016/j.jprot.2014.01.007] [Citation(s) in RCA: 100] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Revised: 12/24/2013] [Accepted: 01/07/2014] [Indexed: 10/25/2022]
Abstract
UNLABELLED Previously, large-scale proteomics was possible only for organisms whose genomes were sequenced, meaning the most common model organisms. The use of next-generation sequencers is now changing the deal. With "proteogenomics", the use of experimental proteomics data to refine genome annotations, a higher integration of omics data is gaining ground. By extension, combining genomic and proteomic data is becoming routine in many research projects. "Proteogenomic"-flavored approaches are currently expanding, enabling the molecular studies of non-model organisms at an unprecedented depth. Today draft genomes can be obtained using next-generation sequencers in a rather straightforward way and at a reasonable cost for any organism. Unfinished genome sequences can be used to interpret tandem mass spectrometry proteomics data without the need for time-consuming genome annotation, and the use of RNA-seq to establish nucleotide sequences that are directly translated into protein sequences appears promising. There are, however, certain drawbacks that deserve further attention for RNA-seq to become more efficient. Here, we discuss the opportunities of working with non-model organisms, the proteomic methods that have been used until now, and the dramatic improvements proffered by proteogenomics. These put the distinction between model and non-model organisms in great danger, at least in terms of proteomics! BIOLOGICAL SIGNIFICANCE Model organisms have been crucial for in-depth analysis of cellular and molecular processes of life. Focusing the efforts of thousands of researchers on the Escherichia coli bacterium, Saccharomyces cerevisiae yeast, Arabidopsis thaliana plant, Danio rerio fish and other models for which genetic manipulation was possible was certainly worthwhile in terms of fundamental and invaluable biological insights. Until recently, proteomics of non-model organisms was limited to tedious, homology-based techniques, but today draft genomes or RNA-seq data can be straightforwardly obtained using next-generation sequencers, allowing the establishment of a draft protein database for any organism. Thus, proteogenomics opens new perspectives for molecular studies of non-model organisms, although they are still difficult experimental organisms. This article is part of a Special Issue entitled: Proteomics of non-model organisms.
Collapse
Affiliation(s)
- Jean Armengaud
- CEA, DSV, IBEB, Lab Biochim System Perturb, Bagnols-sur-Cèze F-30207, France.
| | - Judith Trapp
- CEA, DSV, IBEB, Lab Biochim System Perturb, Bagnols-sur-Cèze F-30207, France; Irstea, UR MALY, F-69626 Villeurbanne, France
| | - Olivier Pible
- CEA, DSV, IBEB, Lab Biochim System Perturb, Bagnols-sur-Cèze F-30207, France
| | | | | | - Erica M Hartmann
- CEA, DSV, IBEB, Lab Biochim System Perturb, Bagnols-sur-Cèze F-30207, France
| |
Collapse
|