401
|
Zhang C, Tao T, Yuan W, Zhang L, Zhang X, Yao J, Zhang Y, Lu H. Fluorous Solid-Phase Extraction Technique Based on Nanographite Fluoride. Anal Chem 2017; 89:4566-4572. [DOI: 10.1021/acs.analchem.6b05071] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Cheng Zhang
- Shanghai
Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, P. R. China
- Department
of Chemistry, Fudan University, Shanghai, 200433, P. R. China
| | - Tao Tao
- Shanghai
Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, P. R. China
- Department
of Chemistry, Fudan University, Shanghai, 200433, P. R. China
| | - Wenjuan Yuan
- Shanghai
Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, P. R. China
- Department
of Chemistry, Fudan University, Shanghai, 200433, P. R. China
| | - Lei Zhang
- Shanghai
Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, P. R. China
| | - Xiaoqin Zhang
- Shanghai
Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, P. R. China
- Department
of Chemistry, Fudan University, Shanghai, 200433, P. R. China
| | - Jun Yao
- Shanghai
Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, P. R. China
| | - Ying Zhang
- Shanghai
Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, P. R. China
| | - Haojie Lu
- Shanghai
Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, P. R. China
- Department
of Chemistry, Fudan University, Shanghai, 200433, P. R. China
- Key
Laboratory of Glycoconjugates Research Ministry of Public Health, Fudan University, Shanghai, 200032, P. R. China
| |
Collapse
|
402
|
Cho JY, Lee HJ, Jeong SK, Paik YK. Epsilon-Q: An Automated Analyzer Interface for Mass Spectral Library Search and Label-Free Protein Quantification. J Proteome Res 2017; 16:4435-4445. [DOI: 10.1021/acs.jproteome.6b01019] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
403
|
|
404
|
Ivanov MV, Lobas AA, Karpov DS, Moshkovskii SA, Gorshkov MV. Comparison of False Discovery Rate Control Strategies for Variant Peptide Identifications in Shotgun Proteogenomics. J Proteome Res 2017; 16:1936-1943. [DOI: 10.1021/acs.jproteome.6b01014] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Mark V. Ivanov
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
- Moscow Institute of Physics and Technology (State University), Moscow Region, Dolgoprudny 141700, Russia
| | - Anna A. Lobas
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
- Moscow Institute of Physics and Technology (State University), Moscow Region, Dolgoprudny 141700, Russia
| | - Dmitry S. Karpov
- Institute of Biomedical Chemistry, Moscow 119121, Russia
- Engelhardt
Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia
| | - Sergei A. Moshkovskii
- Institute of Biomedical Chemistry, Moscow 119121, Russia
- Pirogov Russian National Research Medical University, Moscow 117997, Russia
| | - Mikhail V. Gorshkov
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
- Moscow Institute of Physics and Technology (State University), Moscow Region, Dolgoprudny 141700, Russia
| |
Collapse
|
405
|
Otte KA, Schlötterer C. Polymorphism-aware protein databases - a prerequisite for an unbiased proteomic analysis of natural populations. Mol Ecol Resour 2017; 17:1148-1155. [DOI: 10.1111/1755-0998.12656] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2016] [Revised: 01/12/2017] [Accepted: 01/20/2017] [Indexed: 11/30/2022]
Affiliation(s)
- Kathrin A. Otte
- Institut für Populationsgenetik; Vetmeduni Vienna; Veterinärplatz 1 1210 Vienna Austria
| | - Christian Schlötterer
- Institut für Populationsgenetik; Vetmeduni Vienna; Veterinärplatz 1 1210 Vienna Austria
| |
Collapse
|
406
|
Jean Beltran PM, Federspiel JD, Sheng X, Cristea IM. Proteomics and integrative omic approaches for understanding host-pathogen interactions and infectious diseases. Mol Syst Biol 2017; 13:922. [PMID: 28348067 PMCID: PMC5371729 DOI: 10.15252/msb.20167062] [Citation(s) in RCA: 127] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Organisms are constantly exposed to microbial pathogens in their environments. When a pathogen meets its host, a series of intricate intracellular interactions shape the outcome of the infection. The understanding of these host–pathogen interactions is crucial for the development of treatments and preventive measures against infectious diseases. Over the past decade, proteomic approaches have become prime contributors to the discovery and understanding of host–pathogen interactions that represent anti‐ and pro‐pathogenic cellular responses. Here, we review these proteomic methods and their application to studying viral and bacterial intracellular pathogens. We examine approaches for defining spatial and temporal host–pathogen protein interactions upon infection of a host cell. Further expanding the understanding of proteome organization during an infection, we discuss methods that characterize the regulation of host and pathogen proteomes through alterations in protein abundance, localization, and post‐translational modifications. Finally, we highlight bioinformatic tools available for analyzing such proteomic datasets, as well as novel strategies for integrating proteomics with other omic tools, such as genomics, transcriptomics, and metabolomics, to obtain a systems‐level understanding of infectious diseases.
Collapse
Affiliation(s)
- Pierre M Jean Beltran
- Department of Molecular Biology, Lewis Thomas Laboratory, Princeton University, Princeton, NJ, USA
| | - Joel D Federspiel
- Department of Molecular Biology, Lewis Thomas Laboratory, Princeton University, Princeton, NJ, USA
| | - Xinlei Sheng
- Department of Molecular Biology, Lewis Thomas Laboratory, Princeton University, Princeton, NJ, USA
| | - Ileana M Cristea
- Department of Molecular Biology, Lewis Thomas Laboratory, Princeton University, Princeton, NJ, USA
| |
Collapse
|
407
|
van Ooijen MP, Jong VL, Eijkemans MJC, Heck AJR, Andeweg AC, Binai NA, van den Ham HJ. Identification of differentially expressed peptides in high-throughput proteomics data. Brief Bioinform 2017; 19:971-981. [DOI: 10.1093/bib/bbx031] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2016] [Indexed: 12/25/2022] Open
Affiliation(s)
| | - Victor L Jong
- Department of Biostatistics and Research Support, Julius Center, UMC Utrecht, Netherlands
| | - Marinus J C Eijkemans
- Julius Center for Health Sciences and Primary Care of the University Medical Center Utrecht, Netherlands
| | - Albert J R Heck
- Biomolecular Mass Spectrometry and Proteomics, Utrecht University, Netherlands
| | - Arno C Andeweg
- Department of Viroscience, Erasmus MC, CA Rotterdam, Netherlands
| | - Nadine A Binai
- Biomolecular Mass Spectrometry Group, Utrecht University, Netherlands
| | | |
Collapse
|
408
|
Horton AP, Robotham SA, Cannon JR, Holden DD, Marcotte EM, Brodbelt JS. Comprehensive de Novo Peptide Sequencing from MS/MS Pairs Generated through Complementary Collision Induced Dissociation and 351 nm Ultraviolet Photodissociation. Anal Chem 2017; 89:3747-3753. [PMID: 28234449 PMCID: PMC5480239 DOI: 10.1021/acs.analchem.7b00130] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
We describe a strategy for de novo peptide sequencing based on matched pairs of tandem mass spectra (MS/MS) obtained by collision induced dissociation (CID) and 351 nm ultraviolet photodissociation (UVPD). Each precursor ion is isolated twice with the mass spectrometer switching between CID and UVPD activation modes to obtain a complementary MS/MS pair. To interpret these paired spectra, we modified the UVnovo de novo sequencing software to automatically learn from and interpret fragmentation spectra, provided a representative set of training data. This machine learning procedure, using random forests, synthesizes information from one or multiple complementary spectra, such as the CID/UVPD pairs, into peptide fragmentation site predictions. In doing so, the burden of fragmentation model definition shifts from programmer to machine and opens up the model parameter space for inclusion of nonobvious features and interactions. This spectral synthesis also serves to transform distinct types of spectra into a common representation for subsequent activation-independent processing steps. Then, independent from precursor activation constraints, UVnovo's de novo sequencing procedure generates and scores sequence candidates for each precursor. We demonstrate the combined experimental and computational approach for de novo sequencing using whole cell E. coli lysate. In benchmarks on the CID/UVPD data, UVnovo assigned correct full-length sequences to 83% of the spectral pairs of doubly charged ions with high-confidence database identifications. Considering only top-ranked de novo predictions, 70% of the pairs were deciphered correctly. This de novo sequencing performance exceeds that of PEAKS and PepNovo on the CID spectra and that of UVnovo on CID or UVPD spectra alone. As presented here, the methods for paired CID/UVPD spectral acquisition and interpretation constitute a powerful workflow for high-throughput and accurate de novo peptide sequencing.
Collapse
Affiliation(s)
- Andrew P Horton
- Center for Systems and Synthetic Biology, Department of Molecular Biosciences, University of Texas , Austin, Texas 78712, United States
| | - Scott A Robotham
- Department of Chemistry, University of Texas , Austin, Texas 78712, United States
| | - Joe R Cannon
- Department of Chemistry, University of Texas , Austin, Texas 78712, United States
| | - Dustin D Holden
- Department of Chemistry, University of Texas , Austin, Texas 78712, United States
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Department of Molecular Biosciences, University of Texas , Austin, Texas 78712, United States
| | - Jennifer S Brodbelt
- Department of Chemistry, University of Texas , Austin, Texas 78712, United States
| |
Collapse
|
409
|
Murray HC, Dun MD, Verrills NM. Harnessing the power of proteomics for identification of oncogenic, druggable signalling pathways in cancer. Expert Opin Drug Discov 2017; 12:431-447. [PMID: 28286965 DOI: 10.1080/17460441.2017.1304377] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
INTRODUCTION Genomic and transcriptomic profiling of tumours has revolutionised our understanding of cancer. However, the majority of tumours possess multiple mutations, and determining which oncogene, or even which pathway, to target is difficult. Proteomics is emerging as a powerful approach to identify the functionally important pathways driving these cancers, and how they can be targeted therapeutically. Areas covered: The authors provide a technical overview of mass spectrometry based approaches for proteomic profiling, and review the current and emerging strategies available for the identification of dysregulated networks, pathways, and drug targets in cancer cells, with a key focus on the ability to profile cancer kinomes. The potential applications of mass spectrometry in the clinic are also highlighted. Expert opinion: The addition of proteomic information to genomic platforms - 'proteogenomics' - is providing unparalleled insight in cancer cell biology. Application of improved mass spectrometry technology and methodology, in particular the ability to analyse post-translational modifications (the PTMome), is providing a more complete picture of the dysregulated networks in cancer, and uncovering novel therapeutic targets. While the application of proteomics to discovery research will continue to rise, improved workflow standardisation and reproducibility is required before mass spectrometry can enter routine clinical use.
Collapse
Affiliation(s)
- Heather C Murray
- a School of Biomedical Sciences and Pharmacy, Faculty of Health and Medicine, Priority Research Centre for Cancer Research, Innovation and Translation , University of Newcastle , Callaghan , NSW , Australia.,b Cancer Research Program , Hunter Medical Research Institute , Newcastle , NSW , Australia
| | - Matthew D Dun
- a School of Biomedical Sciences and Pharmacy, Faculty of Health and Medicine, Priority Research Centre for Cancer Research, Innovation and Translation , University of Newcastle , Callaghan , NSW , Australia.,b Cancer Research Program , Hunter Medical Research Institute , Newcastle , NSW , Australia
| | - Nicole M Verrills
- a School of Biomedical Sciences and Pharmacy, Faculty of Health and Medicine, Priority Research Centre for Cancer Research, Innovation and Translation , University of Newcastle , Callaghan , NSW , Australia.,b Cancer Research Program , Hunter Medical Research Institute , Newcastle , NSW , Australia
| |
Collapse
|
410
|
Fu S, Liu X, Luo M, Xie K, Nice EC, Zhang H, Huang C. Proteogenomic studies on cancer drug resistance: towards biomarker discovery and target identification. Expert Rev Proteomics 2017; 14:351-362. [PMID: 28276747 DOI: 10.1080/14789450.2017.1299006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
INTRODUCTION Chemoresistance is a major obstacle for current cancer treatment. Proteogenomics is a powerful multi-omics research field that uses customized protein sequence databases generated by genomic and transcriptomic information to identify novel genes (e.g. noncoding, mutation and fusion genes) from mass spectrometry-based proteomic data. By identifying aberrations that are differentially expressed between tumor and normal pairs, this approach can also be applied to validate protein variants in cancer, which may reveal the response to drug treatment. Areas covered: In this review, we will present recent advances in proteogenomic investigations of cancer drug resistance with an emphasis on integrative proteogenomic pipelines and the biomarker discovery which contributes to achieving the goal of using precision/personalized medicine for cancer treatment. Expert commentary: The discovery and comprehensive understanding of potential biomarkers help identify the cohort of patients who may benefit from particular treatments, and will assist real-time clinical decision-making to maximize therapeutic efficacy and minimize adverse effects. With the development of MS-based proteomics and NGS-based sequencing, a growing number of proteogenomic tools are being developed specifically to investigate cancer drug resistance.
Collapse
Affiliation(s)
- Shuyue Fu
- a State Key Laboratory of Biotherapy and Cancer Center , West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy , Chengdu , P.R. China
| | - Xiang Liu
- b Department of Pathology , Sichuan Academy of Medical Sciences, Sichuan Provincial People's Hospital , Chengdu , P.R. China
| | - Maochao Luo
- c West China School of Public Health, Sichuan University , Chengdu , P.R.China
| | - Ke Xie
- d Department of Oncology , Sichuan Academy of Medical Sciences, Sichuan Provincial People's Hospital , Chengdu , P.R. China
| | - Edouard C Nice
- e Department of Biochemistry and Molecular Biology , Monash University , Clayton , Australia
| | - Haiyuan Zhang
- f School of Medicine , Yangtze University , P. R. China
| | - Canhua Huang
- a State Key Laboratory of Biotherapy and Cancer Center , West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy , Chengdu , P.R. China
| |
Collapse
|
411
|
Dambrun M, Dechavanne C, Emmanuel A, Aussenac F, Leduc M, Giangrande C, Vinh J, Dugoujon JM, Lefranc MP, Guillonneau F, Migot-Nabias F. Human Immunoglobulin Heavy Gamma Chain Polymorphisms: Molecular Confirmation Of Proteomic Assessment. Mol Cell Proteomics 2017; 16:824-839. [PMID: 28265047 DOI: 10.1074/mcp.m116.064733] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2016] [Revised: 02/01/2017] [Indexed: 11/06/2022] Open
Abstract
Immunoglobulin G (IgG) proteins are known for the huge diversity of the variable domains of their heavy and light chains, aimed at protecting each individual against foreign antigens. The IgG also harbor specific polymorphism concentrated in the CH2 and CH3-CHS constant regions located on the Fc fragment of their heavy chains. But this individual particularity relies only on a few amino acids among which some could make accurate sequence determination a challenge for mass spectrometry-based techniques.The purpose of the study was to bring a molecular validation of proteomic results by the sequencing of encoding DNA fragments. It was performed using ten individual samples (DNA and sera) selected on the basis of their Gm (gamma marker) allotype polymorphism in order to cover the main immunoglobulin heavy gamma (IGHG) gene diversity. Gm allotypes, reflecting part of this diversity, were determined by a serological method. On its side, the IGH locus comprises four functional IGHG genes totalizing 34 alleles and encoding the four IgG subclasses. The genomic study focused on the nucleotide polymorphism of the CH2 and CH3-CHS exons and of the intron. Despite strong sequence identity, four pairs of specific gene amplification primers could be designed. Additional primers were identified to perform the subsequent sequencing. The nucleotide sequences obtained were first assigned to a specific IGHG gene, and then IGHG alleles were deduced using a home-made decision tree reading of the nucleotide sequences. IGHG amino acid (AA) alleles were determined by mass spectrometry. Identical results were found at 95% between alleles identified by proteomics and those deduced from genomics. These results validate the proteomic approach which could be used for diagnostic purposes, namely for a mother-and-child differential IGHG detection in a context of suspicion of congenital infection.
Collapse
Affiliation(s)
- Magalie Dambrun
- From the ‡Institut de Recherche pour le Développement, UMR 216 Mère et enfant face aux infections tropicales, Paris, France.,§COMUE Sorbonne Paris Cité, Faculté de Pharmacie, Université Paris Descartes, Paris, France.,¶¶Magalie Dambrun, Célia Dechavanne and Alexandra Emmanuel contributed equally to this work
| | - Célia Dechavanne
- From the ‡Institut de Recherche pour le Développement, UMR 216 Mère et enfant face aux infections tropicales, Paris, France.,§COMUE Sorbonne Paris Cité, Faculté de Pharmacie, Université Paris Descartes, Paris, France.,¶¶Magalie Dambrun, Célia Dechavanne and Alexandra Emmanuel contributed equally to this work
| | - Alexandra Emmanuel
- From the ‡Institut de Recherche pour le Développement, UMR 216 Mère et enfant face aux infections tropicales, Paris, France.,§COMUE Sorbonne Paris Cité, Faculté de Pharmacie, Université Paris Descartes, Paris, France.,¶ESPCI Paris, PSL Research University, Spectrométrie de Masse Biologique et Protéomique (SMBP), CNRS USR 3149, Paris, France.,¶¶Magalie Dambrun, Célia Dechavanne and Alexandra Emmanuel contributed equally to this work
| | - Florentin Aussenac
- From the ‡Institut de Recherche pour le Développement, UMR 216 Mère et enfant face aux infections tropicales, Paris, France.,§COMUE Sorbonne Paris Cité, Faculté de Pharmacie, Université Paris Descartes, Paris, France
| | - Marjorie Leduc
- ‖Plate-forme protéomique de l'Université Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - Chiara Giangrande
- ¶ESPCI Paris, PSL Research University, Spectrométrie de Masse Biologique et Protéomique (SMBP), CNRS USR 3149, Paris, France
| | - Joëlle Vinh
- ¶ESPCI Paris, PSL Research University, Spectrométrie de Masse Biologique et Protéomique (SMBP), CNRS USR 3149, Paris, France
| | - Jean-Michel Dugoujon
- **Laboratoire d'Anthropologie Moléculaire et Imagerie de Synthèse, UMR 5288, CNRS et Université Paul Sabatier Toulouse III, Toulouse, France
| | - Marie-Paule Lefranc
- ‡‡IMGT®, the international ImMunoGeneTics information system®, Laboratoire d'ImmunoGénétique Moléculaire, LIGM, Institut de Génétique Humaine, IGH, UMR 9002, CNRS et Université de Montpellier, Montpellier, France.,§§Institut Universitaire de France, Paris, France
| | - François Guillonneau
- ‖Plate-forme protéomique de l'Université Paris Descartes, Sorbonne Paris Cité, Paris, France.,‖‖François Guillonneau and Florence Migot-Nabias contributed equally to this work
| | - Florence Migot-Nabias
- From the ‡Institut de Recherche pour le Développement, UMR 216 Mère et enfant face aux infections tropicales, Paris, France; .,§COMUE Sorbonne Paris Cité, Faculté de Pharmacie, Université Paris Descartes, Paris, France.,‖‖François Guillonneau and Florence Migot-Nabias contributed equally to this work
| |
Collapse
|
412
|
Mayers MD, Moon C, Stupp GS, Su AI, Wolan DW. Quantitative Metaproteomics and Activity-Based Probe Enrichment Reveals Significant Alterations in Protein Expression from a Mouse Model of Inflammatory Bowel Disease. J Proteome Res 2017; 16:1014-1026. [PMID: 28052195 DOI: 10.1021/acs.jproteome.6b00938] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Tandem mass spectrometry based shotgun proteomics of distal gut microbiomes is exceedingly difficult due to the inherent complexity and taxonomic diversity of the samples. We introduce two new methodologies to improve metaproteomic studies of microbiome samples. These methods include the stable isotope labeling in mammals to permit protein quantitation across two mouse cohorts as well as the application of activity-based probes to enrich and analyze both host and microbial proteins with specific functionalities. We used these technologies to study the microbiota from the adoptive T cell transfer mouse model of inflammatory bowel disease (IBD) and compare these samples to an isogenic control, thereby limiting genetic and environmental variables that influence microbiome composition. The data generated highlight quantitative alterations in both host and microbial proteins due to intestinal inflammation and corroborates the observed phylogenetic changes in bacteria that accompany IBD in humans and mouse models. The combination of isotope labeling with shotgun proteomics resulted in the total identification of 4434 protein clusters expressed in the microbial proteomic environment, 276 of which demonstrated differential abundance between control and IBD mice. Notably, application of a novel cysteine-reactive probe uncovered several microbial proteases and hydrolases overrepresented in the IBD mice. Implementation of these methods demonstrated that substantial insights into the identity and dysregulation of host and microbial proteins altered in IBD can be accomplished and can be used in the interrogation of other microbiome-related diseases.
Collapse
Affiliation(s)
- Michael D Mayers
- Department of Molecular and Experimental Medicine, ‡Department of Integrative Structural and Computational Biology, and §Department of Chemical Physiology, The Scripps Research Institute , 10550 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Clara Moon
- Department of Molecular and Experimental Medicine, ‡Department of Integrative Structural and Computational Biology, and §Department of Chemical Physiology, The Scripps Research Institute , 10550 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Gregory S Stupp
- Department of Molecular and Experimental Medicine, ‡Department of Integrative Structural and Computational Biology, and §Department of Chemical Physiology, The Scripps Research Institute , 10550 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Andrew I Su
- Department of Molecular and Experimental Medicine, ‡Department of Integrative Structural and Computational Biology, and §Department of Chemical Physiology, The Scripps Research Institute , 10550 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Dennis W Wolan
- Department of Molecular and Experimental Medicine, ‡Department of Integrative Structural and Computational Biology, and §Department of Chemical Physiology, The Scripps Research Institute , 10550 North Torrey Pines Road, La Jolla, California 92037, United States
| |
Collapse
|
413
|
A Golden Age for Working with Public Proteomics Data. Trends Biochem Sci 2017; 42:333-341. [PMID: 28118949 PMCID: PMC5414595 DOI: 10.1016/j.tibs.2017.01.001] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2016] [Revised: 12/13/2016] [Accepted: 01/02/2017] [Indexed: 11/23/2022]
Abstract
Data sharing in mass spectrometry (MS)-based proteomics is becoming a common scientific practice, as is now common in the case of other, more mature ‘omics’ disciplines like genomics and transcriptomics. We want to highlight that this situation, unprecedented in the field, opens a plethora of opportunities for data scientists. First, we explain in some detail some of the work already achieved, such as systematic reanalysis efforts. We also explain existing applications of public proteomics data, such as proteogenomics and the creation of spectral libraries and spectral archives. Finally, we discuss the main existing challenges and mention the first attempts to combine public proteomics data with other types of omics data sets. The field of proteomics has matured and diversified substantially over the past 10 years. Proteomics data are increasingly shared through centralized, public repositories. Standardization efforts have ensured that a large proportion of these public data can be read and processed by any interested researcher. Because any proteomics data set is only partially understood, there is great opportunity for (orthogonal) reuse of public data. While public proteomics data has so far remained outside ethics and privacy discussions, recent work indicates that there is an inherent risk.
Collapse
|
414
|
Tan Z, Nie S, McDermott SP, Wicha MS, Lubman DM. Single Amino Acid Variant Profiles of Subpopulations in the MCF-7 Breast Cancer Cell Line. J Proteome Res 2017; 16:842-851. [PMID: 28076950 DOI: 10.1021/acs.jproteome.6b00824] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Cancers are initiated and developed from a small population of stem-like cells termed cancer stem cells (CSCs). There is heterogeneity among this CSC population that leads to multiple subpopulations with their own distinct biological features and protein expression. The protein expression and function may be impacted by amino acid variants that can occur largely due to single nucleotide changes. We have thus performed proteomic analysis of breast CSC subpopulations by mass spectrometry to study the presence of single amino acid variants (SAAVs) and their relation to breast cancer. We have used CSC markers to isolate pure breast CSC subpopulation fractions (ALDH+ and CD44+/CD24- cell populations) and the mature luminal cells (CD49f-EpCAM+) from the MCF-7 breast cancer cell line. By searching the Swiss-CanSAAVs database, 374 unique SAAVs were identified in total, where 27 are cancer-related SAAVs. 135 unique SAAVs were found in the CSC population compared with the mature luminal cells. The distribution of SAAVs detected in MCF-7 cells was compared with those predicted from the Swiss-CanSAAVs database, where we found distinct differences in the numbers of SAAVs detected relative to that expected from the Swiss-CanSAAVs database for several of the amino acids.
Collapse
Affiliation(s)
- Zhijing Tan
- Department of Surgery, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Song Nie
- Department of Surgery, University of Michigan , Ann Arbor, Michigan 48109, United States.,Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory , Richland, Washington 99352, United States
| | - Sean P McDermott
- Department of Internal Medicine, Division of Hematology/Oncology, University of Michigan , Ann Arbor, Michigan 48109, United States.,Comprehensive Cancer Center, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Max S Wicha
- Department of Internal Medicine, Division of Hematology/Oncology, University of Michigan , Ann Arbor, Michigan 48109, United States.,Comprehensive Cancer Center, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - David M Lubman
- Department of Surgery, University of Michigan , Ann Arbor, Michigan 48109, United States
| |
Collapse
|
415
|
Datta KK, Patil AH, Patel K, Dey G, Madugundu AK, Renuse S, Kaviyil JE, Sekhar R, Arunima A, Daswani B, Kaur I, Mohanty J, Sinha R, Jaiswal S, Sivapriya S, Sonnathi Y, Chattoo BB, Gowda H, Ravikumar R, Prasad TSK. Proteogenomics of Candida tropicalis--An Opportunistic Pathogen with Importance for Global Health. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2017; 20:239-47. [PMID: 27093108 DOI: 10.1089/omi.2015.0197] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The frequency of Candida infections is currently rising, and thus adversely impacting global health. The situation is exacerbated by azole resistance developed by fungal pathogens. Candida tropicalis is an opportunistic pathogen that causes candidiasis, for example, in immune-compromised individuals, cancer patients, and those who undergo organ transplantation. It is a member of the non-albicans group of Candida that are known to be azole-resistant, and is frequently seen in individuals being treated for cancers, HIV-infection, and those who underwent bone marrow transplantation. Although the genome of C. tropicalis was sequenced in 2009, the genome annotation has not been supported by experimental validation. In the present study, we have carried out proteomics profiling of C. tropicalis using high-resolution Fourier transform mass spectrometry. We identified 2743 proteins, thus mapping nearly 44% of the computationally predicted protein-coding genes with peptide level evidence. In addition to identifying 2591 proteins in the cell lysate of this yeast, we also analyzed the proteome of the conditioned media of C. tropicalis culture and identified several unique secreted proteins among a total of 780 proteins. By subjecting the mass spectrometry data derived from cell lysate and conditioned media to proteogenomic analysis, we identified 86 novel genes, 12 novel exons, and corrected 49 computationally-predicted gene models. To our knowledge, this is the first high-throughput proteomics study of C. tropicalis validating predicted protein coding genes and refining the current genome annotation. The findings may prove useful in future global health efforts to fight against Candida infections.
Collapse
Affiliation(s)
- Keshava K Datta
- 1 Institute of Bioinformatics , International Technology Park, Bangalore, India.,2 School of Biotechnology, KIIT University , Bhubaneswar, India
| | - Arun H Patil
- 1 Institute of Bioinformatics , International Technology Park, Bangalore, India.,2 School of Biotechnology, KIIT University , Bhubaneswar, India
| | - Krishna Patel
- 1 Institute of Bioinformatics , International Technology Park, Bangalore, India.,3 Amrita School of Biotechnology, Amrita Vishwa Vidyapeetham , Kollam, India
| | - Gourav Dey
- 1 Institute of Bioinformatics , International Technology Park, Bangalore, India.,4 Manipal University , Madhav Nagar, Manipal, India
| | - Anil K Madugundu
- 1 Institute of Bioinformatics , International Technology Park, Bangalore, India.,5 Centre for Bioinformatics, School of Life Sciences, Pondicherry University , Puducherry, India
| | - Santosh Renuse
- 1 Institute of Bioinformatics , International Technology Park, Bangalore, India.,3 Amrita School of Biotechnology, Amrita Vishwa Vidyapeetham , Kollam, India
| | - Jyothi E Kaviyil
- 6 Department of Neuromicrobiology, Neurobiology Research Centre, National Institute of Mental Health and Neurosciences , Bangalore, India
| | - Raja Sekhar
- 1 Institute of Bioinformatics , International Technology Park, Bangalore, India.,5 Centre for Bioinformatics, School of Life Sciences, Pondicherry University , Puducherry, India
| | | | - Bhavna Daswani
- 7 National Institute for Research in Reproductive Health (ICMR) , Parel, Mumbai, India
| | - Inderjeet Kaur
- 8 Malaria Research Group, International Center for Genetic Engineering and Biotechnology (ICGEB) , New Delhi, India
| | - Jyotirmaya Mohanty
- 9 ICAR-Central Institute of Freshwater Aquaculture , Kausalyaganga, Bhubaneswar, India
| | | | | | - S Sivapriya
- 11 Department of Ocular Pathology, Vision Research Foundation , Chennai, India
| | | | - Bharat B Chattoo
- 13 Centre for Genome Research, Department of Microbiology and Biotechnology Centre, Faculty of Science, The M. S. University of Baroda , Vadodara, India
| | - Harsha Gowda
- 1 Institute of Bioinformatics , International Technology Park, Bangalore, India.,2 School of Biotechnology, KIIT University , Bhubaneswar, India .,14 YU-IOB Center for Systems Biology and Molecular Medicine, Yenepoya University , Mangalore, India
| | - Raju Ravikumar
- 6 Department of Neuromicrobiology, Neurobiology Research Centre, National Institute of Mental Health and Neurosciences , Bangalore, India
| | - T S Keshava Prasad
- 1 Institute of Bioinformatics , International Technology Park, Bangalore, India.,14 YU-IOB Center for Systems Biology and Molecular Medicine, Yenepoya University , Mangalore, India .,15 NIMHANS-IOB Proteomics and Bioinformatics Laboratory, Neurobiology Research Centre, National Institute of Mental Health and Neurosciences , Bangalore, India
| |
Collapse
|
416
|
|
417
|
Abstract
Recent advances in high resolution tandem mass spectrometry (MS) has resulted in the accumulation of high quality data. Paralleled with these advances in instrumentation, bioinformatics software have been developed to analyze such quality datasets. In spite of these advances, data analysis in mass spectrometry still remains critical for protein identification. In addition, the complexity of the generated MS/MS spectra, unpredictable nature of peptide fragmentation, sequence annotation errors, and posttranslational modifications has impeded the protein identification process. In a typical MS data analysis, about 60 % of the MS/MS spectra remains unassigned. While some of these could attribute to the low quality of the MS/MS spectra, a proportion can be classified as high quality. Further analysis may reveal how much of the unassigned MS spectra attribute to search space, sequence annotation errors, mutations, and/or posttranslational modifications. In this chapter, the tools used to identify proteins and ways to assign unassigned tandem MS spectra are discussed.
Collapse
Affiliation(s)
- Mohashin Pathan
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Monisha Samuel
- Department of Physiology, Anatomy and Microbiology, La Trobe University, Bundoora, Melbourne, VIC, 3086, Australia
| | - Shivakumar Keerthikumar
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Suresh Mathivanan
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia.
| |
Collapse
|
418
|
Bai B, Tan H, Pagala VR, High AA, Ichhaporia VP, Hendershot L, Peng J. Deep Profiling of Proteome and Phosphoproteome by Isobaric Labeling, Extensive Liquid Chromatography, and Mass Spectrometry. Methods Enzymol 2016; 585:377-395. [PMID: 28109439 DOI: 10.1016/bs.mie.2016.10.007] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Mass spectrometry-based proteomics has experienced an unprecedented advance in comprehensive analysis of proteins and posttranslational modifications, with particular technical progress in liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) and isobaric labeling multiplexing capacity. Here, we introduce a deep proteomics profiling protocol that combines 10-plex tandem mass tag (TMT) labeling with an optimized LC-MS/MS platform to quantitate whole proteome and phosphoproteome. The major steps include protein extraction and digestion, TMT labeling, two-dimensional liquid chromatography, TiO2-mediated phosphopeptide enrichment, high-resolution mass spectrometry, and computational data processing. This protocol routinely leads to confident quantification of more than 10,000 proteins and approximately 30,000 phosphosites in mammalian samples. Quality control steps are implemented for troubleshooting and evaluating experimental variation. Such a multiplexed robust method provides a powerful tool for dissecting proteomic signatures at the systems level in a variety of complex samples, ranging from cell culture, animal tissues to human clinical specimens.
Collapse
Affiliation(s)
- B Bai
- St. Jude Children's Research Hospital, Memphis, TN, United States
| | - H Tan
- St. Jude Proteomics Facility, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - V R Pagala
- St. Jude Proteomics Facility, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - A A High
- St. Jude Proteomics Facility, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - V P Ichhaporia
- St. Jude Children's Research Hospital, Memphis, TN, United States
| | - L Hendershot
- St. Jude Children's Research Hospital, Memphis, TN, United States
| | - J Peng
- St. Jude Children's Research Hospital, Memphis, TN, United States; St. Jude Proteomics Facility, St. Jude Children's Research Hospital, Memphis, TN, United States.
| |
Collapse
|
419
|
Li H, Joh YS, Kim H, Paek E, Lee SW, Hwang KB. Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification. BMC Genomics 2016; 17:1031. [PMID: 28155652 PMCID: PMC5259817 DOI: 10.1186/s12864-016-3327-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Proteogenomics is a promising approach for various tasks ranging from gene annotation to cancer research. Databases for proteogenomic searches are often constructed by adding peptide sequences inferred from genomic or transcriptomic evidence to reference protein sequences. Such inflation of databases has potential of identifying novel peptides. However, it also raises concerns on sensitive and reliable peptide identification. Spurious peptides included in target databases may result in underestimated false discovery rate (FDR). On the other hand, inflation of decoy databases could decrease the sensitivity of peptide identification due to the increased number of high-scoring random hits. Although several studies have addressed these issues, widely applicable guidelines for sensitive and reliable proteogenomic search have hardly been available. Results To systematically evaluate the effect of database inflation in proteogenomic searches, we constructed a variety of real and simulated proteogenomic databases for yeast and human tandem mass spectrometry (MS/MS) data, respectively. Against these databases, we tested two popular database search tools with various approaches to search result validation: the target-decoy search strategy (with and without a refined scoring-metric) and a mixture model-based method. The effect of separate filtering of known and novel peptides was also examined. The results from real and simulated proteogenomic searches confirmed that separate filtering increases the sensitivity and reliability in proteogenomic search. However, no one method consistently identified the largest (or the smallest) number of novel peptides from real proteogenomic searches. Conclusions We propose to use a set of search result validation methods with separate filtering, for sensitive and reliable identification of peptides in proteogenomic search. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3327-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Honglan Li
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Republic of Korea
| | - Yoon Sung Joh
- Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
| | - Hyunwoo Kim
- Scientific Data Research Center, Korea Institute of Science and Technology Information, Daejeon, 34141, Republic of Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
| | - Sang-Won Lee
- Department of Chemistry, Research Institute for Natural Sciences, Korea University, Seoul, 02841, Republic of Korea
| | - Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Republic of Korea.
| |
Collapse
|
420
|
Bhat AR, Gupta MK, Krithivasan P, Dhas K, Nair J, Reddy RB, Sudheendra HV, Chavan S, Vardhan H, Darsi S, Balakrishnan L, Katragadda S, Kekatpure V, Suresh A, Tata P, Panda B, Kuriakose MA, Sirdeshmukh R. Sample preparation method considerations for integrated transcriptomic and proteomic analysis of tumors. Proteomics Clin Appl 2016; 11. [PMID: 27801551 DOI: 10.1002/prca.201600100] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Revised: 08/16/2016] [Accepted: 10/26/2016] [Indexed: 01/09/2023]
Affiliation(s)
| | - Manoj Kumar Gupta
- Institute of Bioinformatics; International Tech Park; Bangalore India
- Manipal University; Madhav Nagar; Manipal India
| | - Priya Krithivasan
- Ganit Labs, Bio-IT Centre; Institute of Bioinformatics and Applied Biotechnology; Bangalore India
| | - Kunal Dhas
- Ganit Labs, Bio-IT Centre; Institute of Bioinformatics and Applied Biotechnology; Bangalore India
| | - Jayalakshmi Nair
- Ganit Labs, Bio-IT Centre; Institute of Bioinformatics and Applied Biotechnology; Bangalore India
| | - Ram Bhupal Reddy
- Head and Neck Oncology; Mazumdar Shaw Medical Centre; Narayana Health; Bangalore India
- Mazumdar Shaw Center for Translational Research; Mazumdar Shaw Medical Foundation; Narayana Health; Bangalore India
| | | | - Sandip Chavan
- Institute of Bioinformatics; International Tech Park; Bangalore India
| | - Harsha Vardhan
- Head and Neck Oncology; Mazumdar Shaw Medical Centre; Narayana Health; Bangalore India
- Mazumdar Shaw Center for Translational Research; Mazumdar Shaw Medical Foundation; Narayana Health; Bangalore India
| | - Sujatha Darsi
- Head and Neck Oncology; Mazumdar Shaw Medical Centre; Narayana Health; Bangalore India
| | | | | | - Vikram Kekatpure
- Head and Neck Oncology; Mazumdar Shaw Medical Centre; Narayana Health; Bangalore India
| | - Amritha Suresh
- Head and Neck Oncology; Mazumdar Shaw Medical Centre; Narayana Health; Bangalore India
- Mazumdar Shaw Center for Translational Research; Mazumdar Shaw Medical Foundation; Narayana Health; Bangalore India
| | | | - Binay Panda
- Ganit Labs, Bio-IT Centre; Institute of Bioinformatics and Applied Biotechnology; Bangalore India
| | - Moni A. Kuriakose
- Head and Neck Oncology; Mazumdar Shaw Medical Centre; Narayana Health; Bangalore India
- Mazumdar Shaw Center for Translational Research; Mazumdar Shaw Medical Foundation; Narayana Health; Bangalore India
| | - Ravi Sirdeshmukh
- Institute of Bioinformatics; International Tech Park; Bangalore India
- Mazumdar Shaw Center for Translational Research; Mazumdar Shaw Medical Foundation; Narayana Health; Bangalore India
| |
Collapse
|
421
|
Nicolaou O, Kousios A, Hadjisavvas A, Lauwerys B, Sokratous K, Kyriacou K. Biomarkers of systemic lupus erythematosus identified using mass spectrometry-based proteomics: a systematic review. J Cell Mol Med 2016; 21:993-1012. [PMID: 27878954 PMCID: PMC5387176 DOI: 10.1111/jcmm.13031] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 09/29/2016] [Indexed: 12/21/2022] Open
Abstract
Advances in mass spectrometry technologies have created new opportunities for discovering novel protein biomarkers in systemic lupus erythematosus (SLE). We performed a systematic review of published reports on proteomic biomarkers identified in SLE patients using mass spectrometry‐based proteomics and highlight their potential disease association and clinical utility. Two electronic databases, MEDLINE and EMBASE, were systematically searched up to July 2015. The methodological quality of studies included in the review was performed according to Preferred Reporting Items for Systematic Reviews and Meta‐analyses guidelines. Twenty‐five studies were included in the review, identifying 241 SLE candidate proteomic biomarkers related to various aspects of the disease including disease diagnosis and activity or pinpointing specific organ involvement. Furthermore, 13 of the 25 studies validated their results for a selected number of biomarkers in an independent cohort, resulting in the validation of 28 candidate biomarkers. It is noteworthy that 11 candidate biomarkers were identified in more than one study. A significant number of potential proteomic biomarkers that are related to a number of aspects of SLE have been identified using mass spectrometry proteomic approaches. However, further studies are required to assess the utility of these biomarkers in routine clinical practice.
Collapse
Affiliation(s)
- Orthodoxia Nicolaou
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus.,Department of Electron Microscopy/Molecular Pathology, Cyprus School of Molecular Medicine, Nicosia, Cyprus
| | - Andreas Kousios
- Department of Electron Microscopy/Molecular Pathology, Cyprus School of Molecular Medicine, Nicosia, Cyprus
| | - Andreas Hadjisavvas
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus.,Department of Electron Microscopy/Molecular Pathology, Cyprus School of Molecular Medicine, Nicosia, Cyprus
| | - Bernard Lauwerys
- Department of Rheumatology, Université catholique de Louvain, Bruxelles, Belgium
| | - Kleitos Sokratous
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Kyriacos Kyriacou
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus.,Department of Electron Microscopy/Molecular Pathology, Cyprus School of Molecular Medicine, Nicosia, Cyprus
| |
Collapse
|
422
|
Guerrero CR, Jagtap PD, Johnson JE, Griffin TJ. Using Galaxy for Proteomics. PROTEOME INFORMATICS 2016. [DOI: 10.1039/9781782626732-00289] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The area of informatics for mass spectrometry (MS)-based proteomics data has steadily grown over the last two decades. Numerous, effective software programs now exist for various aspects of proteomic informatics. However, many researchers still have difficulties in using these software. These difficulties arise from problems with running and integrating disparate software programs, scalability issues when dealing with large data volumes, and lack of ability to share and reproduce workflows comprised of different software. The Galaxy framework for bioinformatics provides an attractive option for solving many of these current issues in proteomic informatics. Originally developed as a workbench to enable genomic data analysis, numerous researchers are now turning to Galaxy to implement software for MS-based proteomics applications. Here, we provide an introduction to Galaxy and its features, and describe how software tools are deployed, published and shared via the scalable framework. We also describe some of the existing tools in Galaxy for basic MS-based proteomics data analysis and informatics. Finally, we describe how proteomics tools in Galaxy can be combined with other existing tools for genomic and transcriptomic data analysis to enable powerful multi-omic data analysis applications.
Collapse
Affiliation(s)
- Candace R. Guerrero
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota 321 Church St SE/6-155 Jackson Hall Minneapolis MN 55455 USA
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota 321 Church St SE/6-155 Jackson Hall Minneapolis MN 55455 USA
- Center for Mass Spectrometry and Proteomics, University of Minnesota 1479 Gortner Avenue, St. Paul MN 55108 USA
| | - James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota 512 Walter Library, 117 Pleasant Street SE Minneapolis MN 55455 USA
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota 321 Church St SE/6-155 Jackson Hall Minneapolis MN 55455 USA
- Center for Mass Spectrometry and Proteomics, University of Minnesota 1479 Gortner Avenue, St. Paul MN 55108 USA
| |
Collapse
|
423
|
An oligotrophic deep-subsurface community dependent on syntrophy is dominated by sulfur-driven autotrophic denitrifiers. Proc Natl Acad Sci U S A 2016; 113:E7927-E7936. [PMID: 27872277 DOI: 10.1073/pnas.1612244113] [Citation(s) in RCA: 87] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Subsurface lithoautotrophic microbial ecosystems (SLiMEs) under oligotrophic conditions are typically supported by H2 Methanogens and sulfate reducers, and the respective energy processes, are thought to be the dominant players and have been the research foci. Recent investigations showed that, in some deep, fluid-filled fractures in the Witwatersrand Basin, South Africa, methanogens contribute <5% of the total DNA and appear to produce sufficient CH4 to support the rest of the diverse community. This paradoxical situation reflects our lack of knowledge about the in situ metabolic diversity and the overall ecological trophic structure of SLiMEs. Here, we show the active metabolic processes and interactions in one of these communities by combining metatranscriptomic assemblies, metaproteomic and stable isotopic data, and thermodynamic modeling. Dominating the active community are four autotrophic β-proteobacterial genera that are capable of oxidizing sulfur by denitrification, a process that was previously unnoticed in the deep subsurface. They co-occur with sulfate reducers, anaerobic methane oxidizers, and methanogens, which each comprise <5% of the total community. Syntrophic interactions between these microbial groups remove thermodynamic bottlenecks and enable diverse metabolic reactions to occur under the oligotrophic conditions that dominate in the subsurface. The dominance of sulfur oxidizers is explained by the availability of electron donors and acceptors to these microorganisms and the ability of sulfur-oxidizing denitrifiers to gain energy through concomitant S and H2 oxidation. We demonstrate that SLiMEs support taxonomically and metabolically diverse microorganisms, which, through developing syntrophic partnerships, overcome thermodynamic barriers imposed by the environmental conditions in the deep subsurface.
Collapse
|
424
|
Weisser H, Wright JC, Mudge JM, Gutenbrunner P, Choudhary JS. Flexible Data Analysis Pipeline for High-Confidence Proteogenomics. J Proteome Res 2016; 15:4686-4695. [PMID: 27786492 PMCID: PMC5703597 DOI: 10.1021/acs.jproteome.6b00765] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Proteogenomics leverages information
derived from proteomic data
to improve genome annotations. Of particular interest are “novel”
peptides that provide direct evidence of protein expression for genomic
regions not previously annotated as protein-coding. We present a modular,
automated data analysis pipeline aimed at detecting such “novel”
peptides in proteomic data sets. This pipeline implements criteria
developed by proteomics and genome annotation experts for high-stringency
peptide identification and filtering. Our pipeline is based on the
OpenMS computational framework; it incorporates multiple database
search engines for peptide identification and applies a machine-learning
approach (Percolator) to post-process search results. We describe
several new and improved software tools that we developed to facilitate
proteogenomic analyses that enhance the wealth of tools provided by
OpenMS. We demonstrate the application of our pipeline to a human
testis tissue data set previously acquired for the Chromosome-Centric
Human Proteome Project, which led to the addition of five new gene
annotations on the human reference genome.
Collapse
Affiliation(s)
| | | | | | - Petra Gutenbrunner
- School of Informatics, Communications, and Media, University of Applied Sciences Upper Austria , Hagenberg 4232, Austria
| | | |
Collapse
|
425
|
Proteomics progresses in microbial physiology and clinical antimicrobial therapy. Eur J Clin Microbiol Infect Dis 2016; 36:403-413. [PMID: 27812806 PMCID: PMC5309286 DOI: 10.1007/s10096-016-2816-4] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Accepted: 10/16/2016] [Indexed: 02/05/2023]
Abstract
Clinical microbial identification plays an important role in optimizing the management of infectious diseases and provides diagnostic and therapeutic support for clinical management. Microbial proteomic research is aimed at identifying proteins associated with microbial activity, which has facilitated the discovery of microbial physiology changes and host–pathogen interactions during bacterial infection and antimicrobial therapy. Here, we summarize proteomic-driven progresses of host–microbial pathogen interactions at multiple levels, mass spectrometry-based microbial proteome identification for clinical diagnosis, and antimicrobial therapy. Proteomic technique progresses pave new ways towards effective prevention and drug discovery for microbial-induced infectious diseases.
Collapse
|
426
|
Proteomic analysis and translational perspective of hepatocellular carcinoma: Identification of diagnostic protein biomarkers by an onco-proteogenomics approach. Kaohsiung J Med Sci 2016; 32:535-544. [DOI: 10.1016/j.kjms.2016.09.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Revised: 09/07/2016] [Accepted: 09/08/2016] [Indexed: 02/07/2023] Open
|
427
|
Hahn J, Tsoy OV, Thalmann S, Čuklina J, Gelfand MS, Evguenieva-Hackenberg E. Small Open Reading Frames, Non-Coding RNAs and Repetitive Elements in Bradyrhizobium japonicum USDA 110. PLoS One 2016; 11:e0165429. [PMID: 27788207 PMCID: PMC5082802 DOI: 10.1371/journal.pone.0165429] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Accepted: 10/11/2016] [Indexed: 11/18/2022] Open
Abstract
Small open reading frames (sORFs) and genes for non-coding RNAs are poorly investigated components of most genomes. Our analysis of 1391 ORFs recently annotated in the soybean symbiont Bradyrhizobium japonicum USDA 110 revealed that 78% of them contain less than 80 codons. Twenty-one of these sORFs are conserved in or outside Alphaproteobacteria and most of them are similar to genes found in transposable elements, in line with their broad distribution. Stabilizing selection was demonstrated for sORFs with proteomic evidence and bll1319_ISGA which is conserved at the nucleotide level in 16 alphaproteobacterial species, 79 species from other taxa and 49 other Proteobacteria. Further we used Northern blot hybridization to validate ten small RNAs (BjsR1 to BjsR10) belonging to new RNA families. We found that BjsR1 and BjsR3 have homologs outside the genus Bradyrhizobium, and BjsR5, BjsR6, BjsR7, and BjsR10 have up to four imperfect copies in Bradyrhizobium genomes. BjsR8, BjsR9, and BjsR10 are present exclusively in nodules, while the other sRNAs are also expressed in liquid cultures. We also found that the level of BjsR4 decreases after exposure to tellurite and iron, and this down-regulation contributes to survival under high iron conditions. Analysis of additional small RNAs overlapping with 3’-UTRs revealed two new repetitive elements named Br-REP1 and Br-REP2. These REP elements may play roles in the genomic plasticity and gene regulation and could be useful for strain identification by PCR-fingerprinting. Furthermore, we studied two potential toxin genes in the symbiotic island and confirmed toxicity of the yhaV homolog bll1687 but not of the newly annotated higB homolog blr0229_ISGA in E. coli. Finally, we revealed transcription interference resulting in an antisense RNA complementary to blr1853, a gene induced in symbiosis. The presented results expand our knowledge on sORFs, non-coding RNAs and repetitive elements in B. japonicum and related bacteria.
Collapse
Affiliation(s)
- Julia Hahn
- Institute of Microbiology and Molecular Biology, Justus-Liebig-University, Giessen, Germany
| | - Olga V. Tsoy
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Bolshoi Karetny Ln. 19, Moscow, 127051, Russia
| | - Sebastian Thalmann
- Institute of Microbiology and Molecular Biology, Justus-Liebig-University, Giessen, Germany
| | - Jelena Čuklina
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Bolshoi Karetny Ln. 19, Moscow, 127051, Russia
- ETH, Institute of Molecular Systems Biology, Zürich, Switzerland
| | - Mikhail S. Gelfand
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Bolshoi Karetny Ln. 19, Moscow, 127051, Russia
- Skolkovo Institute of Science and Technology, Nobel Str. 3, Moscow, 143026, Russia
- Faculty of Bioengineering and Bioinformatics, M. V. Lomonosov Moscow State University, Vorobyevy Gory 1–73, Moscow, 119234, Russia
- Faculty of Computer Science, Higher School of Economics, Kochnovsky Dr. 3, Moscow, 125319, Russia
- * E-mail: (EEH); (MSG)
| | - Elena Evguenieva-Hackenberg
- Institute of Microbiology and Molecular Biology, Justus-Liebig-University, Giessen, Germany
- * E-mail: (EEH); (MSG)
| |
Collapse
|
428
|
Abstract
A genome sequence is worthless if it cannot be deciphered; therefore, efforts to describe - or 'annotate' - genes began as soon as DNA sequences became available. Whereas early work focused on individual protein-coding genes, the modern genomic ocean is a complex maelstrom of alternative splicing, non-coding transcription and pseudogenes. Scientists - from clinicians to evolutionary biologists - need to navigate these waters, and this has led to the design of high-throughput, computationally driven annotation projects. The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets. Their creation, however, remains challenging.
Collapse
Affiliation(s)
- Jonathan M Mudge
- Department of Computational Genomics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK
| | - Jennifer Harrow
- Department of Computational Genomics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.,Illumina Cambridge Ltd, Chesterford Research Park, Little Chesterford, Saffron Walden CB10 1 XL, UK
| |
Collapse
|
429
|
Abstract
Omics approaches have become popular in biology as powerful discovery tools, and currently gain in interest for diagnostic applications. Establishing the accurate genome sequence of any organism is easy, but the outcome of its annotation by means of automatic pipelines remains imprecise. Some protein-encoding genes may be missed as soon as they are specific and poorly conserved in a given taxon, while important to explain the specific traits of the organism. Translational starts are also poorly predicted in a relatively important number of cases, thus impacting the protein sequence database used in proteomics, comparative genomics, and systems biology. The use of high-throughput proteomics data to improve genome annotation is an attractive option to obtain a more comprehensive molecular picture of a given organism. Here, protocols for reannotating prokaryote genomes are described based on shotgun proteomics and derivatization of protein N-termini with a positively charged reagent coupled to high-resolution tandem mass spectrometry.
Collapse
|
430
|
Caron E, Kowalewski DJ, Chiek Koh C, Sturm T, Schuster H, Aebersold R. Analysis of Major Histocompatibility Complex (MHC) Immunopeptidomes Using Mass Spectrometry. Mol Cell Proteomics 2016; 14:3105-17. [PMID: 26628741 DOI: 10.1074/mcp.o115.052431] [Citation(s) in RCA: 164] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The myriad of peptides presented at the cell surface by class I and class II major histocompatibility complex (MHC) molecules are referred to as the immunopeptidome and are of great importance for basic and translational science. For basic science, the immunopeptidome is a critical component for understanding the immune system; for translational science, exact knowledge of the immunopeptidome can directly fuel and guide the development of next-generation vaccines and immunotherapies against autoimmunity, infectious diseases, and cancers. In this mini-review, we summarize established isolation techniques as well as emerging mass spectrometry-based platforms (i.e. SWATH-MS) to identify and quantify MHC-associated peptides. We also highlight selected biological applications and discuss important current technical limitations that need to be solved to accelerate the development of this field.
Collapse
Affiliation(s)
- Etienne Caron
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland;
| | - Daniel J Kowalewski
- §Department of Immunology, Interfaculty Institute for Cell Biology, University of Tübingen, Tübingen, Germany
| | - Ching Chiek Koh
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Theo Sturm
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Heiko Schuster
- §Department of Immunology, Interfaculty Institute for Cell Biology, University of Tübingen, Tübingen, Germany
| | - Ruedi Aebersold
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland; ¶Faculty of Science, University of Zurich, Zurich, Switzerland
| |
Collapse
|
431
|
Yeom J, Kabir MH, Lim B, Ahn HS, Kim SY, Lee C. A proteogenomic approach for protein-level evidence of genomic variants in cancer cells. Sci Rep 2016; 6:35305. [PMID: 27734975 PMCID: PMC5062161 DOI: 10.1038/srep35305] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2016] [Accepted: 09/27/2016] [Indexed: 11/20/2022] Open
Abstract
Variations in protein coding sequence may sometimes play important roles in cancer development. However, since variants may not express into proteins due to various cellular quality control systems, it is important to get protein-level evidence of the genomic variations. We present a proteogenomic strategy getting protein-level evidence of genomic variants, which we call sequential targeted LC-MS/MS based on prediction of peptide pI and Retention time (STaLPIR). Our approach shows improved peptide identification, and has the potential for the unbiased analysis of variant sequence as well as corresponding reference sequence. Integrated analysis of DNA, mRNA and protein suggests that protein expression level of the nonsynonymous variant is regulated either before or after translation, according to influence of the variant on protein function. In conclusion, our data provides an excellent approach getting direct evidence for the expression of variant protein forms from genome sequence data.
Collapse
Affiliation(s)
- Jeonghun Yeom
- Center for Theragnosis, Korea Institute of Science and Technology, Seoul 02792, Republic of Korea.,Department of Biological Chemistry, Korea University of Science and Technology, Daejeon 34113 Republic of Korea
| | - Mohammad Humayun Kabir
- Center for Theragnosis, Korea Institute of Science and Technology, Seoul 02792, Republic of Korea
| | - Byungho Lim
- Genome Structure Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea
| | - Hee-Sung Ahn
- Center for Theragnosis, Korea Institute of Science and Technology, Seoul 02792, Republic of Korea.,Department of Biological Chemistry, Korea University of Science and Technology, Daejeon 34113 Republic of Korea
| | - Seon-Young Kim
- Genome Structure Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea.,Department of Functional Genomics, Korea University of Science and Technology, Daejeon 34113, Republic of Korea
| | - Cheolju Lee
- Center for Theragnosis, Korea Institute of Science and Technology, Seoul 02792, Republic of Korea.,Department of Biological Chemistry, Korea University of Science and Technology, Daejeon 34113 Republic of Korea
| |
Collapse
|
432
|
Díez P, Fuentes M. Proteogenomics for the Comprehensive Analysis of Human Cellular and Serum Antibody Repertoires. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 926:153-162. [PMID: 27686811 DOI: 10.1007/978-3-319-42316-6_10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The vast repertoire of immunoglobulins produced by the immune system is a consequence of the huge amount of antigens to which we are exposed every day. The diversity of these immunoglobulins is due to different mechanisms (including VDJ recombination, somatic hypermutation, and antigen selection). Understanding how the immune system is capable of generating this diversity and which are the molecular bases of the composition of immunoglobulins are key challenges in the immunological field. During the last decades, several techniques have emerged as promising strategies to achieve these goals, but it is their combination which appears to be the fruitful solution for increasing the knowledge about human cellular and serum antibody repertoires.In this chapter, we address the diverse strategies focused on the analysis of immunoglobulin repertoires as well as the characterization of the genomic and peptide sequences. Moreover, the advantages of combining various -omics approaches are discussed through review different published studies, showing the benefits in clinical areas.
Collapse
Affiliation(s)
- Paula Díez
- Department of Medicine and General Cytometry Service-Nucleus, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), Avda. Universidad de Coimbra, S/N 37007, Salamanca, Spain.,Proteomics Unit, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), Avda. Universidad de Coimbra, S/N 37007, Salamanca, Spain
| | - Manuel Fuentes
- Department of Medicine and General Cytometry Service-Nucleus, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), Avda. Universidad de Coimbra, S/N 37007, Salamanca, Spain. .,Proteomics Unit, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), Avda. Universidad de Coimbra, S/N 37007, Salamanca, Spain.
| |
Collapse
|
433
|
Monte E, Rosa-Garrido M, Vondriska TM, Wang J. Undiscovered Physiology of Transcript and Protein Networks. Compr Physiol 2016; 6:1851-1872. [PMID: 27783861 PMCID: PMC10751805 DOI: 10.1002/cphy.c160003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The past two decades have witnessed a rapid evolution in our ability to measure RNA and protein from biological systems. As a result, new principles have arisen regarding how information is processed in cells, how decisions are made, and the role of networks in biology. This essay examines this technological evolution, reviewing (and critiquing) the conceptual framework that has emerged to explain how RNA and protein networks control cellular function. We identify how future investigations into transcriptomes, proteomes, and other cellular networks will enable development of more robust, quantitative models of cellular behavior whilst also providing new avenues to use knowledge of biological networks to improve human health. © 2016 American Physiological Society. Compr Physiol 6:1851-1872, 2016.
Collapse
Affiliation(s)
- Emma Monte
- Department of Anesthesiology & Perioperative Medicine, David Geffen School of Medicine, University of California, Los Angeles, USA
| | - Manuel Rosa-Garrido
- Department of Anesthesiology & Perioperative Medicine, David Geffen School of Medicine, University of California, Los Angeles, USA
| | - Thomas M. Vondriska
- Department of Anesthesiology & Perioperative Medicine, David Geffen School of Medicine, University of California, Los Angeles, USA
- Department of Medicine/Cardiology, David Geffen School of Medicine, University of California, Los Angeles, USA
- Department of Physiology, David Geffen School of Medicine, University of California, Los Angeles, USA
| | - Jessica Wang
- Department of Medicine/Cardiology, David Geffen School of Medicine, University of California, Los Angeles, USA
| |
Collapse
|
434
|
Zhang J, Yang MK, Zeng H, Ge F. GAPP: A Proteogenomic Software for Genome Annotation and Global Profiling of Post-translational Modifications in Prokaryotes. Mol Cell Proteomics 2016; 15:3529-3539. [PMID: 27630248 DOI: 10.1074/mcp.m116.060046] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Indexed: 11/06/2022] Open
Abstract
Although the number of sequenced prokaryotic genomes is growing rapidly, experimentally verified annotation of prokaryotic genome remains patchy and challenging. To facilitate genome annotation efforts for prokaryotes, we developed an open source software called GAPP for genome annotation and global profiling of post-translational modifications (PTMs) in prokaryotes. With a single command, it provides a standard workflow to validate and refine predicted genetic models and discover diverse PTM events. We demonstrated the utility of GAPP using proteomic data from Helicobacter pylori, one of the major human pathogens that is responsible for many gastric diseases. Our results confirmed 84.9% of the existing predicted H. pylori proteins, identified 20 novel protein coding genes, and corrected four existing gene models with regard to translation initiation sites. In particular, GAPP revealed a large repertoire of PTMs using the same proteomic data and provided a rich resource that can be used to examine the functions of reversible modifications in this human pathogen. This software is a powerful tool for genome annotation and global discovery of PTMs and is applicable to any sequenced prokaryotic organism; we expect that it will become an integral part of ongoing genome annotation efforts for prokaryotes. GAPP is freely available at https://sourceforge.net/projects/gappproteogenomic/.
Collapse
Affiliation(s)
- Jia Zhang
- From the ‡Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Ming-Kun Yang
- From the ‡Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Honghui Zeng
- §Wuhan Branch, Supercomputing Center, Chinese Academy of Sciences, China
| | - Feng Ge
- From the ‡Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; .,§Wuhan Branch, Supercomputing Center, Chinese Academy of Sciences, China
| |
Collapse
|
435
|
Deutsch EW, Sun Z, Campbell DS, Binz PA, Farrah T, Shteynberg D, Mendoza L, Omenn GS, Moritz RL. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics. J Proteome Res 2016; 15:4091-4100. [PMID: 27577934 DOI: 10.1021/acs.jproteome.6b00445] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/ .
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - Zhi Sun
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - David S Campbell
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - Pierre-Alain Binz
- CHUV Centre Universitaire Hospitalier Vaudois , 1011 Lausanne, Switzerland
| | - Terry Farrah
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - David Shteynberg
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - Gilbert S Omenn
- Institute for Systems Biology , Seattle, Washington 98109, United States.,Departments of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics and School of Public Health, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Robert L Moritz
- Institute for Systems Biology , Seattle, Washington 98109, United States
| |
Collapse
|
436
|
Park GW, Hwang H, Kim KH, Lee JY, Lee HK, Park JY, Ji ES, Park SKR, Yates JR, Kwon KH, Park YM, Lee HJ, Paik YK, Kim JY, Yoo JS. Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate. J Proteome Res 2016; 15:4082-4090. [PMID: 27537616 DOI: 10.1021/acs.jproteome.6b00376] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In the Chromosome-Centric Human Proteome Project (C-HPP), false-positive identification by peptide spectrum matches (PSMs) after database searches is a major issue for proteogenomic studies using liquid-chromatography and mass-spectrometry-based large proteomic profiling. Here we developed a simple strategy for protein identification, with a controlled false discovery rate (FDR) at the protein level, using an integrated proteomic pipeline (IPP) that consists of four engrailed steps as follows. First, using three different search engines, SEQUEST, MASCOT, and MS-GF+, individual proteomic searches were performed against the neXtProt database. Second, the search results from the PSMs were combined using statistical evaluation tools including DTASelect and Percolator. Third, the peptide search scores were converted into E-scores normalized using an in-house program. Last, ProteinInferencer was used to filter the proteins containing two or more peptides with a controlled FDR of 1.0% at the protein level. Finally, we compared the performance of the IPP to a conventional proteomic pipeline (CPP) for protein identification using a controlled FDR of <1% at the protein level. Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including 477 alternative splicing variants (vs 182 using the CPP) were identified from human hippocampal tissue. In addition, a total of 10 missing proteins (vs 7 using the CPP) were identified with two or more unique peptides, and their tryptic peptides were validated using MS/MS spectral pattern from a repository database or their corresponding synthetic peptides. This study shows that the IPP effectively improved the identification of proteins, including alternative splicing variants and missing proteins, in human hippocampal tissues for the C-HPP. All RAW files used in this study were deposited in ProteomeXchange (PXD000395).
Collapse
Affiliation(s)
- Gun Wook Park
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea.,Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon 34134, Republic of Korea
| | - Heeyoun Hwang
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea
| | - Kwang Hoe Kim
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea.,Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon 34134, Republic of Korea
| | - Ju Yeon Lee
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea
| | - Hyun Kyoung Lee
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea.,Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon 34134, Republic of Korea
| | - Ji Yeong Park
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea.,Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon 34134, Republic of Korea
| | - Eun Sun Ji
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea
| | - Sung-Kyu Robin Park
- Department of Chemical Physiology, The Scripps Research Institute , La Jolla, California 92037, United States
| | - John R Yates
- Department of Chemical Physiology, The Scripps Research Institute , La Jolla, California 92037, United States
| | - Kyung-Hoon Kwon
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea
| | - Young Mok Park
- Center for Cognition and Sociality, Institute for Basic Science , Daejeon 305-811, Republic of Korea
| | - Hyoung-Joo Lee
- Yonsei Proteome Research Center and Department of Integrated OMICS for Biomedical Science, and Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University , Seoul 120-749, Republic of Korea
| | - Young-Ki Paik
- Yonsei Proteome Research Center and Department of Integrated OMICS for Biomedical Science, and Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University , Seoul 120-749, Republic of Korea
| | - Jin Young Kim
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea
| | - Jong Shin Yoo
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea.,Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon 34134, Republic of Korea
| |
Collapse
|
437
|
Poverennaya EV, Kopylov AT, Ponomarenko EA, Ilgisonis EV, Zgoda VG, Tikhonova OV, Novikova SE, Farafonova TE, Kiseleva YY, Radko SP, Vakhrushev IV, Yarygin KN, Moshkovskii SA, Kiseleva OI, Lisitsa AV, Sokolov AS, Mazur AM, Prokhortchouk EB, Skryabin KG, Kostrjukova ES, Tyakht AV, Gorbachev AY, Ilina EN, Govorun VM, Archakov AI. State of the Art of Chromosome 18-Centric HPP in 2016: Transcriptome and Proteome Profiling of Liver Tissue and HepG2 Cells. J Proteome Res 2016; 15:4030-4038. [PMID: 27527821 DOI: 10.1021/acs.jproteome.6b00380] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
A gene-centric approach was applied for a large-scale study of expression products of a single chromosome. Transcriptome profiling of liver tissue and HepG2 cell line was independently performed using two RNA-Seq platforms (SOLiD and Illumina) and also by Droplet Digital PCR (ddPCR) and quantitative RT-PCR. Proteome profiling was performed using shotgun LC-MS/MS as well as selected reaction monitoring with stable isotope-labeled standards (SRM/SIS) for liver tissue and HepG2 cells. On the basis of SRM/SIS measurements, protein copy numbers were estimated for the Chromosome 18 (Chr 18) encoded proteins in the selected types of biological material. These values were compared with expression levels of corresponding mRNA. As a result, we obtained information about 158 and 142 transcripts for HepG2 cell line and liver tissue, respectively. SRM/SIS measurements and shotgun LC-MS/MS allowed us to detect 91 Chr 18-encoded proteins in total, while an intersection between the HepG2 cell line and liver tissue proteomes was ∼66%. In total, there were 16 proteins specifically observed in HepG2 cell line, while 15 proteins were found solely in the liver tissue. Comparison between proteome and transcriptome revealed a poor correlation (R2 ≈ 0.1) between corresponding mRNA and protein expression levels. The SRM and shotgun data sets (obtained during 2015-2016) are available in PASSEL (PASS00697) and ProteomeExchange/PRIDE (PXD004407). All measurements were also uploaded into the in-house Chr 18 Knowledgebase at http://kb18.ru/protein/matrix/416126 .
Collapse
Affiliation(s)
| | - Arthur T Kopylov
- Institute of Biomedical Chemistry , Pogodinskaya Street, 10, Moscow 119121, Russia
| | - Elena A Ponomarenko
- Institute of Biomedical Chemistry , Pogodinskaya Street, 10, Moscow 119121, Russia
| | | | - Victor G Zgoda
- Institute of Biomedical Chemistry , Pogodinskaya Street, 10, Moscow 119121, Russia
| | - Olga V Tikhonova
- Institute of Biomedical Chemistry , Pogodinskaya Street, 10, Moscow 119121, Russia
| | - Svetlana E Novikova
- Institute of Biomedical Chemistry , Pogodinskaya Street, 10, Moscow 119121, Russia
| | - Tatyana E Farafonova
- Institute of Biomedical Chemistry , Pogodinskaya Street, 10, Moscow 119121, Russia
| | - Yana Yu Kiseleva
- Institute of Biomedical Chemistry , Pogodinskaya Street, 10, Moscow 119121, Russia
| | - Sergey P Radko
- Institute of Biomedical Chemistry , Pogodinskaya Street, 10, Moscow 119121, Russia
| | - Igor V Vakhrushev
- Institute of Biomedical Chemistry , Pogodinskaya Street, 10, Moscow 119121, Russia
| | - Konstantin N Yarygin
- Institute of Biomedical Chemistry , Pogodinskaya Street, 10, Moscow 119121, Russia
| | - Sergei A Moshkovskii
- Institute of Biomedical Chemistry , Pogodinskaya Street, 10, Moscow 119121, Russia.,Pirogov Russian National Research Medical University , Ostrovitianov Str. 1, Moscow 117997, Russia
| | - Olga I Kiseleva
- Institute of Biomedical Chemistry , Pogodinskaya Street, 10, Moscow 119121, Russia
| | - Andrey V Lisitsa
- Institute of Biomedical Chemistry , Pogodinskaya Street, 10, Moscow 119121, Russia
| | - Alexey S Sokolov
- Center "Bioengineering" Russian Academy of Sciences , Prospect 60-let Oktyabrya, 7, Build.1, Moscow 119071, Russia
| | - Alexander M Mazur
- Center "Bioengineering" Russian Academy of Sciences , Prospect 60-let Oktyabrya, 7, Build.1, Moscow 119071, Russia
| | - Egor B Prokhortchouk
- Center "Bioengineering" Russian Academy of Sciences , Prospect 60-let Oktyabrya, 7, Build.1, Moscow 119071, Russia
| | - Konstantin G Skryabin
- Center "Bioengineering" Russian Academy of Sciences , Prospect 60-let Oktyabrya, 7, Build.1, Moscow 119071, Russia
| | - Elena S Kostrjukova
- Scientific Research Institute of Physical-Chemical Medicine , Malaya Pirogovskaya, 1a, Moscow 119435, Russia
| | - Alexander V Tyakht
- Scientific Research Institute of Physical-Chemical Medicine , Malaya Pirogovskaya, 1a, Moscow 119435, Russia
| | - Alexey Yu Gorbachev
- Scientific Research Institute of Physical-Chemical Medicine , Malaya Pirogovskaya, 1a, Moscow 119435, Russia
| | - Elena N Ilina
- Scientific Research Institute of Physical-Chemical Medicine , Malaya Pirogovskaya, 1a, Moscow 119435, Russia
| | - Vadim M Govorun
- Scientific Research Institute of Physical-Chemical Medicine , Malaya Pirogovskaya, 1a, Moscow 119435, Russia
| | - Alexander I Archakov
- Institute of Biomedical Chemistry , Pogodinskaya Street, 10, Moscow 119121, Russia
| |
Collapse
|
438
|
Kumar D, Bansal G, Narang A, Basak T, Abbas T, Dash D. Integrating transcriptome and proteome profiling: Strategies and applications. Proteomics 2016; 16:2533-2544. [PMID: 27343053 DOI: 10.1002/pmic.201600140] [Citation(s) in RCA: 108] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Revised: 06/12/2016] [Accepted: 06/23/2016] [Indexed: 12/17/2022]
Abstract
Discovering the gene expression signature associated with a cellular state is one of the basic quests in majority of biological studies. For most of the clinical and cellular manifestations, these molecular differences may be exhibited across multiple layers of gene regulation like genomic variations, gene expression, protein translation and post-translational modifications. These system wide variations are dynamic in nature and their crosstalk is overwhelmingly complex, thus analyzing them separately may not be very informative. This necessitates the integrative analysis of such multiple layers of information to understand the interplay of the individual components of the biological system. Recent developments in high throughput RNA sequencing and mass spectrometric (MS) technologies to probe transcripts and proteins made these as preferred methods for understanding global gene regulation. Subsequently, improvements in "big-data" analysis techniques enable novel conclusions to be drawn from integrative transcriptomic-proteomic analysis. The unified analyses of both these data types have been rewarding for several biological objectives like improving genome annotation, predicting RNA-protein quantities, deciphering gene regulations, discovering disease markers and drug targets. There are different ways in which transcriptomics and proteomics data can be integrated; each aiming for different research objectives. Here, we review various studies, approaches and computational tools targeted for integrative analysis of these two high-throughput omics methods.
Collapse
Affiliation(s)
- Dhirendra Kumar
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA
| | - Gourja Bansal
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA
| | - Ankita Narang
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA
| | - Trayambak Basak
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA.,Academy of Scientific & Innovative Research (AcSIR), CSIR-IGIB South Campus, New Delhi, India
| | - Tahseen Abbas
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA.,Academy of Scientific & Innovative Research (AcSIR), CSIR-IGIB South Campus, New Delhi, India
| | - Debasis Dash
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA. , .,Academy of Scientific & Innovative Research (AcSIR), CSIR-IGIB South Campus, New Delhi, India. ,
| |
Collapse
|
439
|
Deutsch EW, Overall CM, Van Eyk JE, Baker MS, Paik YK, Weintraub ST, Lane L, Martens L, Vandenbrouck Y, Kusebauch U, Hancock WS, Hermjakob H, Aebersold R, Moritz RL, Omenn GS. Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. J Proteome Res 2016; 15:3961-3970. [PMID: 27490519 DOI: 10.1021/acs.jproteome.6b00392] [Citation(s) in RCA: 134] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Every data-rich community research effort requires a clear plan for ensuring the quality of the data interpretation and comparability of analyses. To address this need within the Human Proteome Project (HPP) of the Human Proteome Organization (HUPO), we have developed through broad consultation a set of mass spectrometry data interpretation guidelines that should be applied to all HPP data contributions. For submission of manuscripts reporting HPP protein identification results, the guidelines are presented as a one-page checklist containing 15 essential points followed by two pages of expanded description of each. Here we present an overview of the guidelines and provide an in-depth description of each of the 15 elements to facilitate understanding of the intentions and rationale behind the guidelines, for both authors and reviewers. Broadly, these guidelines provide specific directions regarding how HPP data are to be submitted to mass spectrometry data repositories, how error analysis should be presented, and how detection of novel proteins should be supported with additional confirmatory evidence. These guidelines, developed by the HPP community, are presented to the broader scientific community for further discussion.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology , 401 Terry Avenure North, Seattle, Washington 98109, United States
| | - Christopher M Overall
- Centre for Blood Research, Departments of Oral Biological & Medical Sciences, and Biochemistry & Molecular Biology, Faculty of Dentistry, University of British Columbia , Vancouver, British Columbia V6T 1Z3, Canada
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Research Institute, Department of Medicine, Cedars Sinai Medical Center , Los Angeles, California 90048, United States
| | - Mark S Baker
- Department of Biomedical Sciences, Faculty of Medicine and Health Science, Macquarie University , Sydney, New South Wales 2109, Australia
| | - Young-Ki Paik
- Yonsei Proteome Research Center and Department of Biochemistry, Yonsei University , 50 Yonsei-ro, Sudaemoon-ku, Seoul 120-749, Korea
| | - Susan T Weintraub
- The University of Texas , Health Science Center at San Antonio, San Antonio, Texas 78229, United States
| | - Lydie Lane
- SIB Swiss Institute of Bioinformatics and Department of Human Protein Science, Faculty of Medicine, University of Geneva , CMU, Michel Servet 1, 1211 Geneva 4, Switzerland
| | - Lennart Martens
- Department of Medical Protein Research, VIB , Ghent 9052, Belgium.,Department of Biochemistry, Ghent University , Ghent B-9000, Belgium
| | - Yves Vandenbrouck
- French Proteomics Infrastructure, Biosciences and Biotechnology Institute of Grenoble (BIG), Université Grenoble Alpes, CEA, INSERM , U1038 Grenoble, France
| | - Ulrike Kusebauch
- Institute for Systems Biology , 401 Terry Avenure North, Seattle, Washington 98109, United States
| | - William S Hancock
- Department of Chemical Biology, Northeastern University , Boston, Massachusetts 02115, United States
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus , Hinxton, Cambridge CB10 1SD, United Kingdom.,National Center for Protein Sciences , Beijing 102206, China
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology , ETH Zurich, Zurich 8093, Switzerland.,Faculty of Science, University of Zurich , 8006 Zurich, Switzerland
| | - Robert L Moritz
- Institute for Systems Biology , 401 Terry Avenure North, Seattle, Washington 98109, United States
| | - Gilbert S Omenn
- Institute for Systems Biology , 401 Terry Avenure North, Seattle, Washington 98109, United States.,Departments of Computational Medicine & Bioinformatics, Internal Medicine, and Human Genetics and School of Public Health, University of Michigan , Ann Arbor, Michigan 48109-2218, United States
| |
Collapse
|
440
|
Chatterjee S, Stupp GS, Park SKR, Ducom JC, Yates JR, Su AI, Wolan DW. A comprehensive and scalable database search system for metaproteomics. BMC Genomics 2016; 17:642. [PMID: 27528457 PMCID: PMC4986259 DOI: 10.1186/s12864-016-2855-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 06/21/2016] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations. RESULTS Our approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed "Blazmass") to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy and allowing for a more in-depth characterization of the functional landscape of the samples. CONCLUSIONS The combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomic search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.
Collapse
Affiliation(s)
- Sandip Chatterjee
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Gregory S Stupp
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Sung Kyu Robin Park
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, USA
| | - Jean-Christophe Ducom
- High Performance Computing Technology Core, The Scripps Research Institute, La Jolla, USA
| | - John R Yates
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, USA
| | - Andrew I Su
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037, USA.
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, USA.
| | - Dennis W Wolan
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037, USA.
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, USA.
| |
Collapse
|
441
|
Suo T, Wang H, Li Z. Application of proteomics in research on traditional Chinese medicine. Expert Rev Proteomics 2016; 13:873-81. [DOI: 10.1080/14789450.2016.1220837] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
442
|
Klasberg S, Bitard-Feildel T, Mallet L. Computational Identification of Novel Genes: Current and Future Perspectives. Bioinform Biol Insights 2016; 10:121-31. [PMID: 27493475 PMCID: PMC4970615 DOI: 10.4137/bbi.s39950] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Revised: 05/31/2016] [Accepted: 06/05/2016] [Indexed: 12/31/2022] Open
Abstract
While it has long been thought that all genomic novelties are derived from the existing material, many genes lacking homology to known genes were found in recent genome projects. Some of these novel genes were proposed to have evolved de novo, ie, out of noncoding sequences, whereas some have been shown to follow a duplication and divergence process. Their discovery called for an extension of the historical hypotheses about gene origination. Besides the theoretical breakthrough, increasing evidence accumulated that novel genes play important roles in evolutionary processes, including adaptation and speciation events. Different techniques are available to identify genes and classify them as novel. Their classification as novel is usually based on their similarity to known genes, or lack thereof, detected by comparative genomics or against databases. Computational approaches are further prime methods that can be based on existing models or leveraging biological evidences from experiments. Identification of novel genes remains however a challenging task. With the constant software and technologies updates, no gold standard, and no available benchmark, evaluation and characterization of genomic novelty is a vibrant field. In this review, the classical and state-of-the-art tools for gene prediction are introduced. The current methods for novel gene detection are presented; the methodological strategies and their limits are discussed along with perspective approaches for further studies.
Collapse
Affiliation(s)
- Steffen Klasberg
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, Muenster, Germany
| | - Tristan Bitard-Feildel
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, Muenster, Germany
| | - Ludovic Mallet
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, Muenster, Germany
| |
Collapse
|
443
|
Caruana NJ, Cooke IR, Faou P, Finn J, Hall NE, Norman M, Pineda SS, Strugnell JM. A combined proteomic and transcriptomic analysis of slime secreted by the southern bottletail squid, Sepiadarium austrinum (Cephalopoda). J Proteomics 2016; 148:170-82. [PMID: 27476034 DOI: 10.1016/j.jprot.2016.07.026] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Revised: 07/20/2016] [Accepted: 07/24/2016] [Indexed: 10/21/2022]
Abstract
UNLABELLED Sepiadarium austrinum, the southern bottletail squid, is a small squid that inhabits soft sediments along Australia's south-east coast. When provoked, it rapidly secretes large volumes of slime, presumably as a form of chemical defense. We analyzed the proteomic composition of this slime using tandem mass spectrometry and transcriptomics and found that it was remarkably complex with 1735 identified protein groups (FDR:0.01). To investigate the chemical defense hypothesis we performed an Artemia toxicity assay and used sequence analysis to search for toxin-like molecules. Although the slime did not appear to be toxic to Artemia we found 13 proteins in slime with the hallmarks of toxins, namely cysteine richness, short length, a signal peptide and/or homology to known toxins. These included three short (80-130AA) cysteine rich secreted proteins with no homology to proteins on the NCBI or UniProt databases. Other protein families found included, CAP, phospholipase-B, ShKT-like peptides, peptidase S10, Kunitz BPTI and DNase II. Quantitative analysis using intensity based absolute quantification (iBAQ via MaxQuant) revealed 20 highly abundant proteins, accounting for 67% of iBAQ signal, and three of these were toxin-like. No mucin homologues were found suggesting that the structure of the slime gel may be formed by an unknown mechanism. BIOLOGICAL SIGNIFICANCE This study is the first known instance of a slime secretion from a cephalopod to be analyzed by proteomics methods and is the first investigation of a member of the family Sepiadariidae using proteomic methods. 1735 proteins were identified with 13 of these fitting criteria established for the identification of putative toxins. The slime is dominated by 20 highly abundant proteins with secreted, cysteine rich proteins. The study highlights the importance of 'omics approaches in understanding novel organisms.
Collapse
Affiliation(s)
- Nikeisha J Caruana
- Department of Ecology, Environment and Evolution, School of Life Sciences, La Trobe University, Melbourne, Vic 3086, Australia.
| | - Ira R Cooke
- Department of Molecular and Cell Biology, James Cook University, Townsville, Qld 4811, Australia; Department of Biochemistry and Genetics, La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Vic 3086, Australia
| | - Pierre Faou
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Vic 3086, Australia
| | - Julian Finn
- Sciences, Museum Victoria, Carlton, Vic 3053, Australia
| | - Nathan E Hall
- Life Sciences Computation Centre, Victorian Life Sciences Computation Initiative, Carlton, Vic 3053, Australia
| | - Mark Norman
- Sciences, Museum Victoria, Carlton, Vic 3053, Australia
| | - Sandy S Pineda
- Institute for Molecular Bioscience, The University of Queensland, QLD 4072, Australia
| | - Jan M Strugnell
- Department of Ecology, Environment and Evolution, School of Life Sciences, La Trobe University, Melbourne, Vic 3086, Australia
| |
Collapse
|
444
|
Muth T, Renard BY, Martens L. Metaproteomic data analysis at a glance: advances in computational microbial community proteomics. Expert Rev Proteomics 2016; 13:757-69. [DOI: 10.1080/14789450.2016.1209418] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
445
|
Rosnow JJ, Anderson LN, Nair RN, Baker ES, Wright AT. Profiling microbial lignocellulose degradation and utilization by emergent omics technologies. Crit Rev Biotechnol 2016; 37:626-640. [PMID: 27439855 DOI: 10.1080/07388551.2016.1209158] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
The use of plant materials to generate renewable biofuels and other high-value chemicals is the sustainable and preferable option, but will require considerable improvements to increase the rate and efficiency of lignocellulose depolymerization. This review highlights novel and emerging technologies that are being developed and deployed to characterize the process of lignocellulose degradation. The review will also illustrate how microbial communities deconstruct and metabolize lignocellulose by identifying the necessary genes and enzyme activities along with the reaction products. These technologies include multi-omic measurements, cell sorting and isolation, nuclear magnetic resonance spectroscopy (NMR), activity-based protein profiling, and direct measurement of enzyme activity. The recalcitrant nature of lignocellulose necessitates the need to characterize the methods microbes employ to deconstruct lignocellulose to inform new strategies on how to greatly improve biofuel conversion processes. New technologies are yielding important insights into microbial functions and strategies employed to degrade lignocellulose, providing a mechanistic blueprint in order to advance biofuel production.
Collapse
Affiliation(s)
- Joshua J Rosnow
- a Biological Sciences Division , Pacific Northwest National Laboratory , Richland , WA , USA
| | - Lindsey N Anderson
- a Biological Sciences Division , Pacific Northwest National Laboratory , Richland , WA , USA
| | - Reji N Nair
- a Biological Sciences Division , Pacific Northwest National Laboratory , Richland , WA , USA
| | - Erin S Baker
- a Biological Sciences Division , Pacific Northwest National Laboratory , Richland , WA , USA
| | - Aaron T Wright
- a Biological Sciences Division , Pacific Northwest National Laboratory , Richland , WA , USA
| |
Collapse
|
446
|
Sajjad W, Rafiq M, Ali B, Hayat M, Zada S, Sajjad W, Kumar T. Proteogenomics: New Emerging Technology. HAYATI JOURNAL OF BIOSCIENCES 2016. [DOI: 10.1016/j.hjb.2016.11.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
447
|
Wright JC, Choudhary JS. DecoyPyrat: Fast Non-redundant Hybrid Decoy Sequence Generation for Large Scale Proteomics. JOURNAL OF PROTEOMICS & BIOINFORMATICS 2016; 9:176-180. [PMID: 27418748 PMCID: PMC4941923 DOI: 10.4172/jpb.1000404] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Accurate statistical evaluation of sequence database peptide identifications from tandem mass spectra is essential in mass spectrometry based proteomics experiments. These statistics are dependent on accurately modelling random identifications. The target-decoy approach has risen to become the de facto approach to calculating FDR in proteomic datasets. The main principle of this approach is to search a set of decoy protein sequences that emulate the size and composition of the target protein sequences searched whilst not matching real proteins in the sample. To do this, it is commonplace to reverse or shuffle the proteins and peptides in the target database. However, these approaches have their drawbacks and limitations. A key confounding issue is the peptide redundancy between target and decoy databases leading to inaccurate FDR estimation. This inaccuracy is further amplified at the protein level and when searching large sequence databases such as those used for proteogenomics. Here, we present a unifying hybrid method to quickly and efficiently generate decoy sequences with minimal overlap between target and decoy peptides. We show that applying a reversed decoy approach can produce up to 5% peptide redundancy and many more additional peptides will have the exact same precursor mass as a target peptide. Our hybrid method addresses both these issues by first switching proteolytic cleavage sites with preceding amino acid, reversing the database and then shuffling any redundant sequences. This flexible hybrid method reduces the peptide overlap between target and decoy peptides to about 1% of peptides, making a more robust decoy model suitable for large search spaces. We also demonstrate the anti-conservative effect of redundant peptides on the calculation of q-values in mouse brain tissue data.
Collapse
Affiliation(s)
- James C Wright
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Jyoti S Choudhary
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| |
Collapse
|
448
|
Lobas AA, Karpov DS, Kopylov AT, Solovyeva EM, Ivanov MV, Ilina IY, Lazarev VN, Kuznetsova KG, Ilgisonis EV, Zgoda VG, Gorshkov MV, Moshkovskii SA. Exome-based proteogenomics of HEK-293 human cell line: Coding genomic variants identified at the level of shotgun proteome. Proteomics 2016; 16:1980-91. [DOI: 10.1002/pmic.201500349] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 03/30/2016] [Accepted: 05/24/2016] [Indexed: 12/17/2022]
Affiliation(s)
- Anna A. Lobas
- Institute of Biomedical Chemistry; Moscow Russia
- Institute for Energy Problems of Chemical Physics; Russian Academy of Sciences; Moscow Russia
- Moscow Institute of Physics and Technology (State University); Dolgoprudny Moscow Region Russia
| | - Dmitry S. Karpov
- Institute of Biomedical Chemistry; Moscow Russia
- Engelhardt Institute of Molecular Biology; Russian Academy of Sciences; Moscow Russia
| | | | - Elizaveta M. Solovyeva
- Institute for Energy Problems of Chemical Physics; Russian Academy of Sciences; Moscow Russia
- Moscow Institute of Physics and Technology (State University); Dolgoprudny Moscow Region Russia
| | - Mark V. Ivanov
- Institute for Energy Problems of Chemical Physics; Russian Academy of Sciences; Moscow Russia
- Moscow Institute of Physics and Technology (State University); Dolgoprudny Moscow Region Russia
| | | | - Vassily N. Lazarev
- Research Institute of Physico-Chemical Medicine; Federal Medical and Biological Agency; Moscow Russia
| | | | | | | | - Mikhail V. Gorshkov
- Institute for Energy Problems of Chemical Physics; Russian Academy of Sciences; Moscow Russia
- Moscow Institute of Physics and Technology (State University); Dolgoprudny Moscow Region Russia
| | - Sergei A. Moshkovskii
- Institute of Biomedical Chemistry; Moscow Russia
- Medico-Biological Faculty; Pirogov Russian National Research Medical University (RNRMU); Moscow Russia
| |
Collapse
|
449
|
Kuznetsova KG, Trufanov PV, Moysa AA, Pyatnitskiy MA, Zgoda VG, Gorshkov MV, Moshkovskii SA. Threonine versus isothreonine in synthetic peptides analyzed by high-resolution liquid chromatography/tandem mass spectrometry. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2016; 30:1323-1331. [PMID: 27173114 DOI: 10.1002/rcm.7566] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Revised: 03/15/2016] [Accepted: 03/15/2016] [Indexed: 06/05/2023]
Abstract
RATIONALE One of the problems in proteogenomic research aimed at identification of variant peptides is the presence of peptides with amino acid isomers of different origin in the analyzed samples. Among the most challenging examples are peptides with threonine and isothreonine (homoserine) in their sequences. Indeed, the latter residue may appear in vitro as a methionine substitution during sample preparation for shotgun proteome analysis. Yet, this substitution of Met to isoThr is not encoded genetically and should be unambiguously distinguished from, e.g., point mutations in proteins that result in Met conversion to Thr. METHODS In this work we compared tandem mass (MS/MS) spectra produced by an Orbitrap mass spectrometer of Thr- and isoThr-containing tryptic peptides and found a distinctive feature in their collisionally activated fragmentation patterns. RESULTS Up to 84% of MS/MS spectra for peptides containing isoThr residues have been positively specified. We also studied the differences in retention times for peptides containing Thr isoforms that can be further used for their distinction. CONCLUSIONS Threonine can be distinguished from isothreonine by its retention time and HCD fragmentation pattern, specifically relative intensity of the bn - product ion, which can be further used in proteomic research. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
| | - Pavel V Trufanov
- Institute of Biomedical Chemistry, Moscow, Russia
- Moscow State University, Biological Faculty, Moscow, Russia
| | - Alexander A Moysa
- Institute of Biomedical Chemistry, Moscow, Russia
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
| | | | | | - Mikhail V Gorshkov
- Institute of Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology (State University), Dolgoprudny, Moscow Region, Russia
| | - Sergei A Moshkovskii
- Institute of Biomedical Chemistry, Moscow, Russia
- Pirogov Russian National Medical University, Moscow, Russia
| |
Collapse
|
450
|
Li Y, Wang X, Cho JH, Shaw TI, Wu Z, Bai B, Wang H, Zhou S, Beach TG, Wu G, Zhang J, Peng J. JUMPg: An Integrative Proteogenomics Pipeline Identifying Unannotated Proteins in Human Brain and Cancer Cells. J Proteome Res 2016; 15:2309-20. [PMID: 27225868 DOI: 10.1021/acs.jproteome.6b00344] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Proteogenomics is an emerging approach to improve gene annotation and interpretation of proteomics data. Here we present JUMPg, an integrative proteogenomics pipeline including customized database construction, tag-based database search, peptide-spectrum match filtering, and data visualization. JUMPg creates multiple databases of DNA polymorphisms, mutations, splice junctions, partially trypticity, as well as protein fragments translated from the whole transcriptome in all six frames upon RNA-seq de novo assembly. We use a multistage strategy to search these databases sequentially, in which the performance is optimized by re-searching only unmatched high-quality spectra and reusing amino acid tags generated by the JUMP search engine. The identified peptides/proteins are displayed with gene loci using the UCSC genome browser. Then, the JUMPg program is applied to process a label-free mass spectrometry data set of Alzheimer's disease postmortem brain, uncovering 496 new peptides of amino acid substitutions, alternative splicing, frame shift, and "non-coding gene" translation. The novel protein PNMA6BL specifically expressed in the brain is highlighted. We also tested JUMPg to analyze a stable-isotope labeled data set of multiple myeloma cells, revealing 991 sample-specific peptides that include protein sequences in the immunoglobulin light chain variable region. Thus, the JUMPg program is an effective proteogenomics tool for multiomics data integration.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Hong Wang
- Integrated Biomedical Sciences Program, University of Tennessee Health Science Center , 920 Madison Avenue, Memphis, Tennessee 38163, United States
| | | | - Thomas G Beach
- Banner Sun Health Research Institute , Sun City, Arizona 85351, United States
| | | | | | | |
Collapse
|