1
|
Bouwmeester R, Gabriels R, Van Den Bossche T, Martens L, Degroeve S. The Age of Data-Driven Proteomics: How Machine Learning Enables Novel Workflows. Proteomics 2020; 20:e1900351. [PMID: 32267083 DOI: 10.1002/pmic.201900351] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 03/21/2020] [Indexed: 12/30/2022]
Abstract
A lot of energy in the field of proteomics is dedicated to the application of challenging experimental workflows, which include metaproteomics, proteogenomics, data independent acquisition (DIA), non-specific proteolysis, immunopeptidomics, and open modification searches. These workflows are all challenging because of ambiguity in the identification stage; they either expand the search space and thus increase the ambiguity of identifications, or, in the case of DIA, they generate data that is inherently more ambiguous. In this context, machine learning-based predictive models are now generating considerable excitement in the field of proteomics because these predictive models hold great potential to drastically reduce the ambiguity in the identification process of the above-mentioned workflows. Indeed, the field has already produced classical machine learning and deep learning models to predict almost every aspect of a liquid chromatography-mass spectrometry (LC-MS) experiment. Yet despite all the excitement, thorough integration of predictive models in these challenging LC-MS workflows is still limited, and further improvements to the modeling and validation procedures can still be made. Therefore, highly promising recent machine learning developments in proteomics are pointed out in this viewpoint, alongside some of the remaining challenges.
Collapse
Affiliation(s)
- Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| |
Collapse
|
2
|
Chen Y, Zhou Z, Yang W, Bi N, Xu J, He J, Zhang R, Wang L, Abliz Z. Development of a Data-Independent Targeted Metabolomics Method for Relative Quantification Using Liquid Chromatography Coupled with Tandem Mass Spectrometry. Anal Chem 2017; 89:6954-6962. [PMID: 28574715 DOI: 10.1021/acs.analchem.6b04727] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Quantitative metabolomics approaches can significantly improve the repeatability and reliability of metabolomics investigations but face critical technical challenges, owing to the vast number of unknown endogenous metabolites and the lack of authentic standards. The present study contributes to the development of a novel method known as "data-independent targeted quantitative metabolomics" (DITQM), which was used to investigate the label-free quantitative metabolomics of multiple known and unknown metabolites in biofluid samples. This approach initially involved the acquisition of MS/MS data for all metabolites in biosamples using a sequentially stepped targeted MS/MS (sst-MS/MS) method, in which multiple product ion scans were performed by selecting all ions in the targeted mass ranges as the precursor ions. Subsequently, scheduled multiple reaction monitoring (MRM) by LC-MS/MS of the metabolome was established for 1658 characteristic ion pairs of 1324 metabolites. For sensitive and accurate quantification of these metabolites, mixed calibration curves were generated using sequentially diluted standard reference plasma samples using established MRM methods. Relative concentrations of all metabolites in each sample were calculated without using individual authentic standards. To evaluate the reliability and applicability of this new method, the performance of DITQM was validated by comparison to absolute quantification of 12 acylcarnitines using authentic standards and traditional metabolomics analysis for lung cancer. The results proved that the DITQM protocol is more reliable and can significantly improve clustering effects and repeatability in biomarker discovery. In this study, we established a novel methodology to standardize and quantify large-scale metabolome, providing a new choice for metabolomics research and its clinical applications.
Collapse
Affiliation(s)
- Yanhua Chen
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College , Beijing 100050, P. R. China
| | - Zhi Zhou
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College , Beijing 100050, P. R. China
| | - Wei Yang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College , Beijing 100050, P. R. China.,Center for DMPK Research of Herbal Medicines, Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences , Beijing 100700, P. R. China
| | - Nan Bi
- Cancer Institute and Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College , Beijing 100021, P. R. China
| | - Jing Xu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College , Beijing 100050, P. R. China
| | - Jiuming He
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College , Beijing 100050, P. R. China
| | - Ruiping Zhang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College , Beijing 100050, P. R. China
| | - Lvhua Wang
- Cancer Institute and Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College , Beijing 100021, P. R. China
| | - Zeper Abliz
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College , Beijing 100050, P. R. China.,Centre for Bioimaging & Systems Biology, Minzu University of China , Beijing 100081, P. R. China
| |
Collapse
|
3
|
Ezkurdia I, Calvo E, Del Pozo A, Vázquez J, Valencia A, Tress ML. The potential clinical impact of the release of two drafts of the human proteome. Expert Rev Proteomics 2015; 12:579-93. [PMID: 26496066 PMCID: PMC4732427 DOI: 10.1586/14789450.2015.1103186] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The authors have carried out an investigation of the two "draft maps of the human proteome" published in 2014 in Nature. The findings include an abundance of poor spectra, low-scoring peptide-spectrum matches and incorrectly identified proteins in both these studies, highlighting clear issues with the application of false discovery rates. This noise means that the claims made by the two papers - the identification of high numbers of protein coding genes, the detection of novel coding regions and the draft tissue maps themselves - should be treated with considerable caution. The authors recommend that clinicians and researchers do not use the unfiltered data from these studies. Despite this these studies will inspire further investigation into tissue-based proteomics. As long as this future work has proper quality controls, it could help produce a consensus map of the human proteome and improve our understanding of the processes that underlie health and disease.
Collapse
Affiliation(s)
- Iakes Ezkurdia
- Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Madrid, Spain
| | - Enrique Calvo
- Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Madrid, Spain
| | - Angela Del Pozo
- Instituto de Genetica Medica y Molecular, Hospital Universitario La Paz, Madrid, Spain
| | - Jesús Vázquez
- Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Madrid, Spain
| | - Alfonso Valencia
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Michael L. Tress
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| |
Collapse
|
4
|
Abascal F, Ezkurdia I, Rodriguez-Rivas J, Rodriguez JM, del Pozo A, Vázquez J, Valencia A, Tress ML. Alternatively Spliced Homologous Exons Have Ancient Origins and Are Highly Expressed at the Protein Level. PLoS Comput Biol 2015; 11:e1004325. [PMID: 26061177 PMCID: PMC4465641 DOI: 10.1371/journal.pcbi.1004325] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 05/08/2015] [Indexed: 11/19/2022] Open
Abstract
Alternative splicing of messenger RNA can generate a wide variety of mature RNA transcripts, and these transcripts may produce protein isoforms with diverse cellular functions. While there is much supporting evidence for the expression of alternative transcripts, the same is not true for the alternatively spliced protein products. Large-scale mass spectroscopy experiments have identified evidence of alternative splicing at the protein level, but with conflicting results. Here we carried out a rigorous analysis of the peptide evidence from eight large-scale proteomics experiments to assess the scale of alternative splicing that is detectable by high-resolution mass spectroscopy. We find fewer splice events than would be expected: we identified peptides for almost 64% of human protein coding genes, but detected just 282 splice events. This data suggests that most genes have a single dominant isoform at the protein level. Many of the alternative isoforms that we could identify were only subtly different from the main splice isoform. Very few of the splice events identified at the protein level disrupted functional domains, in stark contrast to the two thirds of splice events annotated in the human genome that would lead to the loss or damage of functional domains. The most striking result was that more than 20% of the splice isoforms we identified were generated by substituting one homologous exon for another. This is significantly more than would be expected from the frequency of these events in the genome. These homologous exon substitution events were remarkably conserved—all the homologous exons we identified evolved over 460 million years ago—and eight of the fourteen tissue-specific splice isoforms we identified were generated from homologous exons. The combination of proteomics evidence, ancient origin and tissue-specific splicing indicates that isoforms generated from homologous exons may have important cellular roles. Alternative splicing is thought to be one means for generating the protein diversity necessary for the whole range of cellular functions. While the presence of alternatively spliced transcripts in the cell has been amply demonstrated, the same cannot be said for alternatively spliced proteins. The quest for alternative protein isoforms has focused primarily on the analysis of peptides from large-scale mass spectroscopy experiments, but evidence for alternative isoforms has been patchy and contradictory. A careful analysis of the peptide evidence is needed to fully understand the scale of alternative splicing detectable at the protein level. Here we analysed peptides from eight large-scale data sets, identifying just 282 splice events among 12,716 genes. This suggests that most genes have a single dominant isoform. Many of the alternative isoforms that we identified were only subtly different from the main splice variant, and one in five was generated by substitution of homologous exons by swapping one related exon for another. Remarkably, the alternative isoforms generated from homologous exons were highly conserved, first appearing 460 million years ago, and several appear to have tissue-specific roles in the brain and heart. Our results suggest that these particular isoforms are likely to have important cellular roles.
Collapse
Affiliation(s)
- Federico Abascal
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Iakes Ezkurdia
- Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Juan Rodriguez-Rivas
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Jose Manuel Rodriguez
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Angela del Pozo
- Instituto de Genetica Medica y Molecular, Hospital Universitario La Paz, Madrid, Spain
| | - Jesús Vázquez
- Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares (CNIC) Madrid, Spain
| | - Alfonso Valencia
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- * E-mail: (AV); (MLT)
| | - Michael L. Tress
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- * E-mail: (AV); (MLT)
| |
Collapse
|
5
|
Proteomic approaches to identify substrates of the three Deg/HtrA proteases of the cyanobacterium Synechocystis sp. PCC 6803. Biochem J 2015; 468:373-84. [PMID: 25877158 DOI: 10.1042/bj20150097] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 04/16/2015] [Indexed: 12/21/2022]
Abstract
The family of Deg/HtrA proteases plays an important role in quality control of cellular proteins in a wide range of organisms. In the genome of the cyanobacterium Synechocystis sp. PCC 6803, a model organism for photosynthetic research and renewable energy products, three Deg proteases are encoded, termed HhoA, HhoB and HtrA. In the present study, we compared wild-type (WT) Synechocystis cells with the single insertion mutants ΔhhoA, ΔhhoB and ΔhtrA. Protein expression of the remaining Deg/HtrA proteases was strongly affected in the single insertion mutants. Detailed proteomic studies using DIGE (difference gel electrophoresis) and N-terminal COFRADIC (N-terminal combined fractional diagonal chromatography) revealed that inactivation of a single Deg protease has similar impact on the proteomes of the three mutants; differences to WT were observed in enzymes involved in the major metabolic pathways. Changes in the amount of phosphate permease system Pst-1 were observed only in the insertion mutant ΔhhoB. N-terminal COFRADIC analyses on cell lysates of ΔhhoB confirmed changed amounts of many cell envelope proteins, including the phosphate permease systems, compared with WT. In vitro COFRADIC studies were performed to identify the specificity profiles of the recombinant proteases rHhoA, rHhoB or rHtrA added to the Synechocystis WT proteome. The combined in vivo and in vitro N-terminal COFRADIC datasets propose RbcS as a natural substrate for HhoA, PsbO for HhoB and HtrA and Pbp8 for HtrA. We therefore suggest that each Synechocystis Deg protease protects the cell through different, but connected mechanisms.
Collapse
|
6
|
Li SJ, Dhaenens M, Garmyn A, Verbrugghe E, Van Rooij P, De Saeger S, Eeckhout M, Ducatelle R, Croubels S, Haesebrouck F, Deforce D, Pasmans F, Martel A. Exposure of Aspergillus fumigatus to T-2 toxin results in a stress response associated with exacerbation of aspergillosis in poultry. WORLD MYCOTOXIN J 2015. [DOI: 10.3920/wmj2014.1765] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Aspergillus fumigatus is a ubiquitous airborne pathogen. Saprophytic growth in the presence of environmental mycotoxins might affect its fitness and virulence. T-2 toxin (T-2) is a trichothecene mycotoxin produced by Fusarium spp. in various substrates. This study aimed to evaluate the effects of T-2 on the fitness of A. fumigatus in vitro and its virulence in experimentally inoculated chickens. We cultured A. fumigatus on agar media containing T-2, and examined the changes in viability, morphology, growth rate, proteome expression, and susceptibility to antimycotics and oxidative stress of this fungus. Results showed that exposure to 1000 ng/ml T-2 in the substrate did not reduce the viability of A. fumigatus, but its growth was inhibited, with wrinkling and depigmentation of the colonies. Proteomic analysis revealed 21 upregulated proteins and 33 downregulated proteins, including those involved in stress response, pathogenesis, metabolism, transcription. The proteome seems to have shifted to enhance the glycolysis, catabolism of lipids, and amino acid conversion. Assays on fungal susceptibility to antimycotics and oxidative stress showed that T-2 exposure did not affect the minimal inhibitory concentrations of amphotericin B, itraconazole, voriconazole and terbinafine against A. fumigatus, but increased the susceptibility of A. fumigatus to H2O2 and menadione. Experimental inoculation of chickens with A. fumigatus showed that exposure of A. fumigatus to T-2 significantly exacerbated aspergillosis in chickens exposed to dietary T-2. In conclusion, A. fumigatus is capable of surviving and growing on substrates containing levels of T-2 up to 1000 ng/ml. Growth in presence of T-2 induces a stress response in A. fumigatus, which is associated with exacerbation of aspergillosis in vivo.
Collapse
Affiliation(s)
- S.-J. Li
- Department of Pathology, Bacteriology and Avian Diseases, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - M. Dhaenens
- Laboratory for Pharmaceutical Biotechnology, Faculty of Pharmaceutical Sciences, Ghent University, Harelbekestraat 72, 9000 Ghent, Belgium
| | - A. Garmyn
- Department of Pathology, Bacteriology and Avian Diseases, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - E. Verbrugghe
- Department of Pathology, Bacteriology and Avian Diseases, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - P. Van Rooij
- Department of Pathology, Bacteriology and Avian Diseases, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - S. De Saeger
- Department of Bio-analysis, Faculty of Pharmaceutical Sciences, Ghent University, Harelbekestraat 72, 9000 Ghent, Belgium
| | - M. Eeckhout
- Department of Applied Biosciences, Faculty of Bio-science Engineering, Ghent University, Valentin Vaerwyckweg 1, 9000 Ghent, Belgium
| | - R. Ducatelle
- Department of Pathology, Bacteriology and Avian Diseases, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - S. Croubels
- Department of Pharmacology, Toxicology and Biochemistry, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - F. Haesebrouck
- Department of Pathology, Bacteriology and Avian Diseases, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - D. Deforce
- Laboratory for Pharmaceutical Biotechnology, Faculty of Pharmaceutical Sciences, Ghent University, Harelbekestraat 72, 9000 Ghent, Belgium
| | - F. Pasmans
- Department of Pathology, Bacteriology and Avian Diseases, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - A. Martel
- Department of Pathology, Bacteriology and Avian Diseases, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| |
Collapse
|
7
|
Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, Vazquez J, Valencia A, Tress ML. Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum Mol Genet 2014; 23:5866-78. [PMID: 24939910 PMCID: PMC4204768 DOI: 10.1093/hmg/ddu309] [Citation(s) in RCA: 333] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.
Collapse
Affiliation(s)
| | - David Juan
- Structural Biology and Bioinformatics Programme and
| | - Jose Manuel Rodriguez
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029, Madrid, Spain
| | - Adam Frankish
- Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, UK and
| | - Mark Diekhans
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA
| | - Jennifer Harrow
- Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, UK and
| | - Jesus Vazquez
- Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Melchor Fernández Almagro, 3, 28029, Madrid, Spain
| | - Alfonso Valencia
- Structural Biology and Bioinformatics Programme and, National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029, Madrid, Spain,
| | | |
Collapse
|
8
|
Lukasse PNJ, America AHP. Protein inference using Peptide quantification patterns. J Proteome Res 2014; 13:3191-9. [PMID: 24815921 DOI: 10.1021/pr401072g] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Determining the list of proteins present in a sample, based on the list of identified peptides, is a crucial step in the untargeted proteomics LC-MS/MS data-processing pipeline. This step, commonly referred to as protein inference, turns out to be a very challenging problem because many peptide sequences are found across multiple proteins. Current protein inference engines typically use peptide to spectrum match (PSM) quality measures and spectral count information to score protein identifications in LC-MS/MS data sets. This is, however, not enough to confidently validate or otherwise rule out many of the proteins. Here we introduce the basis for a new way of performing protein inference based on accurate quantification patterns of identified peptides using the correlation of these patterns to validate peptide to protein matches. For the first implementation of this new approach, we focused on (1) distinguishing between unambiguously and ambiguously identified proteins and (2) generating hypotheses for the discrimination of subsets of the ambiguously identified proteins. Our preprocessing pipelines support both labeled LC-MS/MS or label-free LC-MS followed by LC-MS/MS providing the peptide quantification. We apply our procedure to two published data sets and show that it is able to detect and infer proteins that would otherwise not be confidently inferred.
Collapse
Affiliation(s)
- Pieter N J Lukasse
- Plant Research International, Wageningen UR , P.O. Box 16, 6700AA Wageningen, The Netherlands
| | | |
Collapse
|
9
|
Perez-Riverol Y, Wang R, Hermjakob H, Müller M, Vesada V, Vizcaíno JA. Open source libraries and frameworks for mass spectrometry based proteomics: a developer's perspective. BIOCHIMICA ET BIOPHYSICA ACTA 2014; 1844:63-76. [PMID: 23467006 PMCID: PMC3898926 DOI: 10.1016/j.bbapap.2013.02.032] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2012] [Revised: 02/05/2013] [Accepted: 02/22/2013] [Indexed: 12/23/2022]
Abstract
Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- Department of Proteomics, Center for Genetic Engineering and Biotechnology, Ciudad de la Habana, Cuba
| | - Rui Wang
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Henning Hermjakob
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Markus Müller
- Proteome Informatics Group, Swiss Institute of Bioinformatics, CMU - 1, rue Michel Servet CH-1211 Geneva, Switzerland
| | - Vladimir Vesada
- Department of Proteomics, Center for Genetic Engineering and Biotechnology, Ciudad de la Habana, Cuba
| | - Juan Antonio Vizcaíno
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
10
|
Vaudel M, Sickmann A, Martens L. Introduction to opportunities and pitfalls in functional mass spectrometry based proteomics. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:12-20. [PMID: 23845992 DOI: 10.1016/j.bbapap.2013.06.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2012] [Revised: 06/05/2013] [Accepted: 06/25/2013] [Indexed: 10/26/2022]
Abstract
With the advent of mass spectrometry based proteomics, the identification of thousands of proteins has become commonplace in biology nowadays. Increasingly, efforts have also been invested toward the detection and localization of posttranslational modifications. It is furthermore common practice to quantify the identified entities, a task supported by a panel of different methods. Finally, the results can also be enriched with functional knowledge gained on the proteins, detecting for instance differentially expressed gene ontology terms or biological pathways. In this study, we review the resources, methods and tools available for the researcher to achieve such a quantitative functional analysis. These include statistics for the post-processing of identification and quantification results, online resources and public repositories. With a focus on free but user-friendly software, preferably also open-source, we provide a list of tools designed to help the researcher manage the vast amount of data generated. We also indicate where such applications currently remain lacking. Moreover, we stress the eventual pitfalls of every step of such studies. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
Collapse
Affiliation(s)
- Marc Vaudel
- Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V., Dortmund, Germany; Proteomics Unit (PROBE), Department of Biomedicine, University of Bergen, Bergen, Norway.
| | | | | |
Collapse
|
11
|
Sandin M, Teleman J, Malmström J, Levander F. Data processing methods and quality control strategies for label-free LC-MS protein quantification. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:29-41. [PMID: 23567904 DOI: 10.1016/j.bbapap.2013.03.026] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2012] [Revised: 01/18/2013] [Accepted: 03/08/2013] [Indexed: 12/20/2022]
Abstract
Protein quantification using different LC-MS techniques is becoming a standard practice. However, with a multitude of experimental setups to choose from, as well as a wide array of software solutions for subsequent data processing, it is non-trivial to select the most appropriate workflow for a given biological question. In this review, we highlight different issues that need to be addressed by software for quantitative LC-MS experiments and describe different approaches that are available. With focus on label-free quantification, examples are discussed both for LC-MS/MS and LC-SRM data processing. We further elaborate on current quality control methodology for performing accurate protein quantification experiments. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
Collapse
Affiliation(s)
- Marianne Sandin
- Department of Immunotechnology, Lund University, BMC D13, 22184 Lund, Sweden
| | | | | | | |
Collapse
|
12
|
Sandin M, Ali A, Hansson K, Månsson O, Andreasson E, Resjö S, Levander F. An adaptive alignment algorithm for quality-controlled label-free LC-MS. Mol Cell Proteomics 2013; 12:1407-20. [PMID: 23306530 DOI: 10.1074/mcp.o112.021907] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Label-free quantification using precursor-based intensities is a versatile workflow for large-scale proteomics studies. The method however requires extensive computational analysis and is therefore in need of robust quality control during the data mining stage. We present a new label-free data analysis workflow integrated into a multiuser software platform. A novel adaptive alignment algorithm has been developed to minimize the possible systematic bias introduced into the analysis. Parameters are estimated on the fly from the data at hand, producing a user-friendly analysis suite. Quality metrics are output in every step of the analysis as well as actively incorporated into the parameter estimation. We furthermore show the improvement of this system by comprehensive comparison to classical label-free analysis methodology as well as current state-of-the-art software.
Collapse
Affiliation(s)
- Marianne Sandin
- Department of Immunotechnology, Lund University, BMC D13, 22184 Lund, Sweden
| | | | | | | | | | | | | |
Collapse
|
13
|
MilQuant: A free, generic software tool for isobaric tagging-based quantitation. J Proteomics 2012; 75:5516-22. [DOI: 10.1016/j.jprot.2012.06.028] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2012] [Revised: 06/16/2012] [Accepted: 06/29/2012] [Indexed: 11/21/2022]
|
14
|
Towards a human proteomics atlas. Anal Bioanal Chem 2012; 404:1069-77. [PMID: 22447219 DOI: 10.1007/s00216-012-5940-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2012] [Revised: 03/07/2012] [Accepted: 03/08/2012] [Indexed: 01/01/2023]
Abstract
Proteomics research has taken up an increasingly important role in life sciences over the past few years. Due to a strong push from publishers and funders alike, the community has also started to freely share its data in earnest, making use of public repositories such as the highly popular PRIDE database at EMBL-EBI. Reuse of these publicly available data has so far been confined to rather specific, targeted reanalyses, but this limited reuse is set to expand dramatically as repositories continue to grow exponentially. Examples of large-scale reuse are readily found in other omics disciplines, where more comprehensive public data have already accumulated over longer periods. Here, a typical example of integrative data reuse is provided by the construction of so-called expression atlases. We here therefore investigate the issues involved in using the human data currently stored in the PRIDE database to construct a robust, tissue-specific protein expression atlas from tandem-MS based label-free quantification.
Collapse
|
15
|
Verbrugghe E, Vandenbroucke V, Dhaenens M, Shearer N, Goossens J, De Saeger S, Eeckhout M, D'Herde K, Thompson A, Deforce D, Boyen F, Leyman B, Van Parys A, De Backer P, Haesebrouck F, Croubels S, Pasmans F. T-2 toxin induced Salmonella Typhimurium intoxication results in decreased Salmonella numbers in the cecum contents of pigs, despite marked effects on Salmonella-host cell interactions. Vet Res 2012; 43:22. [PMID: 22440148 PMCID: PMC3362764 DOI: 10.1186/1297-9716-43-22] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2011] [Accepted: 03/22/2012] [Indexed: 02/06/2023] Open
Abstract
The mycotoxin T-2 toxin and Salmonella Typhimurium infections pose a significant threat to human and animal health. Interactions between both agents may result in a different outcome of the infection. Therefore, the aim of the presented study was to investigate the effects of low and relevant concentrations of T-2 toxin on the course of a Salmonella Typhimurium infection in pigs. We showed that the presence of 15 and 83 μg T-2 toxin per kg feed significantly decreased the amount of Salmonella Typhimurium bacteria present in the cecum contents, and a tendency to a reduced colonization of the jejunum, ileum, cecum, colon and colon contents was noticed. In vitro, proteomic analysis of porcine enterocytes revealed that a very low concentration of T-2 toxin (5 ng/mL) affects the protein expression of mitochondrial, endoplasmatic reticulum and cytoskeleton associated proteins, proteins involved in protein synthesis and folding, RNA synthesis, mitogen-activated protein kinase signaling and regulatory processes. Similarly low concentrations (1-100 ng/mL) promoted the susceptibility of porcine macrophages and intestinal epithelial cells to Salmonella Typhimurium invasion, in a SPI-1 independent manner. Furthermore, T-2 toxin (1-5 ng/mL) promoted the translocation of Salmonella Typhimurium over an intestinal porcine epithelial cell monolayer. Although these findings may seem in favour of Salmonella Typhimurium, microarray analysis showed that T-2 toxin (5 ng/mL) causes an intoxication of Salmonella Typhimurium, represented by a reduced motility and a downregulation of metabolic and Salmonella Pathogenicity Island 1 genes. This study demonstrates marked interactions of T-2 toxin with Salmonella Typhimurium pathogenesis, resulting in bacterial intoxication.
Collapse
Affiliation(s)
- Elin Verbrugghe
- Department of Pathology, Bacteriology and Avian Diseases, Faculty of Veterinary Medicine, Ghent University, 9820 Merelbeke, Belgium.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Colaert N, Barsnes H, Vaudel M, Helsens K, Timmerman E, Sickmann A, Gevaert K, Martens L. thermo-msf-parser: An Open Source Java Library to Parse and Visualize Thermo Proteome Discoverer msf Files. J Proteome Res 2011; 10:3840-3. [DOI: 10.1021/pr2005154] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Niklaas Colaert
- Department of Medical Protein Research, VIB, Ghent, Belgium
- Department of Biochemistry, Ghent University, Ghent, Belgium
| | - Harald Barsnes
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Computational Biology Unit, Uni Computing, University of Bergen, Bergen, Norway
| | - Marc Vaudel
- Leibniz-Institut für Analytische Wissenschaften - ISAS–e.V., Dortmund, Germany
| | - Kenny Helsens
- Department of Medical Protein Research, VIB, Ghent, Belgium
- Department of Biochemistry, Ghent University, Ghent, Belgium
| | - Evy Timmerman
- Department of Medical Protein Research, VIB, Ghent, Belgium
- Department of Biochemistry, Ghent University, Ghent, Belgium
| | - Albert Sickmann
- Leibniz-Institut für Analytische Wissenschaften - ISAS–e.V., Dortmund, Germany
- Medizinisches Proteom-Center (MPC), Ruhr - Universität, Bochum, Germany
| | - Kris Gevaert
- Department of Medical Protein Research, VIB, Ghent, Belgium
- Department of Biochemistry, Ghent University, Ghent, Belgium
| | - Lennart Martens
- Department of Medical Protein Research, VIB, Ghent, Belgium
- Department of Biochemistry, Ghent University, Ghent, Belgium
| |
Collapse
|