1
|
Sun Z, Ning Z, Figeys D. The Landscape and Perspectives of the Human Gut Metaproteomics. Mol Cell Proteomics 2024; 23:100763. [PMID: 38608842 PMCID: PMC11098955 DOI: 10.1016/j.mcpro.2024.100763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 02/26/2024] [Accepted: 04/09/2024] [Indexed: 04/14/2024] Open
Abstract
The human gut microbiome is closely associated with human health and diseases. Metaproteomics has emerged as a valuable tool for studying the functionality of the gut microbiome by analyzing the entire proteins present in microbial communities. Recent advancements in liquid chromatography and tandem mass spectrometry (LC-MS/MS) techniques have expanded the detection range of metaproteomics. However, the overall coverage of the proteome in metaproteomics is still limited. While metagenomics studies have revealed substantial microbial diversity and functional potential of the human gut microbiome, few studies have summarized and studied the human gut microbiome landscape revealed with metaproteomics. In this article, we present the current landscape of human gut metaproteomics studies by re-analyzing the identification results from 15 published studies. We quantified the limited proteome coverage in metaproteomics and revealed a high proportion of annotation coverage of metaproteomics-identified proteins. We conducted a preliminary comparison between the metaproteomics view and the metagenomics view of the human gut microbiome, identifying key areas of consistency and divergence. Based on the current landscape of human gut metaproteomics, we discuss the feasibility of using metaproteomics to study functionally unknown proteins and propose a whole workflow peptide-centric analysis. Additionally, we suggest enhancing metaproteomics analysis by refining taxonomic classification and calculating confidence scores, as well as developing tools for analyzing the interaction between taxonomy and function.
Collapse
Affiliation(s)
- Zhongzhi Sun
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada; Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
| | - Zhibin Ning
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
| | - Daniel Figeys
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada; Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada.
| |
Collapse
|
2
|
Bouyssié D, Altıner P, Capella-Gutierrez S, Fernández JM, Hagemeijer YP, Horvatovich P, Hubálek M, Levander F, Mauri P, Palmblad M, Raffelsberger W, Rodríguez-Navas L, Di Silvestre D, Kunkli BT, Uszkoreit J, Vandenbrouck Y, Vizcaíno JA, Winkelhardt D, Schwämmle V. WOMBAT-P: Benchmarking Label-Free Proteomics Data Analysis Workflows. J Proteome Res 2024; 23:418-429. [PMID: 38038272 DOI: 10.1021/acs.jproteome.3c00636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
The inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing different algorithms for various tasks such as peptide-spectrum matching, protein inference, quantification, statistical analysis, and visualization. To enable an unbiased comparison of commonly used bottom-up label-free proteomics workflows, we introduce WOMBAT-P, a versatile platform designed for automated benchmarking and comparison. WOMBAT-P simplifies the processing of public data by utilizing the sample and data relationship format for proteomics (SDRF-Proteomics) as input. This feature streamlines the analysis of annotated local or public ProteomeXchange data sets, promoting efficient comparisons among diverse outputs. Through an evaluation using experimental ground truth data and a realistic biological data set, we uncover significant disparities and a limited overlap in the quantified proteins. WOMBAT-P not only enables rapid execution and seamless comparison of workflows but also provides valuable insights into the capabilities of different software solutions. These benchmarking metrics are a valuable resource for researchers in selecting the most suitable workflow for their specific data sets. The modular architecture of WOMBAT-P promotes extensibility and customization. The software is available at https://github.com/wombat-p/WOMBAT-Pipelines.
Collapse
Affiliation(s)
- David Bouyssié
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, Université Toulouse III─Paul Sabatier (UT3), 31062 Toulouse, France
- Proteomics French Infrastructure, ProFI, FR 2048 Toulouse, France
| | - Pınar Altıner
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, Université Toulouse III─Paul Sabatier (UT3), 31062 Toulouse, France
| | | | - José M Fernández
- Life Sciences Department, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Yanick Paco Hagemeijer
- Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, 9712 CP Groningen, The Netherlands
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, 9713 GZ Groningen, The Netherlands
| | - Peter Horvatovich
- Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, 9712 CP Groningen, The Netherlands
| | - Martin Hubálek
- Institute of Organic Chemistry and Biochemistry, CAS, 160 00 Prague, Czech Republic
| | - Fredrik Levander
- National Bioinformatics Infrastructure Sweden (NBIS), Science for Life Laboratory, Department of Immunotechnology, Lund University, 22100 Lund, Sweden
| | - Pierluigi Mauri
- Institute for Biomedical Technologies (ITB), Department of Biomedical Sciences, National Research Council (CNR), Segrate, 20054 Milan, Italy
| | - Magnus Palmblad
- Leiden University Medical Center, Postbus 9600, 2300 RC Leiden, The Netherlands
| | - Wolfgang Raffelsberger
- Wolfgang Raffelsberger: Institut de Génétique et de Biologie Moléculaire et Cellulaire, Université de Strasbourg, CNRS UMR7104, INSERM U1258, Illkirch, 1 Rue Laurent Fries, 67404 Illkirch, France
| | - Laura Rodríguez-Navas
- Life Sciences Department, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Dario Di Silvestre
- Institute for Biomedical Technologies (ITB), Department of Biomedical Sciences, National Research Council (CNR), Segrate, 20054 Milan, Italy
| | - Balázs Tibor Kunkli
- Balázs Tibor Kunkli: Department of Biochemistry and Molecular Biology, University of Debrecen, 4032 Debrecen, Hungary
| | - Julian Uszkoreit
- Medical Faculty, Medical Bioinformatics, Ruhr University Bochum, 44801 Bochum, Germany
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany
- Medical Faculty, Medizinisches Proteom-Center, Ruhr University Bochum, 44801 Bochum, Germany
| | - Yves Vandenbrouck
- Proteomics French Infrastructure, ProFI, FR 2048 Toulouse, France
- CEA, Fundamental Research Division, Proteomics French Infrastructure, 91191 Gif-sur-Yvette, France
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI), Wellcome Trust, Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Dirk Winkelhardt
- Medical Faculty, Medizinisches Proteom-Center, Ruhr University Bochum, 44801 Bochum, Germany
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark
| |
Collapse
|
3
|
Uszkoreit J, Palmblad M, Schwämmle V. Tackling reproducibility: lessons for the proteomics community. Expert Rev Proteomics 2024; 21:9-11. [PMID: 38362700 DOI: 10.1080/14789450.2024.2320166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/03/2024] [Indexed: 02/17/2024]
Affiliation(s)
| | - Magnus Palmblad
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, Netherlands
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
4
|
Lazear MR. Sage: An Open-Source Tool for Fast Proteomics Searching and Quantification at Scale. J Proteome Res 2023; 22:3652-3659. [PMID: 37819886 DOI: 10.1021/acs.jproteome.3c00486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
The growing complexity and volume of proteomics data necessitate the development of efficient software tools for peptide identification and quantification from mass spectra. Given their central role in proteomics, it is imperative that these tools are auditable and extensible─requirements that are best fulfilled by open-source and permissively licensed software. This work presents Sage, a high-performance, open-source, and freely available proteomics pipeline. Scalable and cloud-ready, Sage matches the performance of state-of-the-art software tools while running an order of magnitude faster.
Collapse
Affiliation(s)
- Michael R Lazear
- Belharra Therapeutics, 3985 Sorrento Valley Boulevard Suite C, San Diego, California 92121, United States
| |
Collapse
|
5
|
De La Toba EA, Anapindi KDB, Sweedler JV. Assessment and Comparison of Database Search Engines for Peptidomic Applications. J Proteome Res 2023; 22:3123-3134. [PMID: 36809008 PMCID: PMC10440370 DOI: 10.1021/acs.jproteome.2c00307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
Protein database search engines are an integral component of mass spectrometry-based peptidomic analyses. Given the unique computational challenges of peptidomics, many factors must be taken into consideration when optimizing search engine selection, as each platform has different algorithms by which tandem mass spectra are scored for subsequent peptide identifications. In this study, four different database search engines, PEAKS, MS-GF+, OMSSA, and X! Tandem, were compared with Aplysia californica and Rattus norvegicus peptidomics data sets, and various metrics were assessed such as the number of unique peptide and neuropeptide identifications, and peptide length distributions. Given the tested conditions, PEAKS was found to have the highest number of peptide and neuropeptide identifications out of the four search engines in both data sets. Furthermore, principal component analysis and multivariate logistic regression were employed to determine whether specific spectral features contribute to false C-terminal amidation assignments by each search engine. From this analysis, it was found that the primary features influencing incorrect peptide assignments were the precursor and fragment ion m/z errors. Finally, an assessment employing a mixed species protein database was performed to evaluate search engine precision and sensitivity when searched against an enlarged search space containing human proteins.
Collapse
Affiliation(s)
- Eduardo A. De La Toba
- Beckman Institute of Advanced Science and Technology, University of Illinois at Urbana-Champaign, 61801
- Department of Chemistry, University of Illinois at Urbana-Champaign, 61801
| | - Krishna D. B. Anapindi
- Beckman Institute of Advanced Science and Technology, University of Illinois at Urbana-Champaign, 61801
- Department of Chemistry, University of Illinois at Urbana-Champaign, 61801
| | - Jonathan V. Sweedler
- Beckman Institute of Advanced Science and Technology, University of Illinois at Urbana-Champaign, 61801
- Department of Chemistry, University of Illinois at Urbana-Champaign, 61801
| |
Collapse
|
6
|
Di Renzo T, Reale A, Nazzaro S, Siano F, Addeo F, Picariello G. Shotgun proteomics for the identification of yeasts responsible for pink/red discoloration in commercial dairy products. Food Res Int 2023; 169:112945. [PMID: 37254369 DOI: 10.1016/j.foodres.2023.112945] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 04/27/2023] [Accepted: 05/02/2023] [Indexed: 06/01/2023]
Abstract
Pink/red discoloration encompasses a series of relatively common spoilage defects of commercial dairy products. In this study, we used shotgun proteomics to identify the microorganism responsible for the production of intensely red-coloured slimes found on the surface of freshly opened commercial spreadable cheese and yogurt samples. Proteome-wide characterization of microbial proteins allowed to identify 1042 and 687 gene products from Rhodotorula spp. in spreadable cheese and yogurt samples, respectively, while no significant protein scores from other microorganisms were recorded. Subsequent microbiological analyses and sequencing of the 26S rRNA gene region supported the proteomic results demonstrating that the microorganism involved was Rhodotorula mucilaginosa, a carotenoid - producing basidiomycetous that can be potentially pathogenic to humans, especially for immunocompromised individuals. This is the first time that shotgun proteomics has been used to identify a microorganism responsible for spoilage in dairy products, proposing it as a relatively fast, sensitive, and reliable alternative or complement to conventional methods for microbial identification.
Collapse
Affiliation(s)
- Tiziana Di Renzo
- Institute of Food Sciences, National Research Council, Via Roma, 64, 83100 Avellino, Italy
| | - Anna Reale
- Institute of Food Sciences, National Research Council, Via Roma, 64, 83100 Avellino, Italy.
| | - Stefania Nazzaro
- Institute of Food Sciences, National Research Council, Via Roma, 64, 83100 Avellino, Italy
| | - Francesco Siano
- Institute of Food Sciences, National Research Council, Via Roma, 64, 83100 Avellino, Italy
| | - Francesco Addeo
- Department of Agricultural Sciences, University of Naples "Federico II", Via Università 100, Parco Gussone, Portici, 80055 Naples, Italy
| | - Gianluca Picariello
- Institute of Food Sciences, National Research Council, Via Roma, 64, 83100 Avellino, Italy
| |
Collapse
|
7
|
Bai M, Deng J, Dai C, Pfeuffer J, Sachsenberg T, Perez-Riverol Y. LFQ-Based Peptide and Protein Intensity Differential Expression Analysis. J Proteome Res 2023. [PMID: 37220883 DOI: 10.1021/acs.jproteome.2c00812] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Testing for significant differences in quantities at the protein level is a common goal of many LFQ-based mass spectrometry proteomics experiments. Starting from a table of protein and/or peptide quantities from a given proteomics quantification software, many tools and R packages exist to perform the final tasks of imputation, summarization, normalization, and statistical testing. To evaluate the effects of packages and settings in their substeps on the final list of significant proteins, we studied several packages on three public data sets with known expected protein fold changes. We found that the results between packages and even across different parameters of the same package can vary significantly. In addition to usability aspects and feature/compatibility lists of different packages, this paper highlights sensitivity and specificity trade-offs that come with specific packages and settings.
Collapse
Affiliation(s)
- Mingze Bai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Jingwen Deng
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| | - Chengxin Dai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Julianus Pfeuffer
- Algorithmic Bioinformatics, Freie Universität Berlin, Berlin 14195, Germany
- Visualization and Data Analysis, Zuse Institute Berlin, Berlin 14195, Germany
| | - Timo Sachsenberg
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen 72076, Germany
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hixton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
8
|
Abstract
Proteins are the key biological actors within cells, driving many biological processes integral to both healthy and diseased states. Understanding the depth of complexity represented within the proteome is crucial to our scientific understanding of cellular biology and to provide disease specific insights for clinical applications. Mass spectrometry-based proteomics is the premier method for proteome analysis, with the ability to both identify and quantify proteins. Although proteomics continues to grow as a robust field of bioanalytical chemistry, advances are still necessary to enable a more comprehensive view of the proteome. In this review, we provide a broad overview of mass spectrometry-based proteomics in general, and highlight four developing areas of bottom-up proteomics: (1) protein inference, (2) alternative proteases, (3) sample-specific databases and (4) post-translational modification discovery.
Collapse
Affiliation(s)
- Rachel M Miller
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA.
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
9
|
Reanalysis of ProteomicsDB Using an Accurate, Sensitive, and Scalable False Discovery Rate Estimation Approach for Protein Groups. Mol Cell Proteomics 2022; 21:100437. [PMID: 36328188 PMCID: PMC9718969 DOI: 10.1016/j.mcpro.2022.100437] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 10/16/2022] [Accepted: 10/28/2022] [Indexed: 11/07/2022] Open
Abstract
Estimating false discovery rates (FDRs) of protein identification continues to be an important topic in mass spectrometry-based proteomics, particularly when analyzing very large datasets. One performant method for this purpose is the Picked Protein FDR approach which is based on a target-decoy competition strategy on the protein level that ensures that FDRs scale to large datasets. Here, we present an extension to this method that can also deal with protein groups, that is, proteins that share common peptides such as protein isoforms of the same gene. To obtain well-calibrated FDR estimates that preserve protein identification sensitivity, we introduce two novel ideas. First, the picked group target-decoy and second, the rescued subset grouping strategies. Using entrapment searches and simulated data for validation, we demonstrate that the new Picked Protein Group FDR method produces accurate protein group-level FDR estimates regardless of the size of the data set. The validation analysis also uncovered that applying the commonly used Occam's razor principle leads to anticonservative FDR estimates for large datasets. This is not the case for the Picked Protein Group FDR method. Reanalysis of deep proteomes of 29 human tissues showed that the new method identified up to 4% more protein groups than MaxQuant. Applying the method to the reanalysis of the entire human section of ProteomicsDB led to the identification of 18,000 protein groups at 1% protein group-level FDR. The analysis also showed that about 1250 genes were represented by ≥2 identified protein groups. To make the method accessible to the proteomics community, we provide a software tool including a graphical user interface that enables merging results from multiple MaxQuant searches into a single list of identified and quantified protein groups.
Collapse
|
10
|
Characterization of peptide-protein relationships in protein ambiguity groups via bipartite graphs. PLoS One 2022; 17:e0276401. [PMID: 36269744 PMCID: PMC9586388 DOI: 10.1371/journal.pone.0276401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 10/05/2022] [Indexed: 11/06/2022] Open
Abstract
In bottom-up proteomics, proteins are enzymatically digested into peptides before measurement with mass spectrometry. The relationship between proteins and their corresponding peptides can be represented by bipartite graphs. We conduct a comprehensive analysis of bipartite graphs using quantified peptides from measured data sets as well as theoretical peptides from an in silico digestion of the corresponding complete taxonomic protein sequence databases. The aim of this study is to characterize and structure the different types of graphs that occur and to compare them between data sets. We observed a large influence of the accepted minimum peptide length during in silico digestion. When changing from theoretical peptides to measured ones, the graph structures are subject to two opposite effects. On the one hand, the graphs based on measured peptides are on average smaller and less complex compared to graphs using theoretical peptides. On the other hand, the proportion of protein nodes without unique peptides, which are a complicated case for protein inference and quantification, is considerably larger for measured data. Additionally, the proportion of graphs containing at least one protein node without unique peptides rises when going from database to quantitative level. The fraction of shared peptides and proteins without unique peptides as well as the complexity and size of the graphs highly depends on the data set and organism. Large differences between the structures of bipartite peptide-protein graphs have been observed between database and quantitative level as well as between analyzed species. In the analyzed measured data sets, the proportion of protein nodes without unique peptides ranged from 6.4% to 55.0%. This highlights the need for novel methods that can quantify proteins without unique peptides. The knowledge about the structure of the bipartite peptide-protein graphs gained in this study will be useful for the development of such algorithms.
Collapse
|
11
|
Tsiamis V, Schwämmle V. VIQoR: a web service for visually supervised protein inference and protein quantification. Bioinformatics 2022; 38:2757-2764. [PMID: 35561162 DOI: 10.1093/bioinformatics/btac182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 03/07/2022] [Accepted: 03/22/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION In quantitative bottom-up mass spectrometry (MS)-based proteomics, the reliable estimation of protein concentration changes from peptide quantifications between different biological samples is essential. This estimation is not a single task but comprises the two processes of protein inference and protein abundance summarization. Furthermore, due to the high complexity of proteomics data and associated uncertainty about the performance of these processes, there is a demand for comprehensive visualization methods able to integrate protein with peptide quantitative data including their post-translational modifications. Hence, there is a lack of a suitable tool that provides post-identification quantitative analysis of proteins with simultaneous interactive visualization. RESULTS In this article, we present VIQoR, a user-friendly web service that accepts peptide quantitative data of both labeled and label-free experiments and accomplishes the crucial components protein inference and summarization and interactive visualization modules, including the novel VIQoR plot. We implemented two different parsimonious algorithms to solve the protein inference problem, while protein summarization is facilitated by a well-established factor analysis algorithm called fast-FARMS followed by a weighted average summarization function that minimizes the effect of missing values. In addition, summarization is optimized by the so-called Global Correlation Indicator (GCI). We test the tool on three publicly available ground truth datasets and demonstrate the ability of the protein inference algorithms to handle shared peptides. We furthermore show that GCI increases the accuracy of the quantitative analysis in datasets with replicated design. AVAILABILITY AND IMPLEMENTATION VIQoR is accessible at: http://computproteomics.bmb.sdu.dk/Apps/VIQoR/. The source code is available at: https://bitbucket.org/veitveit/viqor/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vasileios Tsiamis
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense, Denmark
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense, Denmark
| |
Collapse
|
12
|
Protocol for Increasing the Sensitivity of MS-Based Protein Detection in Human Chorionic Villi. Curr Issues Mol Biol 2022; 44:2069-2088. [PMID: 35678669 PMCID: PMC9164042 DOI: 10.3390/cimb44050140] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 05/06/2022] [Accepted: 05/07/2022] [Indexed: 11/17/2022] Open
Abstract
An important step in the proteomic analysis of missing proteins is the use of a wide range of tissues, optimal extraction, and the processing of protein material in order to ensure the highest sensitivity in downstream protein detection. This work describes a purification protocol for identifying low-abundance proteins in human chorionic villi using the proposed “1DE-gel concentration” method. This involves the removal of SDS in a short electrophoresis run in a stacking gel without protein separation. Following the in-gel digestion of the obtained holistic single protein band, we used the peptide mixture for further LC–MS/MS analysis. Statistically significant results were derived from six datasets, containing three treatments, each from two tissue sources (elective or missed abortions). The 1DE-gel concentration increased the coverage of the chorionic villus proteome. Our approach allowed the identification of 15 low-abundance proteins, of which some had not been previously detected via the mass spectrometry of trophoblasts. In the post hoc data analysis, we found a dubious or uncertain protein (PSG7) encoded on human chromosome 19 according to neXtProt. A proteomic sample preparation workflow with the 1DE-gel concentration can be used as a prospective tool for uncovering the low-abundance part of the human proteome.
Collapse
|
13
|
Miller RM, Jordan BT, Mehlferber MM, Jeffery ED, Chatzipantsiou C, Kaur S, Millikin RJ, Dai Y, Tiberi S, Castaldi PJ, Shortreed MR, Luckey CJ, Conesa A, Smith LM, Deslattes Mays A, Sheynkman GM. Enhanced protein isoform characterization through long-read proteogenomics. Genome Biol 2022; 23:69. [PMID: 35241129 PMCID: PMC8892804 DOI: 10.1186/s13059-022-02624-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 02/02/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full-length protein isoforms. RESULTS We describe here a long-read proteogenomics approach for integrating sample-matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data to enable detection of protein isoforms previously intractable to MS-based detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis. CONCLUSIONS Our work suggests that the incorporation of long-read sequencing and proteomic data can facilitate improved characterization of human protein isoform diversity. Our first-generation pipeline provides a strong foundation for future development of long-read proteogenomics and its adoption for both basic and translational research.
Collapse
Affiliation(s)
- Rachel M. Miller
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Ben T. Jordan
- grid.27755.320000 0000 9136 933XDepartment of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA USA
| | - Madison M. Mehlferber
- grid.27755.320000 0000 9136 933XDepartment of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA USA ,grid.27755.320000 0000 9136 933XDepartment of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA USA
| | - Erin D. Jeffery
- grid.27755.320000 0000 9136 933XDepartment of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA USA
| | | | - Simi Kaur
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Robert J. Millikin
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Yunxiang Dai
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Simone Tiberi
- grid.7400.30000 0004 1937 0650Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland ,grid.7400.30000 0004 1937 0650Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Peter J. Castaldi
- grid.62560.370000 0004 0378 8294Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA USA ,grid.62560.370000 0004 0378 8294Division of General Medicine and Primary Care, Brigham and Women’s Hospital, Boston, MA USA
| | - Michael R. Shortreed
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Chance John Luckey
- grid.27755.320000 0000 9136 933XDepartment of Pathology, University of Virginia, Charlottesville, VA USA
| | - Ana Conesa
- grid.4711.30000 0001 2183 4846Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain ,grid.15276.370000 0004 1936 8091Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL USA
| | - Lloyd M. Smith
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Anne Deslattes Mays
- grid.420089.70000 0000 9635 8082 Office of Data Science and Sharing, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Rockville, MD USA
| | - Gloria M. Sheynkman
- grid.27755.320000 0000 9136 933XDepartment of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA USA ,grid.27755.320000 0000 9136 933XCenter for Public Health Genomics, University of Virginia, Charlottesville, VA USA ,grid.27755.320000 0000 9136 933XUVA Cancer Center, University of Virginia, Charlottesville, VA USA
| |
Collapse
|
14
|
Schallert K, Verschaffelt P, Mesuere B, Benndorf D, Martens L, Van Den Bossche T. Pout2Prot: An Efficient Tool to Create Protein (Sub)groups from Percolator Output Files. J Proteome Res 2022; 21:1175-1180. [PMID: 35143215 DOI: 10.1021/acs.jproteome.1c00685] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In metaproteomics, the study of the collective proteome of microbial communities, the protein inference problem is more challenging than in single-species proteomics. Indeed, a peptide sequence can be present not only in multiple proteins or protein isoforms of the same species, but also in homologous proteins from closely related species. To assign the taxonomy and functions of the microbial species, specialized tools have been developed, such as Prophane. This tool, however, is not directly compatible with post-processing tools such as Percolator. In this manuscript we therefore present Pout2Prot, which takes Percolator Output (.pout) files from multiple experiments and creates protein group and protein subgroup output files (.tsv) that can be used directly with Prophane. We investigated different grouping strategies and compared existing protein grouping tools to develop an advanced protein grouping algorithm that offers a variety of different approaches, allows grouping for multiple files, and uses a weighted spectral count for protein (sub)groups to reflect abundance. Pout2Prot is available as a web application at https://pout2prot.ugent.be and is installable via pip as a standalone command line tool and reusable software library. All code is open source under the Apache License 2.0 and is available at https://github.com/compomics/pout2prot.
Collapse
Affiliation(s)
- Kay Schallert
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, 39104 Magdeburg, Germany.,Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, 39104 Magdeburg, Germany
| | - Pieter Verschaffelt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium.,VIB - UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
| | - Bart Mesuere
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium.,VIB - UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
| | - Dirk Benndorf
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, 39104 Magdeburg, Germany.,Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, 39104 Magdeburg, Germany.,Microbiology, Department of Applied Biosciences and Process Technology, Anhalt University of Applied Sciences, 06366 Köthen, Germany
| | - Lennart Martens
- VIB - UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
| | - Tim Van Den Bossche
- VIB - UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
15
|
Svecla M, Garrone G, Faré F, Aletti G, Norata GD, Beretta G. DDASSQ: an open-source, multiple peptide sequencing strategy for label free quantification based on an OpenMS pipeline in the KNIME analytics platform. Proteomics 2021; 21:e2000319. [PMID: 34312990 PMCID: PMC8459258 DOI: 10.1002/pmic.202000319] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 07/08/2021] [Accepted: 07/12/2021] [Indexed: 11/16/2022]
Abstract
In this study we investigated the performance of a computational pipeline for protein identification and label free quantification (LFQ) of LC–MS/MS data sets from experimental animal tissue samples, as well as the impact of its specific peptide search combinatorial approach. The full pipeline workflow was composed of peptide search engine adapters based on different identification algorithms, in the frame of the open‐source OpenMS software running within the KNIME analytics platform. Two different in silico tryptic digestion, database‐search assisted approaches (X!Tandem and MS‐GF+), de novo peptide sequencing based on Novor and consensus library search (SpectraST), were tested for the processing of LC‐MS/MS raw data files obtained from proteomic LC‐MS experiments done on proteolytic extracts from mouse ex vivo liver samples. The results from proteomic LFQ were compared to those based on the application of the two software tools MaxQuant and Proteome Discoverer for protein inference and label‐free data analysis in shotgun proteomics. Data are available via ProteomeXchange with identifier PXD025097.
Collapse
Affiliation(s)
- Monika Svecla
- Department of Excellence of Pharmacological and Biomolecular Sciences, University of Milan, Milan, Italy
| | | | | | - Giacomo Aletti
- Department of Environmental Science and Policy, University of Milan, Milan, Italy
| | - Giuseppe Danilo Norata
- Department of Excellence of Pharmacological and Biomolecular Sciences, University of Milan, Milan, Italy.,Centro Studio Aterosclerosi, Bassini Hospital, Cinisello Balsamo, Milan, Italy
| | - Giangiacomo Beretta
- Department of Environmental Science and Policy, University of Milan, Milan, Italy
| |
Collapse
|
16
|
Important Issues in Planning a Proteomics Experiment: Statistical Considerations of Quantitative Proteomic Data. Methods Mol Biol 2021; 2228:1-20. [PMID: 33950479 DOI: 10.1007/978-1-0716-1024-4_1] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/15/2023]
Abstract
Mass spectrometry is frequently used in quantitative proteomics to detect differentially regulated proteins. A very important but unfortunately oftentimes neglected part in detecting differential proteins is the statistical analysis. Data from proteomics experiments are usually high-dimensional and hence require profound statistical methods. It is especially important to already correctly design a proteomic experiment before it is conducted in the laboratory. Only this can ensure that the statistical analysis is capable of detecting truly differential proteins afterward. This chapter thus covers aspects of both statistical planning as well as the actual analysis of quantitative proteomic experiments.
Collapse
|
17
|
Comparative database search engine analysis on massive tandem mass spectra of pork-based food products for halal proteomics. J Proteomics 2021; 241:104240. [PMID: 33894373 DOI: 10.1016/j.jprot.2021.104240] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 04/05/2021] [Accepted: 04/09/2021] [Indexed: 11/22/2022]
Abstract
Mass spectrometry-based proteomics relies on dedicated software for peptide and protein identification. These software include open-source or commercial-based search engines; wherein, they employ different algorithms to establish their scoring and identified proteins. Although previous comparative studies have differentiated the proteomics results from different software, there are still yet studies specifically been conducted to compare and evaluate the search engine in the field of halal analysis. This is important because the halal analysis is often using commercial meat samples that have been subjected to various processing, further complicating its analysis. Thus, this study aimed to assess three open-source search engines (Comet, X! Tandem, and ProteinProspector) and a commercial-based search engine (ProteinPilot™) against 135 raw tandem mass spectrometry data files from 15 types of pork-based food products for halal analysis. Each database search engine contained high false-discovery rate (FDR); however, a post-searching algorithm called PeptideProphet managed to reduce the FDR, except for ProteinProspector and ProteinPilot™. From this study, the combined database search engine (executed by iProphet) reveals a thorough protein list for pork-based food products; wherein the most abundant proteins are myofibrillar proteins. Thus, this proteomics study will aid the identification of potential peptide and protein biomarkers for future precision halal analysis. SIGNIFICANCE: A critical challenge of halal proteomics is the availability of a database to confirm the inferential peptides as well as proteins. Currently, the established database such as UniProtKB is related to animal proteome; however, the halal proteomics is related to the highly processed meat-based food products. This study highlights the use of different database search engines (Comet, X! Tandem, ProteinProspector, and ProteinPilot™) and their respective algorithms to analyse 135 raw tandem mass spectrometry data files from 15 types of pork-based food products. This is the first attempt that has compared different database search engines in the context of halal proteomics to ensure the effectiveness of controlling the FDR. Previous studies were just focused on the advantages of a certain algorithm over another. Moreover, other previous studies also have mainly reported the use of mass spectrometry-based shotgun proteomics for meat authentication (the most similar field to halal analysis), but none of the studies were reported on halal aspects that used samples originated from highly processed food products. Hence, a systematic comparative study is duly needed for a more comprehensive and thorough proteomics analysis for such samples. In this study, our combinatorial approach for halal proteomics results from the different search engines used (Comet, X! Tandem, and ProteinProspector) has successfully generated a comprehensive spectral library for the pork-based meat products. This combined spectral library is freely available at https://data.mendeley.com/datasets/6dmm8659rm/3. Thus far, this is the first and new attempt at establishing a spectral library for halal proteomics. We also believe this study is a pioneer for halal proteomics that aimed at non-conventional and non-model organism proteomics, protein analytics, protein bioinformatics, and potential biomarker discovery.
Collapse
|
18
|
Dutta D, Rahman S, Bhattacharje G, Bag S, Sing BC, Chatterjee J, Basak A, Das AK. Label-Free Method Development for Hydroxyproline PTM Mapping in Human Plasma Proteome. Protein J 2021; 40:741-755. [PMID: 33840009 DOI: 10.1007/s10930-021-09984-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/01/2021] [Indexed: 11/29/2022]
Abstract
Post-translational modifications (PTMs) impart structural heterogeneities that can alter plasma proteins' functions in various pathophysiological processes. However, the identification and mapping of PTMs in untargeted plasma proteomics is still a challenge due to the presence of diverse components in blood. Here, we report a label-free method for identifying and mapping hydroxylated proteins using tandem mass spectrometry (MS/MS) in the human plasma sample. Our untargeted proteomics approach led us to identify 676 de novo sequenced peptides in human plasma that correspond to 201 proteins, out of which 11 plasma proteins were found to be hydroxylated. Among these hydroxylated proteins, Immunoglobulin A1 (IgA1) heavy chain was found to be modified at residue 285 (Pro285 to Hyp285), which was further validated by MS/MS study. Molecular dynamics (MD) simulation analysis demonstrated that this proline hydroxylation in IgA1 caused both local and global structural changes. Overall, this study provides a comprehensive understanding of the protein profile containing Hyp PTMs in human plasma and shows the future perspective of identifying and discriminating Hyp PTM in the normal and the diseased proteomes.
Collapse
Affiliation(s)
- Debabrata Dutta
- Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, India.,Advanced Technology Development Centre, Indian Institute of Technology Kharagpur, Kharagpur, 721302, India
| | - Shakilur Rahman
- Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, India
| | - Gourab Bhattacharje
- Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, India
| | - Swarnendu Bag
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, India
| | - Bidhan Chandra Sing
- Central Research Facility, Indian Institute of Technology Kharagpur, Kharagpur, 721302, India
| | - Jyotirmoy Chatterjee
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, India
| | - Amit Basak
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur, 721302, India.,School of Bioscience, Indian Institute of Technology Kharagpur, Kharagpur, 721302, India
| | - Amit Kumar Das
- Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, India. .,School of Bioscience, Indian Institute of Technology Kharagpur, Kharagpur, 721302, India.
| |
Collapse
|
19
|
Meyer JG. Qualitative and Quantitative Shotgun Proteomics Data Analysis from Data-Dependent Acquisition Mass Spectrometry. Methods Mol Biol 2021; 2259:297-308. [PMID: 33687723 DOI: 10.1007/978-1-0716-1178-4_19] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Shotgun proteomics is the inferential analysis of proteoforms using peptide proxies produced by enzyme-catalyzed hydrolysis of entire proteomes. Such peptides are usually identified by nanoflow liquid chromatography coupled to tandem mass spectrometry analysis (nLC-MS/MS). Traditionally, MS/MS analysis is performed in data-dependent acquisition (DDA) mode, which usually produces a pattern of fragment masses unique to a single peptide's fragmentation. Here, I describe a statistically rigorous qualitative and quantitative computational analysis for shotgun proteomics DDA analysis using free open-source software tools. MS/MS data are used to identify peptides, and the area of peptide mass/charge over chromatographic elution is used to quantify peptides. All peptides that uniquely map to a protein sequence predicted from the genome are combined into a single protein quantity, which can then be compared across experimental conditions. Statistically significant protein changes can be summarized using gene ontology or pathway term enrichment analysis.
Collapse
Affiliation(s)
- Jesse G Meyer
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI, United States.
| |
Collapse
|
20
|
Schiebenhoefer H, Schallert K, Renard BY, Trappe K, Schmid E, Benndorf D, Riedel K, Muth T, Fuchs S. A complete and flexible workflow for metaproteomics data analysis based on MetaProteomeAnalyzer and Prophane. Nat Protoc 2020; 15:3212-3239. [PMID: 32859984 DOI: 10.1038/s41596-020-0368-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 05/29/2020] [Indexed: 12/14/2022]
Abstract
Metaproteomics, the study of the collective protein composition of multi-organism systems, provides deep insights into the biodiversity of microbial communities and the complex functional interplay between microbes and their hosts or environment. Thus, metaproteomics has become an indispensable tool in various fields such as microbiology and related medical applications. The computational challenges in the analysis of corresponding datasets differ from those of pure-culture proteomics, e.g., due to the higher complexity of the samples and the larger reference databases demanding specific computing pipelines. Corresponding data analyses usually consist of numerous manual steps that must be closely synchronized. With MetaProteomeAnalyzer and Prophane, we have established two open-source software solutions specifically developed and optimized for metaproteomics. Among other features, peptide-spectrum matching is improved by combining different search engines and, compared to similar tools, metaproteome annotation benefits from the most comprehensive set of available databases (such as NCBI, UniProt, EggNOG, PFAM, and CAZy). The workflow described in this protocol combines both tools and leads the user through the entire data analysis process, including protein database creation, database search, protein grouping and annotation, and results visualization. To the best of our knowledge, this protocol presents the most comprehensive, detailed and flexible guide to metaproteomics data analysis to date. While beginners are provided with robust, easy-to-use, state-of-the-art data analysis in a reasonable time (a few hours, depending on, among other factors, the protein database size and the number of identified peptides and inferred proteins), advanced users benefit from the flexibility and adaptability of the workflow.
Collapse
Affiliation(s)
- Henning Schiebenhoefer
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Hasso Plattner Institute, Faculty for Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Kay Schallert
- Bioprocess Engineering, Otto von Guericke University, Magdeburg, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Hasso Plattner Institute, Faculty for Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Emanuel Schmid
- ID Computational & Data Science Support, Eidgenössische Technische Hochschule, Zurich, Switzerland
| | - Dirk Benndorf
- Bioprocess Engineering, Otto von Guericke University, Magdeburg, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Katharina Riedel
- Center for Functional Genomics of Microbes (CFGM), Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Thilo Muth
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin, Germany
| | - Stephan Fuchs
- Department of Infectious Diseases, Robert Koch Institute, Wernigerode, Germany.
| |
Collapse
|
21
|
Winkler R. ProtyQuant: Comparing label-free shotgun proteomics datasets using accumulated peptide probabilities. J Proteomics 2020; 230:103985. [PMID: 32956841 DOI: 10.1016/j.jprot.2020.103985] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/07/2020] [Accepted: 09/10/2020] [Indexed: 11/20/2022]
Abstract
Comparing multiple label-free shotgun proteomics datasets requires various data processing and formatting steps, including peptide-spectrum matching, protein inference, and quantification. Finally, the compilation of results files into a format that allows for downstream analyses. ProtyQuant performs protein inference and quantification calculations, and combines the results of individual datasets into plain text tables. These are lightweight, human-readable, and easy to import into databases or statistical software. ProtyQuant reads validated pepXML from proteomic workflows such as the Trans-Proteomic Pipeline (TPP), which makes it compatible with many commercial and free search engines. For protein inference and quantification, a modified version of the PIPQ program (He et al. 2016) was integrated. In contrast to simple spectral-counting, PIPQ sums up peptide probabilities. For assigning peptides to proteins, three algorithms are available: Multiple Counting, Equal Division, and Linear Programming. The accumulated peptide probabilities (app) are used for both tasks, protein probability estimation, and quantification. ProtyQuant was tested using a reference dataset for label-free shotgun proteomics, obtained from different concentrations of 48 human UPS proteins spiked into yeast lysate. Compared to ProteinProphet, ProtyQuant detected up to 126 (15%) more proteins in the mixture, applying an equal false positive rate (FPR). Using the app values for label-free quantification showed suitable sensitivity and linearity. Strikingly, the app values represent a realistic measure of 'Protein Presence,' an integral concept of protein probability and quantity. ProtyQuant provides a graphical user interface (GUI) and scripts for console-based processing. It is available (GNU GLP v3) for Windows, Linux, and Docker from https://bitbucket.org/lababi/protyquant/. SIGNIFICANCE: Integrating data from multiple shot-gun proteomics experiments overwhelms non-expert researchers. ProtyQuant complements well-established workflows by aiding the comparison of proteins across samples. Importantly, the probability and abundance of proteins are seen from a holistic point of view. The accumulated peptide probability (app) as an integral measure of 'Protein Presence' demonstrated reliable performance for both protein identification and quantification. Using the app as a single measure facilitates the compilation of reports in comparative proteomics.
Collapse
Affiliation(s)
- Robert Winkler
- Center for Research and Advanced Studies (CINVESTAV) Irapuato, Department of Biochemistry and Biotechnology, Km. 9.6 Libramiento Norte Carr. Irapuato-León, 36824 Irapuato, GTO, Mexico.
| |
Collapse
|
22
|
How Do the Different Proteomic Strategies Cope with the Complexity of Biological Regulations in a Multi-Omic World? Critical Appraisal and Suggestions for Improvements. Proteomes 2020; 8:proteomes8030023. [PMID: 32899323 PMCID: PMC7564458 DOI: 10.3390/proteomes8030023] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 08/30/2020] [Accepted: 09/01/2020] [Indexed: 12/12/2022] Open
Abstract
In this second decade of the 21st century, we are lucky enough to have different types of proteomic analyses at our disposal. Furthermore, other functional omics such as transcriptomics have also undergone major developments, resulting in mature tools. However, choice equals questions, and the major question is how each proteomic strategy is fit for which purpose. The aim of this opinion paper is to reposition the various proteomic strategies in the frame of what is known in terms of biological regulations in order to shed light on the power, limitations, and paths for improvement for the different proteomic setups. This should help biologists to select the best-suited proteomic strategy for their purposes in order not to be driven by raw availability or fashion arguments but rather by the best fitness for purpose. In particular, knowing the limitations of the different proteomic strategies helps in interpreting the results correctly and in devising the validation experiments that should be made downstream of the proteomic analyses.
Collapse
|
23
|
Bartel J, Varadarajan AR, Sura T, Ahrens CH, Maaß S, Becher D. Optimized Proteomics Workflow for the Detection of Small Proteins. J Proteome Res 2020; 19:4004-4018. [DOI: 10.1021/acs.jproteome.0c00286] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Jürgen Bartel
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| | - Adithi R. Varadarajan
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Thomas Sura
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| | - Christian H. Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Sandra Maaß
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| | - Dörte Becher
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| |
Collapse
|
24
|
Perez-Riverol Y, Csordas A, Bai J, Bernal-Llinares M, Hewapathirana S, Kundu DJ, Inuganti A, Griss J, Mayer G, Eisenacher M, Pérez E, Uszkoreit J, Pfeuffer J, Sachsenberg T, Yilmaz S, Tiwary S, Cox J, Audain E, Walzer M, Jarnuczak AF, Ternent T, Brazma A, Vizcaíno JA. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res 2020; 47:D442-D450. [PMID: 30395289 PMCID: PMC6323896 DOI: 10.1093/nar/gky1106] [Citation(s) in RCA: 5032] [Impact Index Per Article: 1258.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Accepted: 10/22/2018] [Indexed: 02/06/2023] Open
Abstract
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world’s largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Attila Csordas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jingwen Bai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Manuel Bernal-Llinares
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Avinash Inuganti
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Johannes Griss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.,Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Vienna, 1090, Austria
| | - Gerhard Mayer
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
| | - Martin Eisenacher
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
| | - Enrique Pérez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julian Uszkoreit
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
| | - Julianus Pfeuffer
- Applied Bioinformatics, Department for Computer Science, University of Tuebingen, Sand 14, 72076 Tuebingen, Germany
| | - Timo Sachsenberg
- Applied Bioinformatics, Department for Computer Science, University of Tuebingen, Sand 14, 72076 Tuebingen, Germany
| | - Sule Yilmaz
- Computational Systems Biochemistry, Max Planck Institute for Biochemistry, Martinsried, 82152, Germany
| | - Shivani Tiwary
- Computational Systems Biochemistry, Max Planck Institute for Biochemistry, Martinsried, 82152, Germany
| | - Jürgen Cox
- Computational Systems Biochemistry, Max Planck Institute for Biochemistry, Martinsried, 82152, Germany
| | - Enrique Audain
- Department of Congenital Heart Disease and Pediatric Cardiology, Universitätsklinikum Schleswig-Holstein Kiel, Kiel, 24105, Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew F Jarnuczak
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tobias Ternent
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
25
|
Prieto G, Vázquez J. Protein Probability Model for High-Throughput Protein Identification by Mass Spectrometry-Based Proteomics. J Proteome Res 2020; 19:1285-1297. [PMID: 32037837 DOI: 10.1021/acs.jproteome.9b00819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Shotgun proteomics is the method of choice for high-throughput protein identification; however, robust statistical methods are essential to automatize this task while minimizing the number of false identifications. The standard method for estimating the false discovery rate (FDR) of individual identifications and keeping it below a threshold (typically 1%) is the target-decoy approach. However, numerous works have shown that FDR at the protein level may become much larger than FDR at the peptide level. The development of an appropriate scoring model to identify proteins from their peptides using high-throughput shotgun proteomics is highly needed. In this study, we present a novel protein-level scoring algorithm that uses the scores of the identified peptides and maintains all of the properties expected for a true protein probability. We also present a refinement of the picked method to calculate FDR at the protein level. These algorithms can be used together as a robust identification workflow suitable for large-scale proteomics, and we show that the identification performance of this workflow is superior to that of other widely used methods in several samples and using different search engines. Our protein probability model offers the scientific community an algorithm that is easy to integrate into protein identification workflows for the automated analysis of shotgun proteomics data.
Collapse
Affiliation(s)
- Gorka Prieto
- Department of Communications Engineering, University of the Basque Country (UPV/EHU), 48013 Bilbao, Spain
| | - Jesús Vázquez
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28049 Madrid, Spain
| |
Collapse
|
26
|
Pfeuffer J, Sachsenberg T, Dijkstra TMH, Serang O, Reinert K, Kohlbacher O. EPIFANY: A Method for Efficient High-Confidence Protein Inference. J Proteome Res 2020; 19:1060-1072. [PMID: 31975601 DOI: 10.1021/acs.jproteome.9b00566] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Accurate protein inference in the presence of shared peptides is still one of the key problems in bottom-up proteomics. Most protein inference tools employing simple heuristic inference strategies are efficient but exhibit reduced accuracy. More advanced probabilistic methods often exhibit better inference quality but tend to be too slow for large data sets. Here, we present a novel protein inference method, EPIFANY, combining a loopy belief propagation algorithm with convolution trees for efficient processing of Bayesian networks. We demonstrate that EPIFANY combines the reliable protein inference of Bayesian methods with significantly shorter runtimes. On the 2016 iPRG protein inference benchmark data, EPIFANY is the only tested method that finds all true-positive proteins at a 5% protein false discovery rate (FDR) without strict prefiltering on the peptide-spectrum match (PSM) level, yielding an increase in identification performance (+10% in the number of true positives and +14% in partial AUC) compared to previous approaches. Even very large data sets with hundreds of thousands of spectra (which are intractable with other Bayesian and some non-Bayesian tools) can be processed with EPIFANY within minutes. The increased inference quality including shared peptides results in better protein inference results and thus increased robustness of the biological hypotheses generated. EPIFANY is available as open-source software for all major platforms at https://OpenMS.de/epifany.
Collapse
Affiliation(s)
- Julianus Pfeuffer
- Applied Bioinformatics, Department of Computer Science, University of Tübingen, 72076 Tübingen, Germany.,Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany.,Algorithmic Bioinformatics, Department of Bioinformatics, Freie Universität Berlin, 14195 Berlin, Germany
| | - Timo Sachsenberg
- Applied Bioinformatics, Department of Computer Science, University of Tübingen, 72076 Tübingen, Germany.,Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany
| | - Tjeerd M H Dijkstra
- Biomolecular Interactions, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Oliver Serang
- Department of Computer Science, University of Montana, Missoula, Montana 59812, United States
| | - Knut Reinert
- Algorithmic Bioinformatics, Department of Bioinformatics, Freie Universität Berlin, 14195 Berlin, Germany
| | - Oliver Kohlbacher
- Applied Bioinformatics, Department of Computer Science, University of Tübingen, 72076 Tübingen, Germany.,Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany.,Biomolecular Interactions, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany.,Institute for Translational Bioinformatics, University Hospital Tübingen, 72076 Tübingen, Germany.,Quantitative Biology Center, University of Tübingen, 72076 Tübingen, Germany
| |
Collapse
|
27
|
Noor Z, Ahn SB, Baker MS, Ranganathan S, Mohamedali A. Mass spectrometry-based protein identification in proteomics-a review. Brief Bioinform 2020; 22:1620-1638. [PMID: 32047889 DOI: 10.1093/bib/bbz163] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 11/05/2019] [Accepted: 11/21/2019] [Indexed: 12/21/2022] Open
Abstract
Statistically, accurate protein identification is a fundamental cornerstone of proteomics and underpins the understanding and application of this technology across all elements of medicine and biology. Proteomics, as a branch of biochemistry, has in recent years played a pivotal role in extending and developing the science of accurately identifying the biology and interactions of groups of proteins or proteomes. Proteomics has primarily used mass spectrometry (MS)-based techniques for identifying proteins, although other techniques including affinity-based identifications still play significant roles. Here, we outline the basics of MS to understand how data are generated and parameters used to inform computational tools used in protein identification. We then outline a comprehensive analysis of the bioinformatics and computational methodologies used in protein identification in proteomics including discussing the most current communally acceptable metrics to validate any identification.
Collapse
|
28
|
Perez‐Riverol Y, Moreno P. Scalable Data Analysis in Proteomics and Metabolomics Using BioContainers and Workflows Engines. Proteomics 2019; 20:e1900147. [DOI: 10.1002/pmic.201900147] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 09/30/2019] [Indexed: 12/29/2022]
Affiliation(s)
- Yasset Perez‐Riverol
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI) Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UK
| | - Pablo Moreno
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI) Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UK
| |
Collapse
|
29
|
Muth T, Renard BY. Evaluating de novo sequencing in proteomics: already an accurate alternative to database-driven peptide identification? Brief Bioinform 2019; 19:954-970. [PMID: 28369237 DOI: 10.1093/bib/bbx033] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Indexed: 01/24/2023] Open
Abstract
While peptide identifications in mass spectrometry (MS)-based shotgun proteomics are mostly obtained using database search methods, high-resolution spectrum data from modern MS instruments nowadays offer the prospect of improving the performance of computational de novo peptide sequencing. The major benefit of de novo sequencing is that it does not require a reference database to deduce full-length or partial tag-based peptide sequences directly from experimental tandem mass spectrometry spectra. Although various algorithms have been developed for automated de novo sequencing, the prediction accuracy of proposed solutions has been rarely evaluated in independent benchmarking studies. The main objective of this work is to provide a detailed evaluation on the performance of de novo sequencing algorithms on high-resolution data. For this purpose, we processed four experimental data sets acquired from different instrument types from collision-induced dissociation and higher energy collisional dissociation (HCD) fragmentation mode using the software packages Novor, PEAKS and PepNovo. Moreover, the accuracy of these algorithms is also tested on ground truth data based on simulated spectra generated from peak intensity prediction software. We found that Novor shows the overall best performance compared with PEAKS and PepNovo with respect to the accuracy of correct full peptide, tag-based and single-residue predictions. In addition, the same tool outpaced the commercial competitor PEAKS in terms of running time speedup by factors of around 12-17. Despite around 35% prediction accuracy for complete peptide sequences on HCD data sets, taken as a whole, the evaluated algorithms perform moderately on experimental data but show a significantly better performance on simulated data (up to 84% accuracy). Further, we describe the most frequently occurring de novo sequencing errors and evaluate the influence of missing fragment ion peaks and spectral noise on the accuracy. Finally, we discuss the potential of de novo sequencing for now becoming more widely used in the field.
Collapse
Affiliation(s)
- Thilo Muth
- Research Group Bioinformatics, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y Renard
- Research Group Bioinformatics, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
30
|
Uszkoreit J, Perez-Riverol Y, Eggers B, Marcus K, Eisenacher M. Protein Inference Using PIA Workflows and PSI Standard File Formats. J Proteome Res 2018; 18:741-747. [DOI: 10.1021/acs.jproteome.8b00723] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Julian Uszkoreit
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, Universitaetsstrasse 150, D-44801 Bochum, Germany
| | - Yasset Perez-Riverol
- EMBL Outstation,
European Bioinformatics Institute, Proteomics Services, Wellcome Trust Genome Campus,
Hinxton, Cambridge, United Kingdom
| | - Britta Eggers
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, Universitaetsstrasse 150, D-44801 Bochum, Germany
| | - Katrin Marcus
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, Universitaetsstrasse 150, D-44801 Bochum, Germany
| | - Martin Eisenacher
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, Universitaetsstrasse 150, D-44801 Bochum, Germany
| |
Collapse
|
31
|
Perez‐Riverol Y, Vizcaíno JA, Griss J. Future Prospects of Spectral Clustering Approaches in Proteomics. Proteomics 2018; 18:e1700454. [PMID: 29882266 PMCID: PMC6099476 DOI: 10.1002/pmic.201700454] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Revised: 05/23/2018] [Indexed: 12/14/2022]
Abstract
In this article, current and future applications of spectral clustering are discussed in the context of mass spectrometry-based proteomics approaches. First of all, the main algorithms and tools that can currently be used to perform spectral clustering are introduced. In addition, its main applications and their use in current computational proteomics workflows are explained, including the generation of spectral libraries and spectral archives. Finally, possible future directions for spectral clustering, including its potential use to achieve a deeper coverage of the proteome and the discovery of novel post-translational modifications and single amino acid variants.
Collapse
Affiliation(s)
- Yasset Perez‐Riverol
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)Wellcome Trust Genome CampusHinxtonCambridgeCB10 1SDUK
| | - Juan Antonio Vizcaíno
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)Wellcome Trust Genome CampusHinxtonCambridgeCB10 1SDUK
| | - Johannes Griss
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)Wellcome Trust Genome CampusHinxtonCambridgeCB10 1SDUK
- Division of ImmunologyAllergy and Infectious DiseasesDepartment of DermatologyMedical University of Vienna1090ViennaAustria
| |
Collapse
|
32
|
Andjelković U, Josić D. Mass spectrometry based proteomics as foodomics tool in research and assurance of food quality and safety. Trends Food Sci Technol 2018. [DOI: 10.1016/j.tifs.2018.04.008] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
33
|
Lee JY, Choi H, Colangelo CM, Davis D, Hoopmann MR, Käll L, Lam H, Payne SH, Perez-Riverol Y, The M, Wilson R, Weintraub ST, Palmblad M. ABRF Proteome Informatics Research Group (iPRG) 2016 Study: Inferring Proteoforms from Bottom-up Proteomics Data. J Biomol Tech 2018; 29:39-45. [PMID: 29977167 DOI: 10.7171/jbt.18-2902-003] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
This report presents the results from the 2016 Association of Biomolecular Resource Facilities Proteome Informatics Research Group (iPRG) study on proteoform inference and false discovery rate (FDR) estimation from bottom-up proteomics data. For this study, 3 replicate Q Exactive Orbitrap liquid chromatography-tandom mass spectrometry datasets were generated from each of 4 Escherichia coli samples spiked with different equimolar mixtures of small recombinant proteins selected to mimic pairs of homologous proteins. Participants were given raw data and a sequence file and asked to identify the proteins and provide estimates on the FDR at the proteoform level. As part of this study, we tested a new submission system with a format validator running on a virtual private server (VPS) and allowed methods to be provided as executable R Markdown or IPython Notebooks. The task was perceived as difficult, and only eight unique submissions were received, although those who participated did well with no one method performing best on all samples. However, none of the submissions included a complete Markdown or Notebook, even though examples were provided. Future iPRG studies need to be more successful in promoting and encouraging participation. The VPS and submission validator easily scale to much larger numbers of participants in these types of studies. The unique "ground-truth" dataset for proteoform identification generated for this study is now available to the research community, as are the server-side scripts for validating and managing submissions.
Collapse
Affiliation(s)
- Joon-Yong Lee
- Pacific Northwest National Laboratory, Richland, Washington 99352, USA
| | - Hyungwon Choi
- National University of Singapore, 117547 Singapore, Singapore
| | | | - Darryl Davis
- Janssen Research and Development, Spring House, Pennsylvania 19087, USA
| | | | - Lukas Käll
- Science for Life Laboratory, KTH - Royal Institute of Technology, 171 65 Solna, Sweden
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Samuel H Payne
- Pacific Northwest National Laboratory, Richland, Washington 99352, USA
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Matthew The
- Science for Life Laboratory, KTH - Royal Institute of Technology, 171 65 Solna, Sweden
| | - Ryan Wilson
- Pacific Northwest National Laboratory, Richland, Washington 99352, USA
| | - Susan T Weintraub
- Department of Biochemistry and Structural Biology, The University of Texas Health Science Center, San Antonio, Texas 78229, USA; and
| | - Magnus Palmblad
- Center for Proteomics and Metabolomics, Leiden University Medical Center, 2300 RC Leiden, The Netherlands
| |
Collapse
|
34
|
The M, Edfors F, Perez-Riverol Y, Payne SH, Hoopmann MR, Palmblad M, Forsström B, Käll L. A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms. J Proteome Res 2018; 17:1879-1886. [PMID: 29631402 DOI: 10.1021/acs.jproteome.7b00899] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A natural way to benchmark the performance of an analytical experimental setup is to use samples of known composition and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, one of the inherent problems of interpreting data is that the measured analytes are peptides and not the actual proteins themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. Hence, for a realistic benchmark of protein inference procedures, there is a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the application of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.
Collapse
Affiliation(s)
- Matthew The
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health , KTH - Royal Institute of Technology , Box 1031 , 17121 Solna , Sweden
| | - Fredrik Edfors
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health , KTH - Royal Institute of Technology , Box 1031 , 17121 Solna , Sweden
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus , Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Samuel H Payne
- Biological Sciences Division , Pacific Northwest National Laboratory , Richland , Washington 99352 , United States
| | - Michael R Hoopmann
- Institute for Systems Biology , Seattle , Washington 98109 , United States
| | - Magnus Palmblad
- Center for Proteomics and Metabolomics , Leiden University Medical Center , 2300 RC Leiden , The Netherlands
| | - Björn Forsström
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health , KTH - Royal Institute of Technology , Box 1031 , 17121 Solna , Sweden
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health , KTH - Royal Institute of Technology , Box 1031 , 17121 Solna , Sweden
| |
Collapse
|
35
|
|
36
|
MetaGOmics: A Web-Based Tool for Peptide-Centric Functional and Taxonomic Analysis of Metaproteomics Data. Proteomes 2017; 6:proteomes6010002. [PMID: 29280960 PMCID: PMC5874761 DOI: 10.3390/proteomes6010002] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Revised: 12/19/2017] [Accepted: 12/21/2017] [Indexed: 12/22/2022] Open
Abstract
Metaproteomics is the characterization of all proteins being expressed by a community of organisms in a complex biological sample at a single point in time. Applications of metaproteomics range from the comparative analysis of environmental samples (such as ocean water and soil) to microbiome data from multicellular organisms (such as the human gut). Metaproteomics research is often focused on the quantitative functional makeup of the metaproteome and which organisms are making those proteins. That is: What are the functions of the currently expressed proteins? How much of the metaproteome is associated with those functions? And, which microorganisms are expressing the proteins that perform those functions? However, traditional protein-centric functional analysis is greatly complicated by the large size, redundancy, and lack of biological annotations for the protein sequences in the database used to search the data. To help address these issues, we have developed an algorithm and web application (dubbed "MetaGOmics") that automates the quantitative functional (using Gene Ontology) and taxonomic analysis of metaproteomics data and subsequent visualization of the results. MetaGOmics is designed to overcome the shortcomings of traditional proteomics analysis when used with metaproteomics data. It is easy to use, requires minimal input, and fully automates most steps of the analysis-including comparing the functional makeup between samples. MetaGOmics is freely available at https://www.yeastrc.org/metagomics/.
Collapse
|
37
|
BioInfra.Prot: A comprehensive proteomics workflow including data standardization, protein inference, expression analysis and data publication. J Biotechnol 2017; 261:116-125. [DOI: 10.1016/j.jbiotec.2017.06.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 06/04/2017] [Accepted: 06/08/2017] [Indexed: 01/12/2023]
|
38
|
Wippel HH, Santos MDM, Clasen MA, Kurt LU, Nogueira FCS, Carvalho CE, McCormick TM, Neto GPB, Alves LR, da Gloria da Costa Carvalho M, Carvalho PC, Fischer JDSDG. Comparing intestinal versus diffuse gastric cancer using a PEFF-oriented proteomic pipeline. J Proteomics 2017; 171:63-72. [PMID: 29032071 DOI: 10.1016/j.jprot.2017.10.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Revised: 09/29/2017] [Accepted: 10/06/2017] [Indexed: 12/19/2022]
Abstract
Gastric cancer is the fifth most common malignant neoplasia and the third leading cause of cancer death worldwide. Mac-Cormick et al. recently showed the importance of considering the anatomical region of the tumor in proteomic gastric cancer studies; more differences were found between distinct anatomical regions than when comparing healthy versus diseased tissue. Thus, failing to consider the anatomical region could lead to differential proteins that are not disease specific. With this as motivation, we compared the proteomic profiles of intestinal and diffuse adenocarcinoma from the same anatomical region, the corpus. To achieve this, we used isobaric labeling (iTRAQ) of peptides, a 10-step HILIC fractionation, and reversed-phase nano-chromatography coupled online with a Q-Exactive Plus mass spectrometer. We updated PatternLab to take advantage of the new Comet-PEFF search engine that enables identifying post-translational modifications and mutations included in neXtProt's PSI Extended FASTA Format (PEFF) metadata. Our pipeline then uses a text-mining tool that automatically extracts PubMed IDs from the proteomic result metadata and drills down keywords from manuscripts related with the biological processes at hand. Our results disclose important proteins such as apolipoprotein B-100, S100 and 14-3-3 proteins, among many others, highlighting the different pathways enriched by each cancer type. SIGNIFICANCE Gastric cancer is a heterogeneous and multifactorial disease responsible for a significant number of deaths every year. Despite the constant improvement of surgical techniques and multimodal treatments, survival rates are low, mostly due to limited diagnostic techniques and late symptoms. Intestinal and diffuse types of gastric cancer have distinct clinical and pathological characteristics; yet little is known about the molecular mechanisms regulating these two types of gastric tumors. Here we compared the proteomic profile of diffuse and intestinal types of gastric cancer from the same anatomical location, the corpus, from four male patients. This methodological design aimed to eliminate proteomic variations resulting from comparison of tumors from distinct anatomical regions. Our PEFF-tailored proteomic pipeline significantly increased the identifications as when compared to previous versions of PatternLab.
Collapse
Affiliation(s)
- Helisa Helena Wippel
- Computational Mass Spectrometry & Proteomics Group, Carlos Chagas Institute, Fiocruz - Paraná, Brazil
| | | | - Milan Avila Clasen
- Computational Mass Spectrometry & Proteomics Group, Carlos Chagas Institute, Fiocruz - Paraná, Brazil
| | - Louise Ulrich Kurt
- Computational Mass Spectrometry & Proteomics Group, Carlos Chagas Institute, Fiocruz - Paraná, Brazil
| | - Fabio Cesar Sousa Nogueira
- Laboratory of Proteomics, Institute of Chemistry, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil; Laboratory of Protein Chemistry, Proteomic Unit, Institute of Chemistry, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Carlos Eduardo Carvalho
- Pathology Service of the Clementino Fraga Filho University Hospital (HUCFF-UFRJ), Rio de Janeiro, Brazil
| | | | - Guilherme Pinto Bravo Neto
- Division of Esophageal and Gastric Surgery, General Surgery Service of the HUCFF-UFRJ, Rio de Janeiro, Brazil
| | | | | | - Paulo Costa Carvalho
- Computational Mass Spectrometry & Proteomics Group, Carlos Chagas Institute, Fiocruz - Paraná, Brazil.
| | | |
Collapse
|
39
|
Zhao P, Zhong J, Liu W, Zhao J, Zhang G. Protein-Level Integration Strategy of Multiengine MS Spectra Search Results for Higher Confidence and Sequence Coverage. J Proteome Res 2017; 16:4446-4454. [DOI: 10.1021/acs.jproteome.7b00463] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- Panpan Zhao
- Key Laboratory of Functional
Protein Research of Guangdong Higher Education Institutes, Institute
of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Jiayong Zhong
- Key Laboratory of Functional
Protein Research of Guangdong Higher Education Institutes, Institute
of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Wanting Liu
- Key Laboratory of Functional
Protein Research of Guangdong Higher Education Institutes, Institute
of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Jing Zhao
- Key Laboratory of Functional
Protein Research of Guangdong Higher Education Institutes, Institute
of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| | - Gong Zhang
- Key Laboratory of Functional
Protein Research of Guangdong Higher Education Institutes, Institute
of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
| |
Collapse
|
40
|
Guruceaga E, Garin-Muga A, Prieto G, Bejarano B, Marcilla M, Marín-Vicente C, Perez-Riverol Y, Casal JI, Vizcaíno JA, Corrales FJ, Segura V. Enhanced Missing Proteins Detection in NCI60 Cell Lines Using an Integrative Search Engine Approach. J Proteome Res 2017; 16:4374-4390. [PMID: 28960077 PMCID: PMC5737412 DOI: 10.1021/acs.jproteome.7b00388] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
![]()
The Human Proteome
Project (HPP) aims deciphering the complete
map of the human proteome. In the past few years, significant efforts
of the HPP teams have been dedicated to the experimental detection
of the missing proteins, which lack reliable mass spectrometry evidence
of their existence. In this endeavor, an in depth analysis of shotgun
experiments might represent a valuable resource to select a biological
matrix in design validation experiments. In this work, we used all
the proteomic experiments from the NCI60 cell lines and applied an
integrative approach based on the results obtained from Comet, Mascot,
OMSSA, and X!Tandem. This workflow benefits from the complementarity
of these search engines to increase the proteome coverage. Five missing
proteins C-HPP guidelines compliant were identified, although further
validation is needed. Moreover, 165 missing proteins were detected
with only one unique peptide, and their functional analysis supported
their participation in cellular pathways as was also proposed in other
studies. Finally, we performed a combined analysis of the gene expression
levels and the proteomic identifications from the common cell lines
between the NCI60 and the CCLE project to suggest alternatives for
further validation of missing protein observations.
Collapse
Affiliation(s)
- Elizabeth Guruceaga
- Bioinformatics Unit, Center for Applied Medical Research, University of Navarra , Pamplona 31008, Spain.,IdiSNA, Navarra Institute for Health Research , Pamplona 31008, Spain
| | - Alba Garin-Muga
- Bioinformatics Unit, Center for Applied Medical Research, University of Navarra , Pamplona 31008, Spain
| | - Gorka Prieto
- Department of Communications Engineering, University of the Basque Country (UPV/EHU) , Bilbao 48013, Spain
| | | | - Miguel Marcilla
- Proteomics Unit, Spanish National Biotechnology Centre, CSIC , Madrid 28049, Spain
| | | | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus , Hinxton, Cambridge CB10 1SD, U.K
| | | | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus , Hinxton, Cambridge CB10 1SD, U.K
| | - Fernando J Corrales
- Proteomics Unit, Spanish National Biotechnology Centre, CSIC , Madrid 28049, Spain
| | - Victor Segura
- Bioinformatics Unit, Center for Applied Medical Research, University of Navarra , Pamplona 31008, Spain.,IdiSNA, Navarra Institute for Health Research , Pamplona 31008, Spain
| |
Collapse
|
41
|
Tholey A, Becker A. Top-down proteomics for the analysis of proteolytic events - Methods, applications and perspectives. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH 2017; 1864:2191-2199. [PMID: 28711385 DOI: 10.1016/j.bbamcr.2017.07.002] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Revised: 07/07/2017] [Accepted: 07/09/2017] [Indexed: 02/06/2023]
Abstract
Mass spectrometry based proteomics is an indispensable tool for almost all research areas relevant for the understanding of proteolytic processing, ranging from the identification of substrates, products and cleavage sites up to the analysis of structural features influencing protease activity. The majority of methods for these studies are based on bottom-up proteomics performing analysis at peptide level. As this approach is characterized by a number of pitfalls, e.g. loss of molecular information, there is an ongoing effort to establish top-down proteomics, performing separation and MS analysis both at intact protein level. We briefly introduce major approaches of bottom-up proteomics used in the field of protease research and highlight the shortcomings of these methods. We then discuss the present state-of-the-art of top-down proteomics. Together with the discussion of known challenges we show the potential of this approach and present a number of successful applications of top-down proteomics in protease research. This article is part of a Special Issue entitled: Proteolysis as a Regulatory Event in Pathophysiology edited by Stefan Rose-John.
Collapse
Affiliation(s)
- Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany.
| | - Alexander Becker
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| |
Collapse
|
42
|
|
43
|
May DH, Tamura K, Noble WS. Param-Medic: A Tool for Improving MS/MS Database Search Yield by Optimizing Parameter Settings. J Proteome Res 2017; 16:1817-1824. [PMID: 28263070 PMCID: PMC5738039 DOI: 10.1021/acs.jproteome.7b00028] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
In shotgun proteomics analysis, user-specified parameters are critical to database search performance and therefore to the yield of confident peptide-spectrum matches (PSMs). Two of the most important parameters are related to the accuracy of the mass spectrometer. Precursor mass tolerance defines the peptide candidates considered for each spectrum. Fragment mass tolerance or bin size determines how close observed and theoretical fragments must be to be considered a match. For either of these two parameters, too wide a setting yields randomly high-scoring false PSMs, whereas too narrow a setting erroneously excludes true PSMs, in both cases, lowering the yield of peptides detected at a given false discovery rate. We describe a strategy for inferring optimal search parameters by assembling and analyzing pairs of spectra that are likely to have been generated by the same peptide ion to infer precursor and fragment mass error. This strategy does not rely on a database search, making it usable in a wide variety of settings. In our experiments on data from a variety of instruments including Orbitrap and Q-TOF acquisitions, this strategy yields more high-confidence PSMs than using settings based on instrument defaults or determined by experts. Param-Medic is open-source and cross-platform. It is available as a standalone tool ( http://noble.gs.washington.edu/proj/param-medic/ ) and has been integrated into the Crux proteomics toolkit ( http://crux.ms ), providing automatic parameter selection for the Comet and Tide search engines.
Collapse
Affiliation(s)
- Damon H May
- Department of Genome Sciences, University of Washington , Seattle, Washington 98195, United States
| | - Kaipo Tamura
- Department of Genome Sciences, University of Washington , Seattle, Washington 98195, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington , Seattle, Washington 98195, United States
- Department of Computer Science and Engineering, University of Washington , Seattle, Washington 98195, United States
| |
Collapse
|
44
|
|
45
|
Langella O, Valot B, Balliau T, Blein-Nicolas M, Bonhomme L, Zivy M. X!TandemPipeline: A Tool to Manage Sequence Redundancy for Protein Inference and Phosphosite Identification. J Proteome Res 2016; 16:494-503. [DOI: 10.1021/acs.jproteome.6b00632] [Citation(s) in RCA: 126] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Olivier Langella
- PAPPSO,
GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Benoît Valot
- UMR
6249 Chrono-Environnement, CNRS, Université de Bourgogne Franche-Comté, 25030 Besançon, France
| | - Thierry Balliau
- PAPPSO,
GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Mélisande Blein-Nicolas
- PAPPSO,
GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Ludovic Bonhomme
- INRA/UBP, UMR 1095, Genetics, Diversity
and Ecophysiology of Cereals, F63100 Clermont-Ferrand, France
| | - Michel Zivy
- PAPPSO,
GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| |
Collapse
|