1
|
Carvalho PC, Lima DB, Leprevost FV, Santos MDM, Fischer JSG, Aquino PF, Moresco JJ, Yates JR, Barbosa VC. Integrated analysis of shotgun proteomic data with PatternLab for proteomics 4.0. Nat Protoc 2016; 11:102-17. [PMID: 26658470 PMCID: PMC5722229 DOI: 10.1038/nprot.2015.133] [Citation(s) in RCA: 201] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
PatternLab for proteomics is an integrated computational environment that unifies several previously published modules for the analysis of shotgun proteomic data. The contained modules allow for formatting of sequence databases, peptide spectrum matching, statistical filtering and data organization, extracting quantitative information from label-free and chemically labeled data, and analyzing statistics for differential proteomics. PatternLab also has modules to perform similarity-driven studies with de novo sequencing data, to evaluate time-course experiments and to highlight the biological significance of data with regard to the Gene Ontology database. The PatternLab for proteomics 4.0 package brings together all of these modules in a self-contained software environment, which allows for complete proteomic data analysis and the display of results in a variety of graphical formats. All updates to PatternLab, including new features, have been previously tested on millions of mass spectra. PatternLab is easy to install, and it is freely available from http://patternlabforproteomics.org.
Collapse
|
Research Support, N.I.H., Extramural |
9 |
201 |
2
|
Bioinformatics Methods for Mass Spectrometry-Based Proteomics Data Analysis. Int J Mol Sci 2020; 21:ijms21082873. [PMID: 32326049 PMCID: PMC7216093 DOI: 10.3390/ijms21082873] [Citation(s) in RCA: 145] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 04/16/2020] [Accepted: 04/18/2020] [Indexed: 01/15/2023] Open
Abstract
Recent advances in mass spectrometry (MS)-based proteomics have enabled tremendous progress in the understanding of cellular mechanisms, disease progression, and the relationship between genotype and phenotype. Though many popular bioinformatics methods in proteomics are derived from other omics studies, novel analysis strategies are required to deal with the unique characteristics of proteomics data. In this review, we discuss the current developments in the bioinformatics methods used in proteomics and how they facilitate the mechanistic understanding of biological processes. We first introduce bioinformatics software and tools designed for mass spectrometry-based protein identification and quantification, and then we review the different statistical and machine learning methods that have been developed to perform comprehensive analysis in proteomics studies. We conclude with a discussion of how quantitative protein data can be used to reconstruct protein interactions and signaling networks.
Collapse
|
Review |
5 |
145 |
3
|
Gupta N, Bandeira N, Keich U, Pevzner PA. Target-decoy approach and false discovery rate: when things may go wrong. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2011; 22:1111-20. [PMID: 21953092 PMCID: PMC3220955 DOI: 10.1007/s13361-011-0139-3] [Citation(s) in RCA: 123] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Revised: 02/19/2011] [Accepted: 02/22/2011] [Indexed: 05/12/2023]
Abstract
The target-decoy approach (TDA) has done the field of proteomics a great service by filling in the need to estimate the false discovery rates (FDR) of peptide identifications. While TDA is often viewed as a universal solution to the problem of FDR evaluation, we argue that the time has come to critically re-examine TDA and to acknowledge not only its merits but also its demerits. We demonstrate that some popular MS/MS search tools are not TDA-compliant and that it is easy to develop a non-TDA compliant tool that outperforms all TDA-compliant tools. Since the distinction between TDA-compliant and non-TDA compliant tools remains elusive, we are concerned about a possible proliferation of non-TDA-compliant tools in the future (developed with the best intentions). We are also concerned that estimation of the FDR by TDA awkwardly depends on a virtual coin toss and argue that it is important to take the coin toss factor out of our estimation of the FDR. Since computing FDR via TDA suffers from various restrictions, we argue that TDA is not needed when accurate p-values of individual Peptide-Spectrum Matches are available.
Collapse
|
Research Support, N.I.H., Extramural |
14 |
123 |
4
|
Deutsch EW, Mendoza L, Shteynberg DD, Hoopmann MR, Sun Z, Eng JK, Moritz RL. Trans-Proteomic Pipeline: Robust Mass Spectrometry-Based Proteomics Data Analysis Suite. J Proteome Res 2023; 22:615-624. [PMID: 36648445 PMCID: PMC10166710 DOI: 10.1021/acs.jproteome.2c00624] [Citation(s) in RCA: 55] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The Trans-Proteomic Pipeline (TPP) mass spectrometry data analysis suite has been in continual development and refinement since its first tools, PeptideProphet and ProteinProphet, were published 20 years ago. The current release provides a large complement of tools for spectrum processing, spectrum searching, search validation, abundance computation, protein inference, and more. Many of the tools include machine-learning modeling to extract the most information from data sets and build robust statistical models to compute the probabilities that derived information is correct. Here we present the latest information on the many TPP tools, and how TPP can be deployed on various platforms from personal Windows laptops to Linux clusters and expansive cloud computing environments. We describe tutorials on how to use TPP in a variety of ways and describe synergistic projects that leverage TPP. We conclude with plans for continued development of TPP.
Collapse
|
Research Support, N.I.H., Extramural |
2 |
55 |
5
|
Rudolph JD, Cox J. A Network Module for the Perseus Software for Computational Proteomics Facilitates Proteome Interaction Graph Analysis. J Proteome Res 2019; 18:2052-2064. [PMID: 30931570 PMCID: PMC6578358 DOI: 10.1021/acs.jproteome.8b00927] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Proteomics data analysis strongly benefits from not studying single proteins in isolation but taking their multivariate interdependence into account. We introduce PerseusNet, the new Perseus network module for the biological analysis of proteomics data. Proteomics is commonly used to generate networks, e.g., with affinity purification experiments, but networks are also used to explore proteomics data. PerseusNet supports the biomedical researcher for both modes of data analysis with a multitude of activities. For affinity purification, a volcano-plot-based statistical analysis method for network generation is featured which is scalable to large numbers of baits. For posttranslational modifications of proteins, such as phosphorylation, a collection of dedicated network analysis tools helps in elucidating cellular signaling events. Co-expression network analysis of proteomics data adopts established tools from transcriptome co-expression analysis. PerseusNet is extensible through a plugin architecture in a multi-lingual way, integrating analyses in C#, Python, and R, and is freely available at http://www.perseus-framework.org .
Collapse
|
Research Support, Non-U.S. Gov't |
6 |
37 |
6
|
Na S, Paek E. Software eyes for protein post-translational modifications. MASS SPECTROMETRY REVIEWS 2015; 34:133-147. [PMID: 24889695 DOI: 10.1002/mas.21425] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2012] [Revised: 07/18/2013] [Accepted: 11/20/2013] [Indexed: 06/03/2023]
Abstract
Post-translational modifications (PTMs) are critical to almost all aspects of complex processes of the cell. Identification of PTMs is one of the biggest challenges for proteomics, and there have been many computational studies for the analysis of PTMs from tandem mass spectrometry (MS/MS). Most early PTM identification studies have been performed by matching MS/MS data to protein databases, using database search tools, but they are prohibitively slow when a large number of PTMs is given as a search parameter. In this article, we present recent developments to search for more types of PTMs and to speed up the search, and discuss many computational issues and solutions in terms of identifying multiply modified peptides or searching for all possible modifications at once in unrestrictive mode. Apart from the most common type of PTMs involving covalent addition of functional groups to proteins, PTMs such as disulfide linkage require dedicated software for the analysis because they may involve cross-linking between two different parts of proteins. Finally, methods for identification of protein disulfide bonds are presented.
Collapse
|
Research Support, N.I.H., Extramural |
10 |
36 |
7
|
Olivella R, Chiva C, Serret M, Mancera D, Cozzuto L, Hermoso A, Borràs E, Espadas G, Morales J, Pastor O, Solé A, Ponomarenko J, Sabidó E. QCloud2: An Improved Cloud-based Quality-Control System for Mass-Spectrometry-based Proteomics Laboratories. J Proteome Res 2021; 20:2010-2013. [PMID: 33724836 DOI: 10.1021/acs.jproteome.0c00853] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
QCloud is a cloud-based system to support proteomics laboratories in daily quality assessment using a user-friendly interface, easy setup, and automated data processing. Since its release, QCloud has facilitated automated quality control for proteomics experiments in many laboratories. QCloud provides a quick and effortless evaluation of instrument performance that helps to overcome many analytical challenges derived from clinical and translational research. Here we present an improved version of the system, QCloud2. This new version includes enhancements in the scalability and reproducibility of the quality-control pipelines, and it features an improved front end for data visualization, user management, and chart annotation. The QCloud2 system also includes programmatic access and a standalone local version.
Collapse
|
Research Support, Non-U.S. Gov't |
4 |
35 |
8
|
Prost SA, Crowell KL, Baker ES, Ibrahim YM, Clowers BH, Monroe ME, Anderson GA, Smith RD, Payne SH. Detecting and removing data artifacts in Hadamard transform ion mobility-mass spectrometry measurements. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2014; 25:2020-2027. [PMID: 24796262 PMCID: PMC4223016 DOI: 10.1007/s13361-014-0895-y] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2013] [Revised: 03/17/2014] [Accepted: 03/18/2014] [Indexed: 05/11/2023]
Abstract
Applying Hadamard transform multiplexing to ion mobility separations (IMS) can significantly improve the signal-to-noise ratio and throughput for IMS coupled mass spectrometry (MS) measurements by increasing the ion utilization efficiency. However, it has been determined that fluctuations in ion intensity as well as spatial shifts in the multiplexed data lower the signal-to-noise ratios and appear as noise in downstream processing of the data. To address this problem, we have developed a novel algorithm that discovers and eliminates data artifacts. The algorithm employs an analytical approach to identify and remove artifacts from the data, decreasing the likelihood of false identifications in subsequent data processing. Following application of the algorithm, IMS-MS measurement sensitivity is greatly increased and artifacts that previously limited the utility of applying the Hadamard transform to IMS are avoided. Figure ᅟ
Collapse
|
Research Support, N.I.H., Extramural |
11 |
32 |
9
|
Doblmann J, Dusberger F, Imre R, Hudecz O, Stanek F, Mechtler K, Dürnberger G. apQuant: Accurate Label-Free Quantification by Quality Filtering. J Proteome Res 2018; 18:535-541. [PMID: 30351950 DOI: 10.1021/acs.jproteome.8b00113] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Label-free quantification of shotgun proteomics data is a frequently used strategy, offering high dynamic range, sensitivity, and the ability to compare a high number of samples without additional labeling effort. Here, we present a bioinformatics approach that significantly improves label-free quantification results. We employ Percolator to assess the quality of quantified peptides. This allows to extract accurate and reliable quantitative results based on false discovery rate. Benchmarking our approach on previously published public data shows that it considerably outperforms currently available algorithms. apQuant is available free of charge as a node for Proteome Discoverer.
Collapse
|
Research Support, Non-U.S. Gov't |
7 |
30 |
10
|
Boekweg H, Van Der Watt D, Truong T, Johnston SM, Guise AJ, Plowey ED, Kelly RT, Payne SH. Features of Peptide Fragmentation Spectra in Single-Cell Proteomics. J Proteome Res 2021; 21:182-188. [PMID: 34920664 DOI: 10.1021/acs.jproteome.1c00670] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The goal of proteomics is to identify and quantify the complete set of proteins in a biological sample. Single-cell proteomics specializes in the identification and quantitation of proteins for individual cells, often used to elucidate cellular heterogeneity. The significant reduction in ions introduced into the mass spectrometer for single-cell samples could impact the features of MS2 fragmentation spectra. As all peptide identification software tools have been developed on spectra from bulk samples and the associated ion-rich spectra, the potential for spectral features to change is of great interest. We characterize the differences between single-cell spectra and bulk spectra by examining three fundamental spectral features that are likely to affect peptide identification performance. All features show significant changes in single-cell spectra, including the loss of annotated fragment ions, blurring signal and background peaks due to diminishing ion intensity, and distinct fragmentation pattern, compared to bulk spectra. As each of these features is a foundational part of peptide identification algorithms, it is critical to adjust algorithms to compensate for these losses.
Collapse
|
|
4 |
21 |
11
|
Bai J, Bandla C, Guo J, Alvarez RV, Bai M, Vizcaíno JA, Moreno P, Grüning B, Sallou O, Perez-Riverol Y. BioContainers Registry: Searching Bioinformatics and Proteomics Tools, Packages, and Containers. J Proteome Res 2021; 20:2056-2061. [PMID: 33625229 PMCID: PMC7611561 DOI: 10.1021/acs.jproteome.0c00904] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
BioContainers is an open-source project that aims to create, store, and distribute bioinformatics software containers and packages. The BioContainers community has developed a set of guidelines to standardize software containers including the metadata, versions, licenses, and software dependencies. BioContainers supports multiple packaging and container technologies such as Conda, Docker, and Singularity. The BioContainers provide over 9000 bioinformatics tools, including more than 200 proteomics and mass spectrometry tools. Here we introduce the BioContainers Registry and Restful API to make containerized bioinformatics tools more findable, accessible, interoperable, and reusable (FAIR). The BioContainers Registry provides a fast and convenient way to find and retrieve bioinformatics tool packages and containers. By doing so, it will increase the use of bioinformatics packages and containers while promoting replicability and reproducibility in research.
Collapse
|
Research Support, N.I.H., Intramural |
4 |
17 |
12
|
Perez‐Riverol Y, Vizcaíno JA, Griss J. Future Prospects of Spectral Clustering Approaches in Proteomics. Proteomics 2018; 18:e1700454. [PMID: 29882266 PMCID: PMC6099476 DOI: 10.1002/pmic.201700454] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Revised: 05/23/2018] [Indexed: 12/14/2022]
Abstract
In this article, current and future applications of spectral clustering are discussed in the context of mass spectrometry-based proteomics approaches. First of all, the main algorithms and tools that can currently be used to perform spectral clustering are introduced. In addition, its main applications and their use in current computational proteomics workflows are explained, including the generation of spectral libraries and spectral archives. Finally, possible future directions for spectral clustering, including its potential use to achieve a deeper coverage of the proteome and the discovery of novel post-translational modifications and single amino acid variants.
Collapse
|
research-article |
7 |
13 |
13
|
Vizcaíno JA, Walzer M, Jiménez RC, Bittremieux W, Bouyssié D, Carapito C, Corrales F, Ferro M, Heck AJ, Horvatovich P, Hubalek M, Lane L, Laukens K, Levander F, Lisacek F, Novak P, Palmblad M, Piovesan D, Pühler A, Schwämmle V, Valkenborg D, van Rijswijk M, Vondrasek J, Eisenacher M, Martens L, Kohlbacher O. A community proposal to integrate proteomics activities in ELIXIR. F1000Res 2017; 6:ELIXIR-875. [PMID: 28713550 PMCID: PMC5499783 DOI: 10.12688/f1000research.11751.1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/06/2017] [Indexed: 11/20/2022] Open
Abstract
Computational approaches have been major drivers behind the progress of proteomics in recent years. The aim of this white paper is to provide a framework for integrating computational proteomics into ELIXIR in the near future, and thus to broaden the portfolio of omics technologies supported by this European distributed infrastructure. This white paper is the direct result of a strategy meeting on 'The Future of Proteomics in ELIXIR' that took place in March 2017 in Tübingen (Germany), and involved representatives of eleven ELIXIR nodes. These discussions led to a list of priority areas in computational proteomics that would complement existing activities and close gaps in the portfolio of tools and services offered by ELIXIR so far. We provide some suggestions on how these activities could be integrated into ELIXIR's existing platforms, and how it could lead to a new ELIXIR use case in proteomics. We also highlight connections to the related field of metabolomics, where similar activities are ongoing. This white paper could thus serve as a starting point for the integration of computational proteomics into ELIXIR. Over the next few months we will be working closely with all stakeholders involved, and in particular with other representatives of the proteomics community, to further refine this paper.
Collapse
|
discussion |
8 |
12 |
14
|
Schallert K, Verschaffelt P, Mesuere B, Benndorf D, Martens L, Van Den Bossche T. Pout2Prot: An Efficient Tool to Create Protein (Sub)groups from Percolator Output Files. J Proteome Res 2022; 21:1175-1180. [PMID: 35143215 DOI: 10.1021/acs.jproteome.1c00685] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In metaproteomics, the study of the collective proteome of microbial communities, the protein inference problem is more challenging than in single-species proteomics. Indeed, a peptide sequence can be present not only in multiple proteins or protein isoforms of the same species, but also in homologous proteins from closely related species. To assign the taxonomy and functions of the microbial species, specialized tools have been developed, such as Prophane. This tool, however, is not directly compatible with post-processing tools such as Percolator. In this manuscript we therefore present Pout2Prot, which takes Percolator Output (.pout) files from multiple experiments and creates protein group and protein subgroup output files (.tsv) that can be used directly with Prophane. We investigated different grouping strategies and compared existing protein grouping tools to develop an advanced protein grouping algorithm that offers a variety of different approaches, allows grouping for multiple files, and uses a weighted spectral count for protein (sub)groups to reflect abundance. Pout2Prot is available as a web application at https://pout2prot.ugent.be and is installable via pip as a standalone command line tool and reusable software library. All code is open source under the Apache License 2.0 and is available at https://github.com/compomics/pout2prot.
Collapse
|
|
3 |
9 |
15
|
Stolte C, Sabir KS, Heinrich J, Hammang CJ, Schafferhans A, O'Donoghue SI. Integrated visual analysis of protein structures, sequences, and feature data. BMC Bioinformatics 2015; 16 Suppl 11:S7. [PMID: 26329268 PMCID: PMC4547178 DOI: 10.1186/1471-2105-16-s11-s7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND To understand the molecular mechanisms that give rise to a protein's function, biologists often need to (i) find and access all related atomic-resolution 3D structures, and (ii) map sequence-based features (e.g., domains, single-nucleotide polymorphisms, post-translational modifications) onto these structures. RESULTS To streamline these processes we recently developed Aquaria, a resource offering unprecedented access to protein structure information based on an all-against-all comparison of SwissProt and PDB sequences. In this work, we provide a requirements analysis for several frequently occuring tasks in molecular biology and describe how design choices in Aquaria meet these requirements. Finally, we show how the interface can be used to explore features of a protein and gain biologically meaningful insights in two case studies conducted by domain experts. CONCLUSIONS The user interface design of Aquaria enables biologists to gain unprecedented access to molecular structures and simplifies the generation of insight. The tasks involved in mapping sequence features onto structures can be conducted easier and faster using Aquaria.
Collapse
|
research-article |
10 |
7 |
16
|
Verschaffelt P, Van Den Bossche T, Gabriel W, Burdukiewicz M, Soggiu A, Martens L, Renard BY, Schiebenhoefer H, Mesuere B. MegaGO: A Fast Yet Powerful Approach to Assess Functional Gene Ontology Similarity across Meta-Omics Data Sets. J Proteome Res 2021; 20:2083-2088. [PMID: 33661648 DOI: 10.1021/acs.jproteome.0c00926] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
The study of microbiomes has gained in importance over the past few years and has led to the emergence of the fields of metagenomics, metatranscriptomics, and metaproteomics. While initially focused on the study of biodiversity within these communities, the emphasis has increasingly shifted to the study of (changes in) the complete set of functions available in these communities. A key tool to study this functional complement of a microbiome is Gene Ontology (GO) term analysis. However, comparing large sets of GO terms is not an easy task due to the deeply branched nature of GO, which limits the utility of exact term matching. To solve this problem, we here present MegaGO, a user-friendly tool that relies on semantic similarity between GO terms to compute the functional similarity between multiple data sets. MegaGO is high performing: Each set can contain thousands of GO terms, and results are calculated in a matter of seconds. MegaGO is available as a web application at https://megago.ugent.be and is installable via pip as a standalone command line tool and reusable software library. All code is open source under the MIT license and is available at https://github.com/MEGA-GO/.
Collapse
|
Research Support, Non-U.S. Gov't |
4 |
6 |
17
|
Postic G, Marcoux J, Reys V, Andreani J, Vandenbrouck Y, Bousquet MP, Mouton-Barbosa E, Cianférani S, Burlet-Schiltz O, Guerois R, Labesse G, Tufféry P. Probing Protein Interaction Networks by Combining MS-Based Proteomics and Structural Data Integration. J Proteome Res 2020; 19:2807-2820. [PMID: 32338910 DOI: 10.1021/acs.jproteome.0c00066] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Protein-protein interactions play a major role in the molecular machinery of life, and various techniques such as AP-MS are dedicated to their identification. However, those techniques return lists of proteins devoid of organizational structure, not detailing which proteins interact with which others. Proposing a hierarchical view of the interactions between the members of the flat list becomes highly tedious for large data sets when done by hand. To help hierarchize this data, we introduce a new bioinformatics protocol that integrates information of the multimeric protein 3D structures available in the Protein Data Bank using remote homology detection, as well as information related to Short Linear Motifs and interaction data from the BioGRID. We illustrate on two unrelated use-cases of different complexity how our approach can be useful to decipher the network of interactions hidden in the list of input proteins, and how it provides added value compared to state-of-the-art resources such as Interactome3D or STRING. Particularly, we show the added value of using homology detection to distinguish between orthologs and paralogs, and to distinguish between core obligate and more facultative interactions. We also demonstrate the potential of considering interactions occurring through Short Linear Motifs.
Collapse
|
Research Support, Non-U.S. Gov't |
5 |
5 |
18
|
Sharman K, Patterson NH, Weiss A, Neumann EK, Guiberson ER, Ryan DJ, Gutierrez DB, Spraggins JM, Van de Plas R, Skaar EP, Caprioli RM. Rapid Multivariate Analysis Approach to Explore Differential Spatial Protein Profiles in Tissue. J Proteome Res 2023; 22:1394-1405. [PMID: 35849531 PMCID: PMC9845430 DOI: 10.1021/acs.jproteome.2c00206] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Spatially targeted proteomics analyzes the proteome of specific cell types and functional regions within tissue. While spatial context is often essential to understanding biological processes, interpreting sub-region-specific protein profiles can pose a challenge due to the high-dimensional nature of the data. Here, we develop a multivariate approach for rapid exploration of differential protein profiles acquired from distinct tissue regions and apply it to analyze a published spatially targeted proteomics data set collected from Staphylococcus aureus-infected murine kidney, 4 and 10 days postinfection. The data analysis process rapidly filters high-dimensional proteomic data to reveal relevant differentiating species among hundreds to thousands of measured molecules. We employ principal component analysis (PCA) for dimensionality reduction of protein profiles measured by microliquid extraction surface analysis mass spectrometry. Subsequently, k-means clustering of the PCA-processed data groups samples by chemical similarity. Cluster center interpretation revealed a subset of proteins that differentiate between spatial regions of infection over two time points. These proteins appear involved in tricarboxylic acid metabolomic pathways, calcium-dependent processes, and cytoskeletal organization. Gene ontology analysis further uncovered relationships to tissue damage/repair and calcium-related defense mechanisms. Applying our analysis in infectious disease highlighted differential proteomic changes across abscess regions over time, reflecting the dynamic nature of host-pathogen interactions.
Collapse
|
Research Support, N.I.H., Extramural |
2 |
4 |
19
|
Roy A, Kalita B, Jayaprakash A, Kumar A, Lakshmi PTV. Computational identification and characterization of vascular wilt pathogen ( Fusarium oxysporum f. sp. lycopersici) CAZymes in tomato xylem sap. J Biomol Struct Dyn 2022:1-17. [PMID: 35470778 DOI: 10.1080/07391102.2022.2067236] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Fusarium oxysporum f. sp. lycopersici is a devastating plant pathogenic fungi known for wilt disease in the tomato plant and secrete cell wall degrading enzymes. These enzymes are collectively known as carbohydrate-active enzymes (CAZymes), crucial for growth, colonization and pathogenesis. Therefore, the present study was aimed to identify and annotate pathogen CAZymes in the xylem sap of a susceptible tomato variety using downstream proteomics and meta servers. Further, structural elucidation and conformational stability analysis of the selected CAZyme families were done through homology modeling and molecular dynamics simulation. Among all the fungal proteins identified, the carbohydrate metabolic process was found to be enriched. Most of the annotated CAZymes belonged to the hydrolase and oxidoreductase families, and 90% were soluble and extracellular. Moreover, using a publically available interactome database, interactions were observed between the families acting on chitin, hemicellulose and pectin. Subsequently, important catalytic residues were identified in the candidate CAZymes belonging to carbohydrate esterase (CE8) and glycosyl hydrolase (GH18 and GH28). Further, essential dynamics after molecular simulation of 100 ns revealed the overall behavior of these CAZymes with distinct global minima and transition states in CE8. Thus, our study identified some of the CAZyme families that assist in pathogenesis and growth through host cell wall deconstruction with further structural insight into the selected CAZyme families.Communicated by Ramaswamy H. Sarma.
Collapse
|
|
3 |
1 |
20
|
Kalogeropoulos K, Moldt Haack A, Madzharova E, Di Lorenzo A, Hanna R, Schoof EM, Auf dem Keller U. CLIPPER 2.0: Peptide-Level Annotation and Data Analysis for Positional Proteomics. Mol Cell Proteomics 2024; 23:100781. [PMID: 38703894 PMCID: PMC11192779 DOI: 10.1016/j.mcpro.2024.100781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 04/11/2024] [Accepted: 05/01/2024] [Indexed: 05/06/2024] Open
Abstract
Positional proteomics methodologies have transformed protease research, and have brought mass spectrometry (MS)-based degradomics studies to the forefront of protease characterization and system-wide interrogation of protease signaling. Considerable advancements in both sensitivity and throughput of liquid chromatography (LC)-MS/MS instrumentation enable the generation of enormous positional proteomics datasets of natural and protein termini and neo-termini of cleaved protease substrates. However, concomitant progress has not been observed to the same extent in data analysis and post-processing steps, arguably constituting the largest bottleneck in positional proteomics workflows. Here, we present a computational tool, CLIPPER 2.0, that builds on prior algorithms developed for MS-based protein termini analysis, facilitating peptide-level annotation and data analysis. CLIPPER 2.0 can be used with several sample preparation workflows and proteomics search algorithms and enables fast and automated database information retrieval, statistical and network analysis, as well as visualization of terminomic datasets. We demonstrate the applicability of our tool by analyzing GluC and MMP9 cleavages in HeLa lysates. CLIPPER 2.0 is available at https://github.com/UadKLab/CLIPPER-2.0.
Collapse
|
research-article |
1 |
|
21
|
To PKP, Wu L, Chan CM, Hoque A, Lam H. ClusterSheep: A Graphics Processing Unit-Accelerated Software Tool for Large-Scale Clustering of Tandem Mass Spectra from Shotgun Proteomics. J Proteome Res 2021; 20:5359-5367. [PMID: 34734728 DOI: 10.1021/acs.jproteome.1c00485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Modern shotgun proteomics experiments generate gigabytes of spectra every hour, only a fraction of which were utilized to form biological conclusions. Instead of being stored as flat files in public data repositories, this large amount of data can be better organized to facilitate data reuse. Clustering these spectra by similarity can be helpful in building high-quality spectral libraries, correcting identification errors, and highlighting frequently observed but unidentified spectra. However, large-scale clustering is time-consuming. Here, we present ClusterSheep, a method utilizing Graphics Processing Units (GPUs) to accelerate the process. Unlike previously proposed algorithms for this purpose, our method performs true pairwise comparison of all spectra within a precursor mass-to-charge ratio tolerance, thereby preserving the full cluster structures. ClusterSheep was benchmarked against previously reported clustering tools, MS-Cluster, MaRaCluster, and msCRUSH. The software tool also functions as an interactive visualization tool with a persistent state, enabling the user to explore the resulting clusters visually and retrieve the clustering results as desired.
Collapse
|
|
4 |
|
22
|
Zhang Y, Shu K, Chang C. [Advances of peptide-centric data-independent acquisition analysis algorithms and software tools]. SHENG WU GONG CHENG XUE BAO = CHINESE JOURNAL OF BIOTECHNOLOGY 2023; 39:3579-3593. [PMID: 37805839 DOI: 10.13345/j.cjb.230079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 10/09/2023]
Abstract
Data-independent acquisition (DIA) is a high-throughput, unbiased mass spectrometry data acquisition method which has good quantitative reproducibility and is friendly to low-abundance proteins. It becomes the preferred choice for clinical proteomic studies especially for large cohort studies in recent years. The mass-spectrometry (MS)/MS spectra generated by DIA is usually heavily mixed with fragment ion information of multiple peptides, which makes the protein identification and quantification more difficult. Currently, DIA data analysis methods fall into two main categories, namely peptide-centric and spectrum-centric. The peptide-centric strategy is more sensitive for identification and more accurate for quantification. Thus, it has become the mainstream strategy for DIA data analysis, which includes four key steps: building a spectral library, extracting ion chromatogram, feature scoring and statistical quality control. This work reviews the peptide-centric DIA data analysis procedure, introduces the corresponding algorithms and software tools, and summarizes the improvements for the existing algorithms. Finally, the future development directions are discussed.
Collapse
|
English Abstract |
2 |
|
23
|
Hoopmann MR, Shteynberg DD, Zelter A, Riffle M, Lyon AS, Agard DA, Luan Q, Nolen BJ, MacCoss MJ, Davis TN, Moritz RL. Improved Analysis of Cross-Linking Mass Spectrometry Data with Kojak 2.0, Advanced by Integration into the Trans-Proteomic Pipeline. J Proteome Res 2023; 22:647-655. [PMID: 36629399 PMCID: PMC10234491 DOI: 10.1021/acs.jproteome.2c00670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Fragmentation ion spectral analysis of chemically cross-linked proteins is an established technology in the proteomics research repertoire for determining protein interactions, spatial orientation, and structure. Here we present Kojak version 2.0, a major update to the original Kojak algorithm, which was developed to identify cross-linked peptides from fragment ion spectra using a database search approach. A substantially improved algorithm with updated scoring metrics, support for cleavable cross-linkers, and identification of cross-links between 15N-labeled homomultimers are among the newest features of Kojak 2.0 presented here. Kojak 2.0 is now integrated into the Trans-Proteomic Pipeline, enabling access to dozens of additional tools within that suite. In particular, the PeptideProphet and iProphet tools for validation of cross-links improve the sensitivity and accuracy of correct cross-link identifications at user-defined thresholds. These new features improve the versatility of the algorithm, enabling its use in a wider range of experimental designs and analysis pipelines. Kojak 2.0 remains open-source and multiplatform.
Collapse
|
Research Support, N.I.H., Extramural |
2 |
|
24
|
Abdul-Khalek N, Picciani M, Shouman O, Wimmer R, Overgaard MT, Wilhelm M, Gregersen Echers S. To Fly, or Not to Fly, That Is the Question: A Deep Learning Model for Peptide Detectability Prediction in Mass Spectrometry. J Proteome Res 2025. [PMID: 40344201 DOI: 10.1021/acs.jproteome.4c00973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2025]
Abstract
Identifying detectable peptides, known as flyers, is key in mass spectrometry-based proteomics. Peptide detectability is strongly related to peptide sequences and their resulting physicochemical properties. Moreover, the high variability in MS data challenges the development of a generic model for detectability prediction, underlining the need for customizable tools. We present Pfly, a deep learning model developed to predict peptide detectability based solely on peptide sequence. Pfly is a versatile and reliable state-of-the-art tool, offering high performance, accessibility, and easy customizability for end-users. This adaptability allows researchers to tailor Pfly to specific experimental conditions, improving accuracy and expanding applicability across various research fields. Pfly is an encoder-decoder with an attention mechanism, classifying peptides as flyers or non-flyers, and providing both binary and categorical probabilities for four distinct classes defined in this study. The model was initially trained on a synthetic peptide library and subsequently fine-tuned with a biological dataset to mitigate bias toward synthesizability, improving predictive capacity and outperforming state-of-the-art predictors in benchmark comparisons across different human and cross-species datasets. The study further investigates the influence of protein abundance and rescoring, illustrating the negative impact on peptide identification due to misclassification. Pfly has been integrated into the DLOmix framework and is accessible on GitHub at https://github.com/wilhelm-lab/dlomix.
Collapse
|
|
1 |
|
25
|
Datta S, Nabeel Asim M, Dengel A, Ahmed S. NTpred: a robust and precise machine learning framework for in silico identification of Tyrosine nitration sites in protein sequences. Brief Funct Genomics 2024; 23:163-179. [PMID: 37248673 DOI: 10.1093/bfgp/elad018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 04/12/2023] [Accepted: 05/02/2023] [Indexed: 05/31/2023] Open
Abstract
Post-translational modifications (PTMs) either enhance a protein's activity in various sub-cellular processes, or degrade their activity which leads toward failure of intracellular processes. Tyrosine nitration (NT) modification degrades protein's activity that initiates and propagates various diseases including neurodegenerative, cardiovascular, autoimmune diseases and carcinogenesis. Identification of NT modification supports development of novel therapies and drug discoveries for associated diseases. Identification of NT modification in biochemical labs is expensive, time consuming and error-prone. To supplement this process, several computational approaches have been proposed. However these approaches fail to precisely identify NT modification, due to the extraction of irrelevant, redundant and less discriminative features from protein sequences. This paper presents the NTpred framework that is competent in extracting comprehensive features from raw protein sequences using four different sequence encoders. To reap the benefits of different encoders, it generates four additional feature spaces by fusing different combinations of individual encodings. Furthermore, it eradicates irrelevant and redundant features from eight different feature spaces through a Recursive Feature Elimination process. Selected features of four individual encodings and four feature fusion vectors are used to train eight different Gradient Boosted Tree classifiers. The probability scores from the trained classifiers are utilized to generate a new probabilistic feature space, which is used to train a Logistic Regression classifier. On the BD1 benchmark dataset, the proposed framework outperforms the existing best-performing predictor in 5-fold cross validation and independent test evaluation with combined improvement of 13.7% in MCC and 20.1% in AUC. Similarly, on the BD2 benchmark dataset, the proposed framework outperforms the existing best-performing predictor with combined improvement of 5.3% in MCC and 1.0% in AUC. NTpred is publicly available for further experimentation and predictive use at: https://sds_genetic_analysis.opendfki.de/PredNTS/.
Collapse
|
|
1 |
|