1
|
Ananth V, Sanders J, Yilmaz M, Wen B, Oh S, Noble WS. A learned score function improves the power of mass spectrometry database search. Bioinformatics 2024; 40:i410-i417. [PMID: 38940129 PMCID: PMC11211853 DOI: 10.1093/bioinformatics/btae218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION One of the core problems in the analysis of protein tandem mass spectrometry data is the peptide assignment problem: determining, for each observed spectrum, the peptide sequence that was responsible for generating the spectrum. Two primary classes of methods are used to solve this problem: database search and de novo peptide sequencing. State-of-the-art methods for de novo sequencing use machine learning methods, whereas most database search engines use hand-designed score functions to evaluate the quality of a match between an observed spectrum and a candidate peptide from the database. We hypothesized that machine learning models for de novo sequencing implicitly learn a score function that captures the relationship between peptides and spectra, and thus may be re-purposed as a score function for database search. Because this score function is trained from massive amounts of mass spectrometry data, it could potentially outperform existing, hand-designed database search tools. RESULTS To test this hypothesis, we re-engineered Casanovo, which has been shown to provide state-of-the-art de novo sequencing capabilities, to assign scores to given peptide-spectrum pairs. We then evaluated the statistical power of this Casanovo score function, Casanovo-DB, to detect peptides on a benchmark of three mass spectrometry runs from three different species. In addition, we show that re-scoring with the Percolator post-processor benefits Casanovo-DB more than other score functions, further increasing the number of detected peptides.
Collapse
Affiliation(s)
- Varun Ananth
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Justin Sanders
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Melih Yilmaz
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Bo Wen
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Sewoong Oh
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - William Stafford Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
2
|
Kalhor M, Lapin J, Picciani M, Wilhelm M. Rescoring Peptide Spectrum Matches: Boosting Proteomics Performance by Integrating Peptide Property Predictors Into Peptide Identification. Mol Cell Proteomics 2024; 23:100798. [PMID: 38871251 DOI: 10.1016/j.mcpro.2024.100798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 05/26/2024] [Accepted: 06/09/2024] [Indexed: 06/15/2024] Open
Abstract
Rescoring of peptide spectrum matches originating from database search engines enabled by peptide property predictors is exceeding the performance of peptide identification from traditional database search engines. In contrast to the peptide spectrum match scores calculated by traditional database search engines, rescoring peptide spectrum matches generates scores based on comparing observed and predicted peptide properties, such as fragment ion intensities and retention times. These newly generated scores enable a more efficient discrimination between correct and incorrect peptide spectrum matches. This approach was shown to lead to substantial improvements in the number of confidently identified peptides, facilitating the analysis of challenging datasets in various fields such as immunopeptidomics, metaproteomics, proteogenomics, and single-cell proteomics. In this review, we summarize the key elements leading up to the recent introduction of multiple data-driven rescoring pipelines. We provide an overview of relevant post-processing rescoring tools, introduce prominent data-driven rescoring pipelines for various applications, and highlight limitations, opportunities, and future perspectives of this approach and its impact on mass spectrometry-based proteomics.
Collapse
Affiliation(s)
- Mostafa Kalhor
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Joel Lapin
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany; Munich Data Science Institute, Technical University of Munich, Garching, Germany.
| |
Collapse
|
3
|
Wen B, Freestone J, Riffle M, MacCoss MJ, Noble WS, Keich U. Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.01.596967. [PMID: 38895431 PMCID: PMC11185562 DOI: 10.1101/2024.06.01.596967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
A pressing statistical challenge in the field of mass spectrometry proteomics is how to assess whether a given software tool provides accurate error control. Each software tool for searching such data uses its own internally implemented methodology for reporting and controlling the error. Many of these software tools are closed source, with incompletely documented methodology, and the strategies for validating the error are inconsistent across tools. In this work, we identify three different methods for validating false discovery rate (FDR) control in use in the field, one of which is invalid, one of which can only provide a lower bound rather than an upper bound, and one of which is valid but under-powered. The result is that the field has a very poor understanding of how well we are doing with respect to FDR control, particularly for the analysis of data-independent acquisition (DIA) data. We therefore propose a new, more powerful method for evaluating FDR control in this setting, and we then employ that method, along with an existing lower bounding technique, to characterize a variety of popular search tools. We find that the search tools for analysis of data-dependent acquisition (DDA) data generally seem to control the FDR at the peptide level, whereas none of the DIA search tools consistently controls the FDR at the peptide level across all the datasets we investigated. Furthermore, this problem becomes much worse when the latter tools are evaluated at the protein level. These results may have significant implications for various downstream analyses, since proper FDR control has the potential to reduce noise in discovery lists and thereby boost statistical power.
Collapse
Affiliation(s)
- Bo Wen
- Department of Genome Sciences, University of Washington
| | - Jack Freestone
- School of Mathematics and Statistics, University of Sydney
| | | | | | - William S Noble
- Department of Genome Sciences, University of Washington
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| | - Uri Keich
- School of Mathematics and Statistics, University of Sydney
| |
Collapse
|
4
|
Freestone J, Noble WS, Keich U. Reinvestigating the Correctness of Decoy-Based False Discovery Rate Control in Proteomics Tandem Mass Spectrometry. J Proteome Res 2024. [PMID: 38687997 DOI: 10.1021/acs.jproteome.3c00902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2024]
Abstract
Traditional database search methods for the analysis of bottom-up proteomics tandem mass spectrometry (MS/MS) data are limited in their ability to detect peptides with post-translational modifications (PTMs). Recently, "open modification" database search strategies, in which the requirement that the mass of the database peptide closely matches the observed precursor mass is relaxed, have become popular as ways to find a wider variety of types of PTMs. Indeed, in one study, Kong et al. reported that the open modification search tool MSFragger can achieve higher statistical power to detect peptides than a traditional "narrow window" database search. We investigated this claim empirically and, in the process, uncovered a potential general problem with false discovery rate (FDR) control in the machine learning postprocessors Percolator and PeptideProphet. This problem might have contributed to Kong et al.'s report that their empirical results suggest that false discovery (FDR) control in the narrow window setting might generally be compromised. Indeed, reanalyzing the same data while using a more standard form of target-decoy competition-based FDR control, we found that, after accounting for chimeric spectra as well as for the inherent difference in the number of candidates in open and narrow searches, the data does not provide sufficient evidence that FDR control in proteomics MS/MS database search is inherently problematic.
Collapse
Affiliation(s)
- Jack Freestone
- School of Mathematics and Statistics F07, University of Sydney, New South Wales 2006, Australia
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Uri Keich
- School of Mathematics and Statistics F07, University of Sydney, New South Wales 2006, Australia
| |
Collapse
|
5
|
Lin A, See D, Fondrie WE, Keich U, Noble WS. Target-decoy false discovery rate estimation using Crema. Proteomics 2024; 24:e2300084. [PMID: 38380501 DOI: 10.1002/pmic.202300084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 01/06/2024] [Accepted: 01/16/2024] [Indexed: 02/22/2024]
Abstract
Assigning statistical confidence estimates to discoveries produced by a tandem mass spectrometry proteomics experiment is critical to enabling principled interpretation of the results and assessing the cost/benefit ratio of experimental follow-up. The most common technique for computing such estimates is to use target-decoy competition (TDC), in which observed spectra are searched against a database of real (target) peptides and a database of shuffled or reversed (decoy) peptides. TDC procedures for estimating the false discovery rate (FDR) at a given score threshold have been developed for application at the level of spectra, peptides, or proteins. Although these techniques are relatively straightforward to implement, it is common in the literature to skip over the implementation details or even to make mistakes in how the TDC procedures are applied in practice. Here we present Crema, an open-source Python tool that implements several TDC methods of spectrum-, peptide- and protein-level FDR estimation. Crema is compatible with a variety of existing database search tools and provides a straightforward way to obtain robust FDR estimates.
Collapse
Affiliation(s)
- Andy Lin
- Chemical and Biological Signatures, Pacific Northwest National Laboratory, Seattle, Washington, USA
| | - Donavan See
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, USA
| | | | - Uri Keich
- School of Mathematics and Statistics, University of Sydney, Sydney, Australia
| | - William Stafford Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, USA
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| |
Collapse
|
6
|
Madej D, Lam H. On the use of tandem mass spectra acquired from samples of evolutionarily distant organisms to validate methods for false discovery rate estimation. Proteomics 2024:e2300398. [PMID: 38491400 DOI: 10.1002/pmic.202300398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 03/01/2024] [Accepted: 03/06/2024] [Indexed: 03/18/2024]
Abstract
Estimating the false discovery rate (FDR) of peptide identifications is a key step in proteomics data analysis, and many methods have been proposed for this purpose. Recently, an entrapment-inspired protocol to validate methods for FDR estimation appeared in articles showcasing new spectral library search tools. That validation approach involves generating incorrect spectral matches by searching spectra from evolutionarily distant organisms (entrapment queries) against the original target search space. Although this approach may appear similar to the solutions using entrapment databases, it represents a distinct conceptual framework whose correctness has not been verified yet. In this viewpoint, we first discussed the background of the entrapment-based validation protocols and then conducted a few simple computational experiments to verify the assumptions behind them. The results reveal that entrapment databases may, in some implementations, be a reasonable choice for validation, while the assumptions underpinning validation protocols based on entrapment queries are likely to be violated in practice. This article also highlights the need for well-designed frameworks for validating FDR estimation methods in proteomics.
Collapse
Affiliation(s)
- Dominik Madej
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| |
Collapse
|
7
|
Lou R, Shui W. Acquisition and Analysis of DIA-Based Proteomic Data: A Comprehensive Survey in 2023. Mol Cell Proteomics 2024; 23:100712. [PMID: 38182042 PMCID: PMC10847697 DOI: 10.1016/j.mcpro.2024.100712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/27/2023] [Accepted: 01/02/2024] [Indexed: 01/07/2024] Open
Abstract
Data-independent acquisition (DIA) mass spectrometry (MS) has emerged as a powerful technology for high-throughput, accurate, and reproducible quantitative proteomics. This review provides a comprehensive overview of recent advances in both the experimental and computational methods for DIA proteomics, from data acquisition schemes to analysis strategies and software tools. DIA acquisition schemes are categorized based on the design of precursor isolation windows, highlighting wide-window, overlapping-window, narrow-window, scanning quadrupole-based, and parallel accumulation-serial fragmentation-enhanced DIA methods. For DIA data analysis, major strategies are classified into spectrum reconstruction, sequence-based search, library-based search, de novo sequencing, and sequencing-independent approaches. A wide array of software tools implementing these strategies are reviewed, with details on their overall workflows and scoring approaches at different steps. The generation and optimization of spectral libraries, which are critical resources for DIA analysis, are also discussed. Publicly available benchmark datasets covering global proteomics and phosphoproteomics are summarized to facilitate performance evaluation of various software tools and analysis workflows. Continued advances and synergistic developments of versatile components in DIA workflows are expected to further enhance the power of DIA-based proteomics.
Collapse
Affiliation(s)
- Ronghui Lou
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| | - Wenqing Shui
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| |
Collapse
|
8
|
Holstein T, Muth T. Bioinformatic Workflows for Metaproteomics. Methods Mol Biol 2024; 2820:187-213. [PMID: 38941024 DOI: 10.1007/978-1-0716-3910-8_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]
Abstract
The strong influence of microbiomes on areas such as ecology and human health has become widely recognized in the past years. Accordingly, various techniques for the investigation of the composition and function of microbial community samples have been developed. Metaproteomics, the comprehensive analysis of the proteins from microbial communities, allows for the investigation of not only the taxonomy but also the functional and quantitative composition of microbiome samples. Due to the complexity of the investigated communities, methods developed for single organism proteomics cannot be readily applied to metaproteomic samples. For this purpose, methods specifically tailored to metaproteomics are required. In this work, a detailed overview of current bioinformatic solutions and protocols in metaproteomics is given. After an introduction to the proteomic database search, the metaproteomic post-processing steps are explained in detail. Ten specific bioinformatic software solutions are focused on, covering various steps including database-driven identification and quantification as well as taxonomic and functional assignment.
Collapse
Affiliation(s)
- Tanja Holstein
- Section eScience (S.3), Federal Institute for Materials Research and Testing, Berlin, Germany
- VIB-UGent Center for Medical Biotechnology, VIB and Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Data Competence Center, Robert Koch Institute, Berlin, Deutschland
| | - Thilo Muth
- Section eScience (S.3), Federal Institute for Materials Research and Testing, Berlin, Germany.
- Data Competence Center, Robert Koch Institute, Berlin, Deutschland.
| |
Collapse
|
9
|
Pandeswari PB, Isaac AE, Sabareesh V. Database Creator for Mass Analysis of Peptides and Proteins, DC-MAPP: A Standalone Tool for Simplifying Manual Analysis of Mass Spectral Data to Identify Peptide/Protein Sequences. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2023; 34:1962-1969. [PMID: 37526995 DOI: 10.1021/jasms.3c00030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
Proteomic studies typically involve the use of different types of software for annotating experimental tandem mass spectrometric data (MS/MS) and thereby simplifying the process of peptide and protein identification. For such annotations, these softwares calculate the m/z values of the peptide/protein precursor and fragment ions, for which a database of protein sequences must be provided as an input file. The calculated m/z values are stored as another database, which the user usually cannot view. Database Creator for Mass Analysis of Peptides and Proteins (DC-MAPP) is a novel standalone software that can create custom databases for "viewing" the calculated m/z values of precursor and fragment ions, prior to the database search. It contains three modules. Peptide/Protein sequences as per user's choice can be entered as input to the first module for creating a custom database. In the second module, m/z values must be queried-in, which are searched within the custom database to identify protein/peptide sequences. The third module is suited for peptide mass fingerprinting, which can be used to analyze both ESI and MALDI mass spectral data. The feature of "viewing" the custom database can be helpful not only for better understanding the search engine processes, but also for designing multiple reaction monitoring (MRM) methods. Post-translational modifications and protein isoforms can also be analyzed. Since, DC-MAPP relies on the protein/peptide "sequences" for creating custom databases, it may not be applicable for the searches involving spectral libraries. Python language was used for implementation, and the graphical user interface was built with Page/Tcl, making this tool more user-friendly. It is freely available at https://vit.ac.in/DC-MAPP/.
Collapse
Affiliation(s)
- Pandi Boomathi Pandeswari
- Centre for Bio-Separation Technology (CBST), Vellore Institute of Technology (VIT), Vellore, Tamil Nadu - 632014, India
| | - Arnold Emerson Isaac
- Bioinformatics Programming Laboratory, School of Bio Sciences & Technology (SBST), VIT, Vellore, Tamil Nadu - 632014, India
| | - Varatharajan Sabareesh
- Centre for Bio-Separation Technology (CBST), Vellore Institute of Technology (VIT), Vellore, Tamil Nadu - 632014, India
| |
Collapse
|
10
|
Chaiyadet S, Sotillo J, Smout M, Cooper M, Doolan DL, Waardenberg A, Eichenberger RM, Field M, Brindley PJ, Laha T, Loukas A. Small extracellular vesicles but not microvesicles from Opisthorchis viverrini promote cell proliferation in human cholangiocytes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.22.540805. [PMID: 37292777 PMCID: PMC10245807 DOI: 10.1101/2023.05.22.540805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Chronic infection with O. viverrini has been linked to the development of cholangiocarcinoma (CCA), which is a major public health burden in the Lower Mekong River Basin countries, including Thailand, Lao PDR, Vietnam and Cambodia. Despite its importance, the exact mechanisms by which O. viverrini promotes CCA are largely unknown. In this study, we characterized different extracellular vesicle populations released by O. viverrini (OvEVs) using proteomic and transcriptomic analyses and investigated their potential role in host-parasite interactions. While 120k OvEVs promoted cell proliferation in H69 cells at different concentrations, 15k OvEVs did not produce any effect compared to controls. The proteomic analysis of both populations showed differences in their composition that could contribute to this differential effect. Furthermore, the miRNAs present in 120k EVs were analysed and their potential interactions with human host genes was explored by computational target prediction. Different pathways involved in inflammation, immune response and apoptosis were identified as potentially targeted by the miRNAs present in this population of EVs. This is the first study showing specific roles for different EV populations in the pathogenesis of a parasitic helminth, and more importantly, an important advance towards deciphering the mechanisms used in establishment of opisthorchiasis and liver fluke infection-associated malignancy.
Collapse
Affiliation(s)
- Sujittra Chaiyadet
- Department of Tropical Medicine, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
| | - Javier Sotillo
- Parasitology Reference and Research Laboratory, Centro Nacional de Microbiologia, Instituto de Salud Carlos III, Majadahonda, Madrid, Spain
| | - Michael Smout
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, Australia
| | - Martha Cooper
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, Australia
| | - Denise L. Doolan
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, Australia
| | - Ashley Waardenberg
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, Australia
- Current affiliation: i-Synapse, Cairns, QLD, Australia
| | - Ramon M. Eichenberger
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, Australia
| | - Matt Field
- Centre for Tropical Bioinformatics and Molecular Biology, College of Public Health, Medical and Veterinary Science, James Cook University, Cairns, Australia
- Immunogenomics Lab, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Paul J. Brindley
- Department of Microbiology, Immunology and Tropical Medicine, School of Medicine & Health Sciences, George Washington University, Washington, DC, USA
| | - Thewarach Laha
- Department of Parasitology, Faculty of Medicine, Khon Kaen University, Thailand
| | - Alex Loukas
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, Australia
| |
Collapse
|
11
|
Zhang Q. Mzion enables deep and precise identification of peptides in data-dependent acquisition proteomics. Sci Rep 2023; 13:7056. [PMID: 37120666 PMCID: PMC10148867 DOI: 10.1038/s41598-023-34323-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 04/27/2023] [Indexed: 05/01/2023] Open
Abstract
Sensitive and reliable identification of proteins and peptides pertains the basis of proteomics. We introduce Mzion, a new database search tool for data-dependent acquisition (DDA) proteomics. Our tool utilizes an intensity tally strategy and achieves generally a higher performance in terms of depth and precision across 20 datasets, ranging from large-scale to single-cell proteomics. Compared to several other search engines, Mzion matches on average 20% more peptide spectra at tryptic enzymatic specificity and 80% more at no enzymatic specificity from six large-scale, global datasets. Mzion also identifies more phosphopeptide spectra that can be explained by fewer proteins, demonstrated by six large-scale, local datasets corresponding to the global data. Our findings highlight the potential of Mzion for improving proteomic analysis and advancing our understanding of protein biology.
Collapse
Affiliation(s)
- Qiang Zhang
- Division of Endocrinology, Metabolism and Lipid Research, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
12
|
Higgins L, Gerdes H, Cutillas PR. Principles of phosphoproteomics and applications in cancer research. Biochem J 2023; 480:403-420. [PMID: 36961757 PMCID: PMC10212522 DOI: 10.1042/bcj20220220] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 02/24/2023] [Accepted: 02/28/2023] [Indexed: 03/25/2023]
Abstract
Phosphorylation constitutes the most common and best-studied regulatory post-translational modification in biological systems and archetypal signalling pathways driven by protein and lipid kinases are disrupted in essentially all cancer types. Thus, the study of the phosphoproteome stands to provide unique biological information on signalling pathway activity and on kinase network circuitry that is not captured by genetic or transcriptomic technologies. Here, we discuss the methods and tools used in phosphoproteomics and highlight how this technique has been used, and can be used in the future, for cancer research. Challenges still exist in mass spectrometry phosphoproteomics and in the software required to provide biological information from these datasets. Nevertheless, improvements in mass spectrometers with enhanced scan rates, separation capabilities and sensitivity, in biochemical methods for sample preparation and in computational pipelines are enabling an increasingly deep analysis of the phosphoproteome, where previous bottlenecks in data acquisition, processing and interpretation are being relieved. These powerful hardware and algorithmic innovations are not only providing exciting new mechanistic insights into tumour biology, from where new drug targets may be derived, but are also leading to the discovery of phosphoproteins as mediators of drug sensitivity and resistance and as classifiers of disease subtypes. These studies are, therefore, uncovering phosphoproteins as a new generation of disruptive biomarkers to improve personalised anti-cancer therapies.
Collapse
Affiliation(s)
- Luke Higgins
- Cell Signaling and Proteomics Group, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, U.K
| | - Henry Gerdes
- Cell Signaling and Proteomics Group, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, U.K
| | - Pedro R. Cutillas
- Cell Signaling and Proteomics Group, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, U.K
- Alan Turing Institute, The British Library, London, U.K
- Digital Environment Research Institute, Queen Mary University of London, London, U.K
| |
Collapse
|
13
|
Hellinger R, Sigurdsson A, Wu W, Romanova EV, Li L, Sweedler JV, Süssmuth RD, Gruber CW. Peptidomics. NATURE REVIEWS. METHODS PRIMERS 2023; 3:25. [PMID: 37250919 PMCID: PMC7614574 DOI: 10.1038/s43586-023-00205-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/09/2023] [Indexed: 05/31/2023]
Abstract
Peptides are biopolymers, typically consisting of 2-50 amino acids. They are biologically produced by the cellular ribosomal machinery or by non-ribosomal enzymes and, sometimes, other dedicated ligases. Peptides are arranged as linear chains or cycles, and include post-translational modifications, unusual amino acids and stabilizing motifs. Their structure and molecular size render them a unique chemical space, between small molecules and larger proteins. Peptides have important physiological functions as intrinsic signalling molecules, such as neuropeptides and peptide hormones, for cellular or interspecies communication, as toxins to catch prey or as defence molecules to fend off enemies and microorganisms. Clinically, they are gaining popularity as biomarkers or innovative therapeutics; to date there are more than 60 peptide drugs approved and more than 150 in clinical development. The emerging field of peptidomics comprises the comprehensive qualitative and quantitative analysis of the suite of peptides in a biological sample (endogenously produced, or exogenously administered as drugs). Peptidomics employs techniques of genomics, modern proteomics, state-of-the-art analytical chemistry and innovative computational biology, with a specialized set of tools. The complex biological matrices and often low abundance of analytes typically examined in peptidomics experiments require optimized sample preparation and isolation, including in silico analysis. This Primer covers the combination of techniques and workflows needed for peptide discovery and characterization and provides an overview of various biological and clinical applications of peptidomics.
Collapse
Affiliation(s)
- Roland Hellinger
- Center for Physiology and Pharmacology, Medical University of Vienna, Vienna, Austria
| | - Arnar Sigurdsson
- Institut für Chemie, Technische Universität Berlin, Berlin, Germany
| | - Wenxin Wu
- School of Pharmacy and Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Elena V Romanova
- Department of Chemistry, University of Illinois, Urbana, IL, USA
| | - Lingjun Li
- School of Pharmacy and Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Christian W Gruber
- Center for Physiology and Pharmacology, Medical University of Vienna, Vienna, Austria
| |
Collapse
|
14
|
Madej D, Lam H. Modeling Lower-Order Statistics to Enable Decoy-Free FDR Estimation in Proteomics. J Proteome Res 2023; 22:1159-1171. [PMID: 36962508 DOI: 10.1021/acs.jproteome.2c00604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2023]
Abstract
One of the chief objectives in mass spectrometry-based peptide identification in proteomics is the statistical validation of top-scoring peptide-spectrum matches (PSMs) in the form of false discovery rate (FDR) estimation. Existing methods construct a null model that captures the characteristics of incorrect target PSMs to estimate the FDR, most often with the help of decoys. Decoy-based methods, however, increase the computational cost and rely on the difficult-to-verify assumption that decoy PSMs constitute a sufficient and representative sample of the population of possible incorrect target PSMs. On the other hand, the possibility of FDR estimation assisted by the plentiful non-top-scoring PSMs, which are almost always incorrect, has been scarcely explored. In this work, we propose a novel decoy-free procedure for developing null models for top-scoring PSMs using the transformed e-value (TEV) score and the distributions of non-top-scoring target PSMs. The method relies on a theoretically derivable relationship between the parameters of the distributions of lower-order statistics of the TEV score and a necessary empirical optimization to fit a single parameter to actual data. The framework was tested on multiple different data sets and two search engines. We present evidence that our method is comparable to and occasionally outperforms popular decoy-free and decoy-based methods in FDR estimation.
Collapse
Affiliation(s)
- Dominik Madej
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong 999077, China
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong 999077, China
| |
Collapse
|
15
|
Joh Y, Lee K, Kim H, Park H. Progressive search in tandem mass spectrometry. BMC Bioinformatics 2023; 24:94. [PMID: 36918816 PMCID: PMC10015927 DOI: 10.1186/s12859-023-05222-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 03/03/2023] [Indexed: 03/16/2023] Open
Abstract
BACKGROUND High-throughput Proteomics has been accelerated by (tandem) mass spectrometry. However, the slow speed of mass spectra analysis prevents the analysis results from being up-to-date. Tandem mass spectrometry database search requires O(|S||D|) time where S is the set of spectra and D is the set of peptides in a database. With usual values of |S| and |D|, database search is quite time consuming. Meanwhile, the database for search is usually updated every month, with 0.5-2% changes. Although the change in the database is usually very small, it may cause extensive changes in the overall analysis results because individual PSM scores such as deltaCn and E-value depend on the entire search results. Therefore, to keep the search results up-to-date, one needs to perform database search from scratch every time the database is updated, which is very inefficient. RESULTS Thus, we present a very efficient method to keep the search results up-to-date where the results are the same as those achieved by the normal search from scratch. This method, called progressive search, runs in O(|S||ΔD|) time on average where ΔD is the difference between the old and the new databases. The experimental results show that the progressive search is up to 53.9 times faster for PSM update only and up to 16.5 times faster for both PSM and E-value update. CONCLUSIONS Progressive search is a novel approach to efficiently obtain analysis results for updated database in tandem mass spectrometry. Compared to performing a normal search from scratch, progressive search achieves the same results much faster. Progressive search is freely available at: https://isa.hanyang.ac.kr/ProgSearch.html .
Collapse
Affiliation(s)
- Yoonsung Joh
- Department of Computer Science, Hanyang University, Seoul, 06978, Republic of Korea
| | - Kangbae Lee
- Department of Computer Science, Hanyang University, Seoul, 06978, Republic of Korea
| | - Hyunwoo Kim
- Biomedical Informatics Team, Korea Institute of Science and Technology Information, Daejeon, 34141, Republic of Korea
| | - Heejin Park
- Department of Computer Science, Hanyang University, Seoul, 06978, Republic of Korea.
| |
Collapse
|
16
|
Kertesz-Farkas A, Nii Adoquaye Acquaye FL, Bhimani K, Eng JK, Fondrie WE, Grant C, Hoopmann MR, Lin A, Lu YY, Moritz RL, MacCoss MJ, Noble WS. The Crux Toolkit for Analysis of Bottom-Up Tandem Mass Spectrometry Proteomics Data. J Proteome Res 2023; 22:561-569. [PMID: 36598107 DOI: 10.1021/acs.jproteome.2c00615] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The Crux tandem mass spectrometry data analysis toolkit provides a collection of algorithms for analyzing bottom-up proteomics tandem mass spectrometry data. Many publications have described various individual components of Crux, but a comprehensive summary has not been published since 2014. The goal of this work is to summarize the functionality of Crux, focusing on developments since 2014. We begin with empirical results demonstrating our recently implemented speedups to the Tide search engine. Other new features include a new score function in Tide, two new confidence estimation procedures, as well as three new tools: Param-medic for estimating search parameters directly from mass spectrometry data, Kojak for searching cross-linked mass spectra, and DIAmeter for searching data independent acquisition data against a sequence database.
Collapse
Affiliation(s)
- Attila Kertesz-Farkas
- Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 20 Myasnitskaya ulitsa, Moscow 101000, Russia
| | - Frank Lawrence Nii Adoquaye Acquaye
- Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 20 Myasnitskaya ulitsa, Moscow 101000, Russia
| | - Kishankumar Bhimani
- Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 20 Myasnitskaya ulitsa, Moscow 101000, Russia
| | - Jimmy K Eng
- Proteomics Resource, University of Washington, 850 Republican Street, Seattle, Washington 98109-4725, United States
| | - William E Fondrie
- Talus Bioscience550 17th Avenue, Seattle, Washington 98122, United States
| | - Charles Grant
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - Michael R Hoopmann
- Insititute for Systems Biology, 401 Terry Avenue N, Seattle, Washington 98109, United States
| | - Andy Lin
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - Yang Y Lu
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - Robert L Moritz
- Insititute for Systems Biology, 401 Terry Avenue N, Seattle, Washington 98109, United States
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States.,Paul G. Allen School of Computer Science and Engineering, University of Washington185 E Stevens Way NE, Seattle, Washington 98195-2350, United States
| |
Collapse
|
17
|
Nii Adoquaye Acquaye FL, Kertesz-Farkas A, Noble WS. Efficient Indexing of Peptides for Database Search Using Tide. J Proteome Res 2023; 22:577-584. [PMID: 36633229 DOI: 10.1021/acs.jproteome.2c00617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The first step in the analysis of protein tandem mass spectrometry data typically involves searching the observed spectra against a protein database. During database search, the search engine must digest the proteins in the database into peptides, subject to digestion rules that are under user control. The choice of these digestion parameters, as well as selection of post-translational modifications (PTMs), can dramatically affect the size of the search space and hence the statistical power of the search. The Tide search engine separates the creation of the peptide index from the database search step, thereby saving time by allowing a peptide index to be reused in multiple searches. Here we describe an improved implementation of the indexing component of Tide that consumes around four times less resources (CPU and RAM) than the previous version and can generate arbitrarily large peptide databases, limited by only the amount of available disk space. We use this improved implementation to explore the relationship between database size and the parameters controlling digestion and PTMs, as well as database size and statistical power. Our results can help guide practitioners in proper selection of these important parameters.
Collapse
Affiliation(s)
- Frank Lawrence Nii Adoquaye Acquaye
- Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, Moscow 109028, Russia
| | - Attila Kertesz-Farkas
- Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, Moscow 109028, Russia
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States.,Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
18
|
The M, Käll L. Integrating Identification and Quantification Uncertainty for Differential Protein Abundance Analysis with Triqler. Methods Mol Biol 2023; 2426:91-117. [PMID: 36308686 DOI: 10.1007/978-1-0716-1967-4_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Protein quantification for shotgun proteomics is a complicated process where errors can be introduced in each of the steps. Triqler is a Python package that estimates and integrates errors of the different parts of the label-free protein quantification pipeline into a single Bayesian model. Specifically, it weighs the quantitative values by the confidence we have in the correctness of the corresponding PSM. Furthermore, it treats missing values in a way that reflects their uncertainty relative to observed values. Finally, it combines these error estimates in a single differential abundance FDR that not only reflects the errors and uncertainties in quantification but also in identification. In this tutorial, we show how to (1) generate input data for Triqler from quantification packages such as MaxQuant and Quandenser, (2) run Triqler and what the different options are, (3) interpret the results, (4) investigate the posterior distributions of a protein of interest in detail, and (5) verify that the hyperparameter estimations are sensible.
Collapse
Affiliation(s)
- Matthew The
- Chair of Proteomics and Bioanalytics, Technische Universität München, Freising, Germany.
| | - Lukas Käll
- Science for Life Laboratory, KTH Royal Institute of Technology, Solna, Sweden
| |
Collapse
|
19
|
Vašíček J, Skiadopoulou D, Kuznetsova KG, Wen B, Johansson S, Njølstad PR, Bruckner S, Käll L, Vaudel M. Finding haplotypic signatures in proteins. Gigascience 2022; 12:giad093. [PMID: 37919975 PMCID: PMC10622322 DOI: 10.1093/gigascience/giad093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 09/24/2023] [Accepted: 10/08/2023] [Indexed: 11/04/2023] Open
Abstract
BACKGROUND The nonrandom distribution of alleles of common genomic variants produces haplotypes, which are fundamental in medical and population genetic studies. Consequently, protein-coding genes with different co-occurring sets of alleles can encode different amino acid sequences: protein haplotypes. These protein haplotypes are present in biological samples and detectable by mass spectrometry, but they are not accounted for in proteomic searches. Consequently, the impact of haplotypic variation on the results of proteomic searches and the discoverability of peptides specific to haplotypes remain unknown. FINDINGS Here, we study how common genetic haplotypes influence the proteomic search space and investigate the possibility to match peptides containing multiple amino acid substitutions to a publicly available data set of mass spectra. We found that for 12.42% of the discoverable amino acid substitutions encoded by common haplotypes, 2 or more substitutions may co-occur in the same peptide after tryptic digestion of the protein haplotypes. We identified 352 spectra that matched to such multivariant peptides, and out of the 4,582 amino acid substitutions identified, 6.37% were covered by multivariant peptides. However, the evaluation of the reliability of these matches remains challenging, suggesting that refined error rate estimation procedures are needed for such complex proteomic searches. CONCLUSIONS As these procedures become available and the ability to analyze protein haplotypes increases, we anticipate that proteomics will provide new information on the consequences of common variation, across tissues and time.
Collapse
Affiliation(s)
- Jakub Vašíček
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
| | - Dafni Skiadopoulou
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
| | - Ksenia G Kuznetsova
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
| | - Bo Wen
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Stefan Johansson
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Department of Medical Genetics, Haukeland University Hospital, Bergen 5021, Norway
| | - Pål R Njølstad
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Children and Youth Clinic, Haukeland University Hospital, Bergen 5021, Norway
| | - Stefan Bruckner
- Chair of Visual Analytics, Institute for Visual and Analytic Computing, University of Rostock, Rostock 18051, Germany
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH–Royal Institute of Technology, Solna 17121, Sweden
| | - Marc Vaudel
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
- Department of Genetics and Bioinformatics, Health Data and Digitalization, Norwegian Institute of Public Health, Oslo 0473, Norway
| |
Collapse
|
20
|
Wang S, Feng S, Pan C, Guo X. FineFDR: Fine-grained Taxonomy-specific False Discovery Rates Control in Metaproteomics. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2022; 2022:287-292. [PMID: 36910011 PMCID: PMC9998077 DOI: 10.1109/bibm55620.2022.9995401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Microbial community proteomics, also termed metaproteomics, investigates all proteins expressed by a microbiota. Tandem mass spectrometry (MS/MS) is the typical method for identifying proteins in metaproteomics, which involves searching the mass spectra against a protein sequence database. A major post-analysis step is controlling the false discovery rate (FDR), i.e., the ratio of false positives to the total number of annotations. The current popular target-decoy FDR estimation method treats all the peptides and proteins equally and overlooks that they could have varied probabilities of being identified. In this study, we report FineFDR, a framework for FDR assessment at fine-grained levels with taxonomy information considered. FineFDR groups the identified peptide-spectrum matches, peptides, and proteins from different taxonomic units and estimates the FDR in each group separately. Empirical experiments on the simulated and real-world data sets demonstrate that our FineFDR achieved higher precision and more peptide and protein identifications when compared to the state-of-the-art methods, such as Comet, Percolator, TIDD, and Tailor. FineFDR is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/FDR.
Collapse
Affiliation(s)
- Shengze Wang
- Department of Computer Science and Engineering University of North Texas, Denton, TX 76207, United States
| | - Shichao Feng
- Department of Computer Science and Engineering University of North Texas, Denton, TX 76207, United States
| | - Chongle Pan
- School of Computer Science Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK 73019, United States
| | - Xuan Guo
- Department of Computer Science and Engineering University of North Texas, Denton, TX 76207, United States
| |
Collapse
|
21
|
Lin A, Short T, Noble WS, Keich U. Improving Peptide-Level Mass Spectrometry Analysis via Double Competition. J Proteome Res 2022; 21:2412-2420. [PMID: 36166314 PMCID: PMC10108709 DOI: 10.1021/acs.jproteome.2c00282] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The analysis of shotgun proteomics data often involves generating lists of inferred peptide-spectrum matches (PSMs) and/or of peptides. The canonical approach for generating these discovery lists is by controlling the false discovery rate (FDR), most commonly through target-decoy competition (TDC). At the PSM level, TDC is implemented by competing each spectrum's best-scoring target (real) peptide match with its best match against a decoy database. This PSM-level procedure can be adapted to the peptide level by selecting the top-scoring PSM per peptide prior to FDR estimation. Here, we first highlight and empirically augment a little known previous work by He et al., which showed that TDC-based PSM-level FDR estimates can be liberally biased. We thus propose that researchers instead focus on peptide-level analysis. We then investigate three ways to carry out peptide-level TDC and show that the most common method ("PSM-only") offers the lowest statistical power in practice. An alternative approach that carries out a double competition, first at the PSM and then at the peptide level ("PSM-and-peptide"), is the most powerful method, yielding an average increase of 17% more discovered peptides at 1% FDR threshold relative to the PSM-only method.
Collapse
Affiliation(s)
- Andy Lin
- Chemical and Biological Signatures, Pacific Northwest National Laboratory, Seattle, Washington 98109, United States
| | - Temana Short
- School of Mathematics & Statistics, University of Sydney, New South Wales, 2006, Australia
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Uri Keich
- School of Mathematics & Statistics, University of Sydney, New South Wales, 2006, Australia
| |
Collapse
|
22
|
Heil LR, Fondrie WE, McGann CD, Federation AJ, Noble WS, MacCoss MJ, Keich U. Building Spectral Libraries from Narrow-Window Data-Independent Acquisition Mass Spectrometry Data. J Proteome Res 2022; 21:1382-1391. [PMID: 35549345 DOI: 10.1021/acs.jproteome.1c00895] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Advances in library-based methods for peptide detection from data-independent acquisition (DIA) mass spectrometry have made it possible to detect and quantify tens of thousands of peptides in a single mass spectrometry run. However, many of these methods rely on a comprehensive, high-quality spectral library containing information about the expected retention time and fragmentation patterns of peptides in the sample. Empirical spectral libraries are often generated through data-dependent acquisition and may suffer from biases as a result. Spectral libraries can be generated in silico, but these models are not trained to handle all possible post-translational modifications. Here, we propose a false discovery rate-controlled spectrum-centric search workflow to generate spectral libraries directly from gas-phase fractionated DIA tandem mass spectrometry data. We demonstrate that this strategy is able to detect phosphorylated peptides and can be used to generate a spectral library for accurate peptide detection and quantitation in wide-window DIA data. We compare the results of this search workflow to other library-free approaches and demonstrate that our search is competitive in terms of accuracy and sensitivity. These results demonstrate that the proposed workflow has the capacity to generate spectral libraries while avoiding the limitations of other methods.
Collapse
Affiliation(s)
- Lilian R Heil
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, United States
| | - William E Fondrie
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, United States
| | - Christopher D McGann
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, United States
| | - Alexander J Federation
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, United States.,Paul G. Allen School for Computer Science and Engineering, University of Washington, Seattle, Washington 98105, United States
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, United States
| | - Uri Keich
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
23
|
Zananiri R, Mangapuram Venkata S, Gaydar V, Yahalom D, Malik O, Rudnizky S, Kleifeld O, Kaplan A, Henn A. Auxiliary ATP binding sites support DNA unwinding by RecBCD. Nat Commun 2022; 13:1806. [PMID: 35379800 PMCID: PMC8980037 DOI: 10.1038/s41467-022-29387-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Accepted: 03/13/2022] [Indexed: 12/22/2022] Open
Abstract
The RecBCD helicase initiates double-stranded break repair in bacteria by processively unwinding DNA with a rate approaching ∼1,600 bp·s−1, but the mechanism enabling such a fast rate is unknown. Employing a wide range of methodologies — including equilibrium and time-resolved binding experiments, ensemble and single-molecule unwinding assays, and crosslinking followed by mass spectrometry — we reveal the existence of auxiliary binding sites in the RecC subunit, where ATP binds with lower affinity and distinct chemical interactions as compared to the known catalytic sites. The essentiality and functionality of these sites are demonstrated by their impact on the survival of E.coli after exposure to damage-inducing radiation. We propose a model by which RecBCD achieves its optimized unwinding rate, even when ATP is scarce, by using the auxiliary binding sites to increase the flux of ATP to its catalytic sites. RecBCD is a remarkably fast DNA helicase. Using a battery of biophysical methods, Zananiri et. al reveal additional, non-catalytic ATP binding sites that increase the ATP flux to the catalytic sites that allows fast unwinding when ATP is scarce.
Collapse
|
24
|
Saeed F, Haseeb M, Iyengar SS. Communication Lower-Bounds for Distributed-Memory Computations for Mass Spectrometry based Omics Data. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 2022; 161:37-47. [PMID: 34898836 PMCID: PMC8658624 DOI: 10.1016/j.jpdc.2021.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Mass spectrometry (MS) based omics data analysis require significant time and resources. To date, few parallel algorithms have been proposed for deducing peptides from mass spectrometry-based data. However, these parallel algorithms were designed, and developed when the amount of data that needed to be processed was smaller in scale. In this paper, we prove that the communication bound that is reached by the existing parallel algorithms is Ω ( m n + 2 r q p ) , where m and n are the dimensions of the theoretical database matrix, q and r are dimensions of spectra, and p is the number of processors. We further prove that communication-optimal strategy with fast-memory M = m n + 2 q r p can achieve Ω ( 2 m n q p ) but is not achieved by any existing parallel proteomics algorithms till date. To validate our claim, we performed a meta-analysis of published parallel algorithms, and their performance results. We show that sub-optimal speedups with increasing number of processors is a direct consequence of not achieving the communication lower-bounds. We further validate our claim by performing experiments which demonstrate the communication bounds that are proved in this paper. Consequently, we assert that next-generation of provable, and demonstrated superior parallel algorithms are urgently needed for MS based large systems-biology studies especially for meta-proteomics, proteogenomic, microbiome, and proteomics for non-model organisms. Our hope is that this paper will excite the parallel computing community to further investigate parallel algorithms for highly influential MS based omics problems.
Collapse
|
25
|
Chaiyadet S, Sotillo J, Krueajampa W, Thongsen S, Smout M, Brindley PJ, Laha T, Loukas A. Silencing of Opisthorchis viverrini Tetraspanin Gene Expression Results in Reduced Secretion of Extracellular Vesicles. Front Cell Infect Microbiol 2022; 12:827521. [PMID: 35223551 PMCID: PMC8875506 DOI: 10.3389/fcimb.2022.827521] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Accepted: 01/19/2022] [Indexed: 12/12/2022] Open
Abstract
Inter-phylum transfer of molecular information is exquisitely exemplified in the uptake of parasite extracellular vesicles (EVs) by their target mammalian host tissues. The oriental liver fluke, Opisthorchis viverrini is the major cause of bile duct cancer in people in Southeast Asia. A major mechanism by which O. viverrini promotes cancer is through the secretion of excretory/secretory products which contain extracellular vesicles (OvEVs). OvEVs contain microRNAs that are predicted to impact various mammalian cell proliferation pathways, and are internalized by cholangiocytes that line the bile ducts. Upon uptake, OvEVs drive relentless proliferation of cholangiocytes and promote a tumorigenic environment, but the underlying mechanisms of this process are unknown. Moreover, purification and characterization methods for helminth EVs in general are ill defined. We therefore compared different purification methods for OvEVs and characterized the sub-vesicular compartment proteomes. Two CD63-like tetraspanins (Ov-TSP-2 and TSP-3) are abundant on the surface of OvEVs, and could serve as biomarkers for these parasite vesicles. Anti-TSP-2 and -TSP-3 IgG, as well as different endocytosis pathway inhibitors significantly reduced OvEV uptake and subsequent proliferation of cholangiocytes in vitro. Silencing of Ov-tsp-2 and tsp-3 gene expression in adult flukes using RNA interference resulted in substantial reductions in OvEV secretion, and those vesicles that were secreted were deficient in their respective TSP proteins. Our findings shed light on the importance of tetraspanins in fluke EV biogenesis and/or stability, and provide a conceivable mechanism for the efficacy of anti-tetraspanin subunit vaccines against a range of parasitic helminth infections.
Collapse
Affiliation(s)
- Sujittra Chaiyadet
- Tropical Medicine Graduate Program, Academic Affairs, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
| | - Javier Sotillo
- Parasitology Reference and Research Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III, Majadahonda, Madrid, Spain
| | - Watchara Krueajampa
- Department of Parasitology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
| | - Sophita Thongsen
- Department of Parasitology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
| | - Michael Smout
- Centre for Molecular Therapeutics, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Paul J. Brindley
- Department of Microbiology, Immunology and Tropical Medicine, and Research Center for Neglected Diseases of Poverty, George Washington University, Washington, DC, United States
| | - Thewarach Laha
- Department of Parasitology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
- *Correspondence: Alex Loukas, ; Thewarach Laha,
| | - Alex Loukas
- Centre for Molecular Therapeutics, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
- *Correspondence: Alex Loukas, ; Thewarach Laha,
| |
Collapse
|
26
|
Madej D, Wu L, Lam H. Common Decoy Distributions Simplify False Discovery Rate Estimation in Shotgun Proteomics. J Proteome Res 2022; 21:339-348. [PMID: 34989576 DOI: 10.1021/acs.jproteome.1c00600] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
In shotgun proteomics, false discovery rate (FDR) estimation is a necessary step to ensure the quality of accepted peptide-spectrum matches (PSMs) from a database search. Popular statistical validation tools for FDR control tend to rely on target-decoy searching to build empirical, dataset-specific models, which often leads to inaccurate FDR estimates. In this paper, we propose a new approach named common decoy distribution (CDD) to FDR estimation using the idea of a fixed empirical null score distribution derived from millions of peptide tandem mass spectra. To demonstrate the viability of CDD, its stability with respect to noise and the presence of unexpected peptide modifications was evaluated. PeptideProphet-based implementation of CDD was benchmarked against decoy-based PeptideProphet, and both methods exhibited similar accuracy of FDR estimates and retrieval of correct PSMs. The finding of this study calls for a re-evaluation of the necessity of dataset-specific target-decoy searches and illustrates the potential of Big Data approaches for statistical analysis in proteomics.
Collapse
Affiliation(s)
- Dominik Madej
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon 999077, Hong Kong, China
| | - Long Wu
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon 999077, Hong Kong, China
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon 999077, Hong Kong, China
| |
Collapse
|
27
|
Trahan C, Oeffinger M. Single-Step Affinity Purification (ssAP) and Mass Spectrometry of Macromolecular Complexes in the Yeast S. cerevisiae. Methods Mol Biol 2022; 2477:195-223. [PMID: 35524119 DOI: 10.1007/978-1-0716-2257-5_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Cellular functions are mostly defined by the dynamic interactions of proteins within macromolecular networks. Deciphering the composition of macromolecular complexes and their dynamic rearrangements is the key to get a comprehensive picture of cellular behavior and to understand biological systems. In the past two decades, affinity purification coupled to mass spectrometry has become a powerful tool to comprehensively study interaction networks and their assemblies. To overcome initial limitations of the approach, in particular, the effect of protein and RNA degradation, loss of transient interactors, and poor overall yield of intact complexes from cell lysates, various modifications to affinity purification protocols have been devised over the years. In this chapter, we describe a rapid single-step affinity purification method for the efficient isolation of dynamic macromolecular complexes. The technique employs cell lysis by cryo-milling, which ensures nondegraded starting material in the submicron range, and magnetic beads, which allow for dense antibody-conjugation and thus rapid complex isolation, while avoiding loss of transient interactions. The method is epitope tag-independent, and overcomes many of the previous limitations to produce large interactomes with almost no contamination. The protocol as described here has been optimized for the yeast S. cerevisiae.
Collapse
Affiliation(s)
- Christian Trahan
- RNP Biochemistry Laboratory, Center for Genetic and Neurological Diseases, Institut de recherches cliniques de Montréal, Montréal, QC, Canada
| | - Marlene Oeffinger
- RNP Biochemistry Laboratory, Center for Genetic and Neurological Diseases, Institut de recherches cliniques de Montréal, Montréal, QC, Canada.
- Département de biochimie et médicine moléculaire, Faculté de médecine, Université de Montréal, Montréal, QC, Canada.
- Division of Experimental Medicine, Faculty of Medicine, McGill University, Montréal, QC, Canada.
| |
Collapse
|
28
|
Symonds P, Marcu A, Cook KW, Metheringham RL, Durrant LG, Brentville VA. Citrullinated Epitopes Identified on Tumour MHC Class II by Peptide Elution Stimulate Both Regulatory and Th1 Responses and Require Careful Selection for Optimal Anti-Tumour Responses. Front Immunol 2021; 12:764462. [PMID: 34858415 PMCID: PMC8630742 DOI: 10.3389/fimmu.2021.764462] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 10/22/2021] [Indexed: 11/13/2022] Open
Abstract
Background Somatic mutations or post-translational modifications of proteins result in changes that enable immune recognition. One such post-translational modification is citrullination, the conversion of arginine residues to citrulline. Citrullinated peptides are presented on MHC class II (MHCII) via autophagy which is upregulated by cellular stresses such as tumourigenesis. Methods Peptides were eluted from B16 melanoma expressing HLA-DP4 and analysed by mass spectrometry to profile the presented citrullinated repertoire. Initially, seven of the identified citrullinated peptides were used in combination to vaccinate HLA-DP4 transgenic mice. Immune responses were characterised from the combination and individual vaccines by ex vivo cytokine ELISpot assay and assessed for tumour therapy. Results The combination vaccine induced only weak anti-tumour therapy in the B16cDP4 melanoma model. Immune phenotyping revealed a dominant IFNγ response to citrullinated matrix metalloproteinase-21 peptide (citMMP21) and an IL-10 response to cytochrome p450 peptide (citCp450). Exclusion of the IL-10 inducing citCp450 peptide from the combined vaccine failed to recover a strong anti-tumour response. Single peptide immunisation confirmed the IFNγ response from citMMP21 and the IL-10 response from citCp450 but also showed that citrullinated Glutamate receptor ionotropic (citGRI) peptide stimulated a low avidity IFNγ response. Interestingly, both citMMP21 and citGRI peptides individually, stimulated strong anti-tumour responses that were significantly better than the combined vaccine. In line with the citGRI T cell avidity, it required high dose immunisation to induce an anti-tumour response. This suggests that as the peptides within the combined vaccine had similar binding affinities to MHC-II the combination vaccine may have resulted in lower presentation of each epitope and weak anti-tumour immunity. Conclusion We demonstrate that tumours present citrullinated peptides that can stimulate Th1 and regulatory responses and that competition likely exists between similar affinity peptides. Characterisation of responses from epitopes identified by peptide elution are necessary to optimise selection for tumour therapy.
Collapse
Affiliation(s)
- Peter Symonds
- Scancell Limited, Biodiscovery Institute, University of Nottingham, Nottingham, United Kingdom
| | - Ana Marcu
- Department of Immunology, Interfaculty Institute for Cell Biology, University of Tübingen, Tübingen, Germany.,Cluster of Excellence iFIT (EXC 2180) "Image-Guided and Functionally Instructed Tumour Therapies", University of Tübingen, Tübingen, Germany
| | - Katherine W Cook
- Scancell Limited, Biodiscovery Institute, University of Nottingham, Nottingham, United Kingdom
| | - Rachael L Metheringham
- Scancell Limited, Biodiscovery Institute, University of Nottingham, Nottingham, United Kingdom
| | - Lindy G Durrant
- Scancell Limited, Biodiscovery Institute, University of Nottingham, Nottingham, United Kingdom.,Biodiscovery Institute, Division of Cancer and Stem Cells, University of Nottingham, Nottingham, United Kingdom
| | - Victoria A Brentville
- Scancell Limited, Biodiscovery Institute, University of Nottingham, Nottingham, United Kingdom
| |
Collapse
|
29
|
Farag YM, Horro C, Vaudel M, Barsnes H. PeptideShaker Online: A User-Friendly Web-Based Framework for the Identification of Mass Spectrometry-Based Proteomics Data. J Proteome Res 2021; 20:5419-5423. [PMID: 34709836 PMCID: PMC8650087 DOI: 10.1021/acs.jproteome.1c00678] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Mass spectrometry-based proteomics is a high-throughput technology generating ever-larger amounts of data per project. However, storing, processing, and interpreting these data can be a challenge. A key element in simplifying this process is the development of interactive frameworks focusing on visualization that can greatly simplify both the interpretation of data and the generation of new knowledge. Here we present PeptideShaker Online, a user-friendly web-based framework for the identification of mass spectrometry-based proteomics data, from raw file conversion to interactive visualization of the resulting data. Storage and processing of the data are performed via the versatile Galaxy platform (through SearchGUI, PeptideShaker, and moFF), while the interaction with the results happens via a locally installed web server, thus enabling researchers to process and interpret their own data without requiring advanced bioinformatics skills or direct access to compute-intensive infrastructures. The source code, additional documentation, and a fully functional demo is available at https://github.com/barsnes-group/peptide-shaker-online.
Collapse
Affiliation(s)
- Yehia Mokhtar Farag
- Proteomics Unit, Department of Biomedicine, University of Bergen, 5020 Bergen, Norway.,Computational Biology Unit, Department of Informatics, University of Bergen, 5008 Bergen, Norway
| | - Carlos Horro
- Proteomics Unit, Department of Biomedicine, University of Bergen, 5020 Bergen, Norway.,Computational Biology Unit, Department of Informatics, University of Bergen, 5008 Bergen, Norway
| | - Marc Vaudel
- Department of Clinical Sciences, University of Bergen, 5020 Bergen, Norway
| | - Harald Barsnes
- Proteomics Unit, Department of Biomedicine, University of Bergen, 5020 Bergen, Norway.,Computational Biology Unit, Department of Informatics, University of Bergen, 5008 Bergen, Norway
| |
Collapse
|
30
|
Tariq MU, Saeed F. SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions. PLoS One 2021; 16:e0259349. [PMID: 34714871 PMCID: PMC8555789 DOI: 10.1371/journal.pone.0259349] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 10/18/2021] [Indexed: 11/19/2022] Open
Abstract
Historically, the database search algorithms have been the de facto standard for inferring peptides from mass spectrometry (MS) data. Database search algorithms deduce peptides by transforming theoretical peptides into theoretical spectra and matching them to the experimental spectra. Heuristic similarity-scoring functions are used to match an experimental spectrum to a theoretical spectrum. However, the heuristic nature of the scoring functions and the simple transformation of the peptides into theoretical spectra, along with noisy mass spectra for the less abundant peptides, can introduce a cascade of inaccuracies. In this paper, we design and implement a Deep Cross-Modal Similarity Network called SpeCollate, which overcomes these inaccuracies by learning the similarity function between experimental spectra and peptides directly from the labeled MS data. SpeCollate transforms spectra and peptides into a shared Euclidean subspace by learning fixed size embeddings for both. Our proposed deep-learning network trains on sextuplets of positive and negative examples coupled with our custom-designed SNAP-loss function. Online hardest negative mining is used to select the appropriate negative examples for optimal training performance. We use 4.8 million sextuplets obtained from the NIST and MassIVE peptide libraries to train the network and demonstrate that for closed search, SpeCollate is able to perform better than Crux and MSFragger in terms of the number of peptide-spectrum matches (PSMs) and unique peptides identified under 1% FDR for real-world data. SpeCollate also identifies a large number of peptides not reported by either Crux or MSFragger. To the best of our knowledge, our proposed SpeCollate is the first deep-learning network that can determine the cross-modal similarity between peptides and mass-spectra for MS-based proteomics. We believe SpeCollate is significant progress towards developing machine-learning solutions for MS-based omics data analysis. SpeCollate is available at https://deepspecs.github.io/.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing & Information Sciences, Florida International University, Miami, FL, United States of America
| | - Fahad Saeed
- School of Computing & Information Sciences, Florida International University, Miami, FL, United States of America
| |
Collapse
|
31
|
Kudriavtseva P, Kashkinov M, Kertész-Farkas A. Deep Convolutional Neural Networks Help Scoring Tandem Mass Spectrometry Data in Database-Searching Approaches. J Proteome Res 2021; 20:4708-4717. [PMID: 34449232 DOI: 10.1021/acs.jproteome.1c00315] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Spectrum annotation is a challenging task due to the presence of unexpected peptide fragmentation ions as well as the inaccuracy of the detectors of the spectrometers. We present a deep convolutional neural network, called Slider, which learns an optimal feature extraction in its kernels for scoring mass spectrometry (MS)/MS spectra to increase the number of spectrum annotations with high confidence. Experimental results using publicly available data sets show that Slider can annotate slightly more spectra than the state-of-the-art methods (BoltzMatch, Res-EV, Prosit), albeit 2-10 times faster. More interestingly, Slider provides only 2-4% fewer spectrum annotations with low-resolution fragmentation information than other methods with high-resolution information. This means that Slider can exploit nearly as much information from the context of low-resolution spectrum peaks as the high-resolution fragmentation information can provide for other scoring methods. Thus, Slider can be an optimal choice for practitioners using old spectrometers with low-resolution detectors.
Collapse
Affiliation(s)
- Polina Kudriavtseva
- Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 11 Pokrovsky Bvld., Moscow 109028, Russian Federation
| | - Matvey Kashkinov
- Faculty of Computer Science, HSE University, 11 Pokrovsky Bvld., Moscow 109028, Russian Federation
| | - Attila Kertész-Farkas
- Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 11 Pokrovsky Bvld., Moscow 109028, Russian Federation
| |
Collapse
|
32
|
Haseeb M, Saeed F. High Performance Computing Framework for Tera-Scale Database Search of Mass Spectrometry Data. NATURE COMPUTATIONAL SCIENCE 2021; 1:550-561. [PMID: 34723198 PMCID: PMC8554525 DOI: 10.1038/s43588-021-00113-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 07/16/2021] [Indexed: 05/09/2023]
Abstract
Database peptide search algorithms deduce peptides from mass spectrometry (MS) data. There has been substantial effort in improving their computational efficiency to achieve larger and more complex systems biology studies. However, modern serial and high-performance computing (HPC) algorithms exhibit sub-optimal performance mainly due to their ineffective parallel designs (low resource utilization), and high overhead costs. We present an HPC framework, called HiCOPS, for efficient acceleration of the database peptide search algorithms on distributed-memory supercomputers. HiCOPS provides, on average, more than 10-fold improvement in speed, and superior parallel performance over several existing HPC database search software. We also formulate a mathematical model for performance analysis and optimization, and report near-optimal results for several key metrics including strong-scale efficiency, hardware utilization, load-balance, inter-process communication and I/O overheads. The core parallel design, techniques, and optimizations presented in HiCOPS are search-algorithm independent and can be extended to efficiently accelerate the existing and future algorithms and software.
Collapse
Affiliation(s)
- Muhammad Haseeb
- Knight Foundation School of Computing and Information
Sciences, Florida International University, Miami, FL, USA
| | - Fahad Saeed
- Knight Foundation School of Computing and Information
Sciences, Florida International University, Miami, FL, USA
- Biomolecular Sciences Institute (BSI), Florida
International University, Miami, FL, USA
- Department of Human and Molecular Genetics, Herbert
Wertheim School of Medicine, Florida International University, Miami, FL, USA
| |
Collapse
|
33
|
Lu YY, Bilmes J, Rodriguez-Mias RA, Villén J, Noble WS. DIAmeter: matching peptides to data-independent acquisition mass spectrometry data. Bioinformatics 2021; 37:i434-i442. [PMID: 34252924 PMCID: PMC8686675 DOI: 10.1093/bioinformatics/btab284] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Tandem mass spectrometry data acquired using data independent acquisition (DIA) is challenging to interpret because the data exhibits complex structure along both the mass-to-charge (m/z) and time axes. The most common approach to analyzing this type of data makes use of a library of previously observed DIA data patterns (a 'spectral library'), but this approach is expensive because the libraries do not typically generalize well across laboratories. RESULTS Here, we propose DIAmeter, a search engine that detects peptides in DIA data using only a peptide sequence database. Although some existing library-free DIA analysis methods (i) support data generated using both wide and narrow isolation windows, (ii) detect peptides containing post-translational modifications, (iii) analyze data from a variety of instrument platforms and (iv) are capable of detecting peptides even in the absence of detectable signal in the survey (MS1) scan, DIAmeter is the only method that offers all four capabilities in a single tool. AVAILABILITY AND IMPLEMENTATION The open source, Apache licensed source code is available as part of the Crux mass spectrometry analysis toolkit (http://crux.ms). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Young Lu
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Jeff Bilmes
- Department of Electrical Engineering, University of Washington, Seattle, WA 98195, USA.,Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | | | - Judit Villén
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.,Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
34
|
Smythers AL, Hicks LM. Mapping the plant proteome: tools for surveying coordinating pathways. Emerg Top Life Sci 2021; 5:203-220. [PMID: 33620075 PMCID: PMC8166341 DOI: 10.1042/etls20200270] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 02/07/2021] [Accepted: 02/09/2021] [Indexed: 12/14/2022]
Abstract
Plants rapidly respond to environmental fluctuations through coordinated, multi-scalar regulation, enabling complex reactions despite their inherently sessile nature. In particular, protein post-translational signaling and protein-protein interactions combine to manipulate cellular responses and regulate plant homeostasis with precise temporal and spatial control. Understanding these proteomic networks are essential to addressing ongoing global crises, including those of food security, rising global temperatures, and the need for renewable materials and fuels. Technological advances in mass spectrometry-based proteomics are enabling investigations of unprecedented depth, and are increasingly being optimized for and applied to plant systems. This review highlights recent advances in plant proteomics, with an emphasis on spatially and temporally resolved analysis of post-translational modifications and protein interactions. It also details the necessity for generation of a comprehensive plant cell atlas while highlighting recent accomplishments within the field.
Collapse
Affiliation(s)
- Amanda L Smythers
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, U.S.A
| | - Leslie M Hicks
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, U.S.A
| |
Collapse
|
35
|
Poudel S, Cope AL, O'Dell KB, Guss AM, Seo H, Trinh CT, Hettich RL. Identification and characterization of proteins of unknown function (PUFs) in Clostridium thermocellum DSM 1313 strains as potential genetic engineering targets. BIOTECHNOLOGY FOR BIOFUELS 2021; 14:116. [PMID: 33971924 PMCID: PMC8112048 DOI: 10.1186/s13068-021-01964-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 04/26/2021] [Indexed: 05/13/2023]
Abstract
BACKGROUND Mass spectrometry-based proteomics can identify and quantify thousands of proteins from individual microbial species, but a significant percentage of these proteins are unannotated and hence classified as proteins of unknown function (PUFs). Due to the difficulty in extracting meaningful metabolic information, PUFs are often overlooked or discarded during data analysis, even though they might be critically important in functional activities, in particular for metabolic engineering research. RESULTS We optimized and employed a pipeline integrating various "guilt-by-association" (GBA) metrics, including differential expression and co-expression analyses of high-throughput mass spectrometry proteome data and phylogenetic coevolution analysis, and sequence homology-based approaches to determine putative functions for PUFs in Clostridium thermocellum. Our various analyses provided putative functional information for over 95% of the PUFs detected by mass spectrometry in a wild-type and/or an engineered strain of C. thermocellum. In particular, we validated a predicted acyltransferase PUF (WP_003519433.1) with functional activity towards 2-phenylethyl alcohol, consistent with our GBA and sequence homology-based predictions. CONCLUSIONS This work demonstrates the value of leveraging sequence homology-based annotations with empirical evidence based on the concept of GBA to broadly predict putative functions for PUFs, opening avenues to further interrogation via targeted experiments.
Collapse
Affiliation(s)
- Suresh Poudel
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA
- The Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA
| | - Alexander L Cope
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- The Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA
| | - Kaela B O'Dell
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA
- The Bredesen Center, University of Tennessee, Knoxville, TN, USA
| | - Adam M Guss
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- The Bredesen Center, University of Tennessee, Knoxville, TN, USA
| | - Hyeongmin Seo
- The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA
- Department of Chemical and Biomolecular Engineering, University of Tennessee, Knoxville, TN, USA
| | - Cong T Trinh
- The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA
- The Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA
- The Bredesen Center, University of Tennessee, Knoxville, TN, USA
- Department of Chemical and Biomolecular Engineering, University of Tennessee, Knoxville, TN, USA
| | - Robert L Hettich
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.
| |
Collapse
|
36
|
Abstract
Proteomics studies rely on the accurate assignment of peptides to the acquired tandem mass spectra-a task where machine learning algorithms have proven invaluable. We describe mokapot, which provides a flexible semisupervised learning algorithm that allows for highly customized analyses. We demonstrate some of the unique features of mokapot by improving the detection of RNA-cross-linked peptides from an analysis of RNA-binding proteins and increasing the consistency of peptide detection in a single-cell proteomics study.
Collapse
Affiliation(s)
- William
E. Fondrie
- Department
of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - William S. Noble
- Department
of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul
G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
37
|
Benchmarking mass spectrometry based proteomics algorithms using a simulated database. ACTA ACUST UNITED AC 2021; 10. [PMID: 34012763 DOI: 10.1007/s13721-021-00298-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Protein sequencing algorithms process data from a variety of instruments that has been generated under diverse experimental conditions. Currently there is no way to predict the accuracy of an algorithm for a given data set. Most of the published algorithms and associated software has been evaluated on limited number of experimental data sets. However, these performance evaluations do not cover the complete search space the algorithmand the software might encounter in real-world. To this end, we present a database of simulated spectra that can be used to benchmark any spectra to peptide search engine. We demonstrate the usability of this database by bench marking two popular peptide sequencing engines. We show wide variation in the accuracy of peptide deductions and a complete quality profile of a given algorithm can be useful for practitioners and algorithm developers. All benchmarking data is available at https://users.cs.fiu.edu/~fsaeed/Benchmark.html.
Collapse
|
38
|
Notonier S, Werner AZ, Kuatsjah E, Dumalo L, Abraham PE, Hatmaker EA, Hoyt CB, Amore A, Ramirez KJ, Woodworth SP, Klingeman DM, Giannone RJ, Guss AM, Hettich RL, Eltis LD, Johnson CW, Beckham GT. Metabolism of syringyl lignin-derived compounds in Pseudomonas putida enables convergent production of 2-pyrone-4,6-dicarboxylic acid. Metab Eng 2021; 65:111-122. [PMID: 33741529 DOI: 10.1016/j.ymben.2021.02.005] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Revised: 02/14/2021] [Accepted: 02/22/2021] [Indexed: 12/15/2022]
Abstract
Valorization of lignin, an abundant component of plant cell walls, is critical to enabling the lignocellulosic bioeconomy. Biological funneling using microbial biocatalysts has emerged as an attractive approach to convert complex mixtures of lignin depolymerization products to value-added compounds. Ideally, biocatalysts would convert aromatic compounds derived from the three canonical types of lignin: syringyl (S), guaiacyl (G), and p-hydroxyphenyl (H). Pseudomonas putida KT2440 (hereafter KT2440) has been developed as a biocatalyst owing in part to its native catabolic capabilities but is not known to catabolize S-type lignin-derived compounds. Here, we demonstrate that syringate, a common S-type lignin-derived compound, is utilized by KT2440 only in the presence of another energy source or when vanAB was overexpressed, as syringate was found to be O-demethylated to gallate by VanAB, a two-component monooxygenase, and further catabolized via extradiol cleavage. Unexpectedly, the specificity (kcat/KM) of VanAB for syringate was within 25% that for vanillate and O-demethylation of both substrates was well-coupled to O2 consumption. However, the native KT2440 gallate-cleaving dioxygenase, GalA, was potently inactivated by 3-O-methylgallate. To engineer a biocatalyst to simultaneously convert S-, G-, and H-type monomers, we therefore employed VanAB from Pseudomonas sp. HR199, which has lower activity for 3MGA, and LigAB, an extradiol dioxygenase able to cleave protocatechuate and 3-O-methylgallate. This strain converted 93% of a mixture of lignin monomers to 2-pyrone-4,6-dicarboxylate, a promising bio-based chemical. Overall, this study elucidates a native pathway in KT2440 for catabolizing S-type lignin-derived compounds and demonstrates the potential of this robust chassis for lignin valorization.
Collapse
Affiliation(s)
- Sandra Notonier
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA; Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
| | - Allison Z Werner
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA; Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
| | - Eugene Kuatsjah
- Department of Microbiology and Immunology, BioProducts Institute, and the Life Sciences Institute, The University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - Linda Dumalo
- Department of Microbiology and Immunology, BioProducts Institute, and the Life Sciences Institute, The University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - Paul E Abraham
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA; Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge, TN, 37830, USA
| | - E Anne Hatmaker
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA; Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge, TN, 37830, USA
| | - Caroline B Hoyt
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA
| | - Antonella Amore
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA
| | - Kelsey J Ramirez
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA
| | - Sean P Woodworth
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA
| | - Dawn M Klingeman
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA; Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge, TN, 37830, USA
| | - Richard J Giannone
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA; Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge, TN, 37830, USA
| | - Adam M Guss
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA; Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge, TN, 37830, USA
| | - Robert L Hettich
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA; Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge, TN, 37830, USA
| | - Lindsay D Eltis
- Department of Microbiology and Immunology, BioProducts Institute, and the Life Sciences Institute, The University of British Columbia, Vancouver, BC V6T 1Z3, Canada.
| | - Christopher W Johnson
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA.
| | - Gregg T Beckham
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, 80401, USA; Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA.
| |
Collapse
|
39
|
Ivanov MV, Bubis JA, Gorshkov V, Abdrakhimov DA, Kjeldsen F, Gorshkov MV. Boosting MS1-only Proteomics with Machine Learning Allows 2000 Protein Identifications in Single-Shot Human Proteome Analysis Using 5 min HPLC Gradient. J Proteome Res 2021; 20:1864-1873. [PMID: 33720732 DOI: 10.1021/acs.jproteome.0c00863] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Proteome-wide analyses rely on tandem mass spectrometry and the extensive separation of proteolytic mixtures. This imposes considerable instrumental time consumption, which is one of the main obstacles in the broader acceptance of proteomics in biomedical and clinical research. Recently, we presented a fast proteomic method termed DirectMS1 based on ultrashort LC gradients as well as MS1-only mass spectra acquisition and data processing. The method allows significant reduction of the proteome-wide analysis time to a few minutes at the depth of quantitative proteome coverage of 1000 proteins at 1% false discovery rate (FDR). In this work, to further increase the capabilities of the DirectMS1 method, we explored the opportunities presented by the recent progress in the machine-learning area and applied the LightGBM decision tree boosting algorithm to the scoring of peptide feature matches when processing MS1 spectra. Furthermore, we integrated the peptide feature identification algorithm of DirectMS1 with the recently introduced peptide retention time prediction utility, DeepLC. Additional approaches to improve the performance of the DirectMS1 method are discussed and demonstrated, such as using FAIMS for gas-phase ion separation. As a result of all improvements to DirectMS1, we succeeded in identifying more than 2000 proteins at 1% FDR from the HeLa cell line in a 5 min gradient LC-FAIMS/MS1 analysis. The data sets generated and analyzed during the current study have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the data set identifier PXD023977.
Collapse
Affiliation(s)
- Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
| | - Julia A Bubis
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
| | - Vladimir Gorshkov
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark
| | - Daniil A Abdrakhimov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia.,Moscow Institute of Physics and Technology, Institutsky lane 9, Dolgoprudny, Moscow Region 141700, Russia
| | - Frank Kjeldsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
| |
Collapse
|
40
|
Manda SS, Noor Z, Hains PG, Zhong Q. PIONEER: Pipeline for Generating High-Quality Spectral Libraries for DIA-MS Data. Curr Protoc 2021; 1:e69. [PMID: 33656278 DOI: 10.1002/cpz1.69] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Data-independent-acquisition mass spectrometry (DIA-MS) is a state-of-the-art proteomic technique for high-throughput identification and quantification of peptides and proteins. Interpretation of DIA-MS data relies on the use of a spectral library, which is optimally created from data acquired from the same samples in data-dependent acquisition (DDA) mode. As DIA-MS quantification relies on the spectral libraries, having a high-quality, non-redundant, and comprehensive spectral library is essential. This article describes the major steps for creating a high-quality spectral library using a combination of multiple complementary search engines. We discuss appropriate strategies to control the false discovery rate for the final spectral library as a result of merging multiple searches. © 2021 The Authors Current Protocols © 2021 Wiley Periodicals LLC. Basic Protocol 1: Searching DDA-MS files with multiple search engines Basic Protocol 2: Merging results from multiple search engines Basic Protocol 3: Creating spectral libraries from merged results Alternate Protocol: Using CLI for automating tasks Support Protocol: Creating concatenated FASTA files.
Collapse
Affiliation(s)
- Srikanth S Manda
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, New South Wales, Australia
| | - Zainab Noor
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, New South Wales, Australia
| | - Peter G Hains
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, New South Wales, Australia
| | - Qing Zhong
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, New South Wales, Australia
| |
Collapse
|
41
|
Chen CT, Wang JH, Cheng CW, Hsu WC, Ko CL, Choong WK, Sung TY. Multi-Q 2 software facilitates isobaric labeling quantitation analysis with improved accuracy and coverage. Sci Rep 2021; 11:2233. [PMID: 33500498 PMCID: PMC7838301 DOI: 10.1038/s41598-021-81740-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 01/06/2021] [Indexed: 12/12/2022] Open
Abstract
Mass spectrometry-based proteomics using isobaric labeling for multiplex quantitation has become a popular approach for proteomic studies. We present Multi-Q 2, an isobaric-labeling quantitation tool which can yield the largest quantitation coverage and improved quantitation accuracy compared to three state-of-the-art methods. Multi-Q 2 supports identification results from several popular proteomic data analysis platforms for quantitation, offering up to 12% improvement in quantitation coverage for accepting identification results from multiple search engines when compared with MaxQuant and PatternLab. It is equipped with various quantitation algorithms, including a ratio compression correction algorithm, and results in up to 336 algorithmic combinations. Systematic evaluation shows different algorithmic combinations have different strengths and are suitable for different situations. We also demonstrate that the flexibility of Multi-Q 2 in customizing algorithmic combination can lead to improved quantitation accuracy over existing tools. Moreover, the use of complementary algorithmic combinations can be an effective strategy to enhance sensitivity when searching for biomarkers from differentially expressed proteins in proteomic experiments. Multi-Q 2 provides interactive graphical interfaces to process quantitation and to display ratios at protein, peptide, and spectrum levels. It also supports a heatmap module, enabling users to cluster proteins based on their abundance ratios and to visualize the clustering results. Multi-Q 2 executable files, sample data sets, and user manual are freely available at http://ms.iis.sinica.edu.tw/COmics/Software_Multi-Q2.html.
Collapse
Affiliation(s)
- Ching-Tai Chen
- Institute of Information Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei, 115, Taiwan.
| | - Jen-Hung Wang
- Institute of Information Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei, 115, Taiwan.,Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, 115, Taiwan.,Institute of Biomedical Informatics, National Yang-Ming University, Taipei, 112, Taiwan
| | - Cheng-Wei Cheng
- Genomics Research Center, Academia Sinica, Taipei, 115, Taiwan
| | - Wei-Che Hsu
- Institute of Information Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei, 115, Taiwan
| | - Chu-Ling Ko
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Wai-Kok Choong
- Institute of Information Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei, 115, Taiwan
| | - Ting-Yi Sung
- Institute of Information Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei, 115, Taiwan.
| |
Collapse
|
42
|
Beri D, Herring CD, Blahova S, Poudel S, Giannone RJ, Hettich RL, Lynd LR. Coculture with hemicellulose-fermenting microbes reverses inhibition of corn fiber solubilization by Clostridium thermocellum at elevated solids loadings. BIOTECHNOLOGY FOR BIOFUELS 2021; 14:24. [PMID: 33461608 PMCID: PMC7814735 DOI: 10.1186/s13068-020-01867-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 12/24/2020] [Indexed: 05/10/2023]
Abstract
BACKGROUND The cellulolytic thermophile Clostridium thermocellum is an important biocatalyst due to its ability to solubilize lignocellulosic feedstocks without the need for pretreatment or exogenous enzyme addition. At low concentrations of substrate, C. thermocellum can solubilize corn fiber > 95% in 5 days, but solubilization declines markedly at substrate concentrations higher than 20 g/L. This differs for model cellulose like Avicel, on which the maximum solubilization rate increases in proportion to substrate concentration. The goal of this study was to examine fermentation at increasing corn fiber concentrations and investigate possible reasons for declining performance. RESULTS The rate of growth of C. thermocellum on corn fiber, inferred from CipA scaffoldin levels measured by LC-MS/MS, showed very little increase with increasing solids loading. To test for inhibition, we evaluated the effects of spent broth on growth and cellulase activity. The liquids remaining after corn fiber fermentation were found to be strongly inhibitory to growth on cellobiose, a substrate that does not require cellulose hydrolysis. Additionally, the hydrolytic activity of C. thermocellum cellulase was also reduced to less-than half by adding spent broth. Noting that > 15 g/L hemicellulose oligosaccharides accumulated in the spent broth of a 40 g/L corn fiber fermentation, we tested the effect of various model carbohydrates on growth on cellobiose and Avicel. Some compounds like xylooligosaccharides caused a decline in cellulolytic activity and a reduction in the maximum solubilization rate on Avicel. However, there were no relevant model compounds that could replicate the strong inhibition by spent broth on C. thermocellum growth on cellobiose. Cocultures of C. thermocellum with hemicellulose-consuming partners-Herbinix spp. strain LL1355 and Thermoanaerobacterium thermosaccharolyticum-exhibited lower levels of unfermented hemicellulose hydrolysis products, a doubling of the maximum solubilization rate, and final solubilization increased from 67 to 93%. CONCLUSIONS This study documents inhibition of C. thermocellum with increasing corn fiber concentration and demonstrates inhibition of cellulase activity by xylooligosaccharides, but further work is needed to understand why growth on cellobiose was inhibited by corn fiber fermentation broth. Our results support the importance of hemicellulose-utilizing coculture partners to augment C. thermocellum in the fermentation of lignocellulosic feedstocks at high solids loading.
Collapse
Affiliation(s)
- Dhananjay Beri
- Thayer School of Engineering, Dartmouth College, Hanover, NH, 03755, USA
- Centre for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
| | - Christopher D Herring
- Thayer School of Engineering, Dartmouth College, Hanover, NH, 03755, USA.
- Centre for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA.
- Enchi Corporation, Lebanon, NH, 03766, USA.
| | - Sofie Blahova
- Thayer School of Engineering, Dartmouth College, Hanover, NH, 03755, USA
| | - Suresh Poudel
- Centre for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
| | - Richard J Giannone
- Centre for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
| | - Robert L Hettich
- Centre for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
| | - Lee R Lynd
- Thayer School of Engineering, Dartmouth College, Hanover, NH, 03755, USA
- Centre for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
- Enchi Corporation, Lebanon, NH, 03766, USA
| |
Collapse
|
43
|
FPTMS: Frequency-based approach to identify the peptide from the low-energy collision-induced dissociation tandem mass spectra. J Proteomics 2021; 235:104116. [PMID: 33453436 DOI: 10.1016/j.jprot.2021.104116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 12/30/2020] [Accepted: 01/05/2021] [Indexed: 11/20/2022]
Abstract
The database search method is a widely accepted method to assign a peptide to the tandem mass spectra. In this study, a new flexible method- FPTMS is introduced to interpret the tandem mass spectra with the known peptide sequences in a protein database. Here the frequency of occurrence of fragment ion peaks extracted from the extensive spectral library is used to predict the theoretical tandem mass spectra of the peptides. The dot product scoring and windowed-xcorr scoring methods were implemented to score the experimental spectrum against the theoretical peptide spectra. Windowed-xcorr is introduced to tackle the mass errors and the cleavage position of the fragmentation process. The new method with windowed-xcorr shows an improved identification rate compared to the existing search engines Crux-Tide and X!Tandem at 1% False Discovery Rate (FDR) for the dataset considered in this study. SIGNIFICANCE: Identifying or sequencing of the peptide from tandem mass spectra is an important application in mass spectrometry-based proteomics. Collision-induced dissociation (CID) fragmentation spectra have been widely used to develop a peptide identification algorithm using database search strategy. CID fragmentation behavior is a complex process and found to have dependency on the sequences of peptide, charge state, and residue content. The inclusion of more features of peptide fragmentation behavior and adaptable scoring algorithm improves the efficiency of the peptide identification algorithm.
Collapse
|
44
|
Abstract
Mass spectrometry (MS)-based proteomics is currently the most successful approach to measure and compare peptides and proteins in a large variety of biological samples. Modern mass spectrometers, equipped with high-resolution analyzers, provide large amounts of data output. This is the case of shotgun/bottom-up proteomics, which consists in the enzymatic digestion of protein into peptides that are then measured by MS-instruments through a data dependent acquisition (DDA) mode. Dedicated bioinformatic tools and platforms have been developed to face the increasing size and complexity of raw MS data that need to be processed and interpreted for large-scale protein identification and quantification. This chapter illustrates the most popular bioinformatics solution for the analysis of shotgun MS-proteomics data. A general description will be provided on the data preprocessing options and the different search engines available, including practical suggestions on how to optimize the parameters for peptide search, based on hands-on experience.
Collapse
Affiliation(s)
- Avinash Yadav
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Federica Marini
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Alessandro Cuomo
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Tiziana Bonaldi
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy.
| |
Collapse
|
45
|
Tariq MU, Haseeb M, Aledhari M, Razzak R, Parizi RM, Saeed F. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 9:5497-5516. [PMID: 33537181 PMCID: PMC7853650 DOI: 10.1109/access.2020.3047588] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process a small-scale dataset of one million spectra against a genome of 3 GB. In this article, we provide an up-to-date review of tools that can analyze proteogenomic datasets, providing a critical analysis of the techniques' relative merits and potential pitfalls. We also point out potential bottlenecks and recommendations that can be incorporated in the future design of these workflows to ensure scalability with the increasing size of proteogenomic data. Lastly, we make a case of how high-performance computing (HPC) solutions may be the best bet to ensure the scalability of future big data proteogenomic data analysis.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Muhammad Haseeb
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Mohammed Aledhari
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Rehma Razzak
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Reza M Parizi
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
46
|
Vijaya Kumar S, Abraham PE, Hurst GB, Chourey K, Bible AN, Hettich RL, Doktycz MJ, Morrell-Falvey JL. A carotenoid-deficient mutant of the plant-associated microbe Pantoea sp. YR343 displays an altered membrane proteome. Sci Rep 2020; 10:14985. [PMID: 32917935 PMCID: PMC7486946 DOI: 10.1038/s41598-020-71672-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 08/05/2020] [Indexed: 01/08/2023] Open
Abstract
Membrane organization plays an important role in signaling, transport, and defense. In eukaryotes, the stability, organization, and function of membrane proteins are influenced by certain lipids and sterols, such as cholesterol. Bacteria lack cholesterol, but carotenoids and hopanoids are predicted to play a similar role in modulating membrane properties. We have previously shown that the loss of carotenoids in the plant-associated bacteria Pantoea sp. YR343 results in changes to membrane biophysical properties and leads to physiological changes, including increased sensitivity to reactive oxygen species, reduced indole-3-acetic acid secretion, reduced biofilm and pellicle formation, and reduced plant colonization. Here, using whole cell and membrane proteomics, we show that the deletion of carotenoid production in Pantoea sp. YR343 results in altered membrane protein distribution and abundance. Moreover, we observe significant differences in the protein composition of detergent-resistant membrane fractions from wildtype and mutant cells, consistent with the prediction that carotenoids play a role in organizing membrane microdomains. These data provide new insights into the function of carotenoids in bacterial membrane organization and identify cellular functions that are affected by the loss of carotenoids.
Collapse
Affiliation(s)
- Sushmitha Vijaya Kumar
- UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA
| | - Paul E Abraham
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Gregory B Hurst
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Karuna Chourey
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Amber N Bible
- UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA
| | - Robert L Hettich
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Mitchel J Doktycz
- UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA.,Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.,Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Jennifer L Morrell-Falvey
- UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA. .,Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN, USA. .,Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
| |
Collapse
|
47
|
Mekonnen GG, Tedla BA, Pickering D, Becker L, Wang L, Zhan B, Bottazzi ME, Loukas A, Sotillo J, Pearson MS. Schistosoma haematobium Extracellular Vesicle Proteins Confer Protection in a Heterologous Model of Schistosomiasis. Vaccines (Basel) 2020; 8:E416. [PMID: 32722279 PMCID: PMC7563238 DOI: 10.3390/vaccines8030416] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 07/19/2020] [Accepted: 07/22/2020] [Indexed: 01/16/2023] Open
Abstract
Helminth parasites release extracellular vesicles which interact with the surrounding host tissues, mediating host-parasite communication and other fundamental processes of parasitism. As such, vesicle proteins present attractive targets for the development of novel intervention strategies to control these parasites and the diseases they cause. Herein, we describe the first proteomic analysis by LC-MS/MS of two types of extracellular vesicles (exosome-like, 120 k pellet vesicles and microvesicle-like, 15 k pellet vesicles) from adult Schistosoma haematobium worms. A total of 57 and 330 proteins were identified in the 120 k pellet vesicles and larger 15 k pellet vesicles, respectively, and some of the most abundant molecules included homologues of known helminth vaccine and diagnostic candidates such as Sm-TSP2, Sm23, glutathione S-transferase, saponins and aminopeptidases. Tetraspanins were highly represented in the analysis and found in both vesicle types. Vaccination of mice with recombinant versions of three of these tetraspanins induced protection in a heterologous challenge (S. mansoni) model of infection, resulting in significant reductions (averaged across two independent trials) in liver (47%, 38% and 41%) and intestinal (47%, 45% and 41%) egg burdens. These findings offer insight into the mechanisms by which anti-tetraspanin antibodies confer protection and highlight the potential that extracellular vesicle surface proteins offer as anti-helminth vaccines.
Collapse
Affiliation(s)
- Gebeyaw G. Mekonnen
- Centre for Molecular Therapeutics, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns 4878, Queensland, Australia; (G.G.M.); (B.A.T.); (D.P.); (L.B.); (J.S.)
- Department of Medical Parasitology, School of Biomedical and Laboratory Sciences, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia
| | - Bemnet A. Tedla
- Centre for Molecular Therapeutics, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns 4878, Queensland, Australia; (G.G.M.); (B.A.T.); (D.P.); (L.B.); (J.S.)
| | - Darren Pickering
- Centre for Molecular Therapeutics, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns 4878, Queensland, Australia; (G.G.M.); (B.A.T.); (D.P.); (L.B.); (J.S.)
| | - Luke Becker
- Centre for Molecular Therapeutics, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns 4878, Queensland, Australia; (G.G.M.); (B.A.T.); (D.P.); (L.B.); (J.S.)
| | - Lei Wang
- Texas Children’s Hospital Center for Vaccine Development, Department of Pediatrics and National School of Tropical Medicine, Baylor College of Medicine, Houston, TX 77030, USA; (L.W.); (B.Z.); (M.E.B.)
| | - Bin Zhan
- Texas Children’s Hospital Center for Vaccine Development, Department of Pediatrics and National School of Tropical Medicine, Baylor College of Medicine, Houston, TX 77030, USA; (L.W.); (B.Z.); (M.E.B.)
| | - Maria Elena Bottazzi
- Texas Children’s Hospital Center for Vaccine Development, Department of Pediatrics and National School of Tropical Medicine, Baylor College of Medicine, Houston, TX 77030, USA; (L.W.); (B.Z.); (M.E.B.)
| | - Alex Loukas
- Centre for Molecular Therapeutics, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns 4878, Queensland, Australia; (G.G.M.); (B.A.T.); (D.P.); (L.B.); (J.S.)
| | - Javier Sotillo
- Centre for Molecular Therapeutics, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns 4878, Queensland, Australia; (G.G.M.); (B.A.T.); (D.P.); (L.B.); (J.S.)
- Parasitology Reference and Research Laboratory, Centro Nacional de Microbiología, Instituto de Salud Carlos III, Majadahonda, 28220 Madrid, Spain
| | - Mark S. Pearson
- Centre for Molecular Therapeutics, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns 4878, Queensland, Australia; (G.G.M.); (B.A.T.); (D.P.); (L.B.); (J.S.)
| |
Collapse
|
48
|
Marion S, Desharnais L, Studer N, Dong Y, Notter MD, Poudel S, Menin L, Janowczyk A, Hettich RL, Hapfelmeier S, Bernier-Latmani R. Biogeography of microbial bile acid transformations along the murine gut. J Lipid Res 2020; 61:1450-1463. [PMID: 32661017 DOI: 10.1194/jlr.ra120001021] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Bile acids, which are synthesized from cholesterol by the liver, are chemically transformed along the intestinal tract by the gut microbiota, and the products of these transformations signal through host receptors, affecting overall host health. These transformations include bile acid deconjugation, oxidation, and 7α-dehydroxylation. An understanding of the biogeography of bile acid transformations in the gut is critical because deconjugation is a prerequisite for 7α-dehydroxylation and because most gut microorganisms harbor bile acid transformation capacity. Here, we used a coupled metabolomic and metaproteomic approach to probe in vivo activity of the gut microbial community in a gnotobiotic mouse model. Results revealed the involvement of Clostridium scindens in 7α-dehydroxylation, of the genera Muribaculum and Bacteroides in deconjugation, and of six additional organisms in oxidation (the genera Clostridium, Muribaculum, Bacteroides, Bifidobacterium, Acutalibacter, and Akkermansia). Furthermore, the bile acid profile in mice with a more complex microbiota, a dysbiosed microbiota, or no microbiota was considered. For instance, conventional mice harbor a large diversity of bile acids, but treatment with an antibiotic such as clindamycin results in the complete inhibition of 7α-dehydroxylation, underscoring the strong inhibition of organisms that are capable of carrying out this process by this compound. Finally, a comparison of the hepatic bile acid pool size as a function of microbiota revealed that a reduced microbiota affects host signaling but not necessarily bile acid synthesis. In this study, bile acid transformations were mapped to the associated active microorganisms, offering a systematic characterization of the relationship between microbiota and bile acid composition.
Collapse
Affiliation(s)
- Solenne Marion
- Environmental Microbiology Laboratory, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Lyne Desharnais
- Environmental Microbiology Laboratory, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Nicolas Studer
- Institute for Infectious Diseases, University of Bern, Bern, Switzerland
| | - Yuan Dong
- Institute for Infectious Diseases, University of Bern, Bern, Switzerland
| | - Matheus D Notter
- Institute for Infectious Diseases, University of Bern, Bern, Switzerland
| | - Suresh Poudel
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Laure Menin
- Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Andrew Janowczyk
- Bioinformatics Core Facility, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Robert L Hettich
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | | | - Rizlan Bernier-Latmani
- Environmental Microbiology Laboratory, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
49
|
Abraham PE, Hurtado Castano N, Cowan-Turner D, Barnes J, Poudel S, Hettich R, Flütsch S, Santelia D, Borland AM. Peeling back the layers of crassulacean acid metabolism: functional differentiation between Kalanchoë fedtschenkoi epidermis and mesophyll proteomes. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 103:869-888. [PMID: 32314451 DOI: 10.1111/tpj.14757] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Revised: 03/18/2020] [Accepted: 03/23/2020] [Indexed: 06/11/2023]
Abstract
Crassulacean acid metabolism (CAM) is a specialized mode of photosynthesis that offers the potential to engineer improved water-use efficiency (WUE) and drought resilience in C3 plants while sustaining productivity in the hotter and drier climates that are predicted for much of the world. CAM species show an inverted pattern of stomatal opening and closing across the diel cycle, which conserves water and provides a means of maintaining growth in hot, water-limited environments. Recent genome sequencing of the constitutive model CAM species Kalanchoë fedtschenkoi provides a platform for elucidating the ensemble of proteins that link photosynthetic metabolism with stomatal movement, and that protect CAM plants from harsh environmental conditions. We describe a large-scale proteomics analysis to characterize and compare proteins, as well as diel changes in their abundance in guard cell-enriched epidermis and mesophyll cells from leaves of K. fedtschenkoi. Proteins implicated in processes that encompass respiration, the transport of water and CO2 , stomatal regulation, and CAM biochemistry are highlighted and discussed. Diel rescheduling of guard cell starch turnover in K. fedtschenkoi compared with that observed in Arabidopsis is reported and tissue-specific localization in the epidermis and mesophyll of isozymes implicated in starch and malate turnover are discussed in line with the contrasting roles for these metabolites within the CAM mesophyll and stomatal complex. These data reveal the proteins and the biological processes enriched in each layer and provide key information for studies aiming to adapt plants to hot and dry environments by modifying leaf physiology for improved plant sustainability.
Collapse
Affiliation(s)
- Paul E Abraham
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Natalia Hurtado Castano
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
- Department of Molecular Biology and Biotechnology, University of Sheffield, Sheffield, S10 2TN, UK
| | - Daniel Cowan-Turner
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
| | - Jeremy Barnes
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
| | - Suresh Poudel
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- Department of Genome Science and Technology, University of Tennessee, Knoxville, TN, 37996, USA
| | - Robert Hettich
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | | | - Diana Santelia
- Institute of Integrative Biology, ETH, Zürich, Switzerland
| | - Anne M Borland
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| |
Collapse
|
50
|
The M, Käll L. Focus on the spectra that matter by clustering of quantification data in shotgun proteomics. Nat Commun 2020; 11:3234. [PMID: 32591519 PMCID: PMC7319958 DOI: 10.1038/s41467-020-17037-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 06/08/2020] [Indexed: 02/02/2023] Open
Abstract
In shotgun proteomics, the analysis of label-free quantification experiments is typically limited by the identification rate and the noise level in the quantitative data. This generally causes a low sensitivity in differential expression analysis. Here, we propose a quantification-first approach for peptides that reverses the classical identification-first workflow, thereby preventing valuable information from being discarded in the identification stage. Specifically, we introduce a method, Quandenser, that applies unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. This reduces search time due to the data reduction. We can now employ open modification and de novo searches to identify analytes of interest that would have gone unnoticed in traditional pipelines. Quandenser+Triqler outperforms the state-of-the-art method MaxQuant+Perseus, consistently reporting more differentially abundant proteins for all tested datasets. Software is available for all major operating systems at https://github.com/statisticalbiotechnology/quandenser, under Apache 2.0 license.
Collapse
Affiliation(s)
- Matthew The
- Science for Life Laboratory, KTH Royal Institute of Technology, Box 1031, 17121, Solna, Sweden
| | - Lukas Käll
- Science for Life Laboratory, KTH Royal Institute of Technology, Box 1031, 17121, Solna, Sweden.
| |
Collapse
|