1
|
Peng Y, Jain S, Radivojac P. An algorithm for decoy-free false discovery rate estimation in XL-MS/MS proteomics. Bioinformatics 2024; 40:i428-i436. [PMID: 38940171 DOI: 10.1093/bioinformatics/btae233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Cross-linking tandem mass spectrometry (XL-MS/MS) is an established analytical platform used to determine distance constraints between residues within a protein or from physically interacting proteins, thus improving our understanding of protein structure and function. To aid biological discovery with XL-MS/MS, it is essential that pairs of chemically linked peptides be accurately identified, a process that requires: (i) database search, that creates a ranked list of candidate peptide pairs for each experimental spectrum and (ii) false discovery rate (FDR) estimation, that determines the probability of a false match in a group of top-ranked peptide pairs with scores above a given threshold. Currently, the only available FDR estimation mechanism in XL-MS/MS is the target-decoy approach (TDA). However, despite its simplicity, TDA has both theoretical and practical limitations that impact the estimation accuracy and increase run time over potential decoy-free approaches (DFAs). RESULTS We introduce a novel decoy-free framework for FDR estimation in XL-MS/MS. Our approach relies on multi-sample mixtures of skew normal distributions, where the latent components correspond to the scores of correct peptide pairs (both peptides identified correctly), partially incorrect peptide pairs (one peptide identified correctly, the other incorrectly), and incorrect peptide pairs (both peptides identified incorrectly). To learn these components, we exploit the score distributions of first- and second-ranked peptide-spectrum matches for each experimental spectrum and subsequently estimate FDR using a novel expectation-maximization algorithm with constraints. We evaluate the method on ten datasets and provide evidence that the proposed DFA is theoretically sound and a viable alternative to TDA owing to its good performance in terms of accuracy, variance of estimation, and run time. AVAILABILITY AND IMPLEMENTATION https://github.com/shawn-peng/xlms.
Collapse
Affiliation(s)
- Yisu Peng
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Shantanu Jain
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
- The Institute for Experiential AI, Northeastern University, Boston, MA 02115, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| |
Collapse
|
2
|
Ananth V, Sanders J, Yilmaz M, Wen B, Oh S, Noble WS. A learned score function improves the power of mass spectrometry database search. Bioinformatics 2024; 40:i410-i417. [PMID: 38940129 PMCID: PMC11211853 DOI: 10.1093/bioinformatics/btae218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION One of the core problems in the analysis of protein tandem mass spectrometry data is the peptide assignment problem: determining, for each observed spectrum, the peptide sequence that was responsible for generating the spectrum. Two primary classes of methods are used to solve this problem: database search and de novo peptide sequencing. State-of-the-art methods for de novo sequencing use machine learning methods, whereas most database search engines use hand-designed score functions to evaluate the quality of a match between an observed spectrum and a candidate peptide from the database. We hypothesized that machine learning models for de novo sequencing implicitly learn a score function that captures the relationship between peptides and spectra, and thus may be re-purposed as a score function for database search. Because this score function is trained from massive amounts of mass spectrometry data, it could potentially outperform existing, hand-designed database search tools. RESULTS To test this hypothesis, we re-engineered Casanovo, which has been shown to provide state-of-the-art de novo sequencing capabilities, to assign scores to given peptide-spectrum pairs. We then evaluated the statistical power of this Casanovo score function, Casanovo-DB, to detect peptides on a benchmark of three mass spectrometry runs from three different species. In addition, we show that re-scoring with the Percolator post-processor benefits Casanovo-DB more than other score functions, further increasing the number of detected peptides.
Collapse
Affiliation(s)
- Varun Ananth
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Justin Sanders
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Melih Yilmaz
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Bo Wen
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Sewoong Oh
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - William Stafford Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
3
|
Lai S, Zhao P, Zhou C, Li N, Yu W. PIPI2: Sensitive Tag-Based Database Search to Identify Peptides with Multiple Post-translational Modifications. J Proteome Res 2024. [PMID: 38770571 DOI: 10.1021/acs.jproteome.3c00819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Peptide identification is important in bottom-up proteomics. Post-translational modifications (PTMs) are crucial in regulating cellular activities. Many database search methods have been developed to identify peptides with PTMs and characterize the PTM patterns. However, the PTMs on peptides hinder the peptide identification rate and the PTM characterization precision, especially for peptides with multiple PTMs. To address this issue, we present a sensitive open search engine, PIPI2, with much better performance on peptides with multiple PTMs than other methods. With a greedy approach, we simplify the PTM characterization problem into a linear one, which enables characterizing multiple PTMs on one peptide. On the simulation data sets with up to four PTMs per peptide, PIPI2 identified over 90% of the spectra, at least 56% more than five other competitors. PIPI2 also characterized these PTM patterns with the highest precision of 77%, demonstrating a significant advantage in handling peptides with multiple PTMs. In the real applications, PIPI2 identified 30% to 88% more peptides with PTMs than its competitors.
Collapse
Affiliation(s)
- Shengzhi Lai
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, Hong Kong, China
| | - Peize Zhao
- Interdisciplinary Programs Office, The Hong Kong University of Science and Technology, Hong Kong, Hong Kong, China
| | - Chen Zhou
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, Hong Kong, China
| | - Ning Li
- Shenzhen-Hong Kong Collaborative Innovation Research Institute, HKUST, Futian, Shenzhen 518000, China
- Division of Life Science, The Hong Kong University of Science and Technology, Hong Kong, Hong Kong, China
| | - Weichuan Yu
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, Hong Kong, China
- Shenzhen-Hong Kong Collaborative Innovation Research Institute, HKUST, Futian, Shenzhen 518000, China
| |
Collapse
|
4
|
Adams C, Gabriel W, Laukens K, Picciani M, Wilhelm M, Bittremieux W, Boonen K. Fragment ion intensity prediction improves the identification rate of non-tryptic peptides in timsTOF. Nat Commun 2024; 15:3956. [PMID: 38730277 PMCID: PMC11087512 DOI: 10.1038/s41467-024-48322-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 04/29/2024] [Indexed: 05/12/2024] Open
Abstract
Immunopeptidomics is crucial for immunotherapy and vaccine development. Because the generation of immunopeptides from their parent proteins does not adhere to clear-cut rules, rather than being able to use known digestion patterns, every possible protein subsequence within human leukocyte antigen (HLA) class-specific length restrictions needs to be considered during sequence database searching. This leads to an inflation of the search space and results in lower spectrum annotation rates. Peptide-spectrum match (PSM) rescoring is a powerful enhancement of standard searching that boosts the spectrum annotation performance. We analyze 302,105 unique synthesized non-tryptic peptides from the ProteomeTools project on a timsTOF-Pro to generate a ground-truth dataset containing 93,227 MS/MS spectra of 74,847 unique peptides, that is used to fine-tune the deep learning-based fragment ion intensity prediction model Prosit. We demonstrate up to 3-fold improvement in the identification of immunopeptides, as well as increased detection of immunopeptides from low input samples.
Collapse
Affiliation(s)
- Charlotte Adams
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Wassim Gabriel
- Computational Mass Spectrometry, Technical University of Munich, 85354, Freising, Germany
| | - Kris Laukens
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Mario Picciani
- Computational Mass Spectrometry, Technical University of Munich, 85354, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich, 85354, Freising, Germany
- Munich Data Science Institute, Technical University of Munich, 85748, Garching, Germany
| | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, Antwerp, Belgium.
| | - Kurt Boonen
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium.
- Sustainable Health Department, Flemish Institute for Technological Research (VITO), Antwerp, Belgium.
| |
Collapse
|
5
|
Freestone J, Noble WS, Keich U. Reinvestigating the Correctness of Decoy-Based False Discovery Rate Control in Proteomics Tandem Mass Spectrometry. J Proteome Res 2024. [PMID: 38687997 DOI: 10.1021/acs.jproteome.3c00902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2024]
Abstract
Traditional database search methods for the analysis of bottom-up proteomics tandem mass spectrometry (MS/MS) data are limited in their ability to detect peptides with post-translational modifications (PTMs). Recently, "open modification" database search strategies, in which the requirement that the mass of the database peptide closely matches the observed precursor mass is relaxed, have become popular as ways to find a wider variety of types of PTMs. Indeed, in one study, Kong et al. reported that the open modification search tool MSFragger can achieve higher statistical power to detect peptides than a traditional "narrow window" database search. We investigated this claim empirically and, in the process, uncovered a potential general problem with false discovery rate (FDR) control in the machine learning postprocessors Percolator and PeptideProphet. This problem might have contributed to Kong et al.'s report that their empirical results suggest that false discovery (FDR) control in the narrow window setting might generally be compromised. Indeed, reanalyzing the same data while using a more standard form of target-decoy competition-based FDR control, we found that, after accounting for chimeric spectra as well as for the inherent difference in the number of candidates in open and narrow searches, the data does not provide sufficient evidence that FDR control in proteomics MS/MS database search is inherently problematic.
Collapse
Affiliation(s)
- Jack Freestone
- School of Mathematics and Statistics F07, University of Sydney, New South Wales 2006, Australia
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Uri Keich
- School of Mathematics and Statistics F07, University of Sydney, New South Wales 2006, Australia
| |
Collapse
|
6
|
Basharat AR, Xiong X, Xu T, Zang Y, Sun L, Liu X. TopDIA: A Software Tool for Top-Down Data-Independent Acquisition Proteomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.05.588302. [PMID: 38645171 PMCID: PMC11030422 DOI: 10.1101/2024.04.05.588302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Top-down mass spectrometry is widely used for proteoform identification, characterization, and quantification owing to its ability to analyze intact proteoforms. In the last decade, top-down proteomics has been dominated by top-down data-dependent acquisition mass spectrometry (TD-DDA-MS), and top-down data-independent acquisition mass spectrometry (TD-DIA-MS) has not been well studied. While TD-DIA-MS produces complex multiplexed tandem mass spectrometry (MS/MS) spectra, which are challenging to confidently identify, it selects more precursor ions for MS/MS analysis and has the potential to increase proteoform identifications compared with TD-DDA-MS. Here we present TopDIA, the first software tool for proteoform identification by TD-DIA-MS. It generates demultiplexed pseudo MS/MS spectra from TD-DIA-MS data and then searches the pseudo MS/MS spectra against a protein sequence database for proteoform identification. We compared the performance of TD-DDA-MS and TD-DIA-MS using Escherichia coli K-12 MG1655 cells and demonstrated that TD-DIA-MS with TopDIA increased proteoform and protein identifications compared with TD-DDA-MS.
Collapse
Affiliation(s)
- Abdul Rehman Basharat
- Department of BioHealth Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202, USA
| | - Xingzhao Xiong
- Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, 70112, USA
| | - Tian Xu
- Department of Chemistry, Michigan State University, East Lansing, MI, 48824, USA
| | - Yong Zang
- Department of Biostatistics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Liangliang Sun
- Department of Chemistry, Michigan State University, East Lansing, MI, 48824, USA
| | - Xiaowen Liu
- Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, 70112, USA
| |
Collapse
|
7
|
Adams C, Laukens K, Bittremieux W, Boonen K. Machine learning-based peptide-spectrum match rescoring opens up the immunopeptidome. Proteomics 2024; 24:e2300336. [PMID: 38009585 DOI: 10.1002/pmic.202300336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/18/2023] [Accepted: 10/23/2023] [Indexed: 11/29/2023]
Abstract
Immunopeptidomics is a key technology in the discovery of targets for immunotherapy and vaccine development. However, identifying immunopeptides remains challenging due to their non-tryptic nature, which results in distinct spectral characteristics. Moreover, the absence of strict digestion rules leads to extensive search spaces, further amplified by the incorporation of somatic mutations, pathogen genomes, unannotated open reading frames, and post-translational modifications. This inflation in search space leads to an increase in random high-scoring matches, resulting in fewer identifications at a given false discovery rate. Peptide-spectrum match rescoring has emerged as a machine learning-based solution to address challenges in mass spectrometry-based immunopeptidomics data analysis. It involves post-processing unfiltered spectrum annotations to better distinguish between correct and incorrect peptide-spectrum matches. Recently, features based on predicted peptidoform properties, including fragment ion intensities, retention time, and collisional cross section, have been used to improve the accuracy and sensitivity of immunopeptide identification. In this review, we describe the diverse bioinformatics pipelines that are currently available for peptide-spectrum match rescoring and discuss how they can be used for the analysis of immunopeptidomics data. Finally, we provide insights into current and future machine learning solutions to boost immunopeptide identification.
Collapse
Affiliation(s)
- Charlotte Adams
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Wout Bittremieux
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- ImmuneSpec BV, Niel, Belgium
| |
Collapse
|
8
|
Strauss MT, Bludau I, Zeng WF, Voytik E, Ammar C, Schessner JP, Ilango R, Gill M, Meier F, Willems S, Mann M. AlphaPept: a modern and open framework for MS-based proteomics. Nat Commun 2024; 15:2168. [PMID: 38461149 PMCID: PMC10924963 DOI: 10.1038/s41467-024-46485-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 02/20/2024] [Indexed: 03/11/2024] Open
Abstract
In common with other omics technologies, mass spectrometry (MS)-based proteomics produces ever-increasing amounts of raw data, making efficient analysis a principal challenge. A plethora of different computational tools can process the MS data to derive peptide and protein identification and quantification. However, during the last years there has been dramatic progress in computer science, including collaboration tools that have transformed research and industry. To leverage these advances, we develop AlphaPept, a Python-based open-source framework for efficient processing of large high-resolution MS data sets. Numba for just-in-time compilation on CPU and GPU achieves hundred-fold speed improvements. AlphaPept uses the Python scientific stack of highly optimized packages, reducing the code base to domain-specific tasks while accessing the latest advances. We provide an easy on-ramp for community contributions through the concept of literate programming, implemented in Jupyter Notebooks. Large datasets can rapidly be processed as shown by the analysis of hundreds of proteomes in minutes per file, many-fold faster than acquisition. AlphaPept can be used to build automated processing pipelines with web-serving functionality and compatibility with downstream analysis tools. It provides easy access via one-click installation, a modular Python library for advanced users, and via an open GitHub repository for developers.
Collapse
Affiliation(s)
- Maximilian T Strauss
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Isabell Bludau
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Wen-Feng Zeng
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Eugenia Voytik
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Constantin Ammar
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Julia P Schessner
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | | | | | - Florian Meier
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
- Functional Proteomics, Jena University Hospital, Jena, Germany
| | - Sander Willems
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Matthias Mann
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
9
|
Cooper B, Yang R. An assessment of AcquireX and Compound Discoverer software 3.3 for non-targeted metabolomics. Sci Rep 2024; 14:4841. [PMID: 38418855 PMCID: PMC10902394 DOI: 10.1038/s41598-024-55356-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 02/22/2024] [Indexed: 03/02/2024] Open
Abstract
We used the Exploris 240 mass spectrometer for non-targeted metabolomics on Saccharomyces cerevisiae strain BY4741 and tested AcquireX software for increasing the number of detectable compounds and Compound Discoverer 3.3 software for identifying compounds by MS2 spectral library matching. AcquireX increased the number of potentially identifiable compounds by 50% through six iterations of MS2 acquisition. On the basis of high-scoring MS2 matches made by Compound Discoverer, there were 483 compounds putatively identified from nearly 8000 candidate spectra. Comparisons to 20 amino acid standards, however, revealed instances whereby compound matches could be incorrect despite strong scores. Situations included the candidate with the top score not being the correct compound, matching the same compound at two different chromatographic peaks, assigning the highest score to a library compound much heavier than the mass for the parent ion, and grouping MS2 isomers to a single parent ion. Because the software does not calculate false positive and false discovery rates at these multiple levels where such errors can propagate, we conclude that manual examination of findings will be required post software analysis. These results will interest scientists who may use this platform for metabolomics research in diverse disciplines including medical science, environmental science, and agriculture.
Collapse
Affiliation(s)
- Bret Cooper
- Soybean Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, 20705, USA.
| | - Ronghui Yang
- Soybean Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, 20705, USA
| |
Collapse
|
10
|
Vasilogianni AM, Alrubia S, El-Khateeb E, Al-Majdoub ZM, Couto N, Achour B, Rostami-Hodjegan A, Barber J. Complementarity of two proteomic data analysis tools in the identification of drug-metabolising enzymes and transporters in human liver. Mol Omics 2024; 20:115-127. [PMID: 37975521 DOI: 10.1039/d3mo00144j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
Several software packages are available for the analysis of proteomic LC-MS/MS data, including commercial (e.g. Mascot/Progenesis LC-MS) and open access software (e.g. MaxQuant). In this study, Progenesis and MaxQuant were used to analyse the same data set from human liver microsomes (n = 23). Comparison focussed on the total number of peptides and proteins identified by the two packages. For the peptides exclusively identified by each software package, distribution of peptide length, hydrophobicity, molecular weight, isoelectric point and score were compared. Using standard cut-off peptide scores, we found an average of only 65% overlap in detected peptides, with surprisingly little consistency in the characteristics of peptides exclusively detected by each package. Generally, MaxQuant detected more peptides than Progenesis, and the additional peptides were longer and had relatively lower scores. Progenesis-specific peptides tended to be more hydrophilic and basic relative to peptides detected only by MaxQuant. At the protein level, we focussed on drug-metabolising enzymes (DMEs) and transporters, by comparing the number of unique peptides detected by the two packages for these specific proteins of interest, and their abundance. The abundance of DMEs and SLC transporters showed good correlation between the two software tools, but ABC showed less consistency. In conclusion, in order to maximise the use of MS datasets, we recommend processing with more than one software package. Together, Progenesis and MaxQuant provided excellent coverage, with a core of common peptides identified in a very robust way.
Collapse
Affiliation(s)
- Areti-Maria Vasilogianni
- Centre for Applied Pharmacokinetic Research, School of Health Sciences, University of Manchester, Stopford Building, Oxford Road, Manchester, M13 9PT, UK.
- DMPK, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Sarah Alrubia
- Centre for Applied Pharmacokinetic Research, School of Health Sciences, University of Manchester, Stopford Building, Oxford Road, Manchester, M13 9PT, UK.
- Pharmaceutical Chemistry Department, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Eman El-Khateeb
- Centre for Applied Pharmacokinetic Research, School of Health Sciences, University of Manchester, Stopford Building, Oxford Road, Manchester, M13 9PT, UK.
- Clinical Pharmacy Department, Faculty of Pharmacy, Tanta University, Tanta, Egypt
- Certara Inc (Simcyp Division), 1 Concourse Way, Sheffield, UK
| | - Zubida M Al-Majdoub
- Centre for Applied Pharmacokinetic Research, School of Health Sciences, University of Manchester, Stopford Building, Oxford Road, Manchester, M13 9PT, UK.
| | - Narciso Couto
- Centre for Applied Pharmacokinetic Research, School of Health Sciences, University of Manchester, Stopford Building, Oxford Road, Manchester, M13 9PT, UK.
| | - Brahim Achour
- Centre for Applied Pharmacokinetic Research, School of Health Sciences, University of Manchester, Stopford Building, Oxford Road, Manchester, M13 9PT, UK.
- Department of Biomedical and Pharmaceutical Sciences, College of Pharmacy, University of Rhode Island, Kingston, Rhode Island, USA
| | - Amin Rostami-Hodjegan
- Centre for Applied Pharmacokinetic Research, School of Health Sciences, University of Manchester, Stopford Building, Oxford Road, Manchester, M13 9PT, UK.
- Certara Inc (Simcyp Division), 1 Concourse Way, Sheffield, UK
| | - Jill Barber
- Centre for Applied Pharmacokinetic Research, School of Health Sciences, University of Manchester, Stopford Building, Oxford Road, Manchester, M13 9PT, UK.
| |
Collapse
|
11
|
Lou R, Shui W. Acquisition and Analysis of DIA-Based Proteomic Data: A Comprehensive Survey in 2023. Mol Cell Proteomics 2024; 23:100712. [PMID: 38182042 PMCID: PMC10847697 DOI: 10.1016/j.mcpro.2024.100712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/27/2023] [Accepted: 01/02/2024] [Indexed: 01/07/2024] Open
Abstract
Data-independent acquisition (DIA) mass spectrometry (MS) has emerged as a powerful technology for high-throughput, accurate, and reproducible quantitative proteomics. This review provides a comprehensive overview of recent advances in both the experimental and computational methods for DIA proteomics, from data acquisition schemes to analysis strategies and software tools. DIA acquisition schemes are categorized based on the design of precursor isolation windows, highlighting wide-window, overlapping-window, narrow-window, scanning quadrupole-based, and parallel accumulation-serial fragmentation-enhanced DIA methods. For DIA data analysis, major strategies are classified into spectrum reconstruction, sequence-based search, library-based search, de novo sequencing, and sequencing-independent approaches. A wide array of software tools implementing these strategies are reviewed, with details on their overall workflows and scoring approaches at different steps. The generation and optimization of spectral libraries, which are critical resources for DIA analysis, are also discussed. Publicly available benchmark datasets covering global proteomics and phosphoproteomics are summarized to facilitate performance evaluation of various software tools and analysis workflows. Continued advances and synergistic developments of versatile components in DIA workflows are expected to further enhance the power of DIA-based proteomics.
Collapse
Affiliation(s)
- Ronghui Lou
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| | - Wenqing Shui
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| |
Collapse
|
12
|
Santos LGC, Parreira VDSC, da Silva EMG, Santos MDM, Fernandes ADF, Neves-Ferreira AGDC, Carvalho PC, Freitas FCDP, Passetti F. SpliceProt 2.0: A Sequence Repository of Human, Mouse, and Rat Proteoforms. Int J Mol Sci 2024; 25:1183. [PMID: 38256255 PMCID: PMC10816255 DOI: 10.3390/ijms25021183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/15/2023] [Accepted: 01/03/2024] [Indexed: 01/24/2024] Open
Abstract
SpliceProt 2.0 is a public proteogenomics database that aims to list the sequence of known proteins and potential new proteoforms in human, mouse, and rat proteomes. This updated repository provides an even broader range of computationally translated proteins and serves, for example, to aid with proteomic validation of splice variants absent from the reference UniProtKB/SwissProt database. We demonstrate the value of SpliceProt 2.0 to predict orthologous proteins between humans and murines based on transcript reconstruction, sequence annotation and detection at the transcriptome and proteome levels. In this release, the annotation data used in the reconstruction of transcripts based on the methodology of ternary matrices were acquired from new databases such as Ensembl, UniProt, and APPRIS. Another innovation implemented in the pipeline is the exclusion of transcripts predicted to be susceptible to degradation through the NMD pathway. Taken together, our repository and its applications represent a valuable resource for the proteogenomics community.
Collapse
Affiliation(s)
- Letícia Graziela Costa Santos
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
| | - Vinícius da Silva Coutinho Parreira
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
| | - Esdras Matheus Gomes da Silva
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
- Laboratory of Toxinology, Oswaldo Cruz Institute, Fundação Oswaldo Cruz (FIOCRUZ), Av. Brazil 4036, Campus Maré, Rio de Janeiro 21040-361, RJ, Brazil
| | - Marlon Dias Mariano Santos
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
| | - Alexander da Franca Fernandes
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
| | - Ana Gisele da Costa Neves-Ferreira
- Laboratory of Toxinology, Oswaldo Cruz Institute, Fundação Oswaldo Cruz (FIOCRUZ), Av. Brazil 4036, Campus Maré, Rio de Janeiro 21040-361, RJ, Brazil
| | - Paulo Costa Carvalho
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
| | - Flávia Cristina de Paula Freitas
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
- Departamento de Genética e Evolução, Universidade Federal de São Carlos (UFSCar), Rodovia Washington Luis, Km 235, São Carlos 13565-905, SP, Brazil
| | - Fabio Passetti
- Instituto Carlos Chagas, Fundação Oswaldo Cruz (FIOCRUZ), Rua Professor Algacyr Munhoz Mader 3775, Cidade Industrial De Curitiba, Curitiba 81310-020, PR, Brazil
| |
Collapse
|
13
|
Holstein T, Muth T. Bioinformatic Workflows for Metaproteomics. Methods Mol Biol 2024; 2820:187-213. [PMID: 38941024 DOI: 10.1007/978-1-0716-3910-8_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]
Abstract
The strong influence of microbiomes on areas such as ecology and human health has become widely recognized in the past years. Accordingly, various techniques for the investigation of the composition and function of microbial community samples have been developed. Metaproteomics, the comprehensive analysis of the proteins from microbial communities, allows for the investigation of not only the taxonomy but also the functional and quantitative composition of microbiome samples. Due to the complexity of the investigated communities, methods developed for single organism proteomics cannot be readily applied to metaproteomic samples. For this purpose, methods specifically tailored to metaproteomics are required. In this work, a detailed overview of current bioinformatic solutions and protocols in metaproteomics is given. After an introduction to the proteomic database search, the metaproteomic post-processing steps are explained in detail. Ten specific bioinformatic software solutions are focused on, covering various steps including database-driven identification and quantification as well as taxonomic and functional assignment.
Collapse
Affiliation(s)
- Tanja Holstein
- Section eScience (S.3), Federal Institute for Materials Research and Testing, Berlin, Germany
- VIB-UGent Center for Medical Biotechnology, VIB and Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Data Competence Center, Robert Koch Institute, Berlin, Deutschland
| | - Thilo Muth
- Section eScience (S.3), Federal Institute for Materials Research and Testing, Berlin, Germany.
- Data Competence Center, Robert Koch Institute, Berlin, Deutschland.
| |
Collapse
|
14
|
Genth J, Schäfer K, Cassidy L, Graspeuntner S, Rupp J, Tholey A. Identification of proteoforms of short open reading frame-encoded peptides in Blautia producta under different cultivation conditions. Microbiol Spectr 2023; 11:e0252823. [PMID: 37782090 PMCID: PMC10715070 DOI: 10.1128/spectrum.02528-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 08/14/2023] [Indexed: 10/03/2023] Open
Abstract
IMPORTANCE The identification of short open reading frame-encoded peptides (SEP) and different proteoforms in single cultures of gut microbes offers new insights into a largely neglected part of the microbial proteome landscape. This is of particular importance as SEP provide various predicted functions, such as acting as antimicrobial peptides, maintaining cell homeostasis under stress conditions, or even contributing to the virulence pattern. They are, thus, taking a poorly understood role in structure and function of microbial networks in the human body. A better understanding of SEP in the context of human health requires a precise understanding of the abundance of SEP both in commensal microbes as well as pathogens. For the gut beneficial B. producta, we demonstrate the importance of specific environmental conditions for biosynthesis of SEP expanding previous findings about their role in microbial interactions.
Collapse
Affiliation(s)
- Jerome Genth
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Kathrin Schäfer
- Department of Infectious Diseases and Microbiology, University of Lübeck, Lübeck, Germany
| | - Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Simon Graspeuntner
- Department of Infectious Diseases and Microbiology, University of Lübeck, Lübeck, Germany
- German Center for Infection Research (DZIF), Partner Site Hamburg-Lübeck-Borstel-Riems, Lübeck, Germany
| | - Jan Rupp
- Department of Infectious Diseases and Microbiology, University of Lübeck, Lübeck, Germany
- German Center for Infection Research (DZIF), Partner Site Hamburg-Lübeck-Borstel-Riems, Lübeck, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| |
Collapse
|
15
|
Luo D, Ebadi A, Emery K, He Y, Noble WS, Keich U. Competition-based control of the false discovery proportion. Biometrics 2023; 79:3472-3484. [PMID: 36652258 DOI: 10.1111/biom.13830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 10/12/2022] [Accepted: 01/02/2023] [Indexed: 01/19/2023]
Abstract
Recently, Barber and Candès laid the theoretical foundation for a general framework for false discovery rate (FDR) control based on the notion of "knockoffs." A closely related FDR control methodology has long been employed in the analysis of mass spectrometry data, referred to there as "target-decoy competition" (TDC). However, any approach that aims to control the FDR, which is defined as the expected value of the false discovery proportion (FDP), suffers from a problem. Specifically, even when successfully controlling the FDR at level α, the FDP in the list of discoveries can significantly exceed α. We offer FDP-SD, a new procedure that rigorously controls the FDP in the knockoff/TDC competition setup by guaranteeing that the FDP is bounded by α at a desired confidence level. Compared with the recently published framework of Katsevich and Ramdas, FDP-SD generally delivers more power and often substantially so in simulated and real data.
Collapse
Affiliation(s)
- Dong Luo
- School of Mathematics and Statistics, University of Sydney, New South Wales, Australia
| | - Arya Ebadi
- School of Mathematics and Statistics, University of Sydney, New South Wales, Australia
| | - Kristen Emery
- School of Mathematics and Statistics, University of Sydney, New South Wales, Australia
| | - Yilun He
- School of Mathematics and Statistics, University of Sydney, New South Wales, Australia
| | | | - Uri Keich
- School of Mathematics and Statistics, University of Sydney, New South Wales, Australia
| |
Collapse
|
16
|
Fuchs S, Engelmann S. Small proteins in bacteria - Big challenges in prediction and identification. Proteomics 2023; 23:e2200421. [PMID: 37609810 DOI: 10.1002/pmic.202200421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/24/2023]
Abstract
Proteins with up to 100 amino acids have been largely overlooked due to the challenges associated with predicting and identifying them using traditional methods. Recent advances in bioinformatics and machine learning, DNA sequencing, RNA and Ribo-seq technologies, and mass spectrometry (MS) have greatly facilitated the detection and characterisation of these elusive proteins in recent years. This has revealed their crucial role in various cellular processes including regulation, signalling and transport, as toxins and as folding helpers for protein complexes. Consequently, the systematic identification and characterisation of these proteins in bacteria have emerged as a prominent field of interest within the microbial research community. This review provides an overview of different strategies for predicting and identifying these proteins on a large scale, leveraging the power of these advanced technologies. Furthermore, the review offers insights into the future developments that may be expected in this field.
Collapse
Affiliation(s)
- Stephan Fuchs
- Genome Competence Center (MF1), Department MFI, Robert-Koch-Institut, Berlin, Germany
| | - Susanne Engelmann
- Institute for Microbiology, Technische Universität Braunschweig, Braunschweig, Germany
- Microbial Proteomics, Helmholtzzentrum für Infektionsforschung GmbH, Braunschweig, Germany
| |
Collapse
|
17
|
Haseeb M, Saeed F. GPU-acceleration of the distributed-memory database peptide search of mass spectrometry data. Sci Rep 2023; 13:18713. [PMID: 37907498 PMCID: PMC10618243 DOI: 10.1038/s41598-023-43033-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 09/18/2023] [Indexed: 11/02/2023] Open
Abstract
Database peptide search is the primary computational technique for identifying peptides from the mass spectrometry (MS) data. Graphical Processing Units (GPU) computing is now ubiquitous in the current-generation of high-performance computing (HPC) systems, yet its application in the database peptide search domain remains limited. Part of the reason is the use of sub-optimal algorithms in the existing GPU-accelerated methods resulting in significantly inefficient hardware utilization. In this paper, we design and implement a new-age CPU-GPU HPC framework, called GiCOPS, for efficient and complete GPU-acceleration of the modern database peptide search algorithms on supercomputers. Our experimentation shows that the GiCOPS exhibits between 1.2 to 5[Formula: see text] speed improvement over its CPU-only predecessor, HiCOPS, and over 10[Formula: see text] improvement over several existing GPU-based database search algorithms for sufficiently large experiment sizes. We further assess and optimize the performance of our framework using the Roofline Model and report near-optimal results for several metrics including computations per second, occupancy rate, memory workload, branch efficiency and shared memory performance. Finally, the CPU-GPU methods and optimizations proposed in our work for complex integer- and memory-bounded algorithmic pipelines can also be extended to accelerate the existing and future peptide identification algorithms. GiCOPS is now integrated with our umbrella HPC framework HiCOPS and is available at: https://github.com/pcdslab/gicops .
Collapse
Affiliation(s)
- Muhammad Haseeb
- Knight Foundation School of Computing and Information Sciences, Florida International University (FIU), Miami, FL, USA
| | - Fahad Saeed
- Knight Foundation School of Computing and Information Sciences, Florida International University (FIU), Miami, FL, USA.
- Biomolecular Sciences Institute (BSI), Miami, FL, USA.
- Department of Human and Molecular Genetics, Herbert Wertheim School of Medicine, Florida International University, Miami, FL, USA.
| |
Collapse
|
18
|
Seregin AA, Smirnova LP, Dmitrieva EM, Zavialova MG, Simutkin GG, Ivanova SA. Differential Expression of Proteins Associated with Bipolar Disorder as Identified Using the PeptideShaker Software. Int J Mol Sci 2023; 24:15250. [PMID: 37894929 PMCID: PMC10607299 DOI: 10.3390/ijms242015250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 09/29/2023] [Accepted: 10/06/2023] [Indexed: 10/29/2023] Open
Abstract
The prevalence of bipolar disorder (BD) in modern society is growing rapidly, but due to the lack of paraclinical criteria, its differential diagnosis with other mental disorders is somewhat challenging. In this regard, the relevance of proteomic studies is increasing due to the development of methods for processing large data arrays; this contributes to the discovery of protein patterns of pathological processes and the creation of new methods of diagnosis and treatment. It seems promising to search for proteins involved in the pathogenesis of BD in an easily accessible material-blood serum. Sera from BD patients and healthy individuals were purified via affinity chromatography to isolate 14 major proteins and separated using 1D SDS-PAGE. After trypsinolysis, the proteins in the samples were identified via HPLC/mass spectrometry. Mass spectrometric data were processed using the OMSSA and X!Tandem search algorithms using the UniProtKB database, and the results were analyzed using PeptideShaker. Differences in proteomes were assessed via an unlabeled NSAF-based analysis using a two-tailed Bonferroni-adjusted t-test. When comparing the blood serum proteomes of BD patients and healthy individuals, 10 proteins showed significant differences in NSAF values. Of these, four proteins were predominantly present in BD patients with the maximum NSAF value: 14-3-3 protein zeta/delta; ectonucleoside triphosphate diphosphohydrolase 7; transforming growth factor-beta-induced protein ig-h3; and B-cell CLL/lymphoma 9 protein. Further exploration of the role of these proteins in BD is warranted; conducting such studies will help develop new paraclinical criteria and discover new targets for BD drug therapy.
Collapse
Affiliation(s)
- Alexander A. Seregin
- Mental Health Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk 634014, Russia; (A.A.S.)
| | - Liudmila P. Smirnova
- Mental Health Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk 634014, Russia; (A.A.S.)
| | - Elena M. Dmitrieva
- Mental Health Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk 634014, Russia; (A.A.S.)
| | | | - German G. Simutkin
- Mental Health Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk 634014, Russia; (A.A.S.)
| | - Svetlana A. Ivanova
- Mental Health Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk 634014, Russia; (A.A.S.)
| |
Collapse
|
19
|
Skiadopoulou D, Vašíček J, Kuznetsova K, Bouyssié D, Käll L, Vaudel M. Retention Time and Fragmentation Predictors Increase Confidence in Identification of Common Variant Peptides. J Proteome Res 2023; 22:3190-3199. [PMID: 37656829 PMCID: PMC10563157 DOI: 10.1021/acs.jproteome.3c00243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Indexed: 09/03/2023]
Abstract
Precision medicine focuses on adapting care to the individual profile of patients, for example, accounting for their unique genetic makeup. Being able to account for the effect of genetic variation on the proteome holds great promise toward this goal. However, identifying the protein products of genetic variation using mass spectrometry has proven very challenging. Here we show that the identification of variant peptides can be improved by the integration of retention time and fragmentation predictors into a unified proteogenomic pipeline. By combining these intrinsic peptide characteristics using the search-engine post-processor Percolator, we demonstrate improved discrimination power between correct and incorrect peptide-spectrum matches. Our results demonstrate that the drop in performance that is induced when expanding a protein sequence database can be compensated, hence enabling efficient identification of genetic variation products in proteomics data. We anticipate that this enhancement of proteogenomic pipelines can provide a more refined picture of the unique proteome of patients and thereby contribute to improving patient care.
Collapse
Affiliation(s)
- Dafni Skiadopoulou
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - Jakub Vašíček
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - Ksenia Kuznetsova
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - David Bouyssié
- Institut
de Pharmacologie et de Biologie Structurale (IPBS), Université
de Toulouse, CNRS, Université Toulouse III—Paul Sabatier
(UT3), 31000 Toulouse, France
| | - Lukas Käll
- Science
for Life Laboratory, School of Engineering Sciences in Chemistry,
Biotechnology and Health, KTH Royal Institute
of Technology, SE-100 44 Stockholm, Sweden
| | - Marc Vaudel
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
- Department
of Genetics and Bioinformatics, Health Data and Digitalization, Norwegian Institute of Public Health, N-0213 Oslo, Norway
| |
Collapse
|
20
|
Yang KL, Yu F, Teo GC, Li K, Demichev V, Ralser M, Nesvizhskii AI. MSBooster: improving peptide identification rates using deep learning-based features. Nat Commun 2023; 14:4539. [PMID: 37500632 PMCID: PMC10374903 DOI: 10.1038/s41467-023-40129-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 07/06/2023] [Indexed: 07/29/2023] Open
Abstract
Peptide identification in liquid chromatography-tandem mass spectrometry (LC-MS/MS) experiments relies on computational algorithms for matching acquired MS/MS spectra against sequences of candidate peptides using database search tools, such as MSFragger. Here, we present a new tool, MSBooster, for rescoring peptide-to-spectrum matches using additional features incorporating deep learning-based predictions of peptide properties, such as LC retention time, ion mobility, and MS/MS spectra. We demonstrate the utility of MSBooster, in tandem with MSFragger and Percolator, in several different workflows, including nonspecific searches (immunopeptidomics), direct identification of peptides from data independent acquisition data, single-cell proteomics, and data generated on an ion mobility separation-enabled timsTOF MS platform. MSBooster is fast, robust, and fully integrated into the widely used FragPipe computational platform.
Collapse
Affiliation(s)
- Kevin L Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
| | - Guo Ci Teo
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Kai Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Vadim Demichev
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Markus Ralser
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
- Nuffield Department of Medicine, The Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Alexey I Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
21
|
Carrascal M, Sánchez-Jiménez E, Fang J, Pérez-López C, Ginebreda A, Barceló D, Abian J. Sewage Protein Information Mining: Discovery of Large Biomolecules as Biomarkers of Population and Industrial Activities. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023. [PMID: 37463250 PMCID: PMC10399289 DOI: 10.1021/acs.est.3c00535] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2023]
Abstract
Wastewater-based epidemiology has been revealed as a powerful approach for surveying the health and lifestyle of a population. In this context, proteins have been proposed as potential biomarkers that complement the information provided by currently available methods. However, little is known about the range of molecular species and dynamics of proteins in wastewater and the information hidden in these protein profiles is still to be uncovered. In this study, we investigated the protein composition of wastewater from 10 municipalities in Catalonia with diverse populations and industrial activities at three different times of the year. The soluble fraction of this material was analyzed using liquid chromatography high-resolution tandem mass spectrometry using a shotgun proteomics approach. The complete proteomic profile, distribution among different organisms, and semiquantitative analysis of the main constituents are described. Excreta (urine and feces) from humans, and blood and other residues from livestock were identified as the two main protein sources. Our findings provide new insights into the characterization of wastewater proteomics that allow for the proposal of specific bioindicators for wastewater-based environmental monitoring. This includes human and animal population monitoring, most notably for rodent pest control (immunoglobulins (Igs) and amylases) and livestock processing industry monitoring (albumins).
Collapse
Affiliation(s)
- Montserrat Carrascal
- Biological and Environmental Proteomics Group, Institute of Biomedical Research of Barcelona, Spanish National Research Council (IIBB-CSIC/IDIBAPS), Rosellón 161, E-08036 Barcelona, Spain
| | - Ester Sánchez-Jiménez
- Biological and Environmental Proteomics Group, Institute of Biomedical Research of Barcelona, Spanish National Research Council (IIBB-CSIC/IDIBAPS), Rosellón 161, E-08036 Barcelona, Spain
- Department of Environmental Chemistry, Institute of Environmental Assessment and Water Studies (IDAEA-CSIC), Jordi Girona 18-26, 08034 Barcelona, Spain
| | - Jie Fang
- Biological and Environmental Proteomics Group, Institute of Biomedical Research of Barcelona, Spanish National Research Council (IIBB-CSIC/IDIBAPS), Rosellón 161, E-08036 Barcelona, Spain
- Department of Environmental Chemistry, Institute of Environmental Assessment and Water Studies (IDAEA-CSIC), Jordi Girona 18-26, 08034 Barcelona, Spain
| | - Carlos Pérez-López
- Biological and Environmental Proteomics Group, Institute of Biomedical Research of Barcelona, Spanish National Research Council (IIBB-CSIC/IDIBAPS), Rosellón 161, E-08036 Barcelona, Spain
- Department of Environmental Chemistry, Institute of Environmental Assessment and Water Studies (IDAEA-CSIC), Jordi Girona 18-26, 08034 Barcelona, Spain
| | - Antoni Ginebreda
- Department of Environmental Chemistry, Institute of Environmental Assessment and Water Studies (IDAEA-CSIC), Jordi Girona 18-26, 08034 Barcelona, Spain
| | - Damià Barceló
- Department of Environmental Chemistry, Institute of Environmental Assessment and Water Studies (IDAEA-CSIC), Jordi Girona 18-26, 08034 Barcelona, Spain
| | - Joaquin Abian
- Biological and Environmental Proteomics Group, Institute of Biomedical Research of Barcelona, Spanish National Research Council (IIBB-CSIC/IDIBAPS), Rosellón 161, E-08036 Barcelona, Spain
| |
Collapse
|
22
|
Yu F, Teo GC, Kong AT, Fröhlich K, Li GX, Demichev V, Nesvizhskii AI. Analysis of DIA proteomics data using MSFragger-DIA and FragPipe computational platform. Nat Commun 2023; 14:4154. [PMID: 37438352 PMCID: PMC10338508 DOI: 10.1038/s41467-023-39869-5] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 06/28/2023] [Indexed: 07/14/2023] Open
Abstract
Liquid chromatography (LC) coupled with data-independent acquisition (DIA) mass spectrometry (MS) has been increasingly used in quantitative proteomics studies. Here, we present a fast and sensitive approach for direct peptide identification from DIA data, MSFragger-DIA, which leverages the unmatched speed of the fragment ion indexing-based search engine MSFragger. Different from most existing methods, MSFragger-DIA conducts a database search of the DIA tandem mass (MS/MS) spectra prior to spectral feature detection and peak tracing across the LC dimension. To streamline the analysis of DIA data and enable easy reproducibility, we integrate MSFragger-DIA into the FragPipe computational platform for seamless support of peptide identification and spectral library building from DIA, data-dependent acquisition (DDA), or both data types combined. We compare MSFragger-DIA with other DIA tools, such as DIA-Umpire based workflow in FragPipe, Spectronaut, DIA-NN library-free, and MaxDIA. We demonstrate the fast, sensitive, and accurate performance of MSFragger-DIA across a variety of sample types and data acquisition schemes, including single-cell proteomics, phosphoproteomics, and large-scale tumor proteome profiling studies.
Collapse
Affiliation(s)
- Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
| | - Guo Ci Teo
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Andy T Kong
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Klemens Fröhlich
- Proteomics Core Facility, Biozentrum, University of Basel, Basel, Switzerland
| | - Ginny Xiaohe Li
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Vadim Demichev
- Department of Biochemistry, Charité - Universitätsmedizin Berlin, Berlin, Germany
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Alexey I Nesvizhskii
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
23
|
Révész Á, Hevér H, Steckel A, Schlosser G, Szabó D, Vékey K, Drahos L. Collision energies: Optimization strategies for bottom-up proteomics. MASS SPECTROMETRY REVIEWS 2023; 42:1261-1299. [PMID: 34859467 DOI: 10.1002/mas.21763] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 11/17/2021] [Accepted: 11/17/2021] [Indexed: 06/07/2023]
Abstract
Mass-spectrometry coupled to liquid chromatography is an indispensable tool in the field of proteomics. In the last decades, more and more complex and diverse biochemical and biomedical questions have arisen. Problems to be solved involve protein identification, quantitative analysis, screening of low abundance modifications, handling matrix effect, and concentrations differing by orders of magnitude. This led the development of more tailored protocols and problem centered proteomics workflows, including advanced choice of experimental parameters. In the most widespread bottom-up approach, the choice of collision energy in tandem mass spectrometric experiments has outstanding role. This review presents the collision energy optimization strategies in the field of proteomics which can help fully exploit the potential of MS based proteomics techniques. A systematic collection of use case studies is then presented to serve as a starting point for related further scientific work. Finally, this article discusses the issue of comparing results from different studies or obtained on different instruments, and it gives some hints on methodology transfer between laboratories based on measurement of reference species.
Collapse
Affiliation(s)
- Ágnes Révész
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| | - Helga Hevér
- Chemical Works of Gedeon Richter Plc, Budapest, Hungary
| | - Arnold Steckel
- Department of Analytical Chemistry, MTA-ELTE Lendület Ion Mobility Mass Spectrometry Research Group, Institute of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Gitta Schlosser
- Department of Analytical Chemistry, MTA-ELTE Lendület Ion Mobility Mass Spectrometry Research Group, Institute of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Dániel Szabó
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| | - Károly Vékey
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| | - László Drahos
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| |
Collapse
|
24
|
Nowatzky Y, Benner P, Reinert K, Muth T. Mistle: bringing spectral library predictions to metaproteomics with an efficient search index. Bioinformatics 2023; 39:btad376. [PMID: 37294786 PMCID: PMC10313348 DOI: 10.1093/bioinformatics/btad376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 05/11/2023] [Accepted: 06/08/2023] [Indexed: 06/11/2023] Open
Abstract
MOTIVATION Deep learning has moved to the forefront of tandem mass spectrometry-driven proteomics and authentic prediction for peptide fragmentation is more feasible than ever. Still, at this point spectral prediction is mainly used to validate database search results or for confined search spaces. Fully predicted spectral libraries have not yet been efficiently adapted to large search space problems that often occur in metaproteomics or proteogenomics. RESULTS In this study, we showcase a workflow that uses Prosit for spectral library predictions on two common metaproteomes and implement an indexing and search algorithm, Mistle, to efficiently identify experimental mass spectra within the library. Hence, the workflow emulates a classic protein sequence database search with protein digestion but builds a searchable index from spectral predictions as an in-between step. We compare Mistle to popular search engines, both on a spectral and database search level, and provide evidence that this approach is more accurate than a database search using MSFragger. Mistle outperforms other spectral library search engines in terms of run time and proves to be extremely memory efficient with a 4- to 22-fold decrease in RAM usage. This makes Mistle universally applicable to large search spaces, e.g. covering comprehensive sequence databases of diverse microbiomes. AVAILABILITY AND IMPLEMENTATION Mistle is freely available on GitHub at https://github.com/BAMeScience/Mistle.
Collapse
Affiliation(s)
- Yannek Nowatzky
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin 12205, Germany
| | - Philipp Benner
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin 12205, Germany
| | - Knut Reinert
- Department of Mathematics and Computer Science, FU Berlin, Berlin 14195, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| | - Thilo Muth
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin 12205, Germany
| |
Collapse
|
25
|
Oreper D, Klaeger S, Jhunjhunwala S, Delamarre L. The peptide woods are lovely, dark and deep: Hunting for novel cancer antigens. Semin Immunol 2023; 67:101758. [PMID: 37027981 DOI: 10.1016/j.smim.2023.101758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/22/2023] [Accepted: 03/22/2023] [Indexed: 04/08/2023]
Abstract
Harnessing the patient's immune system to control a tumor is a proven avenue for cancer therapy. T cell therapies as well as therapeutic vaccines, which target specific antigens of interest, are being explored as treatments in conjunction with immune checkpoint blockade. For these therapies, selecting the best suited antigens is crucial. Most of the focus has thus far been on neoantigens that arise from tumor-specific somatic mutations. Although there is clear evidence that T-cell responses against mutated neoantigens are protective, the large majority of these mutations are not immunogenic. In addition, most somatic mutations are unique to each individual patient and their targeting requires the development of individualized approaches. Therefore, novel antigen types are needed to broaden the scope of such treatments. We review high throughput approaches for discovering novel tumor antigens and some of the key challenges associated with their detection, and discuss considerations when selecting tumor antigens to target in the clinic.
Collapse
Affiliation(s)
- Daniel Oreper
- Genentech, 1 DNA way, South San Francisco, 94080 CA, USA.
| | - Susan Klaeger
- Genentech, 1 DNA way, South San Francisco, 94080 CA, USA.
| | | | | |
Collapse
|
26
|
Zhang Q. Mzion enables deep and precise identification of peptides in data-dependent acquisition proteomics. Sci Rep 2023; 13:7056. [PMID: 37120666 PMCID: PMC10148867 DOI: 10.1038/s41598-023-34323-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 04/27/2023] [Indexed: 05/01/2023] Open
Abstract
Sensitive and reliable identification of proteins and peptides pertains the basis of proteomics. We introduce Mzion, a new database search tool for data-dependent acquisition (DDA) proteomics. Our tool utilizes an intensity tally strategy and achieves generally a higher performance in terms of depth and precision across 20 datasets, ranging from large-scale to single-cell proteomics. Compared to several other search engines, Mzion matches on average 20% more peptide spectra at tryptic enzymatic specificity and 80% more at no enzymatic specificity from six large-scale, global datasets. Mzion also identifies more phosphopeptide spectra that can be explained by fewer proteins, demonstrated by six large-scale, local datasets corresponding to the global data. Our findings highlight the potential of Mzion for improving proteomic analysis and advancing our understanding of protein biology.
Collapse
Affiliation(s)
- Qiang Zhang
- Division of Endocrinology, Metabolism and Lipid Research, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
27
|
Boekweg H, Payne SH. Challenges and opportunities for single cell computational proteomics. Mol Cell Proteomics 2023; 22:100518. [PMID: 36828128 PMCID: PMC10060113 DOI: 10.1016/j.mcpro.2023.100518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/15/2023] [Accepted: 02/17/2023] [Indexed: 02/25/2023] Open
Abstract
Single-cell proteomics is growing rapidly and has made several technological advancements. As most research has been focused on improving instrumentation and sample preparation methods, very little attention has been given to algorithms responsible for identifying and quantifying proteins. Given the inherent difference between bulk data and single-cell data, it's necessary to realize that current algorithms being employed on single-cell data were designed for bulk data, and have underlying assumptions that may not hold true for single-cell data. In order to develop and optimize algorithms for single-cell data, we need to characterize the differences between single-cell data and bulk data, and assess how current algorithms perform on single-cell data. Here, we present a review of algorithms responsible for identifying and quantifying peptides and proteins. We will give a review of how each type of algorithm works, assumptions it relies on, how it performs on single-cell data, and possible optimizations and solutions that could be used to address the differences in single-cell data.
Collapse
Affiliation(s)
- Hannah Boekweg
- Biology Department, Brigham Young University, Provo, Utah, USA
| | - Samuel H Payne
- Biology Department, Brigham Young University, Provo, Utah, USA.
| |
Collapse
|
28
|
Deutsch EW, Mendoza L, Shteynberg DD, Hoopmann MR, Sun Z, Eng JK, Moritz RL. Trans-Proteomic Pipeline: Robust Mass Spectrometry-Based Proteomics Data Analysis Suite. J Proteome Res 2023; 22:615-624. [PMID: 36648445 PMCID: PMC10166710 DOI: 10.1021/acs.jproteome.2c00624] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The Trans-Proteomic Pipeline (TPP) mass spectrometry data analysis suite has been in continual development and refinement since its first tools, PeptideProphet and ProteinProphet, were published 20 years ago. The current release provides a large complement of tools for spectrum processing, spectrum searching, search validation, abundance computation, protein inference, and more. Many of the tools include machine-learning modeling to extract the most information from data sets and build robust statistical models to compute the probabilities that derived information is correct. Here we present the latest information on the many TPP tools, and how TPP can be deployed on various platforms from personal Windows laptops to Linux clusters and expansive cloud computing environments. We describe tutorials on how to use TPP in a variety of ways and describe synergistic projects that leverage TPP. We conclude with plans for continued development of TPP.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | | | | | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Jimmy K Eng
- Proteomics Resource, University of Washington, Seattle, Washington 98195, United States
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
29
|
Polasky DA, Nesvizhskii AI. Recent advances in computational algorithms and software for large-scale glycoproteomics. Curr Opin Chem Biol 2023; 72:102238. [PMID: 36525809 DOI: 10.1016/j.cbpa.2022.102238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 11/12/2022] [Accepted: 11/14/2022] [Indexed: 12/15/2022]
Abstract
Glycoproteomics, or characterizing glycosylation events at a proteome scale, has seen rapid advances in methods for analyzing glycopeptides by tandem mass spectrometry in recent years. These advances have enabled acquisition of far more comprehensive and large-scale datasets, precipitating an urgent need for improved informatics methods to analyze the resulting data. A new generation of glycoproteomics search methods has recently emerged, using glycan fragmentation to split the identification of a glycopeptide into peptide and glycan components and solve each component separately. In this review, we discuss these new methods and their implications for large-scale glycoproteomics, as well as several outstanding challenges in glycoproteomics data analysis, including validation of glycan assignments and quantitation. Finally, we provide an outlook on the future of glycoproteomics from an informatics perspective, noting the key challenges to achieving widespread and reproducible glycopeptide annotation and quantitation.
Collapse
Affiliation(s)
- Daniel A Polasky
- University of Michigan Department of Pathology, Ann Arbor, MI, USA.
| | - Alexey I Nesvizhskii
- University of Michigan Department of Pathology, Ann Arbor, MI, USA; University of Michigan Department of Computational Medicine and Bioinformatics, Ann Arbor, MI, USA.
| |
Collapse
|
30
|
Poudel S, Vanderwall D, Yuan ZF, Wu Z, Peng J, Li Y. JUMPptm: Integrated software for sensitive identification of post-translational modifications and its application in Alzheimer's disease study. Proteomics 2023; 23:e2100369. [PMID: 36094355 PMCID: PMC9957936 DOI: 10.1002/pmic.202100369] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Revised: 07/29/2022] [Accepted: 08/23/2022] [Indexed: 01/10/2023]
Abstract
BACKGROUND Mass spectrometry (MS)-based proteomic analysis of posttranslational modifications (PTMs) usually requires the pre-enrichment of modified proteins or peptides. However, recent ultra-deep whole proteome profiling generates millions of spectra in a single experiment, leaving many unassigned spectra, some of which may be derived from PTM peptides. METHODS Here we present JUMPptm, an integrative computational pipeline, to extract PTMs from unenriched whole proteome. JUMPptm combines the advantages of JUMP, MSFragger and Comet search engines, and includes de novo tags, customized database search and peptide filtering, which iteratively analyzes each PTM by a multi-stage strategy to improve sensitivity and specificity. RESULTS We applied JUMPptm to the deep brain proteome of Alzheimer's disease (AD), and identified 34,954 unique peptides with phosphorylation, methylation, acetylation, ubiquitination, and others. The phosphorylated peptides were validated by enriched phosphoproteome from the same sample. TMT-based quantification revealed 482 PTM peptides dysregulated at different stages during AD progression. For example, the acetylation of numerous mitochondrial proteins is significantly decreased in AD. A total of 60 PTM sites are found in the pan-PTM map of the Tau protein. CONCLUSION The JUMPptm program is an effective tool for pan-PTM analysis and the resulting AD pan-PTM profile serves as a valuable resource for AD research.
Collapse
Affiliation(s)
- Suresh Poudel
- Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - David Vanderwall
- Departments of Structural Biology and Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Zuo-Fei Yuan
- Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Zhiping Wu
- Departments of Structural Biology and Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Junmin Peng
- Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA,Departments of Structural Biology and Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA,Correspondence: and
| | - Yuxin Li
- Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA,Correspondence: and
| |
Collapse
|
31
|
McDonnell K, Howley E, Abram F. Critical evaluation of the use of artificial data for machine learning based de novo peptide identification. Comput Struct Biotechnol J 2023; 21:2732-2743. [PMID: 37168871 PMCID: PMC10165132 DOI: 10.1016/j.csbj.2023.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 04/16/2023] [Accepted: 04/16/2023] [Indexed: 05/13/2023] Open
Abstract
Proteins are essential components of all living cells and so the study of their in situ expression, proteomics, has wide reaching applications. Peptide identification in proteomics typically relies on matching high resolution tandem mass spectra to a protein database but can also be performed de novo. While artificial spectra have been successfully incorporated into database search pipelines to increase peptide identification rates, little work has been done to investigate the utility of artificial spectra in the context of de novo peptide identification. Here, we perform a critical analysis of the use of artificial data for the training and evaluation of de novo peptide identification algorithms. First, we classify the different fragment ion types present in real spectra and then estimate the number of spurious matches using random peptides. We then categorise the different types of noise present in real spectra. Finally, we transfer this knowledge to artificial data and test the performance of a state-of-the-art de novo peptide identification algorithm trained using artificial spectra with and without relevant noise addition. Noise supplementation increased artificial training data performance from 30% to 77% of real training data peptide recall. While real data performance was not fully replicated, this work provides the first steps towards an artificial spectrum framework for the training and evaluation of de novo peptide identification algorithms. Further enhanced artificial spectra may allow for more in depth analysis of de novo algorithms as well as alleviating the reliance on database searches for training data.
Collapse
Affiliation(s)
- Kevin McDonnell
- Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland
- School of Computer Science, University of Galway, Ireland
- Corresponding author at: Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland.
| | - Enda Howley
- School of Computer Science, University of Galway, Ireland
| | - Florence Abram
- Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland
- Corresponding author.
| |
Collapse
|
32
|
Miller RM, Millikin RJ, Rolfs Z, Shortreed MR, Smith LM. Enhanced Proteomic Data Analysis with MetaMorpheus. Methods Mol Biol 2023; 2426:35-66. [PMID: 36308684 PMCID: PMC9623450 DOI: 10.1007/978-1-0716-1967-4_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
MetaMorpheus is a free and open-source software program dedicated to the comprehensive analysis of proteomic data. In bottom-up proteomics, protein samples are digested into peptides prior to chromatographic separation and tandem mass spectrometric analysis. The resulting fragmentation spectra are subsequently analyzed with search software programs to obtain peptide identifications and infer the presence of proteins in the samples. MetaMorpheus seeks to maximize the information gleaned from proteomic data through the use of (a) mass calibration, (b) post-translational modification discovery, (c) multiple search algorithms, which aid in the analysis of data from traditional, crosslinking, and glycoproteomic experiments, (d) isotope-based or label-free quantification, (e) multi-protease protein inference, and (f) spectral annotation and data visualization capabilities. This protocol provides detailed descriptions of how use MetaMorpheus and how to customize data analysis workflows using MetaMorpheus tasks to meet the specific needs of the user.
Collapse
Affiliation(s)
- Rachel M Miller
- University of Wisconsin-Madison, Department of Chemistry, Madison, WI, USA
| | - Robert J Millikin
- University of Wisconsin-Madison, Department of Chemistry, Madison, WI, USA
| | - Zach Rolfs
- University of Wisconsin-Madison, Department of Chemistry, Madison, WI, USA
| | | | - Lloyd M Smith
- University of Wisconsin-Madison, Department of Chemistry, Madison, WI, USA.
| |
Collapse
|
33
|
Proteomics: Application of next-generation proteomics in cancer research. Proteomics 2023. [DOI: 10.1016/b978-0-323-95072-5.00016-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
|
34
|
McDonnell K, Abram F, Howley E. Application of a Novel Hybrid CNN-GNN for Peptide Ion Encoding. J Proteome Res 2022; 22:323-333. [PMID: 36534699 PMCID: PMC9903319 DOI: 10.1021/acs.jproteome.2c00234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Almost all state-of-the-art de novo peptide sequencing algorithms now use machine learning models to encode fragment peaks and hence identify amino acids in mass spectrometry (MS) spectra. Previous work has highlighted how the inherent MS challenges of noise and missing peptide peaks detrimentally affect the performance of these models. In the present research we extracted and evaluated the encoding modules from 3 state-of-the-art de novo peptide sequencing algorithms. We also propose a convolutional neural network-graph neural network machine learning model for encoding peptide ions in tandem MS spectra. We compared the proposed encoding module to those used in the state-of-the-art de novo peptide sequencing algorithms by assessing their ability to identify b-ions and y-ions in MS spectra. This included a comprehensive evaluation in both real and artificial data across various levels of noise and missing peptide peaks. The proposed model performed best across all data sets using two different metrics (area under the receiver operating characteristic curve (AUC) and average precision). The work also highlighted the effect of including additional features such as intensity rank in these encoding modules as well as issues with using the AUC as a metric. This work is of significance to those designing future de novo peptide identification algorithms as it is the first step toward a new approach.
Collapse
Affiliation(s)
- Kevin McDonnell
- Department
of Information Technology, School of Computer Science, University of Galway, GalwayH91 TK33, Ireland,Functional
Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, GalwayH91 TK33, Ireland,E-mail:
| | - Florence Abram
- Functional
Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, GalwayH91 TK33, Ireland
| | - Enda Howley
- Department
of Information Technology, School of Computer Science, University of Galway, GalwayH91 TK33, Ireland
| |
Collapse
|
35
|
Wang S, Feng S, Pan C, Guo X. FineFDR: Fine-grained Taxonomy-specific False Discovery Rates Control in Metaproteomics. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2022; 2022:287-292. [PMID: 36910011 PMCID: PMC9998077 DOI: 10.1109/bibm55620.2022.9995401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Microbial community proteomics, also termed metaproteomics, investigates all proteins expressed by a microbiota. Tandem mass spectrometry (MS/MS) is the typical method for identifying proteins in metaproteomics, which involves searching the mass spectra against a protein sequence database. A major post-analysis step is controlling the false discovery rate (FDR), i.e., the ratio of false positives to the total number of annotations. The current popular target-decoy FDR estimation method treats all the peptides and proteins equally and overlooks that they could have varied probabilities of being identified. In this study, we report FineFDR, a framework for FDR assessment at fine-grained levels with taxonomy information considered. FineFDR groups the identified peptide-spectrum matches, peptides, and proteins from different taxonomic units and estimates the FDR in each group separately. Empirical experiments on the simulated and real-world data sets demonstrate that our FineFDR achieved higher precision and more peptide and protein identifications when compared to the state-of-the-art methods, such as Comet, Percolator, TIDD, and Tailor. FineFDR is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/FDR.
Collapse
Affiliation(s)
- Shengze Wang
- Department of Computer Science and Engineering University of North Texas, Denton, TX 76207, United States
| | - Shichao Feng
- Department of Computer Science and Engineering University of North Texas, Denton, TX 76207, United States
| | - Chongle Pan
- School of Computer Science Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK 73019, United States
| | - Xuan Guo
- Department of Computer Science and Engineering University of North Texas, Denton, TX 76207, United States
| |
Collapse
|
36
|
Lee S, Park H, Kim H. False discovery rate estimation using candidate peptides for each spectrum. BMC Bioinformatics 2022; 23:454. [PMID: 36319948 PMCID: PMC9623924 DOI: 10.1186/s12859-022-05002-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 10/25/2022] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND False discovery rate (FDR) estimation is very important in proteomics. The target-decoy strategy (TDS), which is often used for FDR estimation, estimates the FDR under the assumption that when spectra are identified incorrectly, the probabilities of the spectra matching the target or decoy peptides are identical. However, no spectra matching target or decoy peptide probabilities are identical. We propose cTDS (target-decoy strategy with candidate peptides) for accurate estimation of the FDR using the probability that the spectrum is identified incorrectly as a target or decoy peptide. RESULTS Most spectrum cases result in a probability of having the spectrum identified incorrectly as a target or decoy peptide of close to 0.5, but only about 1.14-4.85% of the total spectra have an exact probability of 0.5. We used an entrapment sequence method to demonstrate the accuracy of cTDS. For fixed FDR thresholds (1-10%), the false match rate (FMR) in cTDS is closer than the FMR in TDS. We compared the number of peptide-spectrum matches (PSMs) obtained with TDS and cTDS at a 1% FDR threshold with the HEK293 dataset. In the first and third replications, the number of PSMs obtained with cTDS for the reverse, pseudo-reverse, shuffle, and de Bruijn databases exceeded those obtained with TDS (about 0.001-0.132%), with the pseudo-shuffle database containing less compared to TDS (about 0.05-0.126%). In the second replication, the number of PSMs obtained with cTDS for all databases exceeds that obtained with TDS (about 0.013-0.274%). CONCLUSIONS When spectra are actually identified incorrectly, most probabilities of the spectra matching a target or decoy peptide are not identical. Therefore, we propose cTDS, which estimates the FDR more accurately using the probability of the spectrum being identified incorrectly as a target or decoy peptide.
Collapse
Affiliation(s)
- Sangjeong Lee
- grid.49606.3d0000 0001 1364 9317Department of Computer Science, Hanyang University, Seoul, 06978 Republic of Korea
| | - Heejin Park
- grid.49606.3d0000 0001 1364 9317Department of Computer Science, Hanyang University, Seoul, 06978 Republic of Korea
| | - Hyunwoo Kim
- grid.249964.40000 0001 0523 5253Biomedical Informatics Team, Korea Institute of Science and Technology Information, Daejeon, 34141 Republic of Korea
| |
Collapse
|
37
|
Freestone J, Short T, Noble WS, Keich U. Group-walk: a rigorous approach to group-wise false discovery rate analysis by target-decoy competition. Bioinformatics 2022; 38:ii82-ii88. [PMID: 36124786 DOI: 10.1093/bioinformatics/btac471] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Target-decoy competition (TDC) is a commonly used method for false discovery rate (FDR) control in the analysis of tandem mass spectrometry data. This type of competition-based FDR control has recently gained significant popularity in other fields after Barber and Candès laid its theoretical foundation in a more general setting that included the feature selection problem. In both cases, the competition is based on a head-to-head comparison between an (observed) target score and a corresponding decoy (knockoff) score. However, the effectiveness of TDC depends on whether the data are homogeneous, which is often not the case: in many settings, the data consist of groups with different score profiles or different proportions of true nulls. In such cases, applying TDC while ignoring the group structure often yields imbalanced lists of discoveries, where some groups might include relatively many false discoveries and other groups include relatively very few. On the other hand, as we show, the alternative approach of applying TDC separately to each group does not rigorously control the FDR. RESULTS We developed Group-walk, a procedure that controls the FDR in the target-decoy/knockoff setting while taking into account a given group structure. Group-walk is derived from the recently developed AdaPT-a general framework for controlling the FDR with side-information. We show using simulated and real datasets that when the data naturally divide into groups with different characteristics Group-walk can deliver consistent power gains that in some cases are substantial. These groupings include the precursor charge state (4% more discovered peptides at 1% FDR threshold), the peptide length (3.6% increase) and the mass difference due to modifications (26% increase). AVAILABILITY AND IMPLEMENTATION Group-walk is available at https://cran.r-project.org/web/packages/groupwalk/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jack Freestone
- School of Mathematics and Statistics F07, University of Sydney, Sydney 2006, Australia
| | - Temana Short
- School of Mathematics and Statistics F07, University of Sydney, Sydney 2006, Australia
| | | | - Uri Keich
- School of Mathematics and Statistics F07, University of Sydney, Sydney 2006, Australia
| |
Collapse
|
38
|
Deep coverage proteome analysis of hair shaft for forensic individual identification. Forensic Sci Int Genet 2022; 60:102742. [DOI: 10.1016/j.fsigen.2022.102742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Revised: 06/13/2022] [Accepted: 06/15/2022] [Indexed: 11/18/2022]
|
39
|
Irajizad E, Fahrmann JF, Long JP, Vykoukal J, Kobayashi M, Capello M, Yu CY, Cai Y, Hsiao FC, Patel N, Park S, Peng Q, Dennison JB, Kato T, Tai MC, Taguchi A, Kadara H, Wistuba II, Katayama H, Do KA, Hanash SM, Ostrin EJ. A Comprehensive Search of Non-Canonical Proteins in Non-Small Cell Lung Cancer and Their Impact on the Immune Response. Int J Mol Sci 2022; 23:ijms23168933. [PMID: 36012199 PMCID: PMC9409146 DOI: 10.3390/ijms23168933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 08/05/2022] [Accepted: 08/08/2022] [Indexed: 12/02/2022] Open
Abstract
There is substantial interest in mining neoantigens for cancer applications. Non-canonical proteins resulting from frameshift mutations have been identified as neoantigens in cancer. We investigated the landscape of non-canonical proteins in non-small cell lung cancer (NSCLC) and their induced immune response in the form of autoantibodies. A database of cryptoproteins was computationally constructed and comprised all alternate open reading frames (altORFs) and ORFs identified in pseudogenes, noncoding RNAs, and untranslated regions of mRNAs that did not align with known canonical proteins. Proteomic profiles of seventeen lung adenocarcinoma (LUAD) cell lines were searched to evaluate the occurrence of cryptoproteins. To assess the immunogenicity, immunoglobulin (Ig)-bound cryptoproteins in plasmas were profiled by mass spectrometry. The specimen set consisted of plasmas from 30 newly diagnosed NSCLC cases, pre-diagnostic plasmas from 51 NSCLC cases, and 102 control plasmas. An analysis of LUAD cell lines identified 420 cryptoproteins. Plasma Ig-bound analyses revealed 90 cryptoproteins uniquely found in cases and 14 cryptoproteins that had a fold-change >2 compared to controls. In pre-diagnostic samples, 17 Ig-bound cryptoproteins yielded an odds ratio ≥2. Eight Ig-bound cryptoproteins were elevated in both pre-diagnostic and newly diagnosed cases compared to controls. Cryptoproteins represent a class of neoantigens that induce an autoantibody response in NSCLC.
Collapse
Affiliation(s)
- Ehsan Irajizad
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - Johannes F. Fahrmann
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - James P. Long
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - Jody Vykoukal
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - Makoto Kobayashi
- Department of Basic Pathology, School of Medicine, Fukushima Medical University, Hikarigaoka, Fukushima 960-1247, Japan
| | - Michela Capello
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - Chuan-Yih Yu
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - Yining Cai
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - Fu Chung Hsiao
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - Nikul Patel
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - Soyoung Park
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - Qian Peng
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - Jennifer B. Dennison
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - Taketo Kato
- Department of Thoracic Surgery, Nagoya University, Nagoya 464-8601, Japan
| | - Mei Chee Tai
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - Ayumu Taguchi
- Division of Molecular Diagnostics, Aichi Cancer Center, Nagoya 464-8601, Japan
- Division of Advanced Cancer Diagnostics, Nagoya University Graduate School of Medicine, Nagoya 464-8601, Japan
| | - Humam Kadara
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - Ignacio I. Wistuba
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - Hiroyuki Katayama
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | - Kim-Anh Do
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
- Correspondence: (K.-A.D.); (S.M.H.); (E.J.O.); Tel.: +1-713-745-5242 (S.M.H.)
| | - Samir M. Hanash
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
- Correspondence: (K.-A.D.); (S.M.H.); (E.J.O.); Tel.: +1-713-745-5242 (S.M.H.)
| | - Edwin J. Ostrin
- Departments of General Internal Medicine, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
- Correspondence: (K.-A.D.); (S.M.H.); (E.J.O.); Tel.: +1-713-745-5242 (S.M.H.)
| |
Collapse
|
40
|
Demichev V, Szyrwiel L, Yu F, Teo GC, Rosenberger G, Niewienda A, Ludwig D, Decker J, Kaspar-Schoenefeld S, Lilley KS, Mülleder M, Nesvizhskii AI, Ralser M. dia-PASEF data analysis using FragPipe and DIA-NN for deep proteomics of low sample amounts. Nat Commun 2022; 13:3944. [PMID: 35803928 PMCID: PMC9270362 DOI: 10.1038/s41467-022-31492-0] [Citation(s) in RCA: 100] [Impact Index Per Article: 50.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 06/20/2022] [Indexed: 11/28/2022] Open
Abstract
The dia-PASEF technology uses ion mobility separation to reduce signal interferences and increase sensitivity in proteomic experiments. Here we present a two-dimensional peak-picking algorithm and generation of optimized spectral libraries, as well as take advantage of neural network-based processing of dia-PASEF data. Our computational platform boosts proteomic depth by up to 83% compared to previous work, and is specifically beneficial for fast proteomic experiments and those with low sample amounts. It quantifies over 5300 proteins in single injections recorded at 200 samples per day throughput using Evosep One chromatography system on a timsTOF Pro mass spectrometer and almost 9000 proteins in single injections recorded with a 93-min nanoflow gradient on timsTOF Pro 2, from 200 ng of HeLa peptides. A user-friendly implementation is provided through the incorporation of the algorithms in the DIA-NN software and by the FragPipe workflow for spectral library generation.
Collapse
Affiliation(s)
- Vadim Demichev
- Department of Biochemistry, Charité - Universitätsmedizin Berlin, Berlin, Germany.
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK.
- Department of Biochemistry and Milner Therapeutics Institute, University of Cambridge, Cambridge, UK.
| | - Lukasz Szyrwiel
- Department of Biochemistry, Charité - Universitätsmedizin Berlin, Berlin, Germany
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK
| | - Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Guo Ci Teo
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | | | - Agathe Niewienda
- Department of Biochemistry, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Daniela Ludwig
- Department of Biochemistry, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Jens Decker
- Bruker Daltonics GmbH & Co. KG, Bremen, Germany
| | | | - Kathryn S Lilley
- Department of Biochemistry and Milner Therapeutics Institute, University of Cambridge, Cambridge, UK
| | - Michael Mülleder
- Core Facility High-Throughput Mass Spectrometry, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Alexey I Nesvizhskii
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| | - Markus Ralser
- Department of Biochemistry, Charité - Universitätsmedizin Berlin, Berlin, Germany
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK
| |
Collapse
|
41
|
Chen Y, Yang Z, Zhou X, Jin M, Dai Z, Ming D, Zhang Z, Zhu L, Jiang L. Sequence, structure, and function of the Dps DNA-binding protein from Deinococcus wulumuqiensis R12. Microb Cell Fact 2022; 21:132. [PMID: 35780107 PMCID: PMC9250271 DOI: 10.1186/s12934-022-01857-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Accepted: 06/21/2022] [Indexed: 11/28/2022] Open
Abstract
Deinococcus wulumuqiensis R12, which was isolated from arid irradiated soil in Xinjiang province of China, belongs to a genus that is well-known for its extreme resistance to ionizing radiation and oxidative stress. The DNA-binding protein Dps has been studied for its great contribution to oxidative resistance. To explore the role of Dps in D. wulumuqiensis R12, the Dps sequence and homology-modeled structure were analyzed. In addition, the dps gene was knocked out and proteomics was used to verify the functions of Dps in D. wulumuqiensis R12. Docking data and DNA binding experiments in vitro showed that the R12 Dps protein has a better DNA binding ability than the Dps1 protein from D. radiodurans R1. When the dps gene was deleted in D. wulumuqiensis R12, its resistance to H2O2 and UV rays was greatly reduced, and the cell envelope was destroyed by H2O2 treatment. Additionally, the qRT-PCR and proteomics data suggested that when the dps gene was deleted, the catalase gene was significantly down-regulated. The proteomics data indicated that the metabolism, transport and oxidation–reduction processes of D. wulumuqiensis R12 were down-regulated after the deletion of the dps gene. Overall, the data conformed that Dps protein plays an important role in D. wulumuqiensis R12.
Collapse
Affiliation(s)
- Yao Chen
- College of Food Science and Light Industry, State Key Laboratory of Materials-Oriented Chemical Engineering, Nanjing Tech University, Nanjing, 211816, China.,College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, 211816, China
| | - Zhihan Yang
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, 211816, China
| | - Xue Zhou
- College of Food Science and Light Industry, State Key Laboratory of Materials-Oriented Chemical Engineering, Nanjing Tech University, Nanjing, 211816, China
| | - Mengmeng Jin
- College of Food Science and Light Industry, State Key Laboratory of Materials-Oriented Chemical Engineering, Nanjing Tech University, Nanjing, 211816, China
| | - Zijie Dai
- College of Food Science and Light Industry, State Key Laboratory of Materials-Oriented Chemical Engineering, Nanjing Tech University, Nanjing, 211816, China
| | - Dengming Ming
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, 211816, China
| | - Zhidong Zhang
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, 211816, China. .,Institute of Applied Microbiology, Xinjiang Academy of Agricultural Sciences/Xinjiang Key Laboratory of Special Environmental Microbiology, Ürümqi, 830091, Xinjiang, China.
| | - Liying Zhu
- School of Chemistry and Molecular Engineering, Nanjing Tech University, Nanjing, 211816, China.
| | - Ling Jiang
- College of Food Science and Light Industry, State Key Laboratory of Materials-Oriented Chemical Engineering, Nanjing Tech University, Nanjing, 211816, China.
| |
Collapse
|
42
|
Fancello L, Burger T. An analysis of proteogenomics and how and when transcriptome-informed reduction of protein databases can enhance eukaryotic proteomics. Genome Biol 2022; 23:132. [PMID: 35725496 PMCID: PMC9208142 DOI: 10.1186/s13059-022-02701-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 06/09/2022] [Indexed: 12/03/2022] Open
Abstract
Background Proteogenomics aims to identify variant or unknown proteins in bottom-up proteomics, by searching transcriptome- or genome-derived custom protein databases. However, empirical observations reveal that these large proteogenomic databases produce lower-sensitivity peptide identifications. Various strategies have been proposed to avoid this, including the generation of reduced transcriptome-informed protein databases, which only contain proteins whose transcripts are detected in the sample-matched transcriptome. These were found to increase peptide identification sensitivity. Here, we present a detailed evaluation of this approach. Results We establish that the increased sensitivity in peptide identification is in fact a statistical artifact, directly resulting from the limited capability of target-decoy competition to accurately model incorrect target matches when using excessively small databases. As anti-conservative false discovery rates (FDRs) are likely to hamper the robustness of the resulting biological conclusions, we advocate for alternative FDR control methods that are less sensitive to database size. Nevertheless, reduced transcriptome-informed databases are useful, as they reduce the ambiguity of protein identifications, yielding fewer shared peptides. Furthermore, searching the reference database and subsequently filtering proteins whose transcripts are not expressed reduces protein identification ambiguity to a similar extent, but is more transparent and reproducible. Conclusions In summary, using transcriptome information is an interesting strategy that has not been promoted for the right reasons. While the increase in peptide identifications from searching reduced transcriptome-informed databases is an artifact caused by the use of an FDR control method unsuitable to excessively small databases, transcriptome information can reduce the ambiguity of protein identifications. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02701-2.
Collapse
Affiliation(s)
- Laura Fancello
- CNRS, CEA, Inserm, BioSanté U1292, Profi FR2048, Université Grenoble Alpes, Grenoble, France
| | - Thomas Burger
- CNRS, CEA, Inserm, BioSanté U1292, Profi FR2048, Université Grenoble Alpes, Grenoble, France.
| |
Collapse
|
43
|
Yang KC, Gorski SM. Protocol for analysis of RNA-sequencing and proteome profiling data for subgroup identification and comparison. STAR Protoc 2022; 3:101283. [PMID: 35634361 PMCID: PMC9133752 DOI: 10.1016/j.xpro.2022.101283] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
RNA-sequencing and quantitative proteomic profiling simultaneously measure thousands of molecules and provide opportunities to decipher the transcriptomic and proteomic landscapes of cohort specimens for basic and health research. We present a protocol for the analysis of paired transcriptome and proteome data to identify and compare molecular subgroups among cohort specimens. We demonstrate a streamlined analysis workflow, applicable for both transcriptome and proteome data, which allows the comparison of two data types for RNA-protein variations and for derivation of biological implications. For complete details on the use and execution of this protocol, please refer to Yang et al. (2021). Identify and compare subgroups using transcriptome and/or proteome data Streamlined workflow from data preprocessing to cluster and differential analysis Examine mRNA-protein correlation from paired transcriptome and proteome data Discover subgroups and identify distinguishing features from any groups of interest
Collapse
|
44
|
Aggarwal S, Raj A, Kumar D, Dash D, Yadav AK. False discovery rate: the Achilles' heel of proteogenomics. Brief Bioinform 2022; 23:6582880. [PMID: 35534181 DOI: 10.1093/bib/bbac163] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 03/14/2022] [Accepted: 04/12/2022] [Indexed: 12/25/2022] Open
Abstract
Proteogenomics refers to the integrated analysis of the genome and proteome that leverages mass-spectrometry (MS)-based proteomics data to improve genome annotations, understand gene expression control through proteoforms and find sequence variants to develop novel insights for disease classification and therapeutic strategies. However, proteogenomic studies often suffer from reduced sensitivity and specificity due to inflated database size. To control the error rates, proteogenomics depends on the target-decoy search strategy, the de-facto method for false discovery rate (FDR) estimation in proteomics. The proteogenomic databases constructed from three- or six-frame nucleotide database translation not only increase the search space and compute-time but also violate the equivalence of target and decoy databases. These searches result in poorer separation between target and decoy scores, leading to stringent FDR thresholds. Understanding these factors and applying modified strategies such as two-pass database search or peptide-class-specific FDR can result in a better interpretation of MS data without introducing additional statistical biases. Based on these considerations, a user can interpret the proteogenomics results appropriately and control false positives and negatives in a more informed manner. In this review, first, we briefly discuss the proteogenomic workflows and limitations in database construction, followed by various considerations that can influence potential novel discoveries in a proteogenomic study. We conclude with suggestions to counter these challenges for better proteogenomic data interpretation.
Collapse
Affiliation(s)
- Suruchi Aggarwal
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd milestone, PO Box No. 04, Faridabad-Gurgaon Expressway, Faridabad-121001, Haryana, India
| | - Anurag Raj
- GN Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics & Integrative Biology, South Campus, Mathura Road, New Delhi 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad-201002, India
| | - Dhirendra Kumar
- GN Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics & Integrative Biology, South Campus, Mathura Road, New Delhi 110025, India
| | - Debasis Dash
- GN Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics & Integrative Biology, South Campus, Mathura Road, New Delhi 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad-201002, India
| | - Amit Kumar Yadav
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd milestone, PO Box No. 04, Faridabad-Gurgaon Expressway, Faridabad-121001, Haryana, India
| |
Collapse
|
45
|
IntroSpect: Motif-Guided Immunopeptidome Database Building Tool to Improve the Sensitivity of HLA I Binding Peptide Identification by Mass Spectrometry. Biomolecules 2022; 12:biom12040579. [PMID: 35454168 PMCID: PMC9025654 DOI: 10.3390/biom12040579] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 04/11/2022] [Accepted: 04/12/2022] [Indexed: 01/02/2023] Open
Abstract
Although database search tools originally developed for shotgun proteome have been widely used in immunopeptidomic mass spectrometry identifications, they have been reported to achieve undesirably low sensitivities or high false positive rates as a result of the hugely inflated search space caused by the lack of specific enzymic digestions in immunopeptidome. To overcome such a problem, we developed a motif-guided immunopeptidome database building tool named IntroSpect, which is designed to first learn the peptide motifs from high confidence hits in the initial search, and then build a targeted database for refined search. Evaluated on 18 representative HLA class I datasets, IntroSpect can improve the sensitivity by an average of 76%, compared to conventional searches with unspecific digestions, while maintaining a very high level of accuracy (~96%), as confirmed by synthetic validation experiments. A distinct advantage of IntroSpect is that it does not depend on any external HLA data, so that it performs equally well on both well-studied and poorly-studied HLA types, unlike the previously developed method SpectMHC. We have also designed IntroSpect to keep a global FDR that can be conveniently controlled, similar to a conventional database search. Finally, we demonstrate the practical value of IntroSpect by discovering neoepitopes from MS data directly, an important application in cancer immunotherapies. IntroSpect is freely available to download and use.
Collapse
|
46
|
|
47
|
Desai HS, Yan T, Yu F, Sun AW, Villanueva M, Nesvizhskii AI, Backus KM. SP3-Enabled Rapid and High Coverage Chemoproteomic Identification of Cell-State-Dependent Redox-Sensitive Cysteines. Mol Cell Proteomics 2022; 21:100218. [PMID: 35219905 PMCID: PMC9010637 DOI: 10.1016/j.mcpro.2022.100218] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 02/17/2022] [Accepted: 02/22/2022] [Indexed: 02/07/2023] Open
Abstract
Proteinaceous cysteine residues act as privileged sensors of oxidative stress. As reactive oxygen and nitrogen species have been implicated in numerous pathophysiological processes, deciphering which cysteines are sensitive to oxidative modification and the specific nature of these modifications is essential to understanding protein and cellular function in health and disease. While established mass spectrometry-based proteomic platforms have improved our understanding of the redox proteome, the widespread adoption of these methods is often hindered by complex sample preparation workflows, prohibitive cost of isotopic labeling reagents, and requirements for custom data analysis workflows. Here, we present the SP3-Rox redox proteomics method that combines tailored low cost isotopically labeled capture reagents with SP3 sample cleanup to achieve high throughput and high coverage proteome-wide identification of redox-sensitive cysteines. By implementing a customized workflow in the free FragPipe computational pipeline, we achieve accurate MS1-based quantitation, including for peptides containing multiple cysteine residues. Application of the SP3-Rox method to cellular proteomes identified cysteines sensitive to the oxidative stressor GSNO and cysteine oxidation state changes that occur during T cell activation. High-coverage Cys oxidation state quantification using custom isotopic probes. FragPipe-IonQuant accurately quantifies Cys labeling comparably to Skyline. PTMProphet enables site-of-labeling localization for multi-Cys–containing peptides. SP3-Rox identifies changes in Cys oxidation during T cell activation.
Collapse
Affiliation(s)
- Heta S Desai
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, California, USA; Molecular Biology Institute, UCLA, Los Angeles, California, USA
| | - Tianyang Yan
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, California, USA; Department of Chemistry and Biochemistry, UCLA, Los Angeles, California, USA
| | - Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA
| | - Alexander W Sun
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, California, USA
| | - Miranda Villanueva
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, California, USA; Molecular Biology Institute, UCLA, Los Angeles, California, USA
| | - Alexey I Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA; Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA
| | - Keriann M Backus
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, California, USA; Department of Chemistry and Biochemistry, UCLA, Los Angeles, California, USA; Molecular Biology Institute, UCLA, Los Angeles, California, USA; DOE Institute for Genomics and Proteomics, UCLA, Los Angeles, California, USA; Jonsson Comprehensive Cancer Center, UCLA, Los Angeles, California, USA; Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, UCLA, Los Angeles, California, USA.
| |
Collapse
|
48
|
The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms. Comput Struct Biotechnol J 2022; 20:1402-1412. [PMID: 35386104 PMCID: PMC8956878 DOI: 10.1016/j.csbj.2022.03.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 03/09/2022] [Accepted: 03/09/2022] [Indexed: 01/24/2023] Open
Abstract
Most correct de novo peptides have ⩽1 missing fragmentation cleavages. DeepNovo outperforms Novor for peptide accuracy for both data types. Novor excels at amino acid recall when many fragmentation cleavages are missing. Deep learning allows DeepNovo to predict amino acids without adjacent peaks.
Proteomics aims to characterise system-wide protein expression and typically relies on mass-spectrometry and peptide fragmentation, followed by a database search for protein identification. It has wide ranging applications from clinical to environmental settings and virtually impacts on every area of biology. In that context, de novo peptide sequencing is becoming increasingly popular. Historically its performance lagged behind database search methods but with the integration of machine learning, this field of research is gaining momentum. To enable de novo peptide sequencing to realise its full potential, it is critical to explore the mass spectrometry data underpinning peptide identification. In this research we investigate the characteristics of tandem mass spectra using 8 published datasets. We then evaluate two state of the art de novo peptide sequencing algorithms, Novor and DeepNovo, with a particular focus on their performance with regard to missing fragmentation cleavage sites and noise. DeepNovo was found to perform better than Novor overall. However, Novor recalled more correct amino acids when 6 or more cleavage sites were missing. Furthermore, less than 11% of each algorithms’ correct peptide predictions emanate from data with more than one missing cleavage site, highlighting the issues missing cleavages pose. We further investigate how the algorithms manage to correctly identify peptides with many of these missing fragmentation cleavages. We show how noise negatively impacts the performance of both algorithms, when high intensity peaks are considered. Finally, we provide recommendations regarding further algorithms’ improvements and offer potential avenues to overcome current inherent data limitations.
Collapse
|
49
|
Ahrens CH, Wade JT, Champion MM, Langer JD. A Practical Guide to Small Protein Discovery and Characterization Using Mass Spectrometry. J Bacteriol 2022; 204:e0035321. [PMID: 34748388 PMCID: PMC8765459 DOI: 10.1128/jb.00353-21] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Small proteins of up to ∼50 amino acids are an abundant class of biomolecules across all domains of life. Yet due to the challenges inherent in their size, they are often missed in genome annotations, and are difficult to identify and characterize using standard experimental approaches. Consequently, we still know few small proteins even in well-studied prokaryotic model organisms. Mass spectrometry (MS) has great potential for the discovery, validation, and functional characterization of small proteins. However, standard MS approaches are poorly suited to the identification of both known and novel small proteins due to limitations at each step of a typical proteomics workflow, i.e., sample preparation, protease digestion, liquid chromatography, MS data acquisition, and data analysis. Here, we outline the major MS-based workflows and bioinformatic pipelines used for small protein discovery and validation. Special emphasis is placed on highlighting the adjustments required to improve detection and data quality for small proteins. We discuss both the unbiased detection of small proteins and the targeted analysis of small proteins of interest. Finally, we provide guidelines to prioritize novel small proteins, and an outlook on methods with particular potential to further improve comprehensive discovery and characterization of small proteins.
Collapse
Affiliation(s)
- Christian H. Ahrens
- Agroscope, Method Development and Analytics & SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Joseph T. Wade
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
- Department of Biomedical Sciences, School of Public Health, University at Albany, Albany, New York, USA
| | - Matthew M. Champion
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana, USA
| | - Julian D. Langer
- Mass Spectrometry and Proteomics, Max Planck Institute of Biophysics, Frankfurt am Main, Germany
- Proteomics, Max Planck Institute for Brain Research, Frankfurt am Main, Germany
| |
Collapse
|
50
|
Mansouri M, Khakabimamaghani S, Chindelevitch L, Ester M. Aristotle: stratified causal discovery for omics data. BMC Bioinformatics 2022; 23:42. [PMID: 35033007 PMCID: PMC8760642 DOI: 10.1186/s12859-021-04521-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 12/08/2021] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND There has been a simultaneous increase in demand and accessibility across genomics, transcriptomics, proteomics and metabolomics data, known as omics data. This has encouraged widespread application of omics data in life sciences, from personalized medicine to the discovery of underlying pathophysiology of diseases. Causal analysis of omics data may provide important insight into the underlying biological mechanisms. Existing causal analysis methods yield promising results when identifying potential general causes of an observed outcome based on omics data. However, they may fail to discover the causes specific to a particular stratum of individuals and missing from others. METHODS To fill this gap, we introduce the problem of stratified causal discovery and propose a method, Aristotle, for solving it. Aristotle addresses the two challenges intrinsic to omics data: high dimensionality and hidden stratification. It employs existing biological knowledge and a state-of-the-art patient stratification method to tackle the above challenges and applies a quasi-experimental design method to each stratum to find stratum-specific potential causes. RESULTS Evaluation based on synthetic data shows better performance for Aristotle in discovering true causes under different conditions compared to existing causal discovery methods. Experiments on a real dataset on Anthracycline Cardiotoxicity indicate that Aristotle's predictions are consistent with the existing literature. Moreover, Aristotle makes additional predictions that suggest further investigations.
Collapse
Affiliation(s)
- Mehrdad Mansouri
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| | - Sahand Khakabimamaghani
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| | - Leonid Chindelevitch
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| | - Martin Ester
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| |
Collapse
|