1
|
Wang Y, Wang Y, Zhang Z, Xu K, Fang Q, Wu X, Ma S. Molecular networking: An efficient tool for discovering and identifying natural products. J Pharm Biomed Anal 2025; 259:116741. [PMID: 40014895 DOI: 10.1016/j.jpba.2025.116741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 02/06/2025] [Accepted: 02/08/2025] [Indexed: 03/01/2025]
Abstract
Natural products (NPs), play a crucial role in drug development. However, the discovery of NPs is accidental, and conventional identification methods lack accuracy. To overcome these challenges, an increasing number of researchers are directing their attention to Molecular networking (MN). MN based on secondary mass spectrometry has become an important tool for the separation, purification and structural identification of NPs. However, most new tools are not well known. This review started with the most basic MN tool and explains it from the principle, workflow, and application. Then introduce the principles and workflows of the remaining eight new types of MN tools. The reliability of various MNs is mainly verified based on the application of phytochemistry and metabolomics. Subsequently, the principles and applications of 12 structural annotation tools are introduced. For the first time, the scope of 9 kinds of MN tools is compared horizontally, and 12 kinds of structured annotation tools are classified from the type of compound structure suitable for identification. The advantages and disadvantages of various tools are summarized, and make suggestions for future application directions and the development of computing tools in this review. MN tools are expected to enhance the efficiency of the discovery and identification in NPs.
Collapse
Affiliation(s)
- Yongjian Wang
- National Institutes for Food and Drug Control, Beijing 102629, China; Hebei University of Chinese Medicine, Shijiazhuang 050091, China
| | - Yadan Wang
- National Institutes for Food and Drug Control, Beijing 102629, China; State Key Laboratory of Drug Regulatory Science, Beijing 100050, China
| | - Zhongmou Zhang
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing 102488, China
| | - Kailing Xu
- National Institutes for Food and Drug Control, Beijing 102629, China
| | - Qiufang Fang
- Shenyang Pharmaceutical University, Shenyang 110179, China
| | - Xianfu Wu
- National Institutes for Food and Drug Control, Beijing 102629, China.
| | - Shuangcheng Ma
- State Key Laboratory of Drug Regulatory Science, Beijing 100050, China; Chinese Pharmacopoeia Commission, Beijing 100061, China.
| |
Collapse
|
2
|
Deutsch EW, Mendoza L, Moritz RL. Quetzal: Comprehensive Peptide Fragmentation Annotation and Visualization. J Proteome Res 2025. [PMID: 40111914 DOI: 10.1021/acs.jproteome.5c00092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2025]
Abstract
Proteomics data-dependent acquisition data sets collected with high-resolution mass-spectrometry (MS) can achieve very high-quality results, but nearly every analysis yields results that are thresholded at some accepted false discovery rate, meaning that a substantial number of results are incorrect. For study conclusions that rely on a small number of peptide-spectrum matches being correct, it is thus important to examine at least some crucial spectra to ensure that they are not one of the incorrect identifications. We present Quetzal, a peptide fragment ion spectrum annotation tool to assist researchers in annotating and examining such spectra to ensure that they correctly support study conclusions. We describe how Quetzal annotates spectra using the new Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) mzPAF standard for fragment ion peak annotation, including the Python-based code, a web-service end point that provides annotation services, and a web-based application for annotating spectra and producing publication-quality figures. We illustrate its functionality with several annotated spectra of varying complexity. Quetzal provides easily accessible functionality that can assist in the effort to ensure and demonstrate that crucial spectra support study conclusions. Quetzal is publicly available at https://proteomecentral.proteomexchange.org/quetzal/.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
3
|
Madej D, Lam H. PyViscount: Validating False Discovery Rate Estimation Methods via Random Search Space Partition. J Proteome Res 2025; 24:1118-1134. [PMID: 39905949 PMCID: PMC11894659 DOI: 10.1021/acs.jproteome.4c00743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 01/20/2025] [Accepted: 01/28/2025] [Indexed: 02/06/2025]
Abstract
Validating false discovery rate (FDR) estimation is an essential but surprisingly understudied aspect of method development in shotgun proteomics. Currently available validation protocols mostly rely on ground truth data sets, which typically involve manipulating the properties of the search space or query spectra used. As a result, comparing estimated FDR and ground truth-based false discovery proportion values may not be representative of the scenarios involving natural data sets encountered in practice. In this study, we introduce PyViscount─a Python tool implementing a novel validation protocol based on random search space partition, which enables generating a quasi ground-truth using unaltered search spaces of unique candidate peptides and generic data sets of experimental query spectra. Furthermore, validation of existing FDR estimation methods by PyViscount is consistent with alternative validation protocols. The presented novel approach to validation free from the need for synthetic data sets or dubious manipulation of the data may be an attractive alternative for proteomics practitioners, allowing them to obtain deeper insights into the performance of existing and new FDR estimation methods.
Collapse
Affiliation(s)
- Dominik Madej
- Department of Chemical and
Biological Engineering, The Hong Kong University
of Science and Technology, Hong Kong 999077, China
| | - Henry Lam
- Department of Chemical and
Biological Engineering, The Hong Kong University
of Science and Technology, Hong Kong 999077, China
| |
Collapse
|
4
|
Ranff T, Dennison M, Bédorf J, Schulze S, Zinn N, Bantscheff M, van Heugten JJRM, Fufezan C. PeptideForest: Semisupervised Machine Learning Integrating Multiple Search Engines for Peptide Identification. J Proteome Res 2025; 24:929-939. [PMID: 39840643 DOI: 10.1021/acs.jproteome.4c00686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2025]
Abstract
The first step in bottom-up proteomics is the assignment of measured fragmentation mass spectra to peptide sequences, also known as peptide spectrum matches. In recent years novel algorithms have pushed the assignment to new heights; unfortunately, different algorithms come with different strengths and weaknesses and choosing the appropriate algorithm poses a challenge for the user. Here we introduce PeptideForest, a semisupervised machine learning approach that integrates the assignments of multiple algorithms to train a random forest classifier to alleviate that issue. Additionally, PeptideForest increases the number of peptide-to-spectrum matches that exhibit a q-value lower than 1% by 25.2 ± 1.6% compared to MS-GF+ data on samples containing mixed HEK and Escherichia coli proteomes. However, an increase in quantity does not necessarily reflect an increase in quality and this is why we devised a novel approach to determine the quality of the assigned spectra through TMT quantification of samples with known ground truths. Thereby, we could show that the increase in PSMs below 1% q-value does not come with a decrease in quantification quality and as such PeptideForest offers a possibility to gain deeper insights into bottom-up proteomics. PeptideForest has been integrated into our pipeline framework Ursgal and can therefore be combined with a wide array of algorithms.
Collapse
Affiliation(s)
- Tristan Ranff
- Institute of Pharmacy and Molecular Biotechnology, Heidelberg University, 69120 Heidelberg, Germany
- Cellzome, A GSK Company, Heidelberg 69117, Germany
- GSK/RDDT/QEL/DE─Data Streams and Operation, Heidelberg 69117, Germany
| | | | - Jeroen Bédorf
- Minds.ai, Santa Cruz, California 95060, United States
| | - Stefan Schulze
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester, New York 14608, United States
| | - Nico Zinn
- Cellzome, A GSK Company, Heidelberg 69117, Germany
| | | | | | - Christian Fufezan
- Institute of Pharmacy and Molecular Biotechnology, Heidelberg University, 69120 Heidelberg, Germany
- Cellzome, A GSK Company, Heidelberg 69117, Germany
- GSK/RDDT/QEL/DE─Data Streams and Operation, Heidelberg 69117, Germany
| |
Collapse
|
5
|
Poudel S, Yuan ZF, Fu Y, Wu L, Shrestha H, High AA, Peng J, Wang X. JUMPlib: Integrative Search Tool Combining Fragment Ion Indexing with Comprehensive TMT Spectral Libraries. J Proteome Res 2025; 24:410-418. [PMID: 39715016 DOI: 10.1021/acs.jproteome.4c00410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2024]
Abstract
The identification of peptides is a cornerstone of mass spectrometry-based proteomics. Spectral library-based algorithms are well-established methods to enhance the identification efficiency of peptides during database searches in proteomics. However, these algorithms are not specifically tailored for tandem mass tag (TMT)-based proteomics due to the lack of high-quality TMT spectral libraries. Here, we introduce JUMPlib, a computational tool for a TMT-based spectral library search. JUMPlib comprises components for generating spectral libraries, conducting library searches, filtering peptide identifications, and quantifying peptides and proteins. Fragment ion indexing in the libraries increases the search speed and utilizing the experimental retention time of precursor ions improves peptide identification. We found that methionine oxidation is a major factor contributing to large shifts in peptide retention time. To test the JUMPlib program, we curated two comprehensive human libraries for the labeling of TMT6/10/11 and TMT16/18 reagents, with ∼286,000 precursor ions and ∼304,000 precursor ions, respectively. Our analyses demonstrate that JUMPlib, employing the fragment ion index strategy, enhances search speed and exhibits high sensitivity and specificity, achieving approximately a 25% increase in peptide-spectrum matches compared to other search tools. Overall, JUMPlib serves as a streamlined computational platform designed to enhance peptide identification in TMT-based proteomics. Both the JUMPlib source code and libraries are publicly available.
Collapse
Affiliation(s)
- Suresh Poudel
- Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, United States
| | - Zuo-Fei Yuan
- Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, United States
| | - Yingxue Fu
- Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, United States
| | - Long Wu
- Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, United States
| | - Him Shrestha
- Department of Structural Biology, and Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, United States
| | - Anthony A High
- Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, United States
| | - Junmin Peng
- Department of Structural Biology, and Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, United States
| | - Xusheng Wang
- Department of Neurology, University of Tennessee Health Science Center, Memphis, Tennessee 38103, United States
- Department of Genetics, Genomics & Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38103, United States
| |
Collapse
|
6
|
Camacho OM, Ramsbottom KA, Prakash A, Sun Z, Perez Riverol Y, Bowler-Barnett E, Martin M, Fan J, Deutsch EW, Vizcaíno JA, Jones AR. Phosphorylation in the Plasmodium falciparum Proteome: A Meta-Analysis of Publicly Available Data Sets. J Proteome Res 2024; 23:5326-5341. [PMID: 39475123 PMCID: PMC11629380 DOI: 10.1021/acs.jproteome.4c00418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 10/07/2024] [Accepted: 10/11/2024] [Indexed: 12/07/2024]
Abstract
Malaria is a deadly disease caused by Apicomplexan parasites of the Plasmodium genus. Several species of the Plasmodium genus are known to be infectious to humans, of which P. falciparum is the most virulent. Post-translational modifications (PTMs) of proteins coordinate cell signaling and hence regulate many biological processes in P. falciparum homeostasis and host infection, of which the most highly studied is phosphorylation. Phosphosites on proteins can be identified by tandem mass spectrometry (MS) performed on enriched samples (phosphoproteomics), followed by downstream computational analyses. We have performed a large-scale meta-analysis of 11 publicly available phosphoproteomics data sets to build a comprehensive atlas of phosphosites in the P. falciparum proteome, using robust pipelines aimed at strict control of false identifications. We identified a total of 26,609 phosphorylated sites on P. falciparum proteins, split across three categories of data reliability (gold/silver/bronze). We identified significant sequence motifs, likely indicative of different groups of kinases responsible for different groups of phosphosites. Conservation analysis identified clusters of phosphoproteins that are highly conserved and others that are evolving faster within the Plasmodium genus, and implicated in different pathways. We were also able to identify over 180,000 phosphosites within Plasmodium species beyond falciparum, based on orthologue mapping. We also explored the structural context of phosphosites, identifying a strong enrichment for phosphosites on fast-evolving (low conservation) intrinsically disordered regions (IDRs) of proteins. In other species, IDRs have been shown to have an important role in modulating protein-protein interactions, particularly in signaling, and thus warranting further study for their roles in host-pathogen interactions. All data have been made available via UniProtKB, PRIDE, and PeptideAtlas, with visualization interfaces for exploring phosphosites in the context of other data on Plasmodium proteins.
Collapse
Affiliation(s)
- Oscar
J. M. Camacho
- Institute
of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, United Kingdom
| | - Kerry A. Ramsbottom
- Institute
of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, United Kingdom
| | - Ananth Prakash
- European
Molecular Biology Laboratory, EMBL-European
Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10
1SD, United Kingdom
| | - Zhi Sun
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Yasset Perez Riverol
- European
Molecular Biology Laboratory, EMBL-European
Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10
1SD, United Kingdom
| | - Emily Bowler-Barnett
- European
Molecular Biology Laboratory, EMBL-European
Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10
1SD, United Kingdom
| | - Maria Martin
- European
Molecular Biology Laboratory, EMBL-European
Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10
1SD, United Kingdom
| | - Jun Fan
- European
Molecular Biology Laboratory, EMBL-European
Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10
1SD, United Kingdom
| | - Eric W. Deutsch
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, EMBL-European
Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10
1SD, United Kingdom
| | - Andrew R. Jones
- Institute
of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7BE, United Kingdom
| |
Collapse
|
7
|
Klein J, Lam H, Mak TD, Bittremieux W, Perez-Riverol Y, Gabriels R, Shofstahl J, Hecht H, Binz PA, Kawano S, Van Den Bossche T, Carver J, Neely BA, Mendoza L, Suomi T, Claeys T, Payne T, Schulte D, Sun Z, Hoffmann N, Zhu Y, Neumann S, Jones AR, Bandeira N, Vizcaíno JA, Deutsch EW. The Proteomics Standards Initiative Standardized Formats for Spectral Libraries and Fragment Ion Peak Annotations: mzSpecLib and mzPAF. Anal Chem 2024; 96:18491-18501. [PMID: 39514576 PMCID: PMC11579979 DOI: 10.1021/acs.analchem.4c04091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 10/16/2024] [Accepted: 11/01/2024] [Indexed: 11/16/2024]
Abstract
Mass spectral libraries are collections of reference spectra, usually associated with specific analytes from which the spectra were generated, that are used for further downstream analysis of new spectra. There are many different formats used for encoding spectral libraries, but none have undergone a standardization process to ensure broad applicability to many applications. As part of the Human Proteome Organization Proteomics Standards Initiative (PSI), we have developed a standardized format for encoding spectral libraries, called mzSpecLib (https://psidev.info/mzSpecLib). It is primarily a data model that flexibly encodes metadata about the library entries using the extensible PSI-MS controlled vocabulary and can be encoded in and converted between different serialization formats. We have also developed a standardized data model and serialization for fragment ion peak annotations, called mzPAF (https://psidev.info/mzPAF). It is defined as a separate standard, since it may be used for other applications besides spectral libraries. The mzSpecLib and mzPAF standards are compatible with existing PSI standards such as ProForma 2.0 and the Universal Spectrum Identifier. The mzSpecLib and mzPAF standards have been primarily defined for peptides in proteomics applications with basic small molecule support. They could be extended in the future to other fields that need to encode spectral libraries for nonpeptidic analytes.
Collapse
Affiliation(s)
- Joshua Klein
- Program
for Bioinformatics, Boston University, Boston, Massachusetts 02215, United States
| | - Henry Lam
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, 999077 Hong Kong, P. R. China
| | - Tytus D. Mak
- Mass
Spectrometry Data Center, National Institute
of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Wout Bittremieux
- Department
of Computer Science, University of Antwerp, 2020 Antwerpen, Belgium
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Jim Shofstahl
- Thermo
Fisher
Scientific, 355 River Oaks Parkway, San Jose, California 95134, United States
| | - Helge Hecht
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 60200 Brno, Czech Republic
| | | | - Shin Kawano
- Database
Center for Life Science, Joint Support Center
for Data Science Research, Research Organization of Information and
Systems, Chiba 277-0871, Japan
- School
of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Tim Van Den Bossche
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Jeremy Carver
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, University of California, San Diego, California 92093-0404, United
States
| | - Benjamin A. Neely
- National
Institute of Standards and Technology (NIST) Charleston, Charleston, South Carolina 29412, United States
| | - Luis Mendoza
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Tomi Suomi
- Turku Bioscience
Centre, University of Turku and Åbo
Akademi University, FI-20520 Turku, Finland
| | - Tine Claeys
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Thomas Payne
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Douwe Schulte
- Biomolecular
Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular
Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584,
CH, Utrecht, The
Netherlands
| | - Zhi Sun
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Nils Hoffmann
- Institute
for Bio- and Geosciences (IBG-5), Forschungszentrum
Jülich GmbH, 52428 Jülich, Germany
| | - Yunping Zhu
- National
Center for Protein Sciences (Beijing), Beijing
Institute of Lifeomics, #38, Life Science Park, Changping District, Beijing 102206, China
| | - Steffen Neumann
- Computational
Plant Biochemistry, Leibniz Institute of
Plant Biochemistry, 06120 Halle, Germany
- German
Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Andrew R. Jones
- Institute
of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, United Kingdom
| | - Nuno Bandeira
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, University of California, San Diego, California 92093-0404, United
States
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Eric W. Deutsch
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
8
|
He Q, Guo H, Li Y, He G, Li X, Shuai J. SeFilter-DIA: Squeeze-and-Excitation Network for Filtering High-Confidence Peptides of Data-Independent Acquisition Proteomics. Interdiscip Sci 2024; 16:579-592. [PMID: 38472692 DOI: 10.1007/s12539-024-00611-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/12/2024] [Accepted: 01/21/2024] [Indexed: 03/14/2024]
Abstract
Mass spectrometry is crucial in proteomics analysis, particularly using Data Independent Acquisition (DIA) for reliable and reproducible mass spectrometry data acquisition, enabling broad mass-to-charge ratio coverage and high throughput. DIA-NN, a prominent deep learning software in DIA proteome analysis, generates peptide results but may include low-confidence peptides. Conventionally, biologists have to manually screen peptide fragment ion chromatogram peaks (XIC) for identifying high-confidence peptides, a time-consuming and subjective process prone to variability. In this study, we introduce SeFilter-DIA, a deep learning algorithm, aiming at automating the identification of high-confidence peptides. Leveraging compressed excitation neural network and residual network models, SeFilter-DIA extracts XIC features and effectively discerns between high and low-confidence peptides. Evaluation of the benchmark datasets demonstrates SeFilter-DIA achieving 99.6% AUC on the test set and 97% for other performance indicators. Furthermore, SeFilter-DIA is applicable for screening peptides with phosphorylation modifications. These results demonstrate the potential of SeFilter-DIA to replace manual screening, providing an efficient and objective approach for high-confidence peptide identification while mitigating associated limitations.
Collapse
Affiliation(s)
- Qingzu He
- Department of Physics, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361005, China
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China
| | - Huan Guo
- Department of Physics, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361005, China
| | - Yulin Li
- Department of Physics, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361005, China
| | - Guoqiang He
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China
| | - Xiang Li
- Department of Physics, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361005, China.
| | - Jianwei Shuai
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China.
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, 325001, China.
| |
Collapse
|
9
|
Sőregi P, Zwillinger M, Vágó L, Csékei M, Kotschy A. High density information storage through isotope ratio encoding. Chem Sci 2024:d4sc03519d. [PMID: 39246345 PMCID: PMC11376023 DOI: 10.1039/d4sc03519d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 08/19/2024] [Indexed: 09/10/2024] Open
Abstract
The need for reliable information storage is on a steep rise. Sequence-defined polymers, particularly oligonucleotides, are already in use in several areas, while compound mixtures also offer a simple way for storing information. We investigated the use of a set of isotopologues in information storage by mixing, where the information is stored in the form of a mass spectrometric (MS) fingerprint of the mixture. A small molecule with 24 non-labile and replaceable hydrogen atoms was selected as a model, and a set of components covering the D0-D24 deuteration range were synthesized. Theoretical analysis predicted that by mixing up to 10 out of the prepared components, one can encode over 130 million different combinations and distinguish their MS fingerprints. As a proof of principle, several mixtures predicted to have similar fingerprints were prepared and their MS fingerprints were recorded. From each measured MS fingerprint, we were able to unambiguously identify the actual composition of the mixture. It was also demonstrated that one can make the MS fingerprints of a given mixture unique, thereby making counterfeiting of the stored information very difficult. Finally, the utility of isotope ratio encoding in covalent tagging was also demonstrated.
Collapse
Affiliation(s)
- Petra Sőregi
- Servier Research Institute of Medicinal Chemistry Záhony utca 7 1031 Budapest Hungary
- Hevesy György PhD School of Chemistry, Eötvös Loránd University Pázmány Péter sétány 1/A 1117 Budapest Hungary
| | - Márton Zwillinger
- Servier Research Institute of Medicinal Chemistry Záhony utca 7 1031 Budapest Hungary
| | - Lajos Vágó
- Kastély u. 49/A 2045 Törökbálint Hungary
| | - Márton Csékei
- Servier Research Institute of Medicinal Chemistry Záhony utca 7 1031 Budapest Hungary
| | - Andras Kotschy
- Servier Research Institute of Medicinal Chemistry Záhony utca 7 1031 Budapest Hungary
| |
Collapse
|
10
|
Fields L, Ma M, DeLaney K, Phetsanthad A, Li L. A crustacean neuropeptide spectral library for data-independent acquisition (DIA) mass spectrometry applications. Proteomics 2024; 24:e2300285. [PMID: 38171828 PMCID: PMC11219527 DOI: 10.1002/pmic.202300285] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 11/06/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024]
Abstract
Neuropeptides have tremendous potential for application in modern medicine, including utility as biomarkers and therapeutics. To overcome the inherent challenges associated with neuropeptide identification and characterization, data-independent acquisition (DIA) is a fitting mass spectrometry (MS) method of choice to achieve sensitive and accurate analysis. It is advantageous for preliminary neuropeptidomic studies to occur in less complex organisms, with crustacean models serving as a popular choice due to their relatively simple nervous system. With spectral libraries serving as a means to interpret DIA-MS output spectra, and Cancer borealis as a model of choice for neuropeptide analysis, we performed the first spectral library mapping of crustacean neuropeptides. Leveraging pre-existing data-dependent acquisition (DDA) spectra, a spectral library was built using PEAKS Online. The library is comprised of 333 unique neuropeptides. The identification results obtained through the use of this spectral library were compared with those achieved through library-free analysis of crustacean brain, pericardial organs (PO), and thoracic ganglia (TG) tissues. A statistically significant increase (Student's t-test, P value < 0.05) in the number of identifications achieved from the TG data was observed in the spectral library results. Furthermore, in each of the tissues, a distinctly different set of identifications was found in the library search compared to the library-free search. This work highlights the necessity for the use of spectral libraries in neuropeptide analysis, illustrating the advantage of spectral libraries for interpreting DIA spectra in a reproducible manner with greater neuropeptidomic depth.
Collapse
Affiliation(s)
- Lauren Fields
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Min Ma
- School of Pharmacy, University of Wisconsin-Madison, Madison, WI, 53705, United States
| | - Kellen DeLaney
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Ashley Phetsanthad
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Lingjun Li
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, United States
- School of Pharmacy, University of Wisconsin-Madison, Madison, WI, 53705, United States
| |
Collapse
|
11
|
Zakopcanik M, Kavan D, Kukacka Z, Novak P, Loginov DS. Data-Independent Acquisition Represents a Promising Alternative for Fast Photochemical Oxidation of Proteins (FPOP) Samples Analysis. Anal Chem 2024; 96:11273-11279. [PMID: 38967040 PMCID: PMC11256011 DOI: 10.1021/acs.analchem.4c01084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 06/27/2024] [Accepted: 06/28/2024] [Indexed: 07/06/2024]
Abstract
Fast Photochemical Oxidation of Proteins (FPOP) is a protein footprinting method utilizing hydroxyl radicals to provide valuable information on the solvent-accessible surface area. The extensive number of oxidative modifications that are created by FPOP is both advantageous, leading to great spatial resolution, and challenging, increasing the complexity of data processing. The precise localization of the modification together with the appropriate reproducibility is crucial to obtain relevant structural information. In this paper, we propose a novel approach combining validated spectral libraries together with utilizing DIA data. First, the DDA data searched by FragPipe are subsequently validated using Skyline software to form a spectral library. This library is then matched against the DIA data to filter out nonrepresentative IDs. In comparison with FPOP data processing using only a search engine followed by generally applied filtration steps, the manually validated spectral library offers higher confidence in identifications and increased spatial resolution. Furthermore, the reproducibility of quantification was compared for DIA, DDA, and MS-only acquisition modes on timsTOF SCP. Comparison of coefficients of variation (CV) showed that the DIA and MS acquisition modes exhibit significantly better reproducibility in quantification (CV medians 0.1233 and 0.1494, respectively) compared to the DDA mode (CV median 0.2104).
Collapse
Affiliation(s)
- Marek Zakopcanik
- Institute
of Microbiology, The Czech Academy of Sciences, 14220 Prague, Czech Republic
- Department
of Biochemistry, Faculty of Science, Charles
University, 12820 Prague, Czech
Republic
| | - Daniel Kavan
- Institute
of Microbiology, The Czech Academy of Sciences, 14220 Prague, Czech Republic
| | - Zdenek Kukacka
- Institute
of Microbiology, The Czech Academy of Sciences, 14220 Prague, Czech Republic
| | - Petr Novak
- Institute
of Microbiology, The Czech Academy of Sciences, 14220 Prague, Czech Republic
| | - Dmitry S. Loginov
- Institute
of Microbiology, The Czech Academy of Sciences, 14220 Prague, Czech Republic
| |
Collapse
|
12
|
Espadas G, Llovera L, Ollivier A, Tuorto F, Novoa EM, Sabidó E. Spectral libraries from nucleobases and deoxyribonucleosides facilitate the identification of ribonucleosides by nano-flow liquid chromatography-tandem mass spectrometry. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2024; 38:e9759. [PMID: 38680121 DOI: 10.1002/rcm.9759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 03/05/2024] [Accepted: 03/31/2024] [Indexed: 05/01/2024]
Abstract
RATIONALE The study addresses the challenge of identifying RNA post-transcriptional modifications when commercial standards are not available to generate reference spectral libraries. It proposes employing homologous nucleobases and deoxyribonucleosides as alternative reference spectral libraries to aid in identifying modified ribonucleosides and distinguishing them from their positional isomers when the standards are unavailable. METHODS Complete sets of ribonucleoside, deoxyribonucleoside and nucleobase standards were analyzed using high-performance nano-flow liquid chromatography coupled to an Orbitrap Eclipse Tribrid mass spectrometer. Spectral libraries were constructed from homologous nucleobases and deoxyribonucleosides using targeted MS2 and neutral-loss-triggered MS3 methods, and collision energies were optimized. The feasibility of using these libraries for identifying modified ribonucleosides and their positional isomers was assessed through comparison of spectral fragmentation patterns. RESULTS Our analysis reveals that both MS2 and neutral-loss-triggered MS3 methods yielded rich spectra with similar fragmentation patterns across ribonucleosides, deoxyribonucleosides and nucleobases. Moreover, we demonstrate that spectra from nucleobases and deoxyribonucleosides, generated at optimized collision energies, exhibited sufficient similarity to those of modified ribonucleosides to enable their use as reference spectra for accurate identification of positional isomers within ribonucleoside families. CONCLUSIONS The study demonstrates the efficacy of utilizing homologous nucleobases and deoxyribonucleosides as interchangeable reference spectral libraries for identifying modified ribonucleosides and their positional isomers. This approach offers a valuable solution for overcoming limitations posed by the unavailability of commercial standards, enhancing the analysis of RNA post-transcriptional modifications via mass spectrometry.
Collapse
Affiliation(s)
- Guadalupe Espadas
- Center for Genomics Regulation, The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Laia Llovera
- Center for Genomics Regulation, The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Alexane Ollivier
- Center for Genomics Regulation, The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Francesca Tuorto
- Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Eva Maria Novoa
- Center for Genomics Regulation, The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Eduard Sabidó
- Center for Genomics Regulation, The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
13
|
Chen YE, Ge X, Woyshner K, McDermott M, Manousopoulou A, Ficarro SB, Marto JA, Li K, Wang LD, Li JJ. APIR: Aggregating Universal Proteomics Database Search Algorithms for Peptide Identification with FDR Control. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae042. [PMID: 39198030 DOI: 10.1093/gpbjnl/qzae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 02/26/2024] [Accepted: 03/11/2024] [Indexed: 09/01/2024]
Abstract
Advances in mass spectrometry (MS) have enabled high-throughput analysis of proteomes in biological systems. The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying peptide-spectrum matches (PSMs), which convert mass spectra to peptide sequences. Different database search algorithms use distinct search strategies and thus may identify unique PSMs. However, no existing approaches can aggregate all user-specified database search algorithms with a guaranteed increase in the number of identified peptides and a control on the false discovery rate (FDR). To fill in this gap, we proposed a statistical framework, Aggregation of Peptide Identification Results (APIR), that is universally compatible with all database search algorithms. Notably, under an FDR threshold, APIR is guaranteed to identify at least as many, if not more, peptides as individual database search algorithms do. Evaluation of APIR on a complex proteomics standard dataset showed that APIR outpowers individual database search algorithms and empirically controls the FDR. Real data studies showed that APIR can identify disease-related proteins and post-translational modifications missed by some individual database search algorithms. The APIR framework is easily extendable to aggregating discoveries made by multiple algorithms in other high-throughput biomedical data analysis, e.g., differential gene expression analysis on RNA sequencing data. The APIR R package is available at https://github.com/yiling0210/APIR.
Collapse
Affiliation(s)
- Yiling Elaine Chen
- Department of Statistics and Data Science, University of California, Los Angeles, CA 90095, USA
| | - Xinzhou Ge
- Department of Statistics and Data Science, University of California, Los Angeles, CA 90095, USA
| | - Kyla Woyshner
- Department of Immuno-Oncology, Beckman Research Institute, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - MeiLu McDermott
- Department of Immuno-Oncology, Beckman Research Institute, City of Hope National Medical Center, Duarte, CA 91010, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Antigoni Manousopoulou
- Department of Immuno-Oncology, Beckman Research Institute, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Scott B Ficarro
- Department of Cancer Biology and Blais Proteomics Center, Dana-Farber Cancer Institute, Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02215, USA
| | - Jarrod A Marto
- Department of Cancer Biology and Blais Proteomics Center, Dana-Farber Cancer Institute, Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02215, USA
| | - Kexin Li
- Department of Statistics and Data Science, University of California, Los Angeles, CA 90095, USA
| | - Leo David Wang
- Department of Immuno-Oncology, Beckman Research Institute, City of Hope National Medical Center, Duarte, CA 91010, USA
- Department of Pediatrics, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Jingyi Jessica Li
- Department of Statistics and Data Science, University of California, Los Angeles, CA 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA 90095, USA
- Department of Human Genetics, University of California, Los Angeles, CA 90095, USA
- Department of Computational Medicine, University of California, Los Angeles, CA 90095, USA
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
14
|
Kalhor M, Lapin J, Picciani M, Wilhelm M. Rescoring Peptide Spectrum Matches: Boosting Proteomics Performance by Integrating Peptide Property Predictors Into Peptide Identification. Mol Cell Proteomics 2024; 23:100798. [PMID: 38871251 PMCID: PMC11269915 DOI: 10.1016/j.mcpro.2024.100798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 05/26/2024] [Accepted: 06/09/2024] [Indexed: 06/15/2024] Open
Abstract
Rescoring of peptide spectrum matches originating from database search engines enabled by peptide property predictors is exceeding the performance of peptide identification from traditional database search engines. In contrast to the peptide spectrum match scores calculated by traditional database search engines, rescoring peptide spectrum matches generates scores based on comparing observed and predicted peptide properties, such as fragment ion intensities and retention times. These newly generated scores enable a more efficient discrimination between correct and incorrect peptide spectrum matches. This approach was shown to lead to substantial improvements in the number of confidently identified peptides, facilitating the analysis of challenging datasets in various fields such as immunopeptidomics, metaproteomics, proteogenomics, and single-cell proteomics. In this review, we summarize the key elements leading up to the recent introduction of multiple data-driven rescoring pipelines. We provide an overview of relevant post-processing rescoring tools, introduce prominent data-driven rescoring pipelines for various applications, and highlight limitations, opportunities, and future perspectives of this approach and its impact on mass spectrometry-based proteomics.
Collapse
Affiliation(s)
- Mostafa Kalhor
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Joel Lapin
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany; Munich Data Science Institute, Technical University of Munich, Garching, Germany.
| |
Collapse
|
15
|
Liu K, Tao C, Ye Y, Tang H. SpecEncoder: deep metric learning for accurate peptide identification in proteomics. Bioinformatics 2024; 40:i257-i265. [PMID: 38940141 PMCID: PMC11211836 DOI: 10.1093/bioinformatics/btae220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Tandem mass spectrometry (MS/MS) is a crucial technology for large-scale proteomic analysis. The protein database search or the spectral library search are commonly used for peptide identification from MS/MS spectra, which, however, may face challenges due to experimental variations between replicated spectra and similar fragmentation patterns among distinct peptides. To address this challenge, we present SpecEncoder, a deep metric learning approach to address these challenges by transforming MS/MS spectra into robust and sensitive embedding vectors in a latent space. The SpecEncoder model can also embed predicted MS/MS spectra of peptides, enabling a hybrid search approach that combines spectral library and protein database searches for peptide identification. RESULTS We evaluated SpecEncoder on three large human proteomics datasets, and the results showed a consistent improvement in peptide identification. For spectral library search, SpecEncoder identifies 1%-2% more unique peptides (and PSMs) than SpectraST. For protein database search, it identifies 6%-15% more unique peptides than MSGF+ enhanced by Percolator, Furthermore, SpecEncoder identified 6%-12% additional unique peptides when utilizing a combined library of experimental and predicted spectra. SpecEncoder can also identify more peptides when compared to deep-learning enhanced methods (MSFragger boosted by MSBooster). These results demonstrate SpecEncoder's potential to enhance peptide identification for proteomic data analyses. AVAILABILITY AND IMPLEMENTATION The source code and scripts for SpecEncoder and peptide identification are available on GitHub at https://github.com/lkytal/SpecEncoder. Contact: hatang@iu.edu.
Collapse
Affiliation(s)
- Kaiyuan Liu
- Department of Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University, IN 47408, United States
| | - Chenghua Tao
- Department of Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University, IN 47408, United States
| | - Yuzhen Ye
- Department of Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University, IN 47408, United States
| | - Haixu Tang
- Department of Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University, IN 47408, United States
| |
Collapse
|
16
|
Hamaneh M, Ogurtsov AY, Obolensky OI, Yu YK. Systematic Assessment of Deep Learning-Based Predictors of Fragmentation Intensity Profiles. J Proteome Res 2024; 23:1983-1999. [PMID: 38728051 PMCID: PMC11165591 DOI: 10.1021/acs.jproteome.3c00857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 03/05/2024] [Accepted: 04/16/2024] [Indexed: 06/13/2024]
Abstract
In recent years, several deep learning-based methods have been proposed for predicting peptide fragment intensities. This study aims to provide a comprehensive assessment of six such methods, namely Prosit, DeepMass:Prism, pDeep3, AlphaPeptDeep, Prosit Transformer, and the method proposed by Guan et al. To this end, we evaluated the accuracy of the predicted intensity profiles for close to 1.7 million precursors (including both tryptic and HLA peptides) corresponding to more than 18 million experimental spectra procured from 40 independent submissions to the PRIDE repository that were acquired for different species using a variety of instruments and different dissociation types/energies. Specifically, for each method, distributions of similarity (measured by Pearson's correlation and normalized angle) between the predicted and the corresponding experimental b and y fragment intensities were generated. These distributions were used to ascertain the prediction accuracy and rank the prediction methods for particular types of experimental conditions. The effect of variables like precursor charge, length, and collision energy on the prediction accuracy was also investigated. In addition to prediction accuracy, the methods were evaluated in terms of prediction speed. The systematic assessment of these six methods may help in choosing the right method for MS/MS spectra prediction for particular needs.
Collapse
Affiliation(s)
- Mehdi
B. Hamaneh
- National Center for Biotechnology
Information, National Library of Medicine,
National Institutes of Health, Bethesda, Maryland 20894, United States
| | - Aleksey Y. Ogurtsov
- National Center for Biotechnology
Information, National Library of Medicine,
National Institutes of Health, Bethesda, Maryland 20894, United States
| | | | - Yi-Kuo Yu
- National Center for Biotechnology
Information, National Library of Medicine,
National Institutes of Health, Bethesda, Maryland 20894, United States
| |
Collapse
|
17
|
Picciani M, Gabriel W, Giurcoiu VG, Shouman O, Hamood F, Lautenbacher L, Jensen CB, Müller J, Kalhor M, Soleymaniniya A, Kuster B, The M, Wilhelm M. Oktoberfest: Open-source spectral library generation and rescoring pipeline based on Prosit. Proteomics 2024; 24:e2300112. [PMID: 37672792 DOI: 10.1002/pmic.202300112] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/17/2023] [Accepted: 08/18/2023] [Indexed: 09/08/2023]
Abstract
Machine learning (ML) and deep learning (DL) models for peptide property prediction such as Prosit have enabled the creation of high quality in silico reference libraries. These libraries are used in various applications, ranging from data-independent acquisition (DIA) data analysis to data-driven rescoring of search engine results. Here, we present Oktoberfest, an open source Python package of our spectral library generation and rescoring pipeline originally only available online via ProteomicsDB. Oktoberfest is largely search engine agnostic and provides access to online peptide property predictions, promoting the adoption of state-of-the-art ML/DL models in proteomics analysis pipelines. We demonstrate its ability to reproduce and even improve our results from previously published rescoring analyses on two distinct use cases. Oktoberfest is freely available on GitHub (https://github.com/wilhelm-lab/oktoberfest) and can easily be installed locally through the cross-platform PyPI Python package.
Collapse
Affiliation(s)
- Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Wassim Gabriel
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Victor-George Giurcoiu
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Omar Shouman
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Firas Hamood
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Ludwig Lautenbacher
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Cecilia Bang Jensen
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Julian Müller
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mostafa Kalhor
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Armin Soleymaniniya
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Matthew The
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| |
Collapse
|
18
|
Joyce AW, Searle BC. Computational approaches to identify sites of phosphorylation. Proteomics 2024; 24:e2300088. [PMID: 37897210 DOI: 10.1002/pmic.202300088] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 10/07/2023] [Accepted: 10/09/2023] [Indexed: 10/29/2023]
Abstract
Due to their oftentimes ambiguous nature, phosphopeptide positional isomers can present challenges in bottom-up mass spectrometry-based workflows as search engine scores alone are often not enough to confidently distinguish them. Additional scoring algorithms can remedy this by providing confidence metrics in addition to these search results, reducing ambiguity. Here we describe challenges to interpreting phosphoproteomics data and review several different approaches to determine sites of phosphorylation for both data-dependent and data-independent acquisition-based workflows. Finally, we discuss open questions regarding neutral losses, gas-phase rearrangement, and false localization rate estimation experienced by both types of acquisition workflows and best practices for managing ambiguity in phosphosite determination.
Collapse
Affiliation(s)
- Alex W Joyce
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio, USA
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, USA
| | - Brian C Searle
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio, USA
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, USA
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
19
|
Palstrøm NB, Campbell AJ, Lindegaard CA, Cakar S, Matthiesen R, Beck HC. Spectral library search for improved TMTpro labelled peptide assignment in human plasma proteomics. Proteomics 2024; 24:e2300236. [PMID: 37706597 DOI: 10.1002/pmic.202300236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 09/15/2023]
Abstract
Clinical biomarker discovery is often based on the analysis of human plasma samples. However, the high dynamic range and complexity of plasma pose significant challenges to mass spectrometry-based proteomics. Current methods for improving protein identifications require laborious pre-analytical sample preparation. In this study, we developed and evaluated a TMTpro-specific spectral library for improved protein identification in human plasma proteomics. The library was constructed by LC-MS/MS analysis of highly fractionated TMTpro-tagged human plasma, human cell lysates, and relevant arterial tissues. The library was curated using several quality filters to ensure reliable peptide identifications. Our results show that spectral library searching using the TMTpro spectral library improves the identification of proteins in plasma samples compared to conventional sequence database searching. Protein identifications made by the spectral library search engine demonstrated a high degree of complementarity with the sequence database search engine, indicating the feasibility of increasing the number of protein identifications without additional pre-analytical sample preparation. The TMTpro-specific spectral library provides a resource for future plasma proteomics research and optimization of search algorithms for greater accuracy and speed in protein identifications in human plasma proteomics, and is made publicly available to the research community via ProteomeXchange with identifier PXD042546.
Collapse
Affiliation(s)
- Nicolai B Palstrøm
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| | - Amanda J Campbell
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| | | | - Samir Cakar
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| | - Rune Matthiesen
- Computational and Experimental Biology Group, CEDOC, Chronic Diseases Research Centre, NOVA Medical School, Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Lisbon, Portugal
| | - Hans C Beck
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| |
Collapse
|
20
|
Jia Z, Zhu X, Zhou Y, Wu J, Cao M, Hu C, Yu L, Xu R, Chen Z. Polypeptides from traditional Chinese medicine: Comprehensive review of perspective towards cancer management. Int J Biol Macromol 2024; 260:129423. [PMID: 38232868 DOI: 10.1016/j.ijbiomac.2024.129423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/26/2023] [Accepted: 01/09/2024] [Indexed: 01/19/2024]
Abstract
Cancer has always been a focus of global attention, and the difficulty of treatment and poor prognosis have always plagued humanity. Conventional chemotherapeutics and treatment with synthetic disciplines will cause adverse side effects and drug resistance. Therefore, searching for a safe, valid, and clinically effective drug is necessary. At present, some natural compounds have proved to have the potential to fight cancer. Polypeptides obtained from traditional Chinese medicine are good anti-cancer ingredients. The anticancer activity has been fully demonstrated in vivo and in vitro. However, most of the functional studies on traditional Chinese medicine polypeptides are at the stage of basic experimental research, and fewer of them have been applied to clinical trials. Hence, this review mainly discusses the chemical structure, extraction, separation and purification methods, the anti-cancer mechanism, and structure-activity relationships of traditional Chinese medicine polypeptides. It provides theoretical support for strengthening the rapid separation and purification and the overall efficacy and mechanism of action, as well as the industrialization and clinical application of traditional Chinese medicine polypeptides.
Collapse
Affiliation(s)
- Zhuolin Jia
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Xiaoli Zhu
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Ye Zhou
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Jie Wu
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Mayijie Cao
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Changjiang Hu
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Lingying Yu
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China.
| | - Runchun Xu
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China.
| | - Zhimin Chen
- State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China.
| |
Collapse
|
21
|
Lou R, Shui W. Acquisition and Analysis of DIA-Based Proteomic Data: A Comprehensive Survey in 2023. Mol Cell Proteomics 2024; 23:100712. [PMID: 38182042 PMCID: PMC10847697 DOI: 10.1016/j.mcpro.2024.100712] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/27/2023] [Accepted: 01/02/2024] [Indexed: 01/07/2024] Open
Abstract
Data-independent acquisition (DIA) mass spectrometry (MS) has emerged as a powerful technology for high-throughput, accurate, and reproducible quantitative proteomics. This review provides a comprehensive overview of recent advances in both the experimental and computational methods for DIA proteomics, from data acquisition schemes to analysis strategies and software tools. DIA acquisition schemes are categorized based on the design of precursor isolation windows, highlighting wide-window, overlapping-window, narrow-window, scanning quadrupole-based, and parallel accumulation-serial fragmentation-enhanced DIA methods. For DIA data analysis, major strategies are classified into spectrum reconstruction, sequence-based search, library-based search, de novo sequencing, and sequencing-independent approaches. A wide array of software tools implementing these strategies are reviewed, with details on their overall workflows and scoring approaches at different steps. The generation and optimization of spectral libraries, which are critical resources for DIA analysis, are also discussed. Publicly available benchmark datasets covering global proteomics and phosphoproteomics are summarized to facilitate performance evaluation of various software tools and analysis workflows. Continued advances and synergistic developments of versatile components in DIA workflows are expected to further enhance the power of DIA-based proteomics.
Collapse
Affiliation(s)
- Ronghui Lou
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| | - Wenqing Shui
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| |
Collapse
|
22
|
Liu K, Ye Y, Li S, Tang H. Accurate de novo peptide sequencing using fully convolutional neural networks. Nat Commun 2023; 14:7974. [PMID: 38042873 PMCID: PMC10693636 DOI: 10.1038/s41467-023-43010-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 10/29/2023] [Indexed: 12/04/2023] Open
Abstract
De novo peptide sequencing, which does not rely on a comprehensive target sequence database, provides us with a way to identify novel peptides from tandem mass spectra. However, current de novo sequencing algorithms suffer from low accuracy and coverage, which hinders their application in proteomics. In this paper, we present PepNet, a fully convolutional neural network for high accuracy de novo peptide sequencing. PepNet takes an MS/MS spectrum (represented as a high-dimensional vector) as input, and outputs the optimal peptide sequence along with its confidence score. The PepNet model is trained using a total of 3 million high-energy collisional dissociation MS/MS spectra from multiple human peptide spectral libraries. Evaluation results show that PepNet significantly outperforms current best-performing de novo sequencing algorithms (e.g. PointNovo and DeepNovo) in both peptide-level accuracy and positional-level accuracy. PepNet can sequence a large fraction of spectra that were not identified by database search engines, and thus could be used as a complementary tool to database search engines for peptide identification in proteomics. In addition, PepNet runs around 3x and 7x faster than PointNovo and DeepNovo on GPUs, respectively, thus being more suitable for the analysis of large-scale proteomics data.
Collapse
Affiliation(s)
- Kaiyuan Liu
- Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, 47408, IN, USA
| | - Yuzhen Ye
- Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, 47408, IN, USA
| | - Sujun Li
- Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, 47408, IN, USA
- Dengding BioAI Co., Ltd., Bloomington, USA
| | - Haixu Tang
- Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, 47408, IN, USA.
| |
Collapse
|
23
|
Chan CMJ, Lam H. Merging Full-Spectrum and Fragment Ion Intensity Predictions from Deep Learning for High-Quality Spectral Libraries. J Proteome Res 2023; 22:3692-3702. [PMID: 37910637 DOI: 10.1021/acs.jproteome.3c00180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2023]
Abstract
Spectral libraries are useful resources in proteomic data analysis. Recent advances in deep learning allow tandem mass spectra of peptides to be predicted from their amino acid sequences. This enables predicted spectral libraries to be compiled, and searching against such libraries has been shown to improve the sensitivity in peptide identification over conventional sequence database searching. However, current prediction models lack support for longer peptides, and thus far, predicted library searching has only been demonstrated for backbone ion-only spectrum prediction methods. Here, we propose a deep learning-based full-spectrum prediction method to generate predicted spectral libraries for peptide identification. We demonstrated the superiority of using full-spectrum libraries over backbone ion-only prediction approaches in spectral library searching. Furthermore, merging spectra from different prediction models, as a form of ensemble learning, can produce improved spectral libraries, in terms of identification sensitivity. We also show that a hybrid library combining predicted and experimental spectra can lead to 20% more confident identifications over experimental library searching or sequence database searching.
Collapse
Affiliation(s)
- Chak Ming Jerry Chan
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China
| |
Collapse
|
24
|
McGann CD, Barshop W, Canterbury J, Lin C, Gabriel W, Huang J, Bergen D, Zubraskov V, Melani R, Wilhelm M, McAlister G, Schweppe DK. Real-Time Spectral Library Matching for Sample Multiplexed Quantitative Proteomics. J Proteome Res 2023; 22:2836-2846. [PMID: 37557900 PMCID: PMC11554524 DOI: 10.1021/acs.jproteome.3c00085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/11/2023]
Abstract
Sample multiplexed quantitative proteomics assays have proved to be a highly versatile means to assay molecular phenotypes. Yet, stochastic precursor selection and precursor coisolation can dramatically reduce the efficiency of data acquisition and quantitative accuracy. To address this, intelligent data acquisition (IDA) strategies have recently been developed to improve instrument efficiency and quantitative accuracy for both discovery and targeted methods. Toward this end, we sought to develop and implement a new real-time spectral library searching (RTLS) workflow that could enable intelligent scan triggering and peak selection within milliseconds of scan acquisition. To ensure ease of use and general applicability, we built an application to read in diverse spectral libraries and file types from both empirical and predicted spectral libraries. We demonstrate that RTLS methods enable improved quantitation of multiplexed samples, particularly with consideration for quantitation from chimeric fragment spectra. We used RTLS to profile proteome responses to small molecule perturbations and were able to quantify up to 15% more significantly regulated proteins in half the gradient time compared to traditional methods. Taken together, the development of RTLS expands the IDA toolbox to improve instrument efficiency and quantitative accuracy for sample multiplexed analyses.
Collapse
Affiliation(s)
| | - Will Barshop
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Jesse Canterbury
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Chuwei Lin
- University of Washington, Seattle, WA 98105
| | | | - Jingjing Huang
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - David Bergen
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Vlad Zubraskov
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Rafael Melani
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | | | - Graeme McAlister
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | | |
Collapse
|
25
|
Sun Z, Ning Z, Cheng K, Duan H, Wu Q, Mayne J, Figeys D. MetaPep: A core peptide database for faster human gut metaproteomics database searches. Comput Struct Biotechnol J 2023; 21:4228-4237. [PMID: 37692080 PMCID: PMC10491838 DOI: 10.1016/j.csbj.2023.08.025] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/18/2023] [Accepted: 08/25/2023] [Indexed: 09/12/2023] Open
Abstract
Metaproteomics has increasingly been applied to study functional changes in the human gut microbiome. Peptide identification is an important step in metaproteomics research, with sequence database search (SDS) and spectral library search (SLS) as the two main methods to identify peptides. However, the large search space in metaproteomics studies causes significant challenges for both identification methods. Moreover, with the development of mass spectrometry, it is now feasible to perform metaproteomic projects involving 100-1000 individual microbiomes. These large-scale projects create a conundrum for searching large databases. In this study, we constructed MetaPep, a core peptide database (including both collections of peptide sequences and tandem MS spectra) greatly accelerating the peptide identifications. Raw files from fifteen metaproteomics projects were re-analyzed and the identified peptide-spectrum matches (PSMs) were used to construct the MetaPep database. The constructed MetaPep database achieved rapid and accurate identification of peptides for human gut metaproteomics. MetaPep has a large collection of peptides and spectra that have been identified in published human gut metaproteomics datasets. MetaPep database can be used as an important resource in the current stage of human gut metaproteomics research. This study showed the possibility of applying a core peptide database as a generic metaproteomics workflow. MetaPep could also be an important resource for future human gut metaproteomics research, such as DIA (data-independent acquisition) analysis.
Collapse
Affiliation(s)
- Zhongzhi Sun
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Zhibin Ning
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Kai Cheng
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Haonan Duan
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Qing Wu
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Janice Mayne
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Daniel Figeys
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| |
Collapse
|
26
|
Kuo TY, Wang JH, Huang YW, Sung TY, Chen CT. Improving quantitation accuracy in isobaric-labeling mass spectrometry experiments with spectral library searching and feature-based peptide-spectrum match filter. Sci Rep 2023; 13:14119. [PMID: 37644119 PMCID: PMC10465558 DOI: 10.1038/s41598-023-41124-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 08/22/2023] [Indexed: 08/31/2023] Open
Abstract
Isobaric labeling relative quantitation is one of the dominating proteomic quantitation technologies. Traditional quantitation pipelines for isobaric-labeled mass spectrometry data are based on sequence database searching. In this study, we present a novel quantitation pipeline that integrates sequence database searching, spectral library searching, and a feature-based peptide-spectrum-match (PSM) filter using various spectral features for filtering. The combined database and spectral library searching results in larger quantitation coverage, and the filter removes PSMs with larger quantitation errors, retaining those with higher quantitation accuracy. Quantitation results show that the proposed pipeline can improve the overall quantitation accuracy at the PSM and protein levels. To our knowledge, this is the first study that utilizes spectral library searching to improve isobaric labeling-based quantitation. For users to conveniently perform the proposed pipeline, we have implemented the feature-based filter being executable on both Windows and Linux platforms; its executable files, user manual, and sample data sets are freely available at https://ms.iis.sinica.edu.tw/comics/Software_FPF.html . Furthermore, with the developed filter, the proposed pipeline is fully compatible with the Trans-Proteomic Pipeline.
Collapse
Affiliation(s)
- Tzu-Yun Kuo
- Department of Biochemical Science and Technology, National Taiwan University, Taipei, 10617, Taiwan
| | - Jen-Hung Wang
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Statistical Science, Academia Sinica, Taipei, 11529, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, 11221, Taiwan
| | - Yung-Wen Huang
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, 10617, Taiwan
| | - Ting-Yi Sung
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan.
| | - Ching-Tai Chen
- Department of Bioinformatics and Biomedical Engineering, Asia University, Taichung, 41354, Taiwan.
- Center for Precision Health Research, Asia University, Taichung, 41354, Taiwan.
| |
Collapse
|
27
|
Hao C, Elias JE, Lee PKH, Lam H. metaSpectraST: an unsupervised and database-independent analysis workflow for metaproteomic MS/MS data using spectrum clustering. MICROBIOME 2023; 11:176. [PMID: 37550758 PMCID: PMC10405559 DOI: 10.1186/s40168-023-01602-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 06/18/2023] [Indexed: 08/09/2023]
Abstract
BACKGROUND The high diversity and complexity of the microbial community make it a formidable challenge to identify and quantify the large number of proteins expressed in the community. Conventional metaproteomics approaches largely rely on accurate identification of the MS/MS spectra to their corresponding short peptides in the digested samples, followed by protein inference and subsequent taxonomic and functional analysis of the detected proteins. These approaches are dependent on the availability of protein sequence databases derived either from sample-specific metagenomic data or from public repositories. Due to the incompleteness and imperfections of these protein sequence databases, and the preponderance of homologous proteins expressed by different bacterial species in the community, this computational process of peptide identification and protein inference is challenging and error-prone, which hinders the comparison of metaproteomes across multiple samples. RESULTS We developed metaSpectraST, an unsupervised and database-independent metaproteomics workflow, which quantitatively profiles and compares metaproteomics samples by clustering experimentally observed MS/MS spectra based on their spectral similarity. We applied metaSpectraST to fecal samples collected from littermates of two different mother mice right after weaning. Quantitative proteome profiles of the microbial communities of different mice were obtained without any peptide-spectrum identification and used to evaluate the overall similarity between samples and highlight any differentiating markers. Compared to the conventional database-dependent metaproteomics analysis, metaSpectraST is more successful in classifying the samples and detecting the subtle microbiome changes of mouse gut microbiomes post-weaning. metaSpectraST could also be used as a tool to select the suitable biological replicates from samples with wide inter-individual variation. CONCLUSIONS metaSpectraST enables rapid profiling of metaproteomic samples quantitatively, without the need for constructing the protein sequence database or identification of the MS/MS spectra. It maximally preserves information contained in the experimental MS/MS spectra by clustering all of them first and thus is able to better profile the complex microbial communities and highlight their functional changes, as compared with conventional approaches. tag the videobyte in this section as ESM4 Video Abstract.
Collapse
Affiliation(s)
- Chunlin Hao
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
- School of Energy and Environment, City University of Hong Kong, Hong Kong SAR, China
| | | | - Patrick K. H. Lee
- School of Energy and Environment, City University of Hong Kong, Hong Kong SAR, China
- State Key Laboratory of Marine Pollution, City University of Hong Kong, Hong Kong SAR, China
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| |
Collapse
|
28
|
Geer LY, Lapin J, Slotta DJ, Mak TD, Stein SE. AIomics: Exploring More of the Proteome Using Mass Spectral Libraries Extended by Artificial Intelligence. J Proteome Res 2023; 22:2246-2255. [PMID: 37232537 PMCID: PMC10542943 DOI: 10.1021/acs.jproteome.2c00807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The unbounded permutations of biological molecules, including proteins and their constituent peptides, present a dilemma in identifying the components of complex biosamples. Sequence search algorithms used to identify peptide spectra can be expanded to cover larger classes of molecules, including more modifications, isoforms, and atypical cleavage, but at the cost of false positives or false negatives due to the simplified spectra they compute from sequence records. Spectral library searching can help solve this issue by precisely matching experimental spectra to library spectra with excellent sensitivity and specificity. However, compiling spectral libraries that span entire proteomes is pragmatically difficult. Neural networks that predict complete spectra containing a full range of annotated and unannotated ions can be used to replace these simplified spectra with libraries of fully predicted spectra, including modified peptides. Using such a network, we created predicted spectral libraries that were used to rescore matches from a sequence search done over a large search space, including a large number of modifications. Rescoring improved the separation of true and false hits by 82%, yielding an 8% increase in peptide identifications, including a 21% increase in nonspecifically cleaved peptides and a 17% increase in phosphopeptides.
Collapse
Affiliation(s)
- Lewis Y. Geer
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Joel Lapin
- Department of Physics, Georgetown University, Washington, DC 20057, United States
- Associate, Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Douglas J. Slotta
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Tytus D. Mak
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Stephen E. Stein
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| |
Collapse
|
29
|
Nowatzky Y, Benner P, Reinert K, Muth T. Mistle: bringing spectral library predictions to metaproteomics with an efficient search index. Bioinformatics 2023; 39:btad376. [PMID: 37294786 PMCID: PMC10313348 DOI: 10.1093/bioinformatics/btad376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 05/11/2023] [Accepted: 06/08/2023] [Indexed: 06/11/2023] Open
Abstract
MOTIVATION Deep learning has moved to the forefront of tandem mass spectrometry-driven proteomics and authentic prediction for peptide fragmentation is more feasible than ever. Still, at this point spectral prediction is mainly used to validate database search results or for confined search spaces. Fully predicted spectral libraries have not yet been efficiently adapted to large search space problems that often occur in metaproteomics or proteogenomics. RESULTS In this study, we showcase a workflow that uses Prosit for spectral library predictions on two common metaproteomes and implement an indexing and search algorithm, Mistle, to efficiently identify experimental mass spectra within the library. Hence, the workflow emulates a classic protein sequence database search with protein digestion but builds a searchable index from spectral predictions as an in-between step. We compare Mistle to popular search engines, both on a spectral and database search level, and provide evidence that this approach is more accurate than a database search using MSFragger. Mistle outperforms other spectral library search engines in terms of run time and proves to be extremely memory efficient with a 4- to 22-fold decrease in RAM usage. This makes Mistle universally applicable to large search spaces, e.g. covering comprehensive sequence databases of diverse microbiomes. AVAILABILITY AND IMPLEMENTATION Mistle is freely available on GitHub at https://github.com/BAMeScience/Mistle.
Collapse
Affiliation(s)
- Yannek Nowatzky
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin 12205, Germany
| | - Philipp Benner
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin 12205, Germany
| | - Knut Reinert
- Department of Mathematics and Computer Science, FU Berlin, Berlin 14195, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| | - Thilo Muth
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin 12205, Germany
| |
Collapse
|
30
|
Orsburn BC. Time-of-Flight Fragmentation Spectra Generated by the Proteomic Analysis of Single Human Cells Do Not Exhibit Atypical Fragmentation Patterns. J Proteome Res 2023; 22:1003-1008. [PMID: 36700448 PMCID: PMC10502792 DOI: 10.1021/acs.jproteome.2c00715] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Indexed: 01/27/2023]
Abstract
Recent work detailed the unique characteristics of fragmentation spectra derived from peptides from single human cells. This valuable report utilized an ultrahigh-field Orbitrap and directly compared the spectra obtained from high-concentration bulk cell HeLa lysates to those obtained from nanogram dilutions of the same and from nanowell-processed single HeLa cells. The analysis demonstrated marked differences between the fragmentation spectra generated at high and single-cell loads, most strikingly, the loss of high-mass y-series fragment ions. As significant differences exist in the physics of Orbitrap and time-of-flight mass analyzers, a comparison appeared warranted. A similar analysis was performed using isolated single pancreatic cancer cells compared to pools consisting of 100 cells. While a reanalysis of the prior Orbitrap data supports the author's original findings, the same trends are not observed in time-of-flight mass spectra of peptides from single human cells. The results are particularly striking when directly comparing the matched intensity fragment values between bulk and single-cell data generated on the same mass analyzers. Instrument acquisition files, processed data, and spectrum libraries are publicly available on MASSIVE via accession MSV000090635.
Collapse
Affiliation(s)
- Benjamin C. Orsburn
- The Department of Pharmacology
and Molecular SciencesThe Johns Hopkins
University School of Medicine, Baltimore, Maryland21205, United States
| |
Collapse
|
31
|
Ahn R, Cui Y, White FM. Antigen discovery for the development of cancer immunotherapy. Semin Immunol 2023; 66:101733. [PMID: 36841147 DOI: 10.1016/j.smim.2023.101733] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 02/15/2023] [Accepted: 02/16/2023] [Indexed: 02/25/2023]
Abstract
Central to successful cancer immunotherapy is effective T cell antitumor immunity. Multiple targeted immunotherapies engineered to invigorate T cell-driven antitumor immunity rely on identifying the repertoire of T cell antigens expressed on the tumor cell surface. Mass spectrometry-based survey of such antigens ("immunopeptidomics") combined with other omics platforms and computational algorithms has been instrumental in identifying and quantifying tumor-derived T cell antigens. In this review, we discuss the types of tumor antigens that have emerged for targeted cancer immunotherapy and the immunopeptidomics methods that are central in MHC peptide identification and quantification. We provide an overview of the strength and limitations of mass spectrometry-driven approaches and how they have been integrated with other technologies to discover targetable T cell antigens for cancer immunotherapy. We highlight some of the emerging cancer immunotherapies that successfully capitalized on immunopeptidomics, their challenges, and mass spectrometry-based strategies that can support their development.
Collapse
Affiliation(s)
- Ryuhjin Ahn
- David H. Koch Institute for Integrative Cancer Research, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Yufei Cui
- David H. Koch Institute for Integrative Cancer Research, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Forest M White
- David H. Koch Institute for Integrative Cancer Research, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
32
|
Li S, Zhu J, Lubman DM, Zhou H, Tang H. GlycoSLASH: Concurrent Glycopeptide Identification from Multiple Related LC-MS/MS Data Sets by Using Spectral Clustering and Library Searching. J Proteome Res 2023; 22:1501-1509. [PMID: 36802412 PMCID: PMC10164058 DOI: 10.1021/acs.jproteome.3c00066] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
Liquid chromatography coupled with tandem mass spectrometry is commonly adopted in large-scale glycoproteomic studies involving hundreds of disease and control samples. The software for glycopeptide identification in such data (e.g., the commercial software Byonic) analyzes the individual data set and does not exploit the redundant spectra of glycopeptides presented in the related data sets. Herein, we present a novel concurrent approach for glycopeptide identification in multiple related glycoproteomic data sets by using spectral clustering and spectral library searching. The evaluation on two large-scale glycoproteomic data sets showed that the concurrent approach can identify 105%-224% more spectra as glycopeptides compared to the glycopeptide identification on individual data sets using Byonic alone. The improvement of glycopeptide identification also enabled the discovery of several potential biomarkers of protein glycosylations in hepatocellular carcinoma patients.
Collapse
Affiliation(s)
- Sujun Li
- Department of Blood Transfusion, The First Affiliated Hospital of Nanchang University, Nanchang 330000, China.,JiangXi Key Laboratory of Transfusion Medicine, Nanchang 330000, China.,Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana 47408, United States
| | - Jianhui Zhu
- Department of Surgery, University of Michigan, Medical Center, Ann Arbor, Michigan 48109, United States
| | - David M Lubman
- Department of Surgery, University of Michigan, Medical Center, Ann Arbor, Michigan 48109, United States
| | - He Zhou
- Shenzhen Dengding Biopharma Co. Ltd., Shenzhen 518000, China
| | - Haixu Tang
- Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana 47408, United States
| |
Collapse
|
33
|
Deutsch EW, Mendoza L, Shteynberg DD, Hoopmann MR, Sun Z, Eng JK, Moritz RL. Trans-Proteomic Pipeline: Robust Mass Spectrometry-Based Proteomics Data Analysis Suite. J Proteome Res 2023; 22:615-624. [PMID: 36648445 PMCID: PMC10166710 DOI: 10.1021/acs.jproteome.2c00624] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The Trans-Proteomic Pipeline (TPP) mass spectrometry data analysis suite has been in continual development and refinement since its first tools, PeptideProphet and ProteinProphet, were published 20 years ago. The current release provides a large complement of tools for spectrum processing, spectrum searching, search validation, abundance computation, protein inference, and more. Many of the tools include machine-learning modeling to extract the most information from data sets and build robust statistical models to compute the probabilities that derived information is correct. Here we present the latest information on the many TPP tools, and how TPP can be deployed on various platforms from personal Windows laptops to Linux clusters and expansive cloud computing environments. We describe tutorials on how to use TPP in a variety of ways and describe synergistic projects that leverage TPP. We conclude with plans for continued development of TPP.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | | | | | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Jimmy K Eng
- Proteomics Resource, University of Washington, Seattle, Washington 98195, United States
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
34
|
Abstract
Spectrum library searching is a powerful alternative to database searching for data dependent acquisition experiments, but has been historically limited to identifying previously observed peptides in libraries. Here we present Scribe, a new library search engine designed to leverage deep learning fragmentation prediction software such as Prosit. Rather than relying on highly curated DDA libraries, this approach predicts fragmentation and retention times for every peptide in a FASTA database. Scribe embeds Percolator for false discovery rate correction and an interference tolerant, label-free quantification integrator for an end-to-end proteomics workflow. By leveraging expected relative fragmentation and retention time values, we find that library searching with Scribe can outperform traditional database searching tools both in terms of sensitivity and quantitative precision. Scribe and its graphical interface are easy to use, freely accessible, and fully open source.
Collapse
Affiliation(s)
- Brian C Searle
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
- Proteome Software Inc., Portland, Oregon97219, United States
| | - Ariana E Shannon
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
| | - Damien Beau Wilburn
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
| |
Collapse
|
35
|
Deutsch EW, Vizcaíno JA, Jones AR, Binz PA, Lam H, Klein J, Bittremieux W, Perez-Riverol Y, Tabb DL, Walzer M, Ricard-Blum S, Hermjakob H, Neumann S, Mak TD, Kawano S, Mendoza L, Van Den Bossche T, Gabriels R, Bandeira N, Carver J, Pullman B, Sun Z, Hoffmann N, Shofstahl J, Zhu Y, Licata L, Quaglia F, Tosatto SCE, Orchard SE. Proteomics Standards Initiative at Twenty Years: Current Activities and Future Work. J Proteome Res 2023; 22:287-301. [PMID: 36626722 PMCID: PMC9903322 DOI: 10.1021/acs.jproteome.2c00637] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Indexed: 01/11/2023]
Abstract
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its leadership, working groups, yearly workshops, and the document process by which proposals are thoroughly and publicly reviewed in order to be ratified as PSI standards. We briefly describe the current state of the many existing PSI standards, some of which remain the same as when originally developed, some of which have undergone subsequent revisions, and some of which have become obsolete. Then the set of proposals currently being developed are described, with an open call to the community for participation in the forging of the next generation of standards. Finally, we describe some synergies and collaborations with other organizations and look to the future in how the PSI will continue to promote the open sharing of data and thus accelerate the progress of the field of proteomics.
Collapse
Affiliation(s)
- Eric W. Deutsch
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Andrew R. Jones
- Institute
of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Pierre-Alain Binz
- Clinical
Chemistry Service, Lausanne University Hospital, 1011 976 Lausanne, Switzerland
| | - Henry Lam
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, P. R. China.
| | - Joshua Klein
- Program for
Bioinformatics, Boston University, Boston, Massachusetts 02215, United States
| | - Wout Bittremieux
- Skaggs
School
of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
- Department
of Computer Science, University of Antwerp, 2020 Antwerpen, Belgium
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - David L. Tabb
- SA MRC
Centre for TB Research, DST/NRF Centre of Excellence for Biomedical
TB Research, Division of Molecular Biology and Human Genetics, Faculty
of Medicine and Health Sciences, Stellenbosch
University, Cape Town 7602, South Africa
| | - Mathias Walzer
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Sylvie Ricard-Blum
- Univ.
Lyon, Université Lyon 1, ICBMS, UMR 5246, 69622 Villeurbanne, France
| | - Henning Hermjakob
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Steffen Neumann
- Bioinformatics
and Scientific Data, Leibniz Institute of
Plant Biochemistry, 06120 Halle, Germany
- German
Centre for Integrative Biodiversity Research (iDiv), 04103 Halle-Jena-Leipzig, Germany
| | - Tytus D. Mak
- Mass Spectrometry
Data Center, National Institute of Standards
and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United
States
| | - Shin Kawano
- Database
Center for Life Science, Joint Support Center for Data Science Research, Research Organization of Information and Systems, Chiba 277-0871, Japan
- Faculty
of Contemporary Society, Toyama University
of International Studies, Toyama 930-1292, Japan
- School
of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Luis Mendoza
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Tim Van Den Bossche
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Nuno Bandeira
- Skaggs
School
of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United States
| | - Jeremy Carver
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United States
| | - Benjamin Pullman
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United States
| | - Zhi Sun
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Nils Hoffmann
- Institute
for Bio- and Geosciences (IBG-5), Forschungszentrum
Jülich GmbH, 52428 Jülich, Germany
| | - Jim Shofstahl
- Thermo
Fisher Scientific, 355 River Oaks Parkway, San Jose, California 95134, United States
| | - Yunping Zhu
- National
Center for Protein Sciences (Beijing), Beijing
Institute of Lifeomics, #38, Life Science Park, Changping District, Beijing 102206, China
| | - Luana Licata
- Fondazione
Human Technopole, 20157 Milan, Italy
- Department
of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Federica Quaglia
- Institute
of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), 70126 Bari, Italy
- Department
of Biomedical Sciences, University of Padova, 35131 Padova, Italy
| | | | - Sandra E. Orchard
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
36
|
Arab I, Fondrie WE, Laukens K, Bittremieux W. Semisupervised Machine Learning for Sensitive Open Modification Spectral Library Searching. J Proteome Res 2023; 22:585-593. [PMID: 36688569 DOI: 10.1021/acs.jproteome.2c00616] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
A key analysis task in mass spectrometry proteomics is matching the acquired tandem mass spectra to their originating peptides by sequence database searching or spectral library searching. Machine learning is an increasingly popular postprocessing approach to maximize the number of confident spectrum identifications that can be obtained at a given false discovery rate threshold. Here, we have integrated semisupervised machine learning in the ANN-SoLo tool, an efficient spectral library search engine that is optimized for open modification searching to identify peptides with any type of post-translational modification. We show that machine learning rescoring boosts the number of spectra that can be identified for both standard searching and open searching, and we provide insights into relevant spectrum characteristics harnessed by the machine learning model. The semisupervised machine learning functionality has now been fully integrated into ANN-SoLo, which is available as open source under the permissive Apache 2.0 license on GitHub at https://github.com/bittremieux/ANN-SoLo.
Collapse
Affiliation(s)
- Issar Arab
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.,Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | | | - Kris Laukens
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.,Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.,Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| |
Collapse
|
37
|
Vu NQ, Yen HC, Fields L, Cao W, Li L. HyPep: An Open-Source Software for Identification and Discovery of Neuropeptides Using Sequence Homology Search. J Proteome Res 2023; 22:420-431. [PMID: 36696582 PMCID: PMC10160011 DOI: 10.1021/acs.jproteome.2c00597] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Neuropeptides are a class of endogenous peptides that have key regulatory roles in biochemical, physiological, and behavioral processes. Mass spectrometry analyses of neuropeptides often rely on protein informatics tools for database searching and peptide identification. As neuropeptide databases are typically experimentally built and comprised of short sequences with high sequence similarity to each other, we developed a novel database searching tool, HyPep, which utilizes sequence homology searching for peptide identification. HyPep aligns de novo sequenced peptides, generated through PEAKS software, with neuropeptide database sequences and identifies neuropeptides based on the alignment score. HyPep performance was optimized using LC-MS/MS measurements of peptide extracts from various Callinectes sapidus neuronal tissue types and compared with a commercial database searching software, PEAKS DB. HyPep identified more neuropeptides from each tissue type than PEAKS DB at 1% false discovery rate, and the false match rate from both programs was 2%. In addition to identification, this report describes how HyPep can aid in the discovery of novel neuropeptides.
Collapse
Affiliation(s)
- Nhu Q Vu
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Hsu-Ching Yen
- Department of Biochemistry, University of Wisconsin-Madison, 433 Babcock Drive, Madison, Wisconsin 53706, United States
| | - Lauren Fields
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Weifeng Cao
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Lingjun Li
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States.,School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
| |
Collapse
|
38
|
Yu Q, Liu X, Keller MP, Navarrete-Perea J, Zhang T, Fu S, Vaites LP, Shuken SR, Schmid E, Keele GR, Li J, Huttlin EL, Rashan EH, Simcox J, Churchill GA, Schweppe DK, Attie AD, Paulo JA, Gygi SP. Sample multiplexing-based targeted pathway proteomics with real-time analytics reveals the impact of genetic variation on protein expression. Nat Commun 2023; 14:555. [PMID: 36732331 PMCID: PMC9894840 DOI: 10.1038/s41467-023-36269-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 01/20/2023] [Indexed: 02/04/2023] Open
Abstract
Targeted proteomics enables hypothesis-driven research by measuring the cellular expression of protein cohorts related by function, disease, or class after perturbation. Here, we present a pathway-centric approach and an assay builder resource for targeting entire pathways of up to 200 proteins selected from >10,000 expressed proteins to directly measure their abundances, exploiting sample multiplexing to increase throughput by 16-fold. The strategy, termed GoDig, requires only a single-shot LC-MS analysis, ~1 µg combined peptide material, a list of up to 200 proteins, and real-time analytics to trigger simultaneous quantification of up to 16 samples for hundreds of analytes. We apply GoDig to quantify the impact of genetic variation on protein expression in mice fed a high-fat diet. We create several GoDig assays to quantify the expression of multiple protein families (kinases, lipid metabolism- and lipid droplet-associated proteins) across 480 fully-genotyped Diversity Outbred mice, revealing protein quantitative trait loci and establishing potential linkages between specific proteins and lipid homeostasis.
Collapse
Affiliation(s)
- Qing Yu
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Xinyue Liu
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Mark P Keller
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | | | - Tian Zhang
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Sipei Fu
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Laura P Vaites
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Steven R Shuken
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Ernst Schmid
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | | | - Jiaming Li
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Edward L Huttlin
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Edrees H Rashan
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Judith Simcox
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | | | - Devin K Schweppe
- Department of Genome Sciences, University of Washington, Seattle, WA, 98105, USA
| | - Alan D Attie
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Joao A Paulo
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Steven P Gygi
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
39
|
Wiebach V. "What I wish I had known before starting my PhD". ANALYTICAL SCIENCE ADVANCES 2023; 4:6-12. [PMID: 38715583 PMCID: PMC10989638 DOI: 10.1002/ansa.202200044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 11/21/2022] [Indexed: 11/17/2024]
Abstract
As a rather recent PhD graduate and still an "early career researcher", the author wondered what to write about that would be interesting for a young scientist. The answer came while overhearing students in the break room stating, "I wish I had known all that before starting my PhD that would have made everything easier!" - An experience many researchers are very familiar with. From simple tricks for laboratory work to choosing the right software or planning the next career steps, this was a reoccurring theme during the career of the author, who will try to give a short personal overview for young researchers, especially in the analytics and/or natural products field. These topics and lists represent a personal opinion and are neither meant to be all-encompassing nor of course might differ from the experiences of other researchers.
Collapse
Affiliation(s)
- Vincent Wiebach
- Department of Biotechnology and BiomedicineTechnical University of DenmarkLyngbyDenmark
| |
Collapse
|
40
|
Dorl S, Winkler S, Mechtler K, Dorfer V. MS Ana: Improving Sensitivity in Peptide Identification with Spectral Library Search. J Proteome Res 2023; 22:462-470. [PMID: 36688604 PMCID: PMC9903325 DOI: 10.1021/acs.jproteome.2c00658] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Spectral library search can enable more sensitive peptide identification in tandem mass spectrometry experiments. However, its drawbacks are the limited availability of high-quality libraries and the added difficulty of creating decoy spectra for result validation. We describe MS Ana, a new spectral library search engine that enables high sensitivity peptide identification using either curated or predicted spectral libraries as well as robust false discovery control through its own decoy library generation algorithm. MS Ana identifies on average 36% more spectrum matches and 4% more proteins than database search in a benchmark test on single-shot human cell-line data. Further, we demonstrate the quality of the result validation with tests on synthetic peptide pools and show the importance of library selection through a comparison of library search performance with different configurations of publicly available human spectral libraries.
Collapse
Affiliation(s)
- Sebastian Dorl
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,Department
of Computer Science, Johannes Kepler University
Linz, Altenbergerstraße
69, 4040Linz, Austria,E-mail: . Phone: +43 (0) 50804
27145
| | - Stephan Winkler
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,Department
of Computer Science, Johannes Kepler University
Linz, Altenbergerstraße
69, 4040Linz, Austria
| | - Karl Mechtler
- Research
Institute of Molecular Pathology (IMP), Protein Chemistry, Campus-Vienna-Biocenter 1, 1030Vienna, Austria,Institute
of Molecular Biotechnology (IMBA), Protein Chemistry, Vienna Biocenter
(VBC), Dr. Bohr-Gasse 3, 1030Vienna, Austria,Gregor
Mendel Institute of Molecular Plant Biology of the Austrian Academy
of Sciences (GMI), Dr.
Bohr Gasse 3, 1030Vienna, Austria
| | - Viktoria Dorfer
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,E-mail: . Phone: +43 (0) 50804
22740
| |
Collapse
|
41
|
Hadjeras L, Bartel J, Maier LK, Maaß S, Vogel V, Svensson SL, Eggenhofer F, Gelhausen R, Müller T, Alkhnbashi OS, Backofen R, Becher D, Sharma CM, Marchfelder A. Revealing the small proteome of Haloferax volcanii by combining ribosome profiling and small-protein optimized mass spectrometry. MICROLIFE 2023; 4:uqad001. [PMID: 37223747 PMCID: PMC10117724 DOI: 10.1093/femsml/uqad001] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 11/29/2022] [Accepted: 01/13/2023] [Indexed: 05/25/2023]
Abstract
In contrast to extensively studied prokaryotic 'small' transcriptomes (encompassing all small noncoding RNAs), small proteomes (here defined as including proteins ≤70 aa) are only now entering the limelight. The absence of a complete small protein catalogue in most prokaryotes precludes our understanding of how these molecules affect physiology. So far, archaeal genomes have not yet been analyzed broadly with a dedicated focus on small proteins. Here, we present a combinatorial approach, integrating experimental data from small protein-optimized mass spectrometry (MS) and ribosome profiling (Ribo-seq), to generate a high confidence inventory of small proteins in the model archaeon Haloferax volcanii. We demonstrate by MS and Ribo-seq that 67% of the 317 annotated small open reading frames (sORFs) are translated under standard growth conditions. Furthermore, annotation-independent analysis of Ribo-seq data showed ribosomal engagement for 47 novel sORFs in intergenic regions. A total of seven of these were also detected by proteomics, in addition to an eighth novel small protein solely identified by MS. We also provide independent experimental evidence in vivo for the translation of 12 sORFs (annotated and novel) using epitope tagging and western blotting, underlining the validity of our identification scheme. Several novel sORFs are conserved in Haloferax species and might have important functions. Based on our findings, we conclude that the small proteome of H. volcanii is larger than previously appreciated, and that combining MS with Ribo-seq is a powerful approach for the discovery of novel small protein coding genes in archaea.
Collapse
Affiliation(s)
- Lydia Hadjeras
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Straße 2 / D15, 97080 Würzburg, Germany
| | - Jürgen Bartel
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | | | - Sandra Maaß
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Verena Vogel
- Biology II, Ulm University, Albert-Einstein-Allee 11, 89081 Ulm, Germany
| | - Sarah L Svensson
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Straße 2 / D15, 97080 Würzburg, Germany
| | - Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Rick Gelhausen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Teresa Müller
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Omer S Alkhnbashi
- Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany
| | - Dörte Becher
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Cynthia M Sharma
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Straße 2 / D15, 97080 Würzburg, Germany
| | - Anita Marchfelder
- Biology II, Ulm University, Albert-Einstein-Allee 11, 89081 Ulm, Germany
| |
Collapse
|
42
|
Matlock AD, Vaibhav V, Holewinski R, Venkatraman V, Dardov V, Manalo DM, Shelley B, Ornelas L, Banuelos M, Mandefro B, Escalante-Chong R, Li J, Finkbeiner S, Fraenkel E, Rothstein J, Thompson L, Sareen D, Svendsen CN, Van Eyk JE. NeuroLINCS Proteomics: Defining human-derived iPSC proteomes and protein signatures of pluripotency. Sci Data 2023; 10:24. [PMID: 36631473 PMCID: PMC9834231 DOI: 10.1038/s41597-022-01687-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 09/07/2022] [Indexed: 01/13/2023] Open
Abstract
The National Institute of Health (NIH) Library of integrated network-based cellular signatures (LINCS) program is premised on the generation of a publicly available data resource of cell-based biochemical responses or "signatures" to genetic or environmental perturbations. NeuroLINCS uses human inducible pluripotent stem cells (hiPSCs), derived from patients and healthy controls, and differentiated into motor neuron cell cultures. This multi-laboratory effort strives to establish i) robust multi-omic workflows for hiPSC and differentiated neuronal cultures, ii) public annotated data sets and iii) relevant and targetable biological pathways of spinal muscular atrophy (SMA) and amyotrophic lateral sclerosis (ALS). Here, we focus on the proteomics and the quality of the developed workflow of hiPSC lines from 6 individuals, though epigenomics and transcriptomics data are also publicly available. Known and commonly used markers representing 73 proteins were reproducibly quantified with consistent expression levels across all hiPSC lines. Data quality assessments, data levels and metadata of all 6 genetically diverse human iPSCs analysed by DIA-MS are parsable and available as a high-quality resource to the public.
Collapse
Affiliation(s)
- Andrea D Matlock
- NeuroLINCS, Advanced Clinical Biosystems Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Vineet Vaibhav
- NeuroLINCS, Advanced Clinical Biosystems Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Ronald Holewinski
- NeuroLINCS, Advanced Clinical Biosystems Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Vidya Venkatraman
- NeuroLINCS, Advanced Clinical Biosystems Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Victoria Dardov
- NeuroLINCS, Advanced Clinical Biosystems Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Danica-Mae Manalo
- NeuroLINCS, Advanced Clinical Biosystems Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Brandon Shelley
- NeuroLINCS, Regenerative Medicine Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Loren Ornelas
- NeuroLINCS, Regenerative Medicine Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Maria Banuelos
- NeuroLINCS, Regenerative Medicine Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Berhan Mandefro
- NeuroLINCS, Regenerative Medicine Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | | | - Jonathan Li
- NeuroLINCS, Department of Biological Engineering, MIT, Cambridge, MA, 02142, USA
| | - Steve Finkbeiner
- NeuroLINCS, Gladstone Institute of Neurological Disease and the Departments of Neurology and Physiology, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Ernest Fraenkel
- NeuroLINCS, Department of Biological Engineering, MIT, Cambridge, MA, 02142, USA
| | - Jeffrey Rothstein
- NeuroLINCS, Department of Neuroscience, Johns Hopkins University, Baltimore, MD, 21205, USA
| | - Leslie Thompson
- NeuroLINCS, Departments of Psychiatry and Human Behaviour, Neurobiology and Behaviour and UCI MIND, University of California Irvine, Irvine, CA, 92697, USA
| | - Dhruv Sareen
- NeuroLINCS, Regenerative Medicine Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Clive N Svendsen
- NeuroLINCS, Regenerative Medicine Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Jennifer E Van Eyk
- NeuroLINCS, Advanced Clinical Biosystems Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA.
| |
Collapse
|
43
|
Cox J. Prediction of peptide mass spectral libraries with machine learning. Nat Biotechnol 2023; 41:33-43. [PMID: 36008611 DOI: 10.1038/s41587-022-01424-w] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 07/11/2022] [Indexed: 01/21/2023]
Abstract
The recent development of machine learning methods to identify peptides in complex mass spectrometric data constitutes a major breakthrough in proteomics. Longstanding methods for peptide identification, such as search engines and experimental spectral libraries, are being superseded by deep learning models that allow the fragmentation spectra of peptides to be predicted from their amino acid sequence. These new approaches, including recurrent neural networks and convolutional neural networks, use predicted in silico spectral libraries rather than experimental libraries to achieve higher sensitivity and/or specificity in the analysis of proteomics data. Machine learning is galvanizing applications that involve large search spaces, such as immunopeptidomics and proteogenomics. Current challenges in the field include the prediction of spectra for peptides with post-translational modifications and for cross-linked pairs of peptides. Permeation of machine-learning-based spectral prediction into search engines and spectrum-centric data-independent acquisition workflows for diverse peptide classes and measurement conditions will continue to push sensitivity and dynamic range in proteomics applications in the coming years.
Collapse
Affiliation(s)
- Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Martinsried, Germany.
- Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway.
| |
Collapse
|
44
|
Bonacina F, Moregola A, Svecla M, Coe D, Uboldi P, Fraire S, Beretta S, Beretta G, Pellegatta F, Catapano AL, Marelli-Berg FM, Norata GD. The low-density lipoprotein receptor-mTORC1 axis coordinates CD8+ T cell activation. J Cell Biol 2022; 221:213488. [PMID: 36129440 PMCID: PMC9499829 DOI: 10.1083/jcb.202202011] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 06/10/2022] [Accepted: 08/08/2022] [Indexed: 11/22/2022] Open
Abstract
Activation of T cells relies on the availability of intracellular cholesterol for an effective response after stimulation. We investigated the contribution of cholesterol derived from extracellular uptake by the low-density lipoprotein (LDL) receptor in the immunometabolic response of T cells. By combining proteomics, gene expression profiling, and immunophenotyping, we described a unique role for cholesterol provided by the LDLR pathway in CD8+ T cell activation. mRNA and protein expression of LDLR was significantly increased in activated CD8+ compared to CD4+ WT T cells, and this resulted in a significant reduction of proliferation and cytokine production (IFNγ, Granzyme B, and Perforin) of CD8+ but not CD4+ T cells from Ldlr -/- mice after in vitro and in vivo stimulation. This effect was the consequence of altered cholesterol routing to the lysosome resulting in a lower mTORC1 activation. Similarly, CD8+ T cells from humans affected by familial hypercholesterolemia (FH) carrying a mutation on the LDLR gene showed reduced activation after an immune challenge.
Collapse
Affiliation(s)
- Fabrizia Bonacina
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - Annalisa Moregola
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - Monika Svecla
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - David Coe
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK
- Centre for Inflammation and Therapeutic Innovation, Queen Mary University of London, Charterhouse Square, London, UK
| | - Patrizia Uboldi
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - Sara Fraire
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - Simona Beretta
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - Giangiacomo Beretta
- Department of Environmental Science and Policy, Università degli Studi di Milano, Milan, Italy
| | - Fabio Pellegatta
- Istituti di Ricovero e Cura a Carattere Scientifico Multimedica, Milan, Italy
| | - Alberico Luigi Catapano
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
- Istituti di Ricovero e Cura a Carattere Scientifico Multimedica, Milan, Italy
| | - Federica M Marelli-Berg
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK
- Centre for Inflammation and Therapeutic Innovation, Queen Mary University of London, Charterhouse Square, London, UK
| | - Giuseppe Danilo Norata
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
- Centro SISA per lo Studio dell'Aterosclerosi, Ospedale Bassini, Cinisello Balsamo, Italy
| |
Collapse
|
45
|
Adams C, Boonen K, Laukens K, Bittremieux W. Open Modification Searching of SARS-CoV-2-Human Protein Interaction Data Reveals Novel Viral Modification Sites. Mol Cell Proteomics 2022; 21:100425. [PMID: 36241021 PMCID: PMC9554009 DOI: 10.1016/j.mcpro.2022.100425] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 09/18/2022] [Accepted: 10/09/2022] [Indexed: 01/18/2023] Open
Abstract
The outbreak of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of the coronavirus 2019 disease, has led to an ongoing global pandemic since 2019. Mass spectrometry can be used to understand the molecular mechanisms of viral infection by SARS-CoV-2, for example, by determining virus-host protein-protein interactions through which SARS-CoV-2 hijacks its human hosts during infection, and to study the role of post-translational modifications. We have reanalyzed public affinity purification-mass spectrometry data using open modification searching to investigate the presence of post-translational modifications in the context of the SARS-CoV-2 virus-host protein-protein interaction network. Based on an over twofold increase in identified spectra, our detected protein interactions show a high overlap with independent mass spectrometry-based SARS-CoV-2 studies and virus-host interactions for alternative viruses, as well as previously unknown protein interactions. In addition, we identified several novel modification sites on SARS-CoV-2 proteins that we investigated in relation to their interactions with host proteins. A detailed analysis of relevant modifications, including phosphorylation, ubiquitination, and S-nitrosylation, provides important hypotheses about the functional role of these modifications during viral infection by SARS-CoV-2.
Collapse
Affiliation(s)
- Charlotte Adams
- Department of Computer Science, University of Antwerp, Antwerp, Belgium,Centre for Proteomics (CFP), University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Centre for Proteomics (CFP), University of Antwerp, Antwerp, Belgium,Sustainable Health Department, Flemish Institute for Technological Research (VITO), Antwerp, Belgium
| | - Kris Laukens
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Wout Bittremieux
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, USA,For correspondence: Wout Bittremieux
| |
Collapse
|
46
|
Zwillinger M, Fischer L, Sályi G, Szabó S, Csékei M, Huc I, Kotschy A. Isotope Ratio Encoding of Sequence-Defined Oligomers. J Am Chem Soc 2022; 144:19078-19088. [PMID: 36206533 DOI: 10.1021/jacs.2c08135] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Information storage at the molecular level commonly entails encoding in the form of ordered sequences of different monomers and subsequent fragmentation and tandem mass spectrometry analysis to read this information. Recent approaches also include the use of mixtures of distinct molecules noncovalently bonded to one another. Here, we present an alternate isotope ratio encoding approach utilizing deuterium-labeled monomers to produce hundreds of oligomers endowed with unique isotope distribution patterns. Mass spectrometric recognition of these patterns then allowed us to directly readout encoded information with high fidelity. Specifically, we show that all 256 tetramers composed of four different monomers of identical constitution can be distinguished by their mass fingerprint using mono-, di-, tri-, and tetradeuterated building blocks. The method is robust to experimental errors and does not require the most sophisticated mass spectrometry instrumentation. Such isotope ratio-encoded oligomers may serve as tags that carry information, but the method mainly opens up the capability to write information, for example, about molecular identity, directly into a pure compound via its isotopologue distribution obviating the need for additional tagging and avoiding the use of mixtures of different molecules.
Collapse
Affiliation(s)
- Márton Zwillinger
- Servier Research Institute of Medicinal Chemistry, H-1031 Budapest, Hungary.,Hevesy György PhD School of Chemistry, Eötvös Loránd University, H-1053 Budapest, Hungary
| | - Lucile Fischer
- CBMN UMR5248, University of Bordeaux-CNRS-IPB, F-33600 Pessac, France
| | - Gergő Sályi
- Servier Research Institute of Medicinal Chemistry, H-1031 Budapest, Hungary
| | - Soma Szabó
- Servier Research Institute of Medicinal Chemistry, H-1031 Budapest, Hungary
| | - Márton Csékei
- Servier Research Institute of Medicinal Chemistry, H-1031 Budapest, Hungary
| | - Ivan Huc
- Department of Pharmacy and Center for Integrated Protein Science, Ludwig-Maximilians-University, D-81377 Munich, Germany
| | - András Kotschy
- Servier Research Institute of Medicinal Chemistry, H-1031 Budapest, Hungary
| |
Collapse
|
47
|
Bittremieux W, May DH, Bilmes J, Noble WS. A learned embedding for efficient joint analysis of millions of mass spectra. Nat Methods 2022; 19:675-678. [DOI: 10.1038/s41592-022-01496-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 04/14/2022] [Indexed: 11/09/2022]
|
48
|
Luo X, Bittremieux W, Griss J, Deutsch EW, Sachsenberg T, Levitsky LI, Ivanov MV, Bubis JA, Gabriels R, Webel H, Sanchez A, Bai M, Käll L, Perez-Riverol Y. A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics. J Proteome Res 2022; 21:1566-1574. [PMID: 35549218 PMCID: PMC9171829 DOI: 10.1021/acs.jproteome.2c00069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Spectrum clustering
is a powerful strategy to minimize redundant
mass spectra by grouping them based on similarity, with the aim of
forming groups of mass spectra from the same repeatedly measured analytes.
Each such group of near-identical spectra can be represented by its
so-called consensus spectrum for downstream processing. Although several
algorithms for spectrum clustering have been adequately benchmarked
and tested, the influence of the consensus spectrum generation step
is rarely evaluated. Here, we present an implementation and benchmark
of common consensus spectrum algorithms, including spectrum averaging,
spectrum binning, the most similar spectrum, and the best-identified
spectrum. We have analyzed diverse public data sets using two different
clustering algorithms (spectra-cluster and MaRaCluster) to evaluate
how the consensus spectrum generation procedure influences downstream
peptide identification. The BEST and BIN methods were found the most
reliable methods for consensus spectrum generation, including for
data sets with post-translational modifications (PTM) such as phosphorylation.
All source code and data of the present study are freely available
on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.
Collapse
Affiliation(s)
- Xiyang Luo
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, 400065 Chongqing, China
| | - Wout Bittremieux
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Johannes Griss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, U.K.,Department of Dermatology, Medical University of Vienna, 1090 Vienna, Austria
| | - Eric W Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Timo Sachsenberg
- Applied Bioinformatics, Department for Computer Science, University of Tuebingen, Sand 14, 72076 Tuebingen, Germany
| | - Lev I Levitsky
- V.L. Talrose Institute for Energy Problems of Chemical Physics, N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 142432, Russia
| | - Mark V Ivanov
- V.L. Talrose Institute for Energy Problems of Chemical Physics, N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 142432, Russia
| | - Julia A Bubis
- V.L. Talrose Institute for Energy Problems of Chemical Physics, N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 142432, Russia
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, B-9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, B-9000 Ghent, Belgium
| | - Henry Webel
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen DK-2200, Denmark
| | - Aniel Sanchez
- Section for Clinical Chemistry, Department of Translational Medicine, Lund University, Skåne University Hospital Malmö, 20502 Malmö, Sweden
| | - Mingze Bai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, 400065 Chongqing, China
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, Royal Institute of Technology - KTH, Box 1031, 17121 Solna, Sweden
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, U.K
| |
Collapse
|
49
|
Shiferaw GA, Gabriels R, Bouwmeester R, Van Den Bossche T, Vandermarliere E, Martens L, Volders PJ. Sensitive and Specific Spectral Library Searching with CompOmics Spectral Library Searching Tool and Percolator. J Proteome Res 2022; 21:1365-1370. [PMID: 35446579 DOI: 10.1021/acs.jproteome.2c00075] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Maintaining high sensitivity while limiting false positives is a key challenge in peptide identification from mass spectrometry data. Here, we investigate the effects of integrating the machine learning-based postprocessor Percolator into our spectral library searching tool COSS (CompOmics Spectral library Searching tool). To evaluate the effects of this postprocessing, we have used 40 data sets from 2 different projects and have searched these against the NIST and MassIVE spectral libraries. The searching is carried out using 2 spectral library search tools, COSS and MSPepSearch with and without Percolator postprocessing, and using sequence database search engine MS-GF+ as a baseline comparator. The addition of the Percolator rescoring step to COSS is effective and results in a substantial improvement in sensitivity and specificity of the identifications. COSS is freely available as open source under the permissive Apache2 license, and binaries and source code are found at https://github.com/compomics/COSS.
Collapse
Affiliation(s)
- Genet Abay Shiferaw
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Elien Vandermarliere
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Pieter-Jan Volders
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
50
|
|