1
|
Wei Q, Li J, Ma J, He QY, Zhang G. DeepMS: super-fast peptide identification using end-to-end deep learning method. J Mol Biol 2025:169237. [PMID: 40449612 DOI: 10.1016/j.jmb.2025.169237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Revised: 05/05/2025] [Accepted: 05/26/2025] [Indexed: 06/03/2025]
Abstract
Mass spectrometry (MS) has emerged as a powerful omics analysis technique, particularly in proteomics, where the initial step involves identifying MS spectra as peptide sequences. However, this process often requires substantial computational resources and expertise, taking hours or even days to complete, thereby limiting the widespread adoption of MS-based omics technologies. To overcome this challenge, we have developed DeepMS, a deep learning-based spectra identification algorithm that overcomes the speed limitations of traditional spectra identification methods. We conducted comprehensive benchmark tests, comparing six deep learning algorithms. Based on the results, we selected the VGG16 algorithm as the core model for DeepMS. This algorithm enables super-fast, end-to-end identification of peptide sequences from MS spectra with high accuracy. DeepMS is adaptable to post-translational modifications, enhancing its versatility. In fact, its identification speed surpasses the generation rate of MS spectra, enabling super-fast identification. Furthermore, we demonstrate the practical application of DeepMS in microorganism detection, highlighting its utility in clinical testing. Through the implementation of DeepMS, our aim is to revolutionize the field of MS-based proteomics and facilitate the broader application of omics technologies, opening new avenues for rapid and efficient analysis in various research and clinical domains.
Collapse
Affiliation(s)
- Qianzhou Wei
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, 510632, China.
| | - Jiamin Li
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, 510632, China.
| | - Jin Ma
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, 510632, China.
| | - Qing-Yu He
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, 510632, China.
| | - Gong Zhang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, 510632, China.
| |
Collapse
|
2
|
Krishnamurthy S, Gunasegaran B, Paul-Heng M, Mohamedali A, P Klare W, Pang CNI, Gluch L, Shin JS, Chan C, Baker MS, Ahn SB, Heng B. Recombinant Protein Spectral Library (rPSL) DIA-MS method improves identification and quantification of low-abundance cancer-associated and kynurenine pathway proteins. Commun Chem 2025; 8:141. [PMID: 40348885 PMCID: PMC12065878 DOI: 10.1038/s42004-025-01531-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Accepted: 04/22/2025] [Indexed: 05/14/2025] Open
Abstract
Data-independent acquisition mass spectrometry (DIA-MS) is a powerful tool for quantitative proteomics, but a well-constructed reference spectral library is crucial to optimize DIA analysis, particularly for low-abundance proteins. In this study, we evaluate the efficacy of a recombinant protein spectral library (rPSL), generated from tryptic digestion of 42 human recombinant proteins, in enhancing the detection and quantification of lower-abundance cancer-associated proteins. Additionally, we generated a combined sample-specific biological-rPSL by integrating the rPSL with a spectral library derived from pooled biological samples. We compared the performance of these libraries for DIA data extraction with standard methods, including sample-specific biological spectral library and library-free DIA methods. Our specific focus was on quantifying cancer-associated proteins, including key enzymes involved in kynurenine pathway, across patient-derived tissues and cell lines. Both rPSL and biological-rPSL-DIA approaches provided significantly improved coverage of lower-abundance proteins, enhancing sensitivity and more consistent protein quantification across matched tumour and adjacent noncancerous tissues from breast and colorectal cancer patients and in cancer cell lines. Overall, our study demonstrates that rPSL and biological-rPSL coupled with DIA-MS workflows, can address the limitations of both biological library-based and library-free DIA methods, offering a robust approach for quantifying low-abundance cancer-associated proteins in complex biological samples.
Collapse
Affiliation(s)
- Shivani Krishnamurthy
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
| | - Bavani Gunasegaran
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
| | - Moumita Paul-Heng
- Transplantation Immunobiology Research Group, Charles Perkins Centre, The University of Sydney, Sydney, Australia
| | - Abidali Mohamedali
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
- Faculty of Science and Engineering, School of Natural Sciences, Macquarie University, Sydney, Australia
| | - William P Klare
- Australian Proteome Analysis Facility, Macquarie University, Sydney, Australia
| | - C N Ignatius Pang
- Australian Proteome Analysis Facility, Macquarie University, Sydney, Australia
| | - Laurence Gluch
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
- The Strathfield Breast and Thyroid Centre, Strathfield, Sydney, Australia
| | - Joo-Shik Shin
- Department of Tissue Pathology and Diagnostic Oncology, Royal Prince Alfred Hospital, Camperdown, Sydney, Australia
- Central Clinical School, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - Charles Chan
- Department of Anatomical Pathology, NSW Health Pathology, Concord Hospital, Sydney, NSW, Australia
- Concord Institute of Academic Surgery, Concord Clinical School, Faculty of Medicine and Health, Concord Hospital, The University of Sydney, Sydney, Australia
| | - Mark S Baker
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
| | - Seong Beom Ahn
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia.
| | - Benjamin Heng
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia.
| |
Collapse
|
3
|
Chan CMJ, Madej D, Chung CKJ, Lam H. Deep Learning-Based Prediction of Decoy Spectra for False Discovery Rate Estimation in Spectral Library Searching. J Proteome Res 2025; 24:2235-2242. [PMID: 40252226 DOI: 10.1021/acs.jproteome.4c00304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2025]
Abstract
With the advantage of extensive coverage, predicted spectral libraries are becoming an attractive alternative in proteomic data analysis. As a popular false discovery rate estimation method, target decoy search has been adopted in library search workflows. While existing decoy methods for curated experimental libraries have been tested, their performance in predicted library scenarios remains unknown. Current methods rely on perturbing real spectra templates, limiting the diversity and number of decoy spectra that can be generated for a given library. In this study, we explore the shuffle-and-predict decoy library generation approach, which can generate decoy spectra without the need for template spectra. Our experiments shed light on decoy method performance for predicted library scenarios and demonstrate the quality of predicted decoys in FDR estimation.
Collapse
Affiliation(s)
- Chak Ming Jerry Chan
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China 999077
| | - Dominik Madej
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China 999077
| | - Chun Kit Jason Chung
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China 999077
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China 999077
| |
Collapse
|
4
|
Schneider M, Zolg DP, Samaras P, Ben Fredj S, Bold D, Guevende A, Hogrebe A, Berger MT, Graber M, Sukumar V, Mamisashvili L, Bronsthein I, Eljagh L, Gessulat S, Seefried F, Schmidt T, Frejno M. A Scalable, Web-Based Platform for Proteomics Data Processing, Result Storage and Analysis. J Proteome Res 2025; 24:1241-1249. [PMID: 39982847 PMCID: PMC11894649 DOI: 10.1021/acs.jproteome.4c00871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 12/20/2024] [Accepted: 01/23/2025] [Indexed: 02/23/2025]
Abstract
The exponential increase in proteomics data presents critical challenges for conventional processing workflows. These pipelines often consist of fragmented software packages, glued together using complex in-house scripts or error-prone manual workflows running on local hardware, which are costly to maintain and scale. The MSAID Platform offers a fully automated, managed proteomics data pipeline, consolidating formerly disjointed functions into unified, API-driven services that cover the entire process from raw data to biological insights. Backed by the cloud-native search algorithm CHIMERYS, as well as scalable cloud compute instances and data lakes, the platform facilitates efficient processing of large data sets, automation of processing via the command line, systematic result storage, analysis, and visualization. The data lake supports elastically growing storage and unified query capabilities, facilitating large-scale analyses and efficient reuse of previously processed data, such as aggregating longitudinally acquired studies. Users interact with the platform via a web interface, CLI client, or API, providing flexible, automated access. Readily available tools for accessing result data include browser-based interrogation and one-click visualizations for statistical analysis. The platform streamlines research processes, making advanced and automated proteomic workflows accessible to a broader range of scientists. The MSAID Platform is globally available via https://platform.msaid.io.
Collapse
|
5
|
Chu F, Lin A. Detecting Human Contaminant Genetically Variant Peptides in Nonhuman Samples. J Proteome Res 2025; 24:579-588. [PMID: 39705712 DOI: 10.1021/acs.jproteome.4c00718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2024]
Abstract
During proteomics data analysis, experimental spectra are searched against a user-defined protein database consisting of proteins that are reasonably expected to be present in the sample. Typically, this database contains the proteome of the organism under study concatenated with expected contaminants, such as trypsin and human keratins. However, there are additional contaminants that are not commonly added to the database. In this study, we describe a new set of protein contaminants and provide evidence that they can be detected in mass spectrometry-based proteomics data. Specifically, we provide evidence that human genetically variant peptides (GVPs) can be detected in nonhuman samples. GVPs are peptides that contain single amino acid polymorphisms that result from nonsynonymous single nucleotide polymorphisms in protein-coding regions of DNA. We reanalyzed previously collected nonhuman data-dependent acquisition (DDA) and data-independent acquisition (DIA) data sets and detected between 0 and 135 GVPs per data set. In addition, we show that GVPs are unlikely to originate from nonhuman sources and that a subset of eight GVPs are commonly detected across data sets.
Collapse
Affiliation(s)
- Fanny Chu
- Chemical and Biological Signatures, Pacific Northwest National Laboratory, Seattle, Washington 98109, United States
| | - Andy Lin
- Chemical and Biological Signatures, Pacific Northwest National Laboratory, Seattle, Washington 98109, United States
| |
Collapse
|
6
|
Balakrishnan A, Winiarek G, Hołówka O, Godlewski J, Bronisz A. Unlocking the secrets of the immunopeptidome: MHC molecules, ncRNA peptides, and vesicles in immune response. Front Immunol 2025; 16:1540431. [PMID: 39944685 PMCID: PMC11814183 DOI: 10.3389/fimmu.2025.1540431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Accepted: 01/13/2025] [Indexed: 05/09/2025] Open
Abstract
The immunopeptidome, a diverse set of peptides presented by Major Histocompatibility Complex (MHC) molecules, is a critical component of immune recognition and response. This review article delves into the mechanisms of peptide presentation by MHC molecules, particularly emphasizing the roles of ncRNA-derived peptides and extracellular vesicles (EVs) in shaping the immunopeptidome landscape. We explore established and emerging insights into MHC molecule interactions with peptides, including the dynamics of peptide loading, transport, and the influence of cellular and genetic variations. The article highlights novel research on non-coding RNA (ncRNA)-derived peptides, which challenge conventional views of antigen processing and presentation and the role of EVs in transporting these peptides, thereby modulating immune responses at remote body sites. This novel research not only challenges conventional views but also opens up new avenues for understanding immune responses. Furthermore, we discuss the implications of these mechanisms in developing therapeutic strategies, particularly for cancer immunotherapy. By conducting a comprehensive analysis of current literature and advanced methodologies in immunopeptidomics, this review aims to deepen the understanding of the complex interplay between MHC peptide presentation and the immune system, offering new perspectives on potential diagnostic and therapeutic applications. Additionally, the interactions between ncRNA-derived peptides and EVs provide a mechanism for the enhanced surface presentation of these peptides and highlight a novel pathway for their systemic distribution, potentially altering immune surveillance and therapeutic landscapes.
Collapse
Affiliation(s)
- Arpita Balakrishnan
- Tumor Microenvironment Laboratory, Mossakowski Medical Research Institute, Polish Academy of Sciences, Warsaw, Poland
- Translational Medicine Doctoral School, Centre of Postgraduate Medical Education, Warsaw, Poland
| | - Gabriela Winiarek
- Tumor Microenvironment Laboratory, Mossakowski Medical Research Institute, Polish Academy of Sciences, Warsaw, Poland
| | - Olga Hołówka
- Tumor Microenvironment Laboratory, Mossakowski Medical Research Institute, Polish Academy of Sciences, Warsaw, Poland
| | - Jakub Godlewski
- Department of NeuroOncology, Mossakowski Medical Research Institute, Polish Academy of Sciences, Warsaw, Poland
| | - Agnieszka Bronisz
- Tumor Microenvironment Laboratory, Mossakowski Medical Research Institute, Polish Academy of Sciences, Warsaw, Poland
| |
Collapse
|
7
|
Zhang Y, Yang Y, Li K, Chen L, Yang Y, Yang C, Xie Z, Wang H, Zhao Q. Enhanced Discovery of Alternative Proteins (AltProts) in Mouse Cardiac Development Using Data-Independent Acquisition (DIA) Proteomics. Anal Chem 2025; 97:1517-1527. [PMID: 39813267 PMCID: PMC11781309 DOI: 10.1021/acs.analchem.4c02924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 11/27/2024] [Accepted: 11/27/2024] [Indexed: 01/18/2025]
Abstract
Alternative proteins (AltProts) are a class of proteins encoded by DNA sequences previously classified as noncoding. Despite their historically being overlooked, recent studies have highlighted their widespread presence and distinctive biological roles. So far, direct detection of AltProt has been relying on data-dependent acquisition (DDA) mass spectrometry (MS). However, data-independent acquisition (DIA) MS, a method that is rapidly gaining popularity for the analysis of canonical proteins, has seen limited application in AltProt research, largely due to the complexities involved in constructing DIA libraries. In this study, we present a novel DIA workflow that leverages a fragmentation spectra predictor for the efficient construction of DIA libraries, significantly enhancing the detection of AltProts. Our method achieved a 2-fold increase in the identification of AltProts and a 50% reduction in missing values compared to DDA. We conducted a comprehensive comparison of four AltProt databases, four DIA-library construction strategies, and three analytical software tools to establish an optimal workflow for AltProt analysis. Utilizing this workflow, we investigated the mouse heart development process and identified over 50 AltProts with differential expression between embryonic and adult heart tissues. Over 30 unannotated mouse AltProts were validated, including ASDURF, which played a crucial role in cardiac development. Our findings not only provide a practical workflow for MS-based AltProt analysis but also reveal novel AltProts with potential significance in biological functions.
Collapse
Affiliation(s)
- Yuanliang Zhang
- Department
of Applied Biology and Chemical Technology, State Key Laboratory of
Chemical Biology and Drug Discovery, Hong
Kong Polytechnic University, Hong Kong 999077, China
| | - Ying Yang
- Department
of Applied Biology and Chemical Technology, State Key Laboratory of
Chemical Biology and Drug Discovery, Hong
Kong Polytechnic University, Hong Kong 999077, China
| | - Kecheng Li
- Department
of Applied Biology and Chemical Technology, State Key Laboratory of
Chemical Biology and Drug Discovery, Hong
Kong Polytechnic University, Hong Kong 999077, China
| | - Lei Chen
- Department
of Applied Biology and Chemical Technology, State Key Laboratory of
Chemical Biology and Drug Discovery, Hong
Kong Polytechnic University, Hong Kong 999077, China
| | - Yang Yang
- Department
of Applied Biology and Chemical Technology, State Key Laboratory of
Chemical Biology and Drug Discovery, Hong
Kong Polytechnic University, Hong Kong 999077, China
| | - Chenxi Yang
- Department
of Applied Biology and Chemical Technology, State Key Laboratory of
Chemical Biology and Drug Discovery, Hong
Kong Polytechnic University, Hong Kong 999077, China
| | - Zhi Xie
- State
Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 510060, China
| | - Hongwei Wang
- State
Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 510060, China
| | - Qian Zhao
- Department
of Applied Biology and Chemical Technology, State Key Laboratory of
Chemical Biology and Drug Discovery, Hong
Kong Polytechnic University, Hong Kong 999077, China
| |
Collapse
|
8
|
Basharat A, Xiong X, Xu T, Zang Y, Sun L, Liu X. TopDIA: A Software Tool for Top-Down Data-Independent Acquisition Proteomics. J Proteome Res 2025; 24:55-64. [PMID: 39641251 PMCID: PMC11705214 DOI: 10.1021/acs.jproteome.4c00293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 10/06/2024] [Accepted: 11/27/2024] [Indexed: 12/07/2024]
Abstract
Top-down mass spectrometry is widely used for proteoform identification, characterization, and quantification owing to its ability to analyze intact proteoforms. In the past decade, top-down proteomics has been dominated by top-down data-dependent acquisition mass spectrometry (TD-DDA-MS), and top-down data-independent acquisition mass spectrometry (TD-DIA-MS) has not been well studied. While TD-DIA-MS produces complex multiplexed tandem mass spectrometry (MS/MS) spectra, which are challenging to confidently identify, it selects more precursor ions for MS/MS analysis and has the potential to increase proteoform identifications compared with TD-DDA-MS. Here we present TopDIA, the first software tool for proteoform identification by TD-DIA-MS. It generates demultiplexed pseudo MS/MS spectra from TD-DIA-MS data and then searches the pseudo MS/MS spectra against a protein sequence database for proteoform identification. We compared the performance of TD-DDA-MS and TD-DIA-MS using Escherichia coli K-12 MG1655 cells and demonstrated that TD-DIA-MS with TopDIA increased proteoform and protein identifications compared with TD-DDA-MS.
Collapse
Affiliation(s)
- Abdul
Rehman Basharat
- Department
of BioHealth Informatics, Luddy School of Informatics, Computing and
Engineering, Indiana University-Purdue University
Indianapolis, Indianapolis, Indiana 46202, United States
| | - Xingzhao Xiong
- Deming
Department of Medicine, Tulane University
School of Medicine, New Orleans, Louisiana 70112, United States
| | - Tian Xu
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yong Zang
- Department
of Biostatistics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Liangliang Sun
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Xiaowen Liu
- Deming
Department of Medicine, Tulane University
School of Medicine, New Orleans, Louisiana 70112, United States
| |
Collapse
|
9
|
Rajczewski AT, Blakeley-Ruiz. JA, Meyer A, Vintila S, McIlvin MR, Van Den Bossche T, Searle BC, Griffin TJ, Saito MA, Kleiner M, Jagtap PD. Data-Independent Acquisition Mass Spectrometry as a Tool for Metaproteomics: Interlaboratory Comparison Using a Model Microbiome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.09.18.613707. [PMID: 39345414 PMCID: PMC11430069 DOI: 10.1101/2024.09.18.613707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Mass spectrometry (MS)-based metaproteomics is used to identify and quantify proteins in microbiome samples, with the frequently used methodology being Data-Dependent Acquisition mass spectrometry (DDA-MS). However, DDA-MS is limited in its ability to reproducibly identify and quantify lower abundant peptides and proteins. To address DDA-MS deficiencies, proteomics researchers have started using Data-Independent Acquisition Mass Spectrometry (DIA-MS) for reproducible detection and quantification of peptides and proteins. We sought to evaluate the reproducibility and accuracy of DIA-MS metaproteomic measurements relative to DDA-MS using a mock community of known taxonomic composition. Artificial microbial communities of known composition were analyzed independently in three laboratories using DDA- and DIA-MS acquisition methods. DIA-MS yielded more protein and peptide identifications than DDA-MS in each laboratory. In addition, the protein and peptide identifications were more reproducible in all laboratories and provided an accurate quantification of proteins and taxonomic groups in the samples. We also identified some limitations of current DIA tools when applied to metaproteomic data, highlighting specific needs to improve DIA tools enabling analysis of metaproteomic datasets from complex microbiomes. Ultimately, DIA-MS represents a promising strategy for MS-based metaproteomics due to its large number of detected proteins and peptides, reproducibility, deep sequencing capabilities, and accurate quantitation.
Collapse
Affiliation(s)
- Andrew T. Rajczewski
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis MN USA
| | | | - Annaliese Meyer
- MIT-WHOI Joint Program in Oceanography/Applied Ocean Science and Engineering, Department of Chemistry, Woods Hole Oceanographic Institution, Woods Hole MA USA, Department of Earth, Atmospheric, and Planetary Sciences, Massachusetts Institute of Technology, Cambridge MA USA
| | - Simina Vintila
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh NC USA
| | - Matthew R. McIlvin
- Department of Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole MA USA
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent Belgium
| | - Brian C. Searle
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus OH USA
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis MN USA
| | - Mak A. Saito
- Department of Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole MA USA
| | - Manuel Kleiner
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh NC USA
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis MN USA
| |
Collapse
|
10
|
Halder A, Dutta S, Srivastava S. A Comprehensive Workflow for Cerebrospinal Fluid Proteomics in the Study of Brain Tumors. Methods Mol Biol 2025; 2914:183-199. [PMID: 40167919 DOI: 10.1007/978-1-0716-4462-1_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
The cerebrospinal fluid (CSF) plays a critical role in maintaining homeostasis within the central nervous system (CNS). Detecting protein alterations in CSF associated with brain tumors offers a unique opportunity to enhance diagnosis, prognosis, recurrence monitoring, and treatment response. As an extracellular component of the CNS, it serves as a valuable reservoir of biomarkers that reflect changes associated with brain tumors-biomarkers that are often difficult to detect in the systemic circulation due to the blood-brain barrier. Advances in liquid biopsy techniques for CSF, coupled with high-throughput technologies, have the potential to significantly benefit patients. Here, we provide a comprehensive overview of methodologies for studying the CSF proteome to identify brain tumor-associated alterations.
Collapse
Affiliation(s)
- Ankit Halder
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Maharashtra, India
| | - Suhisna Dutta
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Maharashtra, India
| | - Sanjeeva Srivastava
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Maharashtra, India.
| |
Collapse
|
11
|
Xu W, Zhang L, Qian X, Sun N, Tu X, Zhou D, Zheng X, Chen J, Xie Z, He T, Qu S, Wang Y, Yang K, Su K, Feng S, Ju B. A deep learning framework for hepatocellular carcinoma diagnosis using MS1 data. Sci Rep 2024; 14:26705. [PMID: 39496730 PMCID: PMC11535524 DOI: 10.1038/s41598-024-77494-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 10/22/2024] [Indexed: 11/06/2024] Open
Abstract
Clinical proteomics analysis is of great significance for analyzing pathological mechanisms and discovering disease-related biomarkers. Using computational methods to accurately predict disease types can effectively improve patient disease diagnosis and prognosis. However, how to eliminate the errors introduced by peptide precursor identification and protein identification for pathological diagnosis remains a major unresolved issue. Here, we develop a powerful end-to-end deep learning model, termed "MS1Former", that is able to classify hepatocellular carcinoma tumors and adjacent non-tumor (normal) tissues directly using raw MS1 spectra without peptide precursor identification. Our model provides accurate discrimination of subtle m/z differences in MS1 between tumor and adjacent non-tumor tissue, as well as more general performance predictions for data-dependent acquisition, data-independent acquisition, and full-scan data. Our model achieves the best performance on multiple external validation datasets. Additionally, we perform a detailed exploration of the model's interpretability. Prospectively, we expect that the advanced end-to-end framework will be more applicable to the classification of other tumors.
Collapse
Affiliation(s)
- Wei Xu
- College of Basic Medical Science, Zhejiang Chinese Medical University, 548 Binwen Rd, Hangzhou, 310053, China
- Key Laboratory of Chinese Medicine Rheumatology of Zhejiang Province, 548 Binwen Rd, Hangzhou, 310053, China
| | - Liying Zhang
- SanOmics AI Co., Ltd, Lingping District, Hangzhou, 311103, China
| | - Xiaoliang Qian
- SanOmics AI Co., Ltd, Lingping District, Hangzhou, 311103, China
| | - Nannan Sun
- SanOmics AI Co., Ltd, Lingping District, Hangzhou, 311103, China
| | - Xiao Tu
- College of Basic Medical Science, Zhejiang Chinese Medical University, 548 Binwen Rd, Hangzhou, 310053, China
- Key Laboratory of Zhejiang Province, Management of Kidney Disease, Hangzhou, 310000, China
| | - Dengfeng Zhou
- SanOmics AI Co., Ltd, Lingping District, Hangzhou, 311103, China
| | - Xiaoping Zheng
- Pathology Department, Shulan (Hangzhou) Hospital, Hangzhou, China
| | - Jia Chen
- School of Life Sciences, Key Laboratory of Structural Biology of Zhejiang Province, Westlake University, Hangzhou, 310024, China
- The Biomedical Research Core Facility, Mass Spectrometry and Metabolomics Core Facility, Westlake University, Hangzhou, 310024, China
| | - Zewen Xie
- SanOmics AI Co., Ltd, Lingping District, Hangzhou, 311103, China
| | - Tao He
- SanOmics AI Co., Ltd, Lingping District, Hangzhou, 311103, China
| | - Shugang Qu
- SanOmics AI Co., Ltd, Lingping District, Hangzhou, 311103, China
| | - Yinjia Wang
- The First People's Hospital of Kunming, Intensive Care Unit, Kunming, 650032, China.
| | - Keda Yang
- Key Laboratory of Artificial Organs and Computational Medicine in Zhejiang Province, Shulan International Medical College, Zhejiang Shuren University, Hangzhou, 310015, China.
| | - Kunkai Su
- The First Affiliated Hospital, State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, Zhejiang University School of Medicine, 79 Qingchun Road, Hangzhou, 310013, China.
| | - Shan Feng
- School of Life Sciences, Key Laboratory of Structural Biology of Zhejiang Province, Westlake University, Hangzhou, 310024, China.
- The Biomedical Research Core Facility, Mass Spectrometry and Metabolomics Core Facility, Westlake University, Hangzhou, 310024, China.
| | - Bin Ju
- SanOmics AI Co., Ltd, Lingping District, Hangzhou, 311103, China.
- Innovative Institute of Basic Medical Sciences, Zhejiang University, Hangzhou, 310022, Zhejiang, China.
| |
Collapse
|
12
|
Wen B, Hsu C, Zeng WF, Riffle M, Chang A, Mudge M, Nunn B, Berg MD, Villén J, MacCoss MJ, Noble WS. Carafe enables high quality in silico spectral library generation for data-independent acquisition proteomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.15.618504. [PMID: 39463980 PMCID: PMC11507862 DOI: 10.1101/2024.10.15.618504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Data-independent acquisition (DIA)-based mass spectrometry is becoming an increasingly popular mass spectrometry acquisition strategy for carrying out quantitative proteomics experiments. Most of the popular DIA search engines make use of in silico generated spectral libraries. However, the generation of high-quality spectral libraries for DIA data analysis remains a challenge, particularly because most such libraries are generated directly from data-dependent acquisition (DDA) data or are from in silico prediction using models trained on DDA data. In this study, we developed Carafe, a tool that generates high-quality experiment-specific in silico spectral libraries by training deep learning models directly on DIA data. We demonstrate the performance of Carafe on a wide range of DIA datasets, where we observe improved fragment ion intensity prediction and peptide detection relative to existing pretrained DDA models.
Collapse
Affiliation(s)
- Bo Wen
- Department of Genome Sciences, University of Washington
| | - Chris Hsu
- Department of Genome Sciences, University of Washington
| | - Wen-Feng Zeng
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Germany
| | | | - Alexis Chang
- Department of Genome Sciences, University of Washington
| | - Miranda Mudge
- Department of Genome Sciences, University of Washington
| | - Brook Nunn
- Department of Genome Sciences, University of Washington
| | | | - Judit Villén
- Department of Genome Sciences, University of Washington
| | | | - William S. Noble
- Department of Genome Sciences, University of Washington
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| |
Collapse
|
13
|
Sze YH, Tse DYY, Zuo B, Li KK, Zhao Q, Jiang X, Kurihara T, Tsubota K, Lam TC. Deep Spectral Library of Mice Retina for Myopia Research: Proteomics Dataset generated by SWATH and DIA-NN. Sci Data 2024; 11:1115. [PMID: 39389962 PMCID: PMC11467338 DOI: 10.1038/s41597-024-03958-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 10/02/2024] [Indexed: 10/12/2024] Open
Abstract
The retina plays a crucial role in processing and decoding visual information, both in normal development and during myopia progression. Recent advancements have introduced a library-independent approach for data-independent acquisition (DIA) analyses. This study demonstrates deep proteome identification and quantification in individual mice retinas during myopia development, with an average of 6,263 ± 86 unique protein groups. We anticipate that the use of a predicted retinal-specific spectral library combined with the robust quantification achieved within this dataset will contribute to a better understanding of the proteome complexity. Furthermore, a comprehensive mice retinal-specific spectral library was generated, encompassing a total identification of 9,401 protein groups, 70,041 peptides, 95,339 precursors, and 761,868 transitions acquired using SWATH-MS acquisition on a ZenoTOF 7600 mass spectrometer. This dataset surpasses the spectral library generated through high-pH reversed-phase fractionation by data-dependent acquisition (DDA). The data is available via ProteomeXchange with the identifier PXD046983. It will also serve as an indispensable reference for investigations in myopia research and other retinal or neurological diseases.
Collapse
Affiliation(s)
- Ying Hon Sze
- Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
- Centre for Eye and Vision Research (CEVR), 17W Hong Kong Science Park, Hung Hom, Hong Kong
| | - Dennis Yan Yin Tse
- Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
- Centre for Eye and Vision Research (CEVR), 17W Hong Kong Science Park, Hung Hom, Hong Kong
- Research Centre for SHARP Vision (RCSV), The Hong Kong Polytechnic University, Hung Hom, Hong Kong
| | - Bing Zuo
- Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
| | - King Kit Li
- Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
| | - Qian Zhao
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
| | - Xiaoyan Jiang
- Department of Ophthalmology, Keio University School of Medicine, Tokyo, Japan
| | - Toshihide Kurihara
- Department of Ophthalmology, Keio University School of Medicine, Tokyo, Japan
| | - Kazuo Tsubota
- Department of Ophthalmology, Keio University School of Medicine, Tokyo, Japan
- Tsubota Laboratory, Inc., Tokyo, Japan
| | - Thomas Cheun Lam
- Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hung Hom, Hong Kong.
- Centre for Eye and Vision Research (CEVR), 17W Hong Kong Science Park, Hung Hom, Hong Kong.
- Research Centre for SHARP Vision (RCSV), The Hong Kong Polytechnic University, Hung Hom, Hong Kong.
- Shenzhen Research Institute, The Hong Kong Polytechnic University, Shenzhen, 518052, China.
| |
Collapse
|
14
|
Dai Y, Yang Y, Wu E, Shen C, Qiao L. Deep Learning Powers Protein Identification from Precursor MS Information. J Proteome Res 2024; 23:3837-3846. [PMID: 39167422 DOI: 10.1021/acs.jproteome.4c00118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2024]
Abstract
Proteome analysis currently heavily relies on tandem mass spectrometry (MS/MS), which does not fully utilize MS1 features, as many precursors remain unselected for MS/MS fragmentation, especially in the cases of low abundance samples and wide abundance dynamic range samples. Therefore, leveraging MS1 features as a complement to MS/MS has become an attractive option to improve the coverage of feature identification. Herein, we propose MonoMS1, an approach combining deep learning-based retention time, ion mobility, detectability prediction, and logistic regression-based scoring for MS1 feature identification. The approach achieved a significant increase in MS1 feature identification based on an E. coli data set. Application of MonoMS1 to data sets with wide dynamic range, such as human serum proteome samples, and with low sample abundance, such as single-cell proteome samples, enabled substantial complementation of MS/MS-based peptide and protein identification. This method opens a new avenue for proteomic analysis and can boost proteomic research on complex samples.
Collapse
Affiliation(s)
- Yameng Dai
- Department of Chemistry, and Shanghai Stomatological Hospital, Fudan University, Shanghai 200000, China
| | - Yi Yang
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou 311200, China
| | - Enhui Wu
- Department of Chemistry, and Shanghai Stomatological Hospital, Fudan University, Shanghai 200000, China
| | - Chengpin Shen
- Shanghai Omicsolution Co., Ltd., Shanghai 201100, China
| | - Liang Qiao
- Department of Chemistry, and Shanghai Stomatological Hospital, Fudan University, Shanghai 200000, China
| |
Collapse
|
15
|
He Q, Guo H, Li Y, He G, Li X, Shuai J. SeFilter-DIA: Squeeze-and-Excitation Network for Filtering High-Confidence Peptides of Data-Independent Acquisition Proteomics. Interdiscip Sci 2024; 16:579-592. [PMID: 38472692 DOI: 10.1007/s12539-024-00611-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/12/2024] [Accepted: 01/21/2024] [Indexed: 03/14/2024]
Abstract
Mass spectrometry is crucial in proteomics analysis, particularly using Data Independent Acquisition (DIA) for reliable and reproducible mass spectrometry data acquisition, enabling broad mass-to-charge ratio coverage and high throughput. DIA-NN, a prominent deep learning software in DIA proteome analysis, generates peptide results but may include low-confidence peptides. Conventionally, biologists have to manually screen peptide fragment ion chromatogram peaks (XIC) for identifying high-confidence peptides, a time-consuming and subjective process prone to variability. In this study, we introduce SeFilter-DIA, a deep learning algorithm, aiming at automating the identification of high-confidence peptides. Leveraging compressed excitation neural network and residual network models, SeFilter-DIA extracts XIC features and effectively discerns between high and low-confidence peptides. Evaluation of the benchmark datasets demonstrates SeFilter-DIA achieving 99.6% AUC on the test set and 97% for other performance indicators. Furthermore, SeFilter-DIA is applicable for screening peptides with phosphorylation modifications. These results demonstrate the potential of SeFilter-DIA to replace manual screening, providing an efficient and objective approach for high-confidence peptide identification while mitigating associated limitations.
Collapse
Affiliation(s)
- Qingzu He
- Department of Physics, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361005, China
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China
| | - Huan Guo
- Department of Physics, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361005, China
| | - Yulin Li
- Department of Physics, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361005, China
| | - Guoqiang He
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China
| | - Xiang Li
- Department of Physics, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361005, China.
| | - Jianwei Shuai
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China.
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, 325001, China.
| |
Collapse
|
16
|
Fedorov II, Protasov SA, Tarasova IA, Gorshkov MV. Ultrafast Proteomics. BIOCHEMISTRY. BIOKHIMIIA 2024; 89:1349-1361. [PMID: 39245450 DOI: 10.1134/s0006297924080017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 06/21/2024] [Accepted: 06/24/2024] [Indexed: 09/10/2024]
Abstract
Current stage of proteomic research in the field of biology, medicine, development of new drugs, population screening, or personalized approaches to therapy dictates the need to analyze large sets of samples within the reasonable experimental time. Until recently, mass spectrometry measurements in proteomics were characterized as unique in identifying and quantifying cellular protein composition, but low throughput, requiring many hours to analyze a single sample. This was in conflict with the dynamics of changes in biological systems at the whole cellular proteome level upon the influence of external and internal factors. Thus, low speed of the whole proteome analysis has become the main factor limiting developments in functional proteomics, where it is necessary to annotate intracellular processes not only in a wide range of conditions, but also over a long period of time. Enormous level of heterogeneity of tissue cells or tumors, even of the same type, dictates the need to analyze biological systems at the level of individual cells. These studies involve obtaining molecular characteristics for tens, if not hundreds of thousands of individual cells, including their whole proteome profiles. Development of mass spectrometry technologies providing high resolution and mass measurement accuracy, predictive chromatography, new methods for peptide separation by ion mobility and processing of proteomic data based on artificial intelligence algorithms have opened a way for significant, if not radical, increase in the throughput of whole proteome analysis and led to implementation of the novel concept of ultrafast proteomics. Work done just in the last few years has demonstrated the proteome-wide analysis throughput of several hundred samples per day at a depth of several thousand proteins, levels unimaginable three or four years ago. The review examines background of these developments, as well as modern methods and approaches that implement ultrafast analysis of the entire proteome.
Collapse
Affiliation(s)
- Ivan I Fedorov
- Moscow Institute of Physics and Technology (National University), Dolgoprudny, Moscow Region, 141700, Russia
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Sergey A Protasov
- Moscow Institute of Physics and Technology (National University), Dolgoprudny, Moscow Region, 141700, Russia
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Irina A Tarasova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia.
| |
Collapse
|
17
|
Lapcik P, Synkova K, Janacova L, Bouchalova P, Potesil D, Nenutil R, Bouchal P. A hybrid DDA/DIA-PASEF based assay library for a deep proteotyping of triple-negative breast cancer. Sci Data 2024; 11:794. [PMID: 39025866 PMCID: PMC11258311 DOI: 10.1038/s41597-024-03632-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 07/10/2024] [Indexed: 07/20/2024] Open
Abstract
Triple-negative breast cancer (TNBC) is the most aggressive subtype of breast cancer, and deeper proteome coverage is needed for its molecular characterization. We present comprehensive library of targeted mass spectrometry assays specific for TNBC and demonstrate its applicability. Proteins were extracted from 105 TNBC tissues and digested. Aliquots were pooled, fractionated using hydrophilic chromatography and analyzed by LC-MS/MS in data-dependent acquisition (DDA) parallel accumulation-serial fragmentation (PASEF) mode on timsTOF Pro LC-MS system. 16 individual lysates were analyzed in data-independent acquisition (DIA)-PASEF mode. Hybrid library was generated in Spectronaut software and covers 244,464 precursors, 168,006 peptides and 11,564 protein groups (FDR = 1%). Application of our library for pilot quantitative analysis of 16 tissues increased identification numbers in Spectronaut 18.5 and DIA-NN 1.8.1 software compared to library-free setting, with Spectronaut achieving the best results represented by 190,310 precursors, 140,566 peptides, and 10,463 protein groups. In conclusion, we introduce assay library that offers the deepest coverage of TNBC proteome to date. The TNBC library is available via PRIDE repository (PXD047793).
Collapse
Grants
- NU22-08-00230 Ministerstvo Zdravotnictví Ceské Republiky (Ministry of Health of the Czech Republic)
- NU22-08-00230 Ministerstvo Zdravotnictví Ceské Republiky (Ministry of Health of the Czech Republic)
- NU22-08-00230 Ministerstvo Zdravotnictví Ceské Republiky (Ministry of Health of the Czech Republic)
- NU22-08-00230 Ministerstvo Zdravotnictví Ceské Republiky (Ministry of Health of the Czech Republic)
- LX22NPO5102 Ministerstvo Školství, Mládeže a Tělovýchovy (Ministry of Education, Youth and Sports)
- LX22NPO5102 Ministerstvo Školství, Mládeže a Tělovýchovy (Ministry of Education, Youth and Sports)
- LX22NPO5102 Ministerstvo Školství, Mládeže a Tělovýchovy (Ministry of Education, Youth and Sports)
- LX22NPO5102 Ministerstvo Školství, Mládeže a Tělovýchovy (Ministry of Education, Youth and Sports)
- CZ.02.1.01/0.0/0.0/18_046/0015974 Ministerstvo Školství, Mládeže a Tělovýchovy (Ministry of Education, Youth and Sports)
- LM2023033 Ministerstvo Školství, Mládeže a Tělovýchovy (Ministry of Education, Youth and Sports)
Collapse
Affiliation(s)
- Petr Lapcik
- Department of Biochemistry, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Klara Synkova
- Department of Biochemistry, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Lucia Janacova
- Department of Biochemistry, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Pavla Bouchalova
- Department of Biochemistry, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - David Potesil
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Rudolf Nenutil
- Department of Oncological Pathology, Masaryk Memorial Cancer Institute, Brno, Czech Republic
| | - Pavel Bouchal
- Department of Biochemistry, Faculty of Science, Masaryk University, Brno, Czech Republic.
| |
Collapse
|
18
|
WU E, QIAO L. [Microbial metaproteomics--From sample processing to data acquisition and analysis]. Se Pu 2024; 42:658-668. [PMID: 38966974 PMCID: PMC11224941 DOI: 10.3724/sp.j.1123.2024.02009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Indexed: 07/06/2024] Open
Abstract
Microorganisms are closely associated with human diseases and health. Understanding the composition and function of microbial communities requires extensive research. Metaproteomics has recently become an important method for throughout and in-depth study of microorganisms. However, major challenges in terms of sample processing, mass spectrometric data acquisition, and data analysis limit the development of metaproteomics owing to the complexity and high heterogeneity of microbial community samples. In metaproteomic analysis, optimizing the preprocessing method for different types of samples and adopting different microbial isolation, enrichment, extraction, and lysis schemes are often necessary. Similar to those for single-species proteomics, the mass spectrometric data acquisition modes for metaproteomics include data-dependent acquisition (DDA) and data-independent acquisition (DIA). DIA can collect comprehensive peptide information from a sample and holds great potential for future development. However, data analysis for DIA is challenged by the complexity of metaproteome samples, which hinders the deeper coverage of metaproteomes. The most important step in data analysis is the construction of a protein sequence database. The size and completeness of the database strongly influence not only the number of identifications, but also analyses at the species and functional levels. The current gold standard for metaproteome database construction is the metagenomic sequencing-based protein sequence database. A public database-filtering method based on an iterative database search has been proven to have strong practical value. The peptide-centric DIA data analysis method is a mainstream data analysis strategy. The development of deep learning and artificial intelligence will greatly promote the accuracy, coverage, and speed of metaproteomic analysis. In terms of downstream bioinformatics analysis, a series of annotation tools that can perform species annotation at the protein, peptide, and gene levels has been developed in recent years to determine the composition of microbial communities. The functional analysis of microbial communities is a unique feature of metaproteomics compared with other omics approaches. Metaproteomics has become an important component of the multi-omics analysis of microbial communities, and has great development potential in terms of depth of coverage, sensitivity of detection, and completeness of data analysis.
Collapse
|
19
|
Wu E, Xu G, Xie D, Qiao L. Data-independent acquisition in metaproteomics. Expert Rev Proteomics 2024; 21:271-280. [PMID: 39152734 DOI: 10.1080/14789450.2024.2394190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 08/12/2024] [Accepted: 08/14/2024] [Indexed: 08/19/2024]
Abstract
INTRODUCTION Metaproteomics offers insights into the function of complex microbial communities, while it is also capable of revealing microbe-microbe and host-microbe interactions. Data-independent acquisition (DIA) mass spectrometry is an emerging technology, which holds great potential to achieve deep and accurate metaproteomics with higher reproducibility yet still facing a series of challenges due to the inherent complexity of metaproteomics and DIA data. AREAS COVERED This review offers an overview of the DIA metaproteomics approaches, covering aspects such as database construction, search strategy, and data analysis tools. Several cases of current DIA metaproteomics studies are presented to illustrate the procedures. Important ongoing challenges are also highlighted. Future perspectives of DIA methods for metaproteomics analysis are further discussed. Cited references are searched through and collected from Google Scholar and PubMed. EXPERT OPINION Considering the inherent complexity of DIA metaproteomics data, data analysis strategies specifically designed for interpretation are imperative. From this point of view, we anticipate that deep learning methods and de novo sequencing methods will become more prevalent in the future, potentially improving protein coverage in metaproteomics. Moreover, the advancement of metaproteomics also depends on the development of sample preparation methods, data analysis strategies, etc. These factors are key to unlocking the full potential of metaproteomics.
Collapse
Affiliation(s)
- Enhui Wu
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China
- Department of Chemistry, Fudan University, Shanghai, China
| | - Guanyang Xu
- Department of Chemistry, Fudan University, Shanghai, China
| | - Dong Xie
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China
| | - Liang Qiao
- Department of Chemistry, Fudan University, Shanghai, China
| |
Collapse
|
20
|
He G, He Q, Cheng J, Yu R, Shuai J, Cao Y. ProPept-MT: A Multi-Task Learning Model for Peptide Feature Prediction. Int J Mol Sci 2024; 25:7237. [PMID: 39000344 PMCID: PMC11241495 DOI: 10.3390/ijms25137237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 06/26/2024] [Accepted: 06/28/2024] [Indexed: 07/16/2024] Open
Abstract
In the realm of quantitative proteomics, data-independent acquisition (DIA) has emerged as a promising approach, offering enhanced reproducibility and quantitative accuracy compared to traditional data-dependent acquisition (DDA) methods. However, the analysis of DIA data is currently hindered by its reliance on project-specific spectral libraries derived from DDA analyses, which not only limits proteome coverage but also proves to be a time-intensive process. To overcome these challenges, we propose ProPept-MT, a novel deep learning-based multi-task prediction model designed to accurately forecast key features such as retention time (RT), ion intensity, and ion mobility (IM). Leveraging advanced techniques such as multi-head attention and BiLSTM for feature extraction, coupled with Nash-MTL for gradient coordination, ProPept-MT demonstrates superior prediction performance. Integrating ion mobility alongside RT, mass-to-charge ratio (m/z), and ion intensity forms 4D proteomics. Then, we outline a comprehensive workflow tailored for 4D DIA proteomics research, integrating the use of 4D in silico libraries predicted by ProPept-MT. Evaluation on a benchmark dataset showcases ProPept-MT's exceptional predictive capabilities, with impressive results including a 99.9% Pearson correlation coefficient (PCC) for RT prediction, a median dot product (DP) of 96.0% for fragment ion intensity prediction, and a 99.3% PCC for IM prediction on the test set. Notably, ProPept-MT manifests efficacy in predicting both unmodified and phosphorylated peptides, underscoring its potential as a valuable tool for constructing high-quality 4D DIA in silico libraries.
Collapse
Affiliation(s)
- Guoqiang He
- Postgraduate Training Base Alliance, Wenzhou Medical University, Wenzhou 325000, China
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325000, China
| | - Qingzu He
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen 361005, China
| | - Jinyan Cheng
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325000, China
| | - Rongwen Yu
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325000, China
| | - Jianwei Shuai
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325000, China
| | - Yi Cao
- Postgraduate Training Base Alliance, Wenzhou Medical University, Wenzhou 325000, China
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325000, China
| |
Collapse
|
21
|
Baker C, Bruderer R, Abbott J, Arthur JSC, Brenes AJ. Optimizing Spectronaut Search Parameters to Improve Data Quality with Minimal Proteome Coverage Reductions in DIA Analyses of Heterogeneous Samples. J Proteome Res 2024; 23:1926-1936. [PMID: 38691771 PMCID: PMC11165578 DOI: 10.1021/acs.jproteome.3c00671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/18/2024] [Accepted: 04/19/2024] [Indexed: 05/03/2024]
Abstract
Data-independent acquisition has seen breakthroughs that enable comprehensive proteome profiling using short gradients. As the proteome coverage continues to increase, the quality of the data generated becomes much more relevant. Using Spectronaut, we show that the default search parameters can be easily optimized to minimize the occurrence of false positives across different samples. Using an immunological infection model system to demonstrate the impact of adjusting search settings, we analyzed Mus musculus macrophages and compared their proteome to macrophages spiked withCandida albicans. This experimental system enabled the identification of "false positives" as Candida albicans peptides and proteins should not be present in the Mus musculus-only samples. We show that adjusting the search parameters reduced "false positive" identifications by 89% at the peptide and protein level, thereby considerably increasing the quality of the data. We also show that these optimized parameters incurred a moderate cost, only reducing the overall number of "true positive" identifications across each biological replicate by <6.7% at both the peptide and protein level. We believe the value of our updated search parameters extends beyond a two-organism analysis and would be of great value to any DIA experiment analyzing heterogeneous populations of cell types or tissues.
Collapse
Affiliation(s)
- Christa
P. Baker
- Division
of Cell Signalling & Immunology, School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| | | | - James Abbott
- Data
Analysis Group, Division of Computational Biology, School of Life
Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| | - J. Simon C. Arthur
- Division
of Cell Signalling & Immunology, School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| | - Alejandro J. Brenes
- Division
of Cell Signalling & Immunology, School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| |
Collapse
|
22
|
Hamaneh M, Ogurtsov AY, Obolensky OI, Yu YK. Systematic Assessment of Deep Learning-Based Predictors of Fragmentation Intensity Profiles. J Proteome Res 2024; 23:1983-1999. [PMID: 38728051 PMCID: PMC11165591 DOI: 10.1021/acs.jproteome.3c00857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 03/05/2024] [Accepted: 04/16/2024] [Indexed: 06/13/2024]
Abstract
In recent years, several deep learning-based methods have been proposed for predicting peptide fragment intensities. This study aims to provide a comprehensive assessment of six such methods, namely Prosit, DeepMass:Prism, pDeep3, AlphaPeptDeep, Prosit Transformer, and the method proposed by Guan et al. To this end, we evaluated the accuracy of the predicted intensity profiles for close to 1.7 million precursors (including both tryptic and HLA peptides) corresponding to more than 18 million experimental spectra procured from 40 independent submissions to the PRIDE repository that were acquired for different species using a variety of instruments and different dissociation types/energies. Specifically, for each method, distributions of similarity (measured by Pearson's correlation and normalized angle) between the predicted and the corresponding experimental b and y fragment intensities were generated. These distributions were used to ascertain the prediction accuracy and rank the prediction methods for particular types of experimental conditions. The effect of variables like precursor charge, length, and collision energy on the prediction accuracy was also investigated. In addition to prediction accuracy, the methods were evaluated in terms of prediction speed. The systematic assessment of these six methods may help in choosing the right method for MS/MS spectra prediction for particular needs.
Collapse
Affiliation(s)
- Mehdi
B. Hamaneh
- National Center for Biotechnology
Information, National Library of Medicine,
National Institutes of Health, Bethesda, Maryland 20894, United States
| | - Aleksey Y. Ogurtsov
- National Center for Biotechnology
Information, National Library of Medicine,
National Institutes of Health, Bethesda, Maryland 20894, United States
| | | | - Yi-Kuo Yu
- National Center for Biotechnology
Information, National Library of Medicine,
National Institutes of Health, Bethesda, Maryland 20894, United States
| |
Collapse
|
23
|
Staes A, Mendes Maia T, Dufour S, Bouwmeester R, Gabriels R, Martens L, Gevaert K, Impens F, Devos S. Benefit of In Silico Predicted Spectral Libraries in Data-Independent Acquisition Data Analysis Workflows. J Proteome Res 2024; 23:2078-2089. [PMID: 38666436 DOI: 10.1021/acs.jproteome.4c00048] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2025]
Abstract
Data-independent acquisition (DIA) has become a well-established method for MS-based proteomics. However, the list of options to analyze this type of data is quite extensive, and the use of spectral libraries has become an important factor in DIA data analysis. More specifically the use of in silico predicted libraries is gaining more interest. By working with a differential spike-in of human standard proteins (UPS2) in a constant yeast tryptic digest background, we evaluated the sensitivity, precision, and accuracy of the use of in silico predicted libraries in data DIA data analysis workflows compared to more established workflows. Three commonly used DIA software tools, DIA-NN, EncyclopeDIA, and Spectronaut, were each tested in spectral library mode and spectral library-free mode. In spectral library mode, we used independent spectral library prediction tools PROSIT and MS2PIP together with DeepLC, next to classical data-dependent acquisition (DDA)-based spectral libraries. In total, we benchmarked 12 computational workflows for DIA. Our comparison showed that DIA-NN reached the highest sensitivity while maintaining a good compromise on the reproducibility and accuracy levels in either library-free mode or using in silico predicted libraries pointing to a general benefit in using in silico predicted libraries.
Collapse
Affiliation(s)
- An Staes
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- VIB Proteomics Core, B9052 Ghent, Belgium
| | - Teresa Mendes Maia
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- VIB Proteomics Core, B9052 Ghent, Belgium
| | - Sara Dufour
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- VIB Proteomics Core, B9052 Ghent, Belgium
| | - Robbin Bouwmeester
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
| | - Ralf Gabriels
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
| | - Lennart Martens
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
| | - Kris Gevaert
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
| | - Francis Impens
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- VIB Proteomics Core, B9052 Ghent, Belgium
| | - Simon Devos
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- VIB Proteomics Core, B9052 Ghent, Belgium
| |
Collapse
|
24
|
Lu XY, Wu HP, Ma H, Li H, Li J, Liu YT, Pan ZY, Xie Y, Wang L, Ren B, Liu GK. Deep Learning-Assisted Spectrum-Structure Correlation: State-of-the-Art and Perspectives. Anal Chem 2024; 96:7959-7975. [PMID: 38662943 DOI: 10.1021/acs.analchem.4c01639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Spectrum-structure correlation is playing an increasingly crucial role in spectral analysis and has undergone significant development in recent decades. With the advancement of spectrometers, the high-throughput detection triggers the explosive growth of spectral data, and the research extension from small molecules to biomolecules accompanies massive chemical space. Facing the evolving landscape of spectrum-structure correlation, conventional chemometrics becomes ill-equipped, and deep learning assisted chemometrics rapidly emerges as a flourishing approach with superior ability of extracting latent features and making precise predictions. In this review, the molecular and spectral representations and fundamental knowledge of deep learning are first introduced. We then summarize the development of how deep learning assist to establish the correlation between spectrum and molecular structure in the recent 5 years, by empowering spectral prediction (i.e., forward structure-spectrum correlation) and further enabling library matching and de novo molecular generation (i.e., inverse spectrum-structure correlation). Finally, we highlight the most important open issues persisted with corresponding potential solutions. With the fast development of deep learning, it is expected to see ultimate solution of establishing spectrum-structure correlation soon, which would trigger substantial development of various disciplines.
Collapse
Affiliation(s)
- Xin-Yu Lu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hao-Ping Wu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| | - Hao Ma
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hui Li
- Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen 361005, P. R. China
| | - Jia Li
- Institute of Artificial Intelligence, Xiamen University, Xiamen 361005, P. R. China
| | - Yan-Ti Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Zheng-Yan Pan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yi Xie
- School of Informatics, Xiamen University, Xiamen 361005, P. R. China
| | - Lei Wang
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361005, P. R. China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Guo-Kun Liu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| |
Collapse
|
25
|
Lee H, Ozbulak U, Park H, Depuydt S, De Neve W, Vankerschaver J. Assessing the reliability of point mutation as data augmentation for deep learning with genomic data. BMC Bioinformatics 2024; 25:170. [PMID: 38689247 PMCID: PMC11059627 DOI: 10.1186/s12859-024-05787-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 04/15/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Deep neural networks (DNNs) have the potential to revolutionize our understanding and treatment of genetic diseases. An inherent limitation of deep neural networks, however, is their high demand for data during training. To overcome this challenge, other fields, such as computer vision, use various data augmentation techniques to artificially increase the available training data for DNNs. Unfortunately, most data augmentation techniques used in other domains do not transfer well to genomic data. RESULTS Most genomic data possesses peculiar properties and data augmentations may significantly alter the intrinsic properties of the data. In this work, we propose a novel data augmentation technique for genomic data inspired by biology: point mutations. By employing point mutations as substitutes for codons, we demonstrate that our newly proposed data augmentation technique enhances the performance of DNNs across various genomic tasks that involve coding regions, such as translation initiation and splice site detection. CONCLUSION Silent and missense mutations are found to positively influence effectiveness, while nonsense mutations and random mutations in non-coding regions generally lead to degradation. Overall, point mutation-based augmentations in genomic datasets present valuable opportunities for improving the accuracy and reliability of predictive models for DNA sequences.
Collapse
Affiliation(s)
| | - Utku Ozbulak
- Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, South Korea
| | - Homin Park
- Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, South Korea
- IDLab, Department of Electronics and Information Systems, Ghent University, Ghent, Belgium
| | - Stephen Depuydt
- Erasmus Brussels University of Applied Sciences and Arts, Brussels, Belgium
| | - Wesley De Neve
- Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, South Korea
- IDLab, Department of Electronics and Information Systems, Ghent University, Ghent, Belgium
| | - Joris Vankerschaver
- Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, South Korea.
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.
| |
Collapse
|
26
|
Basharat AR, Xiong X, Xu T, Zang Y, Sun L, Liu X. TopDIA: A Software Tool for Top-Down Data-Independent Acquisition Proteomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.05.588302. [PMID: 38645171 PMCID: PMC11030422 DOI: 10.1101/2024.04.05.588302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Top-down mass spectrometry is widely used for proteoform identification, characterization, and quantification owing to its ability to analyze intact proteoforms. In the last decade, top-down proteomics has been dominated by top-down data-dependent acquisition mass spectrometry (TD-DDA-MS), and top-down data-independent acquisition mass spectrometry (TD-DIA-MS) has not been well studied. While TD-DIA-MS produces complex multiplexed tandem mass spectrometry (MS/MS) spectra, which are challenging to confidently identify, it selects more precursor ions for MS/MS analysis and has the potential to increase proteoform identifications compared with TD-DDA-MS. Here we present TopDIA, the first software tool for proteoform identification by TD-DIA-MS. It generates demultiplexed pseudo MS/MS spectra from TD-DIA-MS data and then searches the pseudo MS/MS spectra against a protein sequence database for proteoform identification. We compared the performance of TD-DDA-MS and TD-DIA-MS using Escherichia coli K-12 MG1655 cells and demonstrated that TD-DIA-MS with TopDIA increased proteoform and protein identifications compared with TD-DDA-MS.
Collapse
Affiliation(s)
- Abdul Rehman Basharat
- Department of BioHealth Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202, USA
| | - Xingzhao Xiong
- Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, 70112, USA
| | - Tian Xu
- Department of Chemistry, Michigan State University, East Lansing, MI, 48824, USA
| | - Yong Zang
- Department of Biostatistics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Liangliang Sun
- Department of Chemistry, Michigan State University, East Lansing, MI, 48824, USA
| | - Xiaowen Liu
- Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, 70112, USA
| |
Collapse
|
27
|
Yang Y, Fang Q. Prediction of glycopeptide fragment mass spectra by deep learning. Nat Commun 2024; 15:2448. [PMID: 38503734 PMCID: PMC10951270 DOI: 10.1038/s41467-024-46771-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 03/11/2024] [Indexed: 03/21/2024] Open
Abstract
Deep learning has achieved a notable success in mass spectrometry-based proteomics and is now emerging in glycoproteomics. While various deep learning models can predict fragment mass spectra of peptides with good accuracy, they cannot cope with the non-linear glycan structure in an intact glycopeptide. Herein, we present DeepGlyco, a deep learning-based approach for the prediction of fragment spectra of intact glycopeptides. Our model adopts tree-structured long-short term memory networks to process the glycan moiety and a graph neural network architecture to incorporate potential fragmentation pathways of a specific glycan structure. This feature is beneficial to model explainability and differentiation ability of glycan structural isomers. We further demonstrate that predicted spectral libraries can be used for data-independent acquisition glycoproteomics as a supplement for library completeness. We expect that this work will provide a valuable deep learning resource for glycoproteomics.
Collapse
Affiliation(s)
- Yi Yang
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, 311200, China.
| | - Qun Fang
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, 311200, China.
- Department of Chemistry, Zhejiang University, Hangzhou, 310058, China.
| |
Collapse
|
28
|
Palstrøm NB, Campbell AJ, Lindegaard CA, Cakar S, Matthiesen R, Beck HC. Spectral library search for improved TMTpro labelled peptide assignment in human plasma proteomics. Proteomics 2024; 24:e2300236. [PMID: 37706597 DOI: 10.1002/pmic.202300236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 09/15/2023]
Abstract
Clinical biomarker discovery is often based on the analysis of human plasma samples. However, the high dynamic range and complexity of plasma pose significant challenges to mass spectrometry-based proteomics. Current methods for improving protein identifications require laborious pre-analytical sample preparation. In this study, we developed and evaluated a TMTpro-specific spectral library for improved protein identification in human plasma proteomics. The library was constructed by LC-MS/MS analysis of highly fractionated TMTpro-tagged human plasma, human cell lysates, and relevant arterial tissues. The library was curated using several quality filters to ensure reliable peptide identifications. Our results show that spectral library searching using the TMTpro spectral library improves the identification of proteins in plasma samples compared to conventional sequence database searching. Protein identifications made by the spectral library search engine demonstrated a high degree of complementarity with the sequence database search engine, indicating the feasibility of increasing the number of protein identifications without additional pre-analytical sample preparation. The TMTpro-specific spectral library provides a resource for future plasma proteomics research and optimization of search algorithms for greater accuracy and speed in protein identifications in human plasma proteomics, and is made publicly available to the research community via ProteomeXchange with identifier PXD042546.
Collapse
Affiliation(s)
- Nicolai B Palstrøm
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| | - Amanda J Campbell
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| | | | - Samir Cakar
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| | - Rune Matthiesen
- Computational and Experimental Biology Group, CEDOC, Chronic Diseases Research Centre, NOVA Medical School, Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Lisbon, Portugal
| | - Hans C Beck
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| |
Collapse
|
29
|
Lapin J, Yan X, Dong Q. UniSpec: Deep Learning for Predicting the Full Range of Peptide Fragment Ion Series to Enhance the Proteomics Data Analysis Workflow. Anal Chem 2024. [PMID: 38329031 DOI: 10.1021/acs.analchem.3c02321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
We present UniSpec, an attention-driven deep neural network designed to predict comprehensive collision-induced fragmentation spectra, thereby improving peptide identification in shotgun proteomics. Utilizing a training data set of 1.8 million unique high-quality tandem mass spectra (MS2) from 0.8 million unique peptide ions, UniSpec learned with a peptide fragmentation dictionary encompassing 7919 fragment peaks. Among these, 5712 are neutral loss peaks, with 2310 corresponding to modification-specific neutral losses. Remarkably, UniSpec can predict 73%-77% of fragment intensities based on our NIST reference library spectra, a significant leap from the 35%-45% coverage of only b and y ions. Comparative studies with Prosit elucidate that while both models are strong at predicting their respective fragment ion series, UniSpec particularly shines in generating more complex MS2 spectra with diverse ion annotations. The integration of UniSpec's predictions into shotgun proteomics data analysis boosts the identification rate of tryptic peptides by 48% at a 1% false discovery rate (FDR) and 60% at a more confident 0.1% FDR. Using UniSpec's predicted in-silico spectral library, the search results closely matched those from search engines and experimental spectral libraries used in peptide identification, highlighting its potential as a stand-alone identification tool. The source code and Python scripts are available on GitHub (https://github.com/usnistgov/UniSpec) and Zenodo (https://zenodo.org/records/10452792), and all data sets and analysis results generated in this work were deposited in Zenodo (https://zenodo.org/records/10052268).
Collapse
Affiliation(s)
- Joel Lapin
- Department of Physics, Georgetown University, Washington, D.C. 20057, United States
- Associate, Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Xinjian Yan
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Qian Dong
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
30
|
Lou R, Shui W. Acquisition and Analysis of DIA-Based Proteomic Data: A Comprehensive Survey in 2023. Mol Cell Proteomics 2024; 23:100712. [PMID: 38182042 PMCID: PMC10847697 DOI: 10.1016/j.mcpro.2024.100712] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/27/2023] [Accepted: 01/02/2024] [Indexed: 01/07/2024] Open
Abstract
Data-independent acquisition (DIA) mass spectrometry (MS) has emerged as a powerful technology for high-throughput, accurate, and reproducible quantitative proteomics. This review provides a comprehensive overview of recent advances in both the experimental and computational methods for DIA proteomics, from data acquisition schemes to analysis strategies and software tools. DIA acquisition schemes are categorized based on the design of precursor isolation windows, highlighting wide-window, overlapping-window, narrow-window, scanning quadrupole-based, and parallel accumulation-serial fragmentation-enhanced DIA methods. For DIA data analysis, major strategies are classified into spectrum reconstruction, sequence-based search, library-based search, de novo sequencing, and sequencing-independent approaches. A wide array of software tools implementing these strategies are reviewed, with details on their overall workflows and scoring approaches at different steps. The generation and optimization of spectral libraries, which are critical resources for DIA analysis, are also discussed. Publicly available benchmark datasets covering global proteomics and phosphoproteomics are summarized to facilitate performance evaluation of various software tools and analysis workflows. Continued advances and synergistic developments of versatile components in DIA workflows are expected to further enhance the power of DIA-based proteomics.
Collapse
Affiliation(s)
- Ronghui Lou
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| | - Wenqing Shui
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| |
Collapse
|
31
|
Jiao X, Li X, Zhang N, Zhang W, Yan B, Huang J, Zhao J, Zhang H, Chen W, Fan D. Postmortem Muscle Proteome Characteristics of Silver Carp ( Hypophthalmichthys molitrix): Insights from Full-Length Transcriptome and Deep 4D Label-Free Proteomic. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:1376-1390. [PMID: 38165648 DOI: 10.1021/acs.jafc.3c06902] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]
Abstract
The coverage of the protein database directly determines the results of shotgun proteomics. In this study, PacBio single-molecule real-time sequencing technology was performed on postmortem silver carp muscle transcripts. A total of 42.43 Gb clean data, 35,834 nonredundant transcripts, and 15,413 unigenes were obtained. In total, 99.32% of the unigenes were successfully annotated and assigned specific functions. PacBio long-read isoform sequencing (Iso-Seq) analysis can provide more accurate protein information with a higher proportion of complete coding sequences and longer lengths. Subsequently, 2671 proteins were identified in deep 4D proteomics informed by a full-length transcriptomics technique, which has been shown to improve the identification of low-abundance muscle proteins and potential protein isoforms. The feature of the sarcomeric protein profile and information on more than 30 major proteins in the white dorsal muscle of silver carp were reported here for the first time. Overall, this study provides valuable transcriptome data resources and the comprehensive muscle protein information detected to date for further study into the processing characteristic of early postmortem fish muscle, as well as a spectral library for data-independent acquisition and data processing. This batch of muscle-specific dependent acquisition data is available via PRIDE with identifier PXD043702.
Collapse
Affiliation(s)
- Xidong Jiao
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Xingying Li
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Nana Zhang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi 214122, China
- Key Laboratory of Refrigeration and Conditioning Aquatic Products Processing, Ministry of Agriculture and Rural Affairs, Xiamen 361022, China
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Wenhai Zhang
- Key Laboratory of Refrigeration and Conditioning Aquatic Products Processing, Ministry of Agriculture and Rural Affairs, Xiamen 361022, China
- Fujian Provincial Key Laboratory of Refrigeration and Conditioning Aquatic Products Processing, Xiamen 361022, China
- Anjoy Foods Group Co., Ltd., Xiamen 361022, China
| | - Bowen Yan
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi 214122, China
- Key Laboratory of Refrigeration and Conditioning Aquatic Products Processing, Ministry of Agriculture and Rural Affairs, Xiamen 361022, China
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Jianlian Huang
- Key Laboratory of Refrigeration and Conditioning Aquatic Products Processing, Ministry of Agriculture and Rural Affairs, Xiamen 361022, China
- Fujian Provincial Key Laboratory of Refrigeration and Conditioning Aquatic Products Processing, Xiamen 361022, China
- Anjoy Foods Group Co., Ltd., Xiamen 361022, China
| | - Jianxin Zhao
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Hao Zhang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Wei Chen
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Daming Fan
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi 214122, China
- Key Laboratory of Refrigeration and Conditioning Aquatic Products Processing, Ministry of Agriculture and Rural Affairs, Xiamen 361022, China
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
32
|
Chan CMJ, Lam H. Merging Full-Spectrum and Fragment Ion Intensity Predictions from Deep Learning for High-Quality Spectral Libraries. J Proteome Res 2023; 22:3692-3702. [PMID: 37910637 DOI: 10.1021/acs.jproteome.3c00180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2023]
Abstract
Spectral libraries are useful resources in proteomic data analysis. Recent advances in deep learning allow tandem mass spectra of peptides to be predicted from their amino acid sequences. This enables predicted spectral libraries to be compiled, and searching against such libraries has been shown to improve the sensitivity in peptide identification over conventional sequence database searching. However, current prediction models lack support for longer peptides, and thus far, predicted library searching has only been demonstrated for backbone ion-only spectrum prediction methods. Here, we propose a deep learning-based full-spectrum prediction method to generate predicted spectral libraries for peptide identification. We demonstrated the superiority of using full-spectrum libraries over backbone ion-only prediction approaches in spectral library searching. Furthermore, merging spectra from different prediction models, as a form of ensemble learning, can produce improved spectral libraries, in terms of identification sensitivity. We also show that a hybrid library combining predicted and experimental spectra can lead to 20% more confident identifications over experimental library searching or sequence database searching.
Collapse
Affiliation(s)
- Chak Ming Jerry Chan
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China
| |
Collapse
|
33
|
Kitata RB, Yang JC, Chen YJ. Advances in data-independent acquisition mass spectrometry towards comprehensive digital proteome landscape. MASS SPECTROMETRY REVIEWS 2023; 42:2324-2348. [PMID: 35645145 DOI: 10.1002/mas.21781] [Citation(s) in RCA: 69] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 12/17/2021] [Accepted: 01/21/2022] [Indexed: 06/15/2023]
Abstract
The data-independent acquisition mass spectrometry (DIA-MS) has rapidly evolved as a powerful alternative for highly reproducible proteome profiling with a unique strength of generating permanent digital maps for retrospective analysis of biological systems. Recent advancements in data analysis software tools for the complex DIA-MS/MS spectra coupled to fast MS scanning speed and high mass accuracy have greatly expanded the sensitivity and coverage of DIA-based proteomics profiling. Here, we review the evolution of the DIA-MS techniques, from earlier proof-of-principle of parallel fragmentation of all-ions or ions in selected m/z range, the sequential window acquisition of all theoretical mass spectra (SWATH-MS) to latest innovations, recent development in computation algorithms for data informatics, and auxiliary tools and advanced instrumentation to enhance the performance of DIA-MS. We further summarize recent applications of DIA-MS and experimentally-derived as well as in silico spectra library resources for large-scale profiling to facilitate biomarker discovery and drug development in human diseases with emphasis on the proteomic profiling coverage. Toward next-generation DIA-MS for clinical proteomics, we outline the challenges in processing multi-dimensional DIA data set and large-scale clinical proteomics, and continuing need in higher profiling coverage and sensitivity.
Collapse
Affiliation(s)
| | - Jhih-Ci Yang
- Institute of Chemistry, Academia Sinica, Taipei, Taiwan
- Sustainable Chemical Science and Technology, Taiwan International Graduate Program, Academia Sinica and National Yang Ming Chiao Tung University, Taipei, Taiwan
- Department of Applied Chemistry, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Yu-Ju Chen
- Institute of Chemistry, Academia Sinica, Taipei, Taiwan
- Sustainable Chemical Science and Technology, Taiwan International Graduate Program, Academia Sinica and National Yang Ming Chiao Tung University, Taipei, Taiwan
- Department of Chemistry, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
34
|
Jin L, Wang F, Wang X, Harvey BP, Bi Y, Hu C, Cui B, Darcy AT, Maull JW, Phillips BR, Kim Y, Jenkins GJ, Sornasse TR, Tian Y. Identification of Plasma Biomarkers from Rheumatoid Arthritis Patients Using an Optimized Sequential Window Acquisition of All THeoretical Mass Spectra (SWATH) Proteomics Workflow. Proteomes 2023; 11:32. [PMID: 37873874 PMCID: PMC10594463 DOI: 10.3390/proteomes11040032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 09/28/2023] [Accepted: 10/02/2023] [Indexed: 10/25/2023] Open
Abstract
Rheumatoid arthritis (RA) is a systemic autoimmune and inflammatory disease. Plasma biomarkers are critical for understanding disease mechanisms, treatment effects, and diagnosis. Mass spectrometry-based proteomics is a powerful tool for unbiased biomarker discovery. However, plasma proteomics is significantly hampered by signal interference from high-abundance proteins, low overall protein coverage, and high levels of missing data from data-dependent acquisition (DDA). To achieve quantitative proteomics analysis for plasma samples with a balance of throughput, performance, and cost, we developed a workflow incorporating plate-based high abundance protein depletion and sample preparation, comprehensive peptide spectral library building, and data-independent acquisition (DIA) SWATH mass spectrometry-based methodology. In this study, we analyzed plasma samples from both RA patients and healthy donors. The results showed that the new workflow performance exceeded that of the current state-of-the-art depletion-based plasma proteomic platforms in terms of both data quality and proteome coverage. Proteins from biological processes related to the activation of systemic inflammation, suppression of platelet function, and loss of muscle mass were enriched and differentially expressed in RA. Some plasma proteins, particularly acute-phase reactant proteins, showed great power to distinguish between RA patients and healthy donors. Moreover, protein isoforms in the plasma were also analyzed, providing even deeper proteome coverage. This workflow can serve as a basis for further application in discovering plasma biomarkers of other diseases.
Collapse
Affiliation(s)
- Liang Jin
- Research & Development, AbbVie, North Chicago, IL 60064, USA; (L.J.); (B.P.H.); (B.C.); (A.T.D.); (J.W.M.); (T.R.S.)
| | - Fei Wang
- Research & Development, AbbVie, North Chicago, IL 60064, USA; (L.J.); (B.P.H.); (B.C.); (A.T.D.); (J.W.M.); (T.R.S.)
| | - Xue Wang
- Research & Development, AbbVie, North Chicago, IL 60064, USA; (L.J.); (B.P.H.); (B.C.); (A.T.D.); (J.W.M.); (T.R.S.)
| | - Bohdan P. Harvey
- Research & Development, AbbVie, North Chicago, IL 60064, USA; (L.J.); (B.P.H.); (B.C.); (A.T.D.); (J.W.M.); (T.R.S.)
| | - Yingtao Bi
- Research & Development, AbbVie, North Chicago, IL 60064, USA; (L.J.); (B.P.H.); (B.C.); (A.T.D.); (J.W.M.); (T.R.S.)
| | - Chenqi Hu
- DMPK, Takeda Development Center Americas Inc., Cambridge, MA 02142, USA; (C.H.)
| | - Baoliang Cui
- Research & Development, AbbVie, North Chicago, IL 60064, USA; (L.J.); (B.P.H.); (B.C.); (A.T.D.); (J.W.M.); (T.R.S.)
| | - Anhdao T. Darcy
- Research & Development, AbbVie, North Chicago, IL 60064, USA; (L.J.); (B.P.H.); (B.C.); (A.T.D.); (J.W.M.); (T.R.S.)
| | - John W. Maull
- Research & Development, AbbVie, North Chicago, IL 60064, USA; (L.J.); (B.P.H.); (B.C.); (A.T.D.); (J.W.M.); (T.R.S.)
| | - Ben R. Phillips
- Research & Development, AbbVie, North Chicago, IL 60064, USA; (L.J.); (B.P.H.); (B.C.); (A.T.D.); (J.W.M.); (T.R.S.)
| | - Youngjae Kim
- DMPK, Takeda Development Center Americas Inc., Cambridge, MA 02142, USA; (C.H.)
| | - Gary J. Jenkins
- Research & Development, AbbVie, North Chicago, IL 60064, USA; (L.J.); (B.P.H.); (B.C.); (A.T.D.); (J.W.M.); (T.R.S.)
| | - Thierry R. Sornasse
- Research & Development, AbbVie, North Chicago, IL 60064, USA; (L.J.); (B.P.H.); (B.C.); (A.T.D.); (J.W.M.); (T.R.S.)
| | - Yu Tian
- Research & Development, AbbVie, North Chicago, IL 60064, USA; (L.J.); (B.P.H.); (B.C.); (A.T.D.); (J.W.M.); (T.R.S.)
| |
Collapse
|
35
|
Zhang B, Bassani-Sternberg M. Current perspectives on mass spectrometry-based immunopeptidomics: the computational angle to tumor antigen discovery. J Immunother Cancer 2023; 11:e007073. [PMID: 37899131 PMCID: PMC10619091 DOI: 10.1136/jitc-2023-007073] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/21/2023] [Indexed: 10/31/2023] Open
Abstract
Identification of tumor antigens presented by the human leucocyte antigen (HLA) molecules is essential for the design of effective and safe cancer immunotherapies that rely on T cell recognition and killing of tumor cells. Mass spectrometry (MS)-based immunopeptidomics enables high-throughput, direct identification of HLA-bound peptides from a variety of cell lines, tumor tissues, and healthy tissues. It involves immunoaffinity purification of HLA complexes followed by MS profiling of the extracted peptides using data-dependent acquisition, data-independent acquisition, or targeted approaches. By incorporating DNA, RNA, and ribosome sequencing data into immunopeptidomics data analysis, the proteogenomic approach provides a powerful means for identifying tumor antigens encoded within the canonical open reading frames of annotated coding genes and non-canonical tumor antigens derived from presumably non-coding regions of our genome. We discuss emerging computational challenges in immunopeptidomics data analysis and tumor antigen identification, highlighting key considerations in the proteogenomics-based approach, including accurate DNA, RNA and ribosomal sequencing data analysis, careful incorporation of predicted novel protein sequences into reference protein database, special quality control in MS data analysis due to the expanded and heterogeneous search space, cancer-specificity determination, and immunogenicity prediction. The advancements in technology and computation is continually enabling us to identify tumor antigens with higher sensitivity and accuracy, paving the way toward the development of more effective cancer immunotherapies.
Collapse
Affiliation(s)
- Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| |
Collapse
|
36
|
Hao Y, Chen M, Huang X, Xu H, Wu P, Chen S. 4D-diaXLMS: Proteome-wide Four-Dimensional Data-Independent Acquisition Workflow for Cross-Linking Mass Spectrometry. Anal Chem 2023; 95:14077-14085. [PMID: 37691250 DOI: 10.1021/acs.analchem.3c02824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Cross-linking mass spectrometry (XL-MS) is a powerful tool for examining protein structures and interactions. Nevertheless, analysis of low-abundance cross-linked peptides is often limited in the data-dependent acquisition (DDA) mode due to its semistochastic nature. To address this issue, we introduced a workflow called 4D-diaXLMS, representing the first-ever application of four-dimensional data-independent acquisition for proteome-wide cross-linking analysis. Cross-linking studies of the HeLa cell proteome were evaluated using the classical cross-linker disuccinimidyl suberate as an example. Compared with the DDA analysis, 4D-diaXLMS exhibited marked improvement in the identification coverage of cross-linked peptides, with a total increase of 36% in single-shot analysis across all 16 SCX fractions. This advantage was further amplified when reducing the fraction number to 8 and 4, resulting in 125 and 149% improvements, respectively. Using 4D-diaXLMS, up to 83% of the cross-linked peptides were repeatedly identified in three replicates, more than twice the 38% in the DDA mode. Furthermore, 4D-diaXLMS showed good performance in the quantitative analysis of yeast cross-linked peptides even in a 15-fold excess amount of HeLa cell matrix, with a low coefficient of variation and high quantitative accuracies in all concentrations. Overall, 4D-diaXLMS was proven to have high coverage, good reproducibility, and accurate quantification for in-depth XL-MS analysis in complex samples, demonstrating its immense potential for advances in the field.
Collapse
Affiliation(s)
- Yanhong Hao
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| | - Moran Chen
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| | - Xiao Huang
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| | - Hui Xu
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| | - Pengfei Wu
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| | - Suming Chen
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| |
Collapse
|
37
|
McGann CD, Barshop W, Canterbury J, Lin C, Gabriel W, Huang J, Bergen D, Zubraskov V, Melani R, Wilhelm M, McAlister G, Schweppe DK. Real-Time Spectral Library Matching for Sample Multiplexed Quantitative Proteomics. J Proteome Res 2023; 22:2836-2846. [PMID: 37557900 PMCID: PMC11554524 DOI: 10.1021/acs.jproteome.3c00085] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/11/2023]
Abstract
Sample multiplexed quantitative proteomics assays have proved to be a highly versatile means to assay molecular phenotypes. Yet, stochastic precursor selection and precursor coisolation can dramatically reduce the efficiency of data acquisition and quantitative accuracy. To address this, intelligent data acquisition (IDA) strategies have recently been developed to improve instrument efficiency and quantitative accuracy for both discovery and targeted methods. Toward this end, we sought to develop and implement a new real-time spectral library searching (RTLS) workflow that could enable intelligent scan triggering and peak selection within milliseconds of scan acquisition. To ensure ease of use and general applicability, we built an application to read in diverse spectral libraries and file types from both empirical and predicted spectral libraries. We demonstrate that RTLS methods enable improved quantitation of multiplexed samples, particularly with consideration for quantitation from chimeric fragment spectra. We used RTLS to profile proteome responses to small molecule perturbations and were able to quantify up to 15% more significantly regulated proteins in half the gradient time compared to traditional methods. Taken together, the development of RTLS expands the IDA toolbox to improve instrument efficiency and quantitative accuracy for sample multiplexed analyses.
Collapse
Affiliation(s)
| | - Will Barshop
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Jesse Canterbury
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Chuwei Lin
- University of Washington, Seattle, WA 98105
| | | | - Jingjing Huang
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - David Bergen
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Vlad Zubraskov
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Rafael Melani
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | | | - Graeme McAlister
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | | |
Collapse
|
38
|
Sun Z, Ning Z, Cheng K, Duan H, Wu Q, Mayne J, Figeys D. MetaPep: A core peptide database for faster human gut metaproteomics database searches. Comput Struct Biotechnol J 2023; 21:4228-4237. [PMID: 37692080 PMCID: PMC10491838 DOI: 10.1016/j.csbj.2023.08.025] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/18/2023] [Accepted: 08/25/2023] [Indexed: 09/12/2023] Open
Abstract
Metaproteomics has increasingly been applied to study functional changes in the human gut microbiome. Peptide identification is an important step in metaproteomics research, with sequence database search (SDS) and spectral library search (SLS) as the two main methods to identify peptides. However, the large search space in metaproteomics studies causes significant challenges for both identification methods. Moreover, with the development of mass spectrometry, it is now feasible to perform metaproteomic projects involving 100-1000 individual microbiomes. These large-scale projects create a conundrum for searching large databases. In this study, we constructed MetaPep, a core peptide database (including both collections of peptide sequences and tandem MS spectra) greatly accelerating the peptide identifications. Raw files from fifteen metaproteomics projects were re-analyzed and the identified peptide-spectrum matches (PSMs) were used to construct the MetaPep database. The constructed MetaPep database achieved rapid and accurate identification of peptides for human gut metaproteomics. MetaPep has a large collection of peptides and spectra that have been identified in published human gut metaproteomics datasets. MetaPep database can be used as an important resource in the current stage of human gut metaproteomics research. This study showed the possibility of applying a core peptide database as a generic metaproteomics workflow. MetaPep could also be an important resource for future human gut metaproteomics research, such as DIA (data-independent acquisition) analysis.
Collapse
Affiliation(s)
- Zhongzhi Sun
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Zhibin Ning
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Kai Cheng
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Haonan Duan
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Qing Wu
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Janice Mayne
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Daniel Figeys
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| |
Collapse
|
39
|
Son J, Na S, Paek E. DbyDeep: Exploration of MS-Detectable Peptides via Deep Learning. Anal Chem 2023; 95:11193-11200. [PMID: 37459568 PMCID: PMC10401496 DOI: 10.1021/acs.analchem.3c00460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 07/05/2023] [Indexed: 08/02/2023]
Abstract
Predicting peptide detectability is useful in a variety of mass spectrometry (MS)-based proteomics applications, particularly targeted proteomics. However, most machine learning-based computational methods have relied solely on information from the peptide itself, such as its amino acid sequences or physicochemical properties, despite the fact that peptides detected by MS are dependent on many factors, including protein sample preparation, digestion, separation, ionization, and precursor selection during MS experiments. DbyDeep (Detectability by Deep learning) is an innovative end-to-end LSTM network model for peptide detectability prediction that incorporates sequence contexts of peptides and their cleavage sites (by protease). Utilizing the cleavage site contexts could improve the performance of prediction, and DbyDeep outperformed existing methods in predicting peptides recognizable from multiple MS/MS data sets with diverse species and MS instruments. We argue for the necessity of a learning model that encompasses several contexts associated with peptide detection, as opposed to depending just on peptide sequences. There is a Python implementation of DbyDeep at https://github.com/BISCodeRepo/DbyDeep.
Collapse
Affiliation(s)
- Juho Son
- Department
of Computer Science, Hanyang University, Seoul 04763, Republic of Korea
| | - Seungjin Na
- Department
of Computer Science, Hanyang University, Seoul 04763, Republic of Korea
- Institute
for Artificial Intelligence Research, Hanyang
University, Seoul 04763, Republic
of Korea
| | - Eunok Paek
- Department
of Computer Science, Hanyang University, Seoul 04763, Republic of Korea
- Institute
for Artificial Intelligence Research, Hanyang
University, Seoul 04763, Republic
of Korea
| |
Collapse
|
40
|
Yang KL, Yu F, Teo GC, Li K, Demichev V, Ralser M, Nesvizhskii AI. MSBooster: improving peptide identification rates using deep learning-based features. Nat Commun 2023; 14:4539. [PMID: 37500632 PMCID: PMC10374903 DOI: 10.1038/s41467-023-40129-9] [Citation(s) in RCA: 73] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 07/06/2023] [Indexed: 07/29/2023] Open
Abstract
Peptide identification in liquid chromatography-tandem mass spectrometry (LC-MS/MS) experiments relies on computational algorithms for matching acquired MS/MS spectra against sequences of candidate peptides using database search tools, such as MSFragger. Here, we present a new tool, MSBooster, for rescoring peptide-to-spectrum matches using additional features incorporating deep learning-based predictions of peptide properties, such as LC retention time, ion mobility, and MS/MS spectra. We demonstrate the utility of MSBooster, in tandem with MSFragger and Percolator, in several different workflows, including nonspecific searches (immunopeptidomics), direct identification of peptides from data independent acquisition data, single-cell proteomics, and data generated on an ion mobility separation-enabled timsTOF MS platform. MSBooster is fast, robust, and fully integrated into the widely used FragPipe computational platform.
Collapse
Affiliation(s)
- Kevin L Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
| | - Guo Ci Teo
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Kai Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Vadim Demichev
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Markus Ralser
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
- Nuffield Department of Medicine, The Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Alexey I Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
41
|
Yu F, Teo GC, Kong AT, Fröhlich K, Li GX, Demichev V, Nesvizhskii AI. Analysis of DIA proteomics data using MSFragger-DIA and FragPipe computational platform. Nat Commun 2023; 14:4154. [PMID: 37438352 PMCID: PMC10338508 DOI: 10.1038/s41467-023-39869-5] [Citation(s) in RCA: 97] [Impact Index Per Article: 48.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 06/28/2023] [Indexed: 07/14/2023] Open
Abstract
Liquid chromatography (LC) coupled with data-independent acquisition (DIA) mass spectrometry (MS) has been increasingly used in quantitative proteomics studies. Here, we present a fast and sensitive approach for direct peptide identification from DIA data, MSFragger-DIA, which leverages the unmatched speed of the fragment ion indexing-based search engine MSFragger. Different from most existing methods, MSFragger-DIA conducts a database search of the DIA tandem mass (MS/MS) spectra prior to spectral feature detection and peak tracing across the LC dimension. To streamline the analysis of DIA data and enable easy reproducibility, we integrate MSFragger-DIA into the FragPipe computational platform for seamless support of peptide identification and spectral library building from DIA, data-dependent acquisition (DDA), or both data types combined. We compare MSFragger-DIA with other DIA tools, such as DIA-Umpire based workflow in FragPipe, Spectronaut, DIA-NN library-free, and MaxDIA. We demonstrate the fast, sensitive, and accurate performance of MSFragger-DIA across a variety of sample types and data acquisition schemes, including single-cell proteomics, phosphoproteomics, and large-scale tumor proteome profiling studies.
Collapse
Affiliation(s)
- Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
| | - Guo Ci Teo
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Andy T Kong
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Klemens Fröhlich
- Proteomics Core Facility, Biozentrum, University of Basel, Basel, Switzerland
| | - Ginny Xiaohe Li
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Vadim Demichev
- Department of Biochemistry, Charité - Universitätsmedizin Berlin, Berlin, Germany
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Alexey I Nesvizhskii
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
42
|
Geer LY, Lapin J, Slotta DJ, Mak TD, Stein SE. AIomics: Exploring More of the Proteome Using Mass Spectral Libraries Extended by Artificial Intelligence. J Proteome Res 2023; 22:2246-2255. [PMID: 37232537 PMCID: PMC10542943 DOI: 10.1021/acs.jproteome.2c00807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The unbounded permutations of biological molecules, including proteins and their constituent peptides, present a dilemma in identifying the components of complex biosamples. Sequence search algorithms used to identify peptide spectra can be expanded to cover larger classes of molecules, including more modifications, isoforms, and atypical cleavage, but at the cost of false positives or false negatives due to the simplified spectra they compute from sequence records. Spectral library searching can help solve this issue by precisely matching experimental spectra to library spectra with excellent sensitivity and specificity. However, compiling spectral libraries that span entire proteomes is pragmatically difficult. Neural networks that predict complete spectra containing a full range of annotated and unannotated ions can be used to replace these simplified spectra with libraries of fully predicted spectra, including modified peptides. Using such a network, we created predicted spectral libraries that were used to rescore matches from a sequence search done over a large search space, including a large number of modifications. Rescoring improved the separation of true and false hits by 82%, yielding an 8% increase in peptide identifications, including a 21% increase in nonspecifically cleaved peptides and a 17% increase in phosphopeptides.
Collapse
Affiliation(s)
- Lewis Y. Geer
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Joel Lapin
- Department of Physics, Georgetown University, Washington, DC 20057, United States
- Associate, Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Douglas J. Slotta
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Tytus D. Mak
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Stephen E. Stein
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| |
Collapse
|
43
|
Liu K, Zhang L, Qi Q, Li J, Yan F, Hou J. Growth hormone treatment improves the development of follicles and oocytes in prepubertal lambs. J Ovarian Res 2023; 16:132. [PMID: 37408062 DOI: 10.1186/s13048-023-01209-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 06/17/2023] [Indexed: 07/07/2023] Open
Abstract
BACKGROUND When prepubertal lambs are superovulated, the ovarian response to gonadotropin stimulation has great individual difference and the collected oocytes have lower developmental ability than that of adult ewes. Over the years, growth hormone (GH) has been used in assisted reproduction because it can improve the reproductive performance in humans and animals. However, the effect of GH on ovaries and oocytes of prepubertal lambs remains unclear. METHODS Before and during follicle-stimulating hormone (FSH) superovulation of prepubertal lambs (4‒6-week-old), the lambs were treated with high (50 mg) or low dose (25 mg) of ovine GH in a long (5 days) or short (2 days) period. The recovered oocytes were used for in vitro maturation and fertilization, and several parameters of oocyte quality and development capacity were evaluated. The possible underlying mechanisms of GH action were explored by analysis of granulosa cell (GC) transcriptome, ovarian proteome and follicular fluid metabolome. RESULTS Treatment of lambs with 50 mg GH over 5 days (long treatment) potentially promoted the response of lambs to superovulation and improved the development capacity of retrieved oocytes, consequently increasing the high quality embryo yield from lambs. A number of differently expressed genes or proteins were found in ovaries between GH-treated and untreated lambs. Cellular experiments revealed that GH reduced the oxidative stress of GCs and promoted the GC proliferation probably through activation of the PI3K/Akt signaling pathway. Finally, analysis of follicular fluid metabolome indicated that GH treatment altered the abundance of many metabolites in follicular fluid, such as antioxidants and fatty acids. CONCLUSIONS GH treatment has a beneficial role on function of lamb ovaries, which supports the development of follicles and oocytes and improves the efficiency of embryo production from prepubertal lambs.
Collapse
Affiliation(s)
- Kexiong Liu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Yuan-Ming-Yuan West Road, Haidian District, Beijing, 100193, China
| | - Luyao Zhang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Yuan-Ming-Yuan West Road, Haidian District, Beijing, 100193, China
| | - Qi Qi
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Yuan-Ming-Yuan West Road, Haidian District, Beijing, 100193, China
| | - Junjin Li
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Yuan-Ming-Yuan West Road, Haidian District, Beijing, 100193, China
| | - Fengxiang Yan
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Yuan-Ming-Yuan West Road, Haidian District, Beijing, 100193, China
| | - Jian Hou
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Yuan-Ming-Yuan West Road, Haidian District, Beijing, 100193, China.
| |
Collapse
|
44
|
Révész Á, Hevér H, Steckel A, Schlosser G, Szabó D, Vékey K, Drahos L. Collision energies: Optimization strategies for bottom-up proteomics. MASS SPECTROMETRY REVIEWS 2023; 42:1261-1299. [PMID: 34859467 DOI: 10.1002/mas.21763] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 11/17/2021] [Accepted: 11/17/2021] [Indexed: 06/07/2023]
Abstract
Mass-spectrometry coupled to liquid chromatography is an indispensable tool in the field of proteomics. In the last decades, more and more complex and diverse biochemical and biomedical questions have arisen. Problems to be solved involve protein identification, quantitative analysis, screening of low abundance modifications, handling matrix effect, and concentrations differing by orders of magnitude. This led the development of more tailored protocols and problem centered proteomics workflows, including advanced choice of experimental parameters. In the most widespread bottom-up approach, the choice of collision energy in tandem mass spectrometric experiments has outstanding role. This review presents the collision energy optimization strategies in the field of proteomics which can help fully exploit the potential of MS based proteomics techniques. A systematic collection of use case studies is then presented to serve as a starting point for related further scientific work. Finally, this article discusses the issue of comparing results from different studies or obtained on different instruments, and it gives some hints on methodology transfer between laboratories based on measurement of reference species.
Collapse
Affiliation(s)
- Ágnes Révész
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| | - Helga Hevér
- Chemical Works of Gedeon Richter Plc, Budapest, Hungary
| | - Arnold Steckel
- Department of Analytical Chemistry, MTA-ELTE Lendület Ion Mobility Mass Spectrometry Research Group, Institute of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Gitta Schlosser
- Department of Analytical Chemistry, MTA-ELTE Lendület Ion Mobility Mass Spectrometry Research Group, Institute of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Dániel Szabó
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| | - Károly Vékey
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| | - László Drahos
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| |
Collapse
|
45
|
He Q, Zhong CQ, Li X, Guo H, Li Y, Gao M, Yu R, Liu X, Zhang F, Guo D, Ye F, Guo T, Shuai J, Han J. Dear-DIA XMBD: Deep Autoencoder Enables Deconvolution of Data-Independent Acquisition Proteomics. RESEARCH (WASHINGTON, D.C.) 2023; 6:0179. [PMID: 37377457 PMCID: PMC10292580 DOI: 10.34133/research.0179] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 06/01/2023] [Indexed: 06/29/2023]
Abstract
Data-independent acquisition (DIA) technology for protein identification from mass spectrometry and related algorithms is developing rapidly. The spectrum-centric analysis of DIA data without the use of spectra library from data-dependent acquisition data represents a promising direction. In this paper, we proposed an untargeted analysis method, Dear-DIAXMBD, for direct analysis of DIA data. Dear-DIAXMBD first integrates the deep variational autoencoder and triplet loss to learn the representations of the extracted fragment ion chromatograms, then uses the k-means clustering algorithm to aggregate fragments with similar representations into the same classes, and finally establishes the inverted index tables to determine the precursors of fragment clusters between precursors and peptides and between fragments and peptides. We show that Dear-DIAXMBD performs superiorly with the highly complicated DIA data of different species obtained by different instrument platforms. Dear-DIAXMBD is publicly available at https://github.com/jianweishuai/Dear-DIA-XMBD.
Collapse
Affiliation(s)
- Qingzu He
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research,
Xiamen University, Xiamen 361005, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health) and Wenzhou Institute,
University of Chinese Academy of Sciences, Wenzhou, Zhejiang 325001, China
| | - Chuan-Qi Zhong
- School of Life Sciences,
Xiamen University, Xiamen 361102, China
- State Key Laboratory of Cellular Stress Biology,
Innovation Center for Cell Signaling Network, Xiamen 361102, China
| | - Xiang Li
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research,
Xiamen University, Xiamen 361005, China
- State Key Laboratory of Cellular Stress Biology,
Innovation Center for Cell Signaling Network, Xiamen 361102, China
| | - Huan Guo
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research,
Xiamen University, Xiamen 361005, China
| | - Yiming Li
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research,
Xiamen University, Xiamen 361005, China
| | - Mingxuan Gao
- Department of Computer Science,
Xiamen University, Xiamen 361005, China
| | - Rongshan Yu
- Department of Computer Science,
Xiamen University, Xiamen 361005, China
- National Institute for Data Science in Health and Medicine, School of Medicine,
Xiamen University, Xiamen 361102, China
| | - Xianming Liu
- Bruker (Beijing) Scientific Technology Co. Ltd., Beijing, China
| | - Fangfei Zhang
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences,
Westlake University, 18 Shilongshan Road, Hangzhou 310024, China
- Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, China
| | - Donghui Guo
- Department of Electronic Engineering,
Xiamen University, Xiamen 361005, China
| | - Fangfu Ye
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health) and Wenzhou Institute,
University of Chinese Academy of Sciences, Wenzhou, Zhejiang 325001, China
| | - Tiannan Guo
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences,
Westlake University, 18 Shilongshan Road, Hangzhou 310024, China
- Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, China
- Westlake Omics Ltd., Yunmeng Road 1, Hangzhou, China
| | - Jianwei Shuai
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research,
Xiamen University, Xiamen 361005, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health) and Wenzhou Institute,
University of Chinese Academy of Sciences, Wenzhou, Zhejiang 325001, China
- State Key Laboratory of Cellular Stress Biology,
Innovation Center for Cell Signaling Network, Xiamen 361102, China
- National Institute for Data Science in Health and Medicine, School of Medicine,
Xiamen University, Xiamen 361102, China
| | - Jiahuai Han
- School of Life Sciences,
Xiamen University, Xiamen 361102, China
- State Key Laboratory of Cellular Stress Biology,
Innovation Center for Cell Signaling Network, Xiamen 361102, China
- National Institute for Data Science in Health and Medicine, School of Medicine,
Xiamen University, Xiamen 361102, China
| |
Collapse
|
46
|
Hamza GM, Miele E, Wojchowski DM, Toran P, Worsfold CR, Anthonymuthu TS, Bergo VB, Zhang AX, Silva JC. Affi-BAMS™: A Robust Targeted Proteomics Microarray Platform to Measure Histone Post-Translational Modifications. Int J Mol Sci 2023; 24:10060. [PMID: 37373206 DOI: 10.3390/ijms241210060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 06/08/2023] [Accepted: 06/11/2023] [Indexed: 06/29/2023] Open
Abstract
For targeted protein panels, the ability to specifically assay post-translational modifications (PTMs) in a quantitative, sensitive, and straightforward manner would substantially advance biological and pharmacological studies. The present study highlights the effectiveness of the Affi-BAMS™ epitope-directed affinity bead capture/MALDI MS platform for quantitatively defining complex PTM marks of H3 and H4 histones. Using H3 and H4 histone peptides and isotopically labelled derivatives, this affinity bead and MALDI MS platform achieves a range of >3 orders of magnitude with a technical precision CV of <5%. Using nuclear cellular lysates, Affi-BAMS PTM-peptide capture resolves heterogeneous histone N-terminal PTMs with as little as 100 µg of starting material. In an HDAC inhibitor and MCF7 cell line model, the ability to monitor dynamic histone H3 acetylation and methylation events is further demonstrated (including SILAC quantification). Affi-BAMS (and its capacity for the multiplexing of samples and target PTM-proteins) thus provides a uniquely efficient and effective approach for analyzing dynamic epigenetic histone marks, which is critical for the regulation of chromatin structure and gene expression.
Collapse
Affiliation(s)
- Ghaith M Hamza
- Discovery Biology, Discovery Sciences, R&D, AstraZeneca, Boston, MA 02451, USA
- Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH 03824, USA
| | - Eric Miele
- Discovery Biology, Discovery Sciences, R&D, AstraZeneca, Boston, MA 02451, USA
| | - Don M Wojchowski
- Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH 03824, USA
| | - Paul Toran
- Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH 03824, USA
| | | | | | | | - Andrew X Zhang
- Discovery Biology, Discovery Sciences, R&D, AstraZeneca, Boston, MA 02451, USA
| | - Jeffrey C Silva
- Adeptrix Corporation, Beverly, MA 01915, USA
- Cell Signaling Technology, Danvers, MA 01915, USA
| |
Collapse
|
47
|
Souza Junior DR, Silva ARM, Ronsein GE. Strategies for consistent and automated quantification of HDL proteome using data-independent acquisition (DIA). J Lipid Res 2023:100397. [PMID: 37286042 PMCID: PMC10339053 DOI: 10.1016/j.jlr.2023.100397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 05/11/2023] [Accepted: 05/31/2023] [Indexed: 06/09/2023] Open
Abstract
The introduction of mass spectrometry-based proteomics has revolutionized HDL field, with the description, characterization and implication of HDL-associated proteins in an array of pathologies. However, acquiring robust, reproducible data is still a challenge in the quantitative assessment of HDL proteome. Data-independent acquisition (DIA) is a mass spectrometry methodology that allows the acquisition of reproducible data, but data analysis remains a challenge in the field. Up to date, there is no consensus in how to process DIA-derived data for HDL proteomics. Here, we developed a pipeline aiming to standardize HDL proteome quantification. We optimized instrument parameters, and compared the performance of four freely available, user-friendly software tools (DIA-NN, EncyclopeDIA, MaxDIA and Skyline) in processing DIA data. Importantly, pooled samples were used as quality controls throughout our experimental setup. A carefully evaluation of precision, linearity, and detection limits, first using E. coli background for HDL proteomics, and second using HDL proteome and synthetic peptides, was undertaken. Finally, as a proof of concept, we employed our optimized and automated pipeline to quantify the proteome of HDL and apolipoprotein B (APOB)-containing lipoproteins. Our results show that determination of precision is key to confidently and consistently quantify HDL proteins. Taking this precaution, any of the available software tested here would be appropriate for quantification of HDL proteome, although their performance varied considerably.
Collapse
Affiliation(s)
| | | | - Graziella Eliza Ronsein
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, São Paulo, Brazil.
| |
Collapse
|
48
|
Wen C, Wu X, Lin G, Yan W, Gan G, Xu X, Chen XY, Chen X, Liu X, Fu G, Zhong CQ. Evaluation of DDA Library-Free Strategies for Phosphoproteomics and Ubiquitinomics Data-Independent Acquisition Data. J Proteome Res 2023. [PMID: 37256709 DOI: 10.1021/acs.jproteome.2c00735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Phosphoproteomics and ubiquitinomics data-independent acquisition (DIA) mass spectrometry (MS) data is typically analyzed by using a data-dependent acquisition (DDA) spectral library. The performance of various library-free strategies for analyzing phosphoproteomics and ubiquitinomics DIA MS data has not been evaluated. In this study, we systematically compare four commonly used DDA library-free approaches including Spectronaut's directDIA, DIA-Umpire, DIA-MSFragger, and in silico-predicted library for analysis of phosphoproteomics SWATH, DIA, and diaPASEF data as well as ubiquitinomics diaPASEF data. Spectronaut's directDIA shows the highest sensitivity for phosphopeptide detection not only in synthetic phosphopeptide samples but also in phosphoproteomics SWATH-MS and DIA data from real biological samples, when compared to the other three library-free strategies. For phosphoproteomics diaPASEF data, Spectronaut's directDIA and the in silico-predicted library based on DIA-NN identify almost the same number of phosphopeptides as a project-specific DDA spectral library. However, only about 30% of the total phosphopeptides are commonly identified, suggesting that the library-free strategies for phospho-diaPASEF data need further improvement in terms of sensitivity. For ubiquitinomics diaPASEF data, the in silico-predicted library performs the best among the four workflows and detects ∼50% more K-GG peptides than a project-specific DDA spectral library. Our results demonstrate that Spectronaut's directDIA is suitable for the analysis of phosphoproteomics SWATH-MS and DIA MS data, while the in silico-predicted library based on DIA-NN shows substantial advantages for ubiquitinomics diaPASEF MS data.
Collapse
Affiliation(s)
- Chengwen Wen
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361005, Fujian, China
| | - Xiurong Wu
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361005, Fujian, China
| | - Guanzhong Lin
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361005, Fujian, China
| | - Wei Yan
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361005, Fujian, China
| | - Guohong Gan
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361005, Fujian, China
| | - Xiao Xu
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361005, Fujian, China
| | - Xiang-Yu Chen
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361005, Fujian, China
| | - Xi Chen
- SpecAlly Life Technology Co., Ltd., Wuhan 430074, Hubei, China
| | - Xianming Liu
- Shanghai Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai 200030, China
| | - Guo Fu
- School of Medicine, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361005, Fujian, China
| | - Chuan-Qi Zhong
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361005, Fujian, China
| |
Collapse
|
49
|
Chen M, Zhu P, Wan Q, Ruan X, Wu P, Hao Y, Zhang Z, Sun J, Nie W, Chen S. High-Coverage Four-Dimensional Data-Independent Acquisition Proteomics and Phosphoproteomics Enabled by Deep Learning-Driven Multidimensional Predictions. Anal Chem 2023; 95:7495-7502. [PMID: 37126374 DOI: 10.1021/acs.analchem.2c05414] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Four-dimensional (4D) data-independent acquisition (DIA)-based proteomics is a promising technology. However, its full performance is restricted by the time-consuming building and limited coverage of a project-specific experimental library. Herein, we developed a versatile multifunctional deep learning model Deep4D based on self-attention that could predict the collisional cross section, retention time, fragment ion intensity, and charge state with high accuracies for both the unmodified and phosphorylated peptides and thus established the complete workflows for high-coverage 4D DIA proteomics and phosphoproteomics based on multidimensional predictions. A 4D predicted library containing ∼2 million peptides was established that could realize experimental library-free DIA analysis, and 33% more proteins were identified than using an experimental library of single-shot measurement in the example of HeLa cells. These results show the great values of the convenient high-coverage 4D DIA proteomics methods.
Collapse
Affiliation(s)
- Moran Chen
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| | - Pujia Zhu
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| | - Qiongqiong Wan
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| | - Xianqin Ruan
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| | - Pengfei Wu
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| | - Yanhong Hao
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| | - Zhourui Zhang
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| | - Jian Sun
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| | - Wenjing Nie
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| | - Suming Chen
- The Institute for Advanced Studies, Wuhan University, Wuhan, Hubei 430072, China
| |
Collapse
|
50
|
Zhang Q. Mzion enables deep and precise identification of peptides in data-dependent acquisition proteomics. Sci Rep 2023; 13:7056. [PMID: 37120666 PMCID: PMC10148867 DOI: 10.1038/s41598-023-34323-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 04/27/2023] [Indexed: 05/01/2023] Open
Abstract
Sensitive and reliable identification of proteins and peptides pertains the basis of proteomics. We introduce Mzion, a new database search tool for data-dependent acquisition (DDA) proteomics. Our tool utilizes an intensity tally strategy and achieves generally a higher performance in terms of depth and precision across 20 datasets, ranging from large-scale to single-cell proteomics. Compared to several other search engines, Mzion matches on average 20% more peptide spectra at tryptic enzymatic specificity and 80% more at no enzymatic specificity from six large-scale, global datasets. Mzion also identifies more phosphopeptide spectra that can be explained by fewer proteins, demonstrated by six large-scale, local datasets corresponding to the global data. Our findings highlight the potential of Mzion for improving proteomic analysis and advancing our understanding of protein biology.
Collapse
Affiliation(s)
- Qiang Zhang
- Division of Endocrinology, Metabolism and Lipid Research, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|