1
|
Petrovskiy DV, Nikolsky KS, Kulikova LI, Rudnev VR, Butkova TV, Malsagova KA, Kopylov AT, Kaysheva AL. PowerNovo: de novo peptide sequencing via tandem mass spectrometry using an ensemble of transformer and BERT models. Sci Rep 2024; 14:15000. [PMID: 38951578 PMCID: PMC11217302 DOI: 10.1038/s41598-024-65861-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 06/25/2024] [Indexed: 07/03/2024] Open
Abstract
The primary objective of analyzing the data obtained in a mass spectrometry-based proteomic experiment is peptide and protein identification, or correct assignment of the tandem mass spectrum to one amino acid sequence. Comparison of empirical fragment spectra with the theoretical predicted one or matching with the collected spectra library are commonly accepted strategies of proteins identification and defining of their amino acid sequences. Although these approaches are widely used and are appreciably efficient for the well-characterized model organisms or measured proteins, they cannot detect novel peptide sequences that have not been previously annotated or are rare. This study presents PowerNovo tool for de novo sequencing of proteins using tandem mass spectra acquired in a variety of types of mass analyzers and different fragmentation techniques. PowerNovo involves an ensemble of models for peptide sequencing: model for detecting regularities in tandem mass spectra, precursors, and fragment ions and a natural language processing model, which has a function of peptide sequence quality assessment and helps with reconstruction of noisy sequences. The results of testing showed that the performance of PowerNovo is comparable and even better than widely utilized PointNovo, DeepNovo, Casanovo, and Novor packages. Also, PowerNovo provides complete cycle of processing (pipeline) of mass spectrometry data and, along with predicting the peptide sequence, involves the peptide assembly and protein inference blocks.
Collapse
|
2
|
Fijalkowski I, Snauwaert V, Van Damme P. Proteins à la carte: riboproteogenomic exploration of bacterial N-terminal proteoform expression. mBio 2024; 15:e0033324. [PMID: 38511928 PMCID: PMC11005335 DOI: 10.1128/mbio.00333-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Accepted: 02/28/2024] [Indexed: 03/22/2024] Open
Abstract
In recent years, it has become evident that the true complexity of bacterial proteomes remains underestimated. Gene annotation tools are known to propagate biases and overlook certain classes of truly expressed proteins, particularly proteoforms-protein isoforms arising from a single gene. Recent (re-)annotation efforts heavily rely on ribosome profiling by providing a direct readout of translation to fully describe bacterial proteomes. In this study, we employ a robust riboproteogenomic pipeline to conduct a systematic census of expressed N-terminal proteoform pairs, representing two isoforms encoded by a single gene raised by annotated and alternative translation initiation, in Salmonella. Intriguingly, conditional-dependent changes in relative utilization of annotated and alternative translation initiation sites (TIS) were observed in several cases. This suggests that TIS selection is subject to regulatory control, adding yet another layer of complexity to our understanding of bacterial proteomes. IMPORTANCE With the emerging theme of genes within genes comprising the existence of alternative open reading frames (ORFs) generated by translation initiation at in-frame start codons, mechanisms that control the relative utilization of annotated and alternative TIS need to be unraveled and our molecular understanding of resulting proteoforms broadened. Utilizing complementary ribosome profiling strategies to map ORF boundaries, we uncovered dual-encoding ORFs generated by in-frame TIS usage in Salmonella. Besides demonstrating that alternative TIS usage may generate proteoforms with different characteristics, such as differential localization and specialized function, quantitative aspects of conditional retapamulin-assisted ribosome profiling (Ribo-RET) translation initiation maps offer unprecedented insights into the relative utilization of annotated and alternative TIS, enabling the exploration of gene regulatory mechanisms that control TIS usage and, consequently, the translation of N-terminal proteoform pairs.
Collapse
Affiliation(s)
- Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| | - Valdes Snauwaert
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| |
Collapse
|
3
|
Williams G, Couchman L, Taylor DR, Sandhu JK, Slingsby OC, Ng LL, Moniz CF, Jones DJL, Maxwell CB. Use of Nonhuman Sera as a Highly Cost-Effective Internal Standard for Quantitation of Multiple Human Proteins Using Species-Specific Tryptic Peptides: Applicability in Clinical LC-MS Analyses. J Proteome Res 2024. [PMID: 38533909 DOI: 10.1021/acs.jproteome.3c00762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
Quantitation of proteins using liquid chromatography-tandem mass spectrometry (LC-MS/MS) is complex, with a multiplicity of options ranging from label-free techniques to chemically and metabolically labeling proteins. Increasingly, for clinically relevant analyses, stable isotope-labeled (SIL) internal standards (ISs) represent the "gold standard" for quantitation due to their similar physiochemical properties to the analyte, wide availability, and ability to multiplex to several peptides. However, the purchase of SIL-ISs is a resource-intensive step in terms of cost and time, particularly for screening putative biomarker panels of hundreds of proteins. We demonstrate an alternative strategy utilizing nonhuman sera as the IS for quantitation of multiple human proteins. We demonstrate the effectiveness of this strategy using two high abundance clinically relevant analytes, vitamin D binding protein [Gc globulin] (DBP) and albumin (ALB). We extend this to three putative risk markers for cardiovascular disease: plasma protease C1 inhibitor (SERPING1), annexin A1 (ANXA1), and protein kinase, DNA-activated catalytic subunit (PRKDC). The results show highly specific, reproducible, and linear measurement of the proteins of interest with comparable precision and accuracy to the gold standard SIL-IS technique. This approach may not be applicable to every protein, but for many proteins it can offer a cost-effective solution to LC-MS/MS protein quantitation.
Collapse
Affiliation(s)
- Geraldine Williams
- Leicester van Geest MS-OMICS Facility, Hodgkin Building, University of Leicester, Leicester LE1 9HN, United Kingdom
- Department of Cardiovascular Sciences and NIHR Leicester Cardiovascular Biomedical Research Unit, Glenfield Hospital, Leicester LE3 9QP, United Kingdom
| | - Lewis Couchman
- Leicester Cancer Research Centre, RKCSB, University of Leicester, Leicester LE2 7LX, United Kingdom
- Viapath Analytics, King's College Hospital, Denmark Hill, London SE5 9RS, United Kingdom
- Department of Clinical Biochemistry, King's College Hospital, Denmark Hill, London SE5 9RS, United Kingdom
| | - David R Taylor
- Viapath Analytics, King's College Hospital, Denmark Hill, London SE5 9RS, United Kingdom
| | - Jatinderpal K Sandhu
- Leicester van Geest MS-OMICS Facility, Hodgkin Building, University of Leicester, Leicester LE1 9HN, United Kingdom
- Department of Cardiovascular Sciences and NIHR Leicester Cardiovascular Biomedical Research Unit, Glenfield Hospital, Leicester LE3 9QP, United Kingdom
| | - Oliver C Slingsby
- Leicester van Geest MS-OMICS Facility, Hodgkin Building, University of Leicester, Leicester LE1 9HN, United Kingdom
- Department of Cardiovascular Sciences and NIHR Leicester Cardiovascular Biomedical Research Unit, Glenfield Hospital, Leicester LE3 9QP, United Kingdom
| | - Leong L Ng
- Leicester van Geest MS-OMICS Facility, Hodgkin Building, University of Leicester, Leicester LE1 9HN, United Kingdom
- Department of Cardiovascular Sciences and NIHR Leicester Cardiovascular Biomedical Research Unit, Glenfield Hospital, Leicester LE3 9QP, United Kingdom
| | - Cajetan F Moniz
- Department of Clinical Biochemistry, King's College Hospital, Denmark Hill, London SE5 9RS, United Kingdom
| | - Donald J L Jones
- Leicester van Geest MS-OMICS Facility, Hodgkin Building, University of Leicester, Leicester LE1 9HN, United Kingdom
- Leicester Cancer Research Centre, RKCSB, University of Leicester, Leicester LE2 7LX, United Kingdom
- Department of Cardiovascular Sciences and NIHR Leicester Cardiovascular Biomedical Research Unit, Glenfield Hospital, Leicester LE3 9QP, United Kingdom
| | - Colleen B Maxwell
- Leicester van Geest MS-OMICS Facility, Hodgkin Building, University of Leicester, Leicester LE1 9HN, United Kingdom
- Department of Cardiovascular Sciences and NIHR Leicester Cardiovascular Biomedical Research Unit, Glenfield Hospital, Leicester LE3 9QP, United Kingdom
| |
Collapse
|
4
|
Fuchs S, Engelmann S. Small proteins in bacteria - Big challenges in prediction and identification. Proteomics 2023; 23:e2200421. [PMID: 37609810 DOI: 10.1002/pmic.202200421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/24/2023]
Abstract
Proteins with up to 100 amino acids have been largely overlooked due to the challenges associated with predicting and identifying them using traditional methods. Recent advances in bioinformatics and machine learning, DNA sequencing, RNA and Ribo-seq technologies, and mass spectrometry (MS) have greatly facilitated the detection and characterisation of these elusive proteins in recent years. This has revealed their crucial role in various cellular processes including regulation, signalling and transport, as toxins and as folding helpers for protein complexes. Consequently, the systematic identification and characterisation of these proteins in bacteria have emerged as a prominent field of interest within the microbial research community. This review provides an overview of different strategies for predicting and identifying these proteins on a large scale, leveraging the power of these advanced technologies. Furthermore, the review offers insights into the future developments that may be expected in this field.
Collapse
Affiliation(s)
- Stephan Fuchs
- Genome Competence Center (MF1), Department MFI, Robert-Koch-Institut, Berlin, Germany
| | - Susanne Engelmann
- Institute for Microbiology, Technische Universität Braunschweig, Braunschweig, Germany
- Microbial Proteomics, Helmholtzzentrum für Infektionsforschung GmbH, Braunschweig, Germany
| |
Collapse
|
5
|
Wacholder A, Carvunis AR. Biological factors and statistical limitations prevent detection of most noncanonical proteins by mass spectrometry. PLoS Biol 2023; 21:e3002409. [PMID: 38048358 PMCID: PMC10721188 DOI: 10.1371/journal.pbio.3002409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 12/14/2023] [Accepted: 10/30/2023] [Indexed: 12/06/2023] Open
Abstract
Ribosome profiling experiments indicate pervasive translation of short open reading frames (ORFs) outside of annotated protein-coding genes. However, shotgun mass spectrometry (MS) experiments typically detect only a small fraction of the predicted protein products of this noncanonical translation. The rarity of detection could indicate that most predicted noncanonical proteins are rapidly degraded and not present in the cell; alternatively, it could reflect technical limitations. Here, we leveraged recent advances in ribosome profiling and MS to investigate the factors limiting detection of noncanonical proteins in yeast. We show that the low detection rate of noncanonical ORF products can largely be explained by small size and low translation levels and does not indicate that they are unstable or biologically insignificant. In particular, proteins encoded by evolutionarily young genes, including those with well-characterized biological roles, are too short and too lowly expressed to be detected by shotgun MS at current detection sensitivities. Additionally, we find that decoy biases can give misleading estimates of noncanonical protein false discovery rates, potentially leading to false detections. After accounting for these issues, we found strong evidence for 4 noncanonical proteins in MS data, which were also supported by evolution and translation data. These results illustrate the power of MS to validate unannotated genes predicted by ribosome profiling, but also its substantial limitations in finding many biologically relevant lowly expressed proteins.
Collapse
Affiliation(s)
- Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
6
|
Simoens L, Fijalkowski I, Van Damme P. Exposing the small protein load of bacterial life. FEMS Microbiol Rev 2023; 47:fuad063. [PMID: 38012116 PMCID: PMC10723866 DOI: 10.1093/femsre/fuad063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 11/10/2023] [Accepted: 11/24/2023] [Indexed: 11/29/2023] Open
Abstract
The ever-growing repertoire of genomic techniques continues to expand our understanding of the true diversity and richness of prokaryotic genomes. Riboproteogenomics laid the foundation for dynamic studies of previously overlooked genomic elements. Most strikingly, bacterial genomes were revealed to harbor robust repertoires of small open reading frames (sORFs) encoding a diverse and broadly expressed range of small proteins, or sORF-encoded polypeptides (SEPs). In recent years, continuous efforts led to great improvements in the annotation and characterization of such proteins, yet many challenges remain to fully comprehend the pervasive nature of small proteins and their impact on bacterial biology. In this work, we review the recent developments in the dynamic field of bacterial genome reannotation, catalog the important biological roles carried out by small proteins and identify challenges obstructing the way to full understanding of these elusive proteins.
Collapse
Affiliation(s)
- Laure Simoens
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| | - Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| |
Collapse
|
7
|
Son J, Na S, Paek E. DbyDeep: Exploration of MS-Detectable Peptides via Deep Learning. Anal Chem 2023; 95:11193-11200. [PMID: 37459568 PMCID: PMC10401496 DOI: 10.1021/acs.analchem.3c00460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 07/05/2023] [Indexed: 08/02/2023]
Abstract
Predicting peptide detectability is useful in a variety of mass spectrometry (MS)-based proteomics applications, particularly targeted proteomics. However, most machine learning-based computational methods have relied solely on information from the peptide itself, such as its amino acid sequences or physicochemical properties, despite the fact that peptides detected by MS are dependent on many factors, including protein sample preparation, digestion, separation, ionization, and precursor selection during MS experiments. DbyDeep (Detectability by Deep learning) is an innovative end-to-end LSTM network model for peptide detectability prediction that incorporates sequence contexts of peptides and their cleavage sites (by protease). Utilizing the cleavage site contexts could improve the performance of prediction, and DbyDeep outperformed existing methods in predicting peptides recognizable from multiple MS/MS data sets with diverse species and MS instruments. We argue for the necessity of a learning model that encompasses several contexts associated with peptide detection, as opposed to depending just on peptide sequences. There is a Python implementation of DbyDeep at https://github.com/BISCodeRepo/DbyDeep.
Collapse
Affiliation(s)
- Juho Son
- Department
of Computer Science, Hanyang University, Seoul 04763, Republic of Korea
| | - Seungjin Na
- Department
of Computer Science, Hanyang University, Seoul 04763, Republic of Korea
- Institute
for Artificial Intelligence Research, Hanyang
University, Seoul 04763, Republic
of Korea
| | - Eunok Paek
- Department
of Computer Science, Hanyang University, Seoul 04763, Republic of Korea
- Institute
for Artificial Intelligence Research, Hanyang
University, Seoul 04763, Republic
of Korea
| |
Collapse
|
8
|
Abdul-Khalek N, Wimmer R, Overgaard MT, Gregersen Echers S. Insight on physicochemical properties governing peptide MS1 response in HPLC-ESI-MS/MS: A deep learning approach. Comput Struct Biotechnol J 2023; 21:3715-3727. [PMID: 37560124 PMCID: PMC10407266 DOI: 10.1016/j.csbj.2023.07.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 08/11/2023] Open
Abstract
Accurate and absolute quantification of peptides in complex mixtures using quantitative mass spectrometry (MS)-based methods requires foreground knowledge and isotopically labeled standards, thereby increasing analytical expenses, time consumption, and labor, thus limiting the number of peptides that can be accurately quantified. This originates from differential ionization efficiency between peptides and thus, understanding the physicochemical properties that influence the ionization and response in MS analysis is essential for developing less restrictive label-free quantitative methods. Here, we used equimolar peptide pool repository data to develop a deep learning model capable of identifying amino acids influencing the MS1 response. By using an encoder-decoder with an attention mechanism and correlating attention weights with amino acid physicochemical properties, we obtain insight on properties governing the peptide-level MS1 response within the datasets. While the problem cannot be described by one single set of amino acids and properties, distinct patterns were reproducibly obtained. Properties are grouped in three main categories related to peptide hydrophobicity, charge, and structural propensities. Moreover, our model can predict MS1 intensity output under defined conditions based solely on peptide sequence input. Using a refined training dataset, the model predicted log-transformed peptide MS1 intensities with an average error of 9.7 ± 0.5% based on 5-fold cross validation, and outperformed random forest and ridge regression models on both log-transformed and real scale data. This work demonstrates how deep learning can facilitate identification of physicochemical properties influencing peptide MS1 responses, but also illustrates how sequence-based response prediction and label-free peptide-level quantification may impact future workflows within quantitative proteomics.
Collapse
Affiliation(s)
- Naim Abdul-Khalek
- Department of Chemistry and Bioscience, Aalborg University, Aalborg 9220, Denmark
| | - Reinhard Wimmer
- Department of Chemistry and Bioscience, Aalborg University, Aalborg 9220, Denmark
| | | | | |
Collapse
|
9
|
Neely BA, Dorfer V, Martens L, Bludau I, Bouwmeester R, Degroeve S, Deutsch EW, Gessulat S, Käll L, Palczynski P, Payne SH, Rehfeldt TG, Schmidt T, Schwämmle V, Uszkoreit J, Vizcaíno JA, Wilhelm M, Palmblad M. Toward an Integrated Machine Learning Model of a Proteomics Experiment. J Proteome Res 2023; 22:681-696. [PMID: 36744821 PMCID: PMC9990124 DOI: 10.1021/acs.jproteome.2c00711] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.
Collapse
Affiliation(s)
- Benjamin A Neely
- National Institute of Standards and Technology, Charleston, South Carolina 29412, United States
| | - Viktoria Dorfer
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Softwarepark 11, 4232 Hagenberg, Austria
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, 9000 Ghent, Belgium
| | - Isabell Bludau
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, 9000 Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, 9000 Ghent, Belgium
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | | | - Lukas Käll
- Science for Life Laboratory, KTH - Royal Institute of Technology, 171 21 Solna, Sweden
| | - Pawel Palczynski
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense, Denmark
| | - Samuel H Payne
- Department of Biology, Brigham Young University, Provo, Utah 84602, United States
| | - Tobias Greisager Rehfeldt
- Institute for Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark
| | | | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense, Denmark
| | - Julian Uszkoreit
- Medical Proteome Analysis, Center for Protein Diagnostics (ProDi), Ruhr University Bochum, 44801 Bochum, Germany.,Medizinisches Proteom-Center, Medical Faculty, Ruhr University Bochum, 44801 Bochum, Germany
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich (TUM), 85354 Freising, Germany
| | - Magnus Palmblad
- Leiden University Medical Center, Postbus 9600, 2300 RC Leiden, The Netherlands
| |
Collapse
|
10
|
Pauletti BA, Granato DC, M Carnielli C, Câmara GA, Normando AGC, Telles GP, Leme AFP. Typic: A Practical and Robust Tool to Rank Proteotypic Peptides for Targeted Proteomics. J Proteome Res 2023; 22:539-545. [PMID: 36480281 DOI: 10.1021/acs.jproteome.2c00585] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The selection of a suitable proteotypic peptide remains a challenge for designing a targeted quantitative proteomics assay. Although the criteria are well-established in the literature, the selection of these peptides is often performed in a subjective and time-consuming manner. Here, we have developed a practical and semiautomated workflow implemented in an open-source program named Typic. Typic is designed to run in a command line and a graphical interface to help selecting a list of proteotypic peptides for targeted quantitation. The tool combines the input data and downloads additional data from public repositories to produce a file per protein as output. Each output file includes relevant information to the selection of proteotypic peptides organized in a table, a colored ranking of peptides according to their potential value as targets for quantitation and auxiliary plots to assist users in the task of proteotypic peptides selection. Taken together, Typic leads to a practical and straightforward data extraction from multiple data sets, allowing the identification of most suitable proteotypic peptides based on established criteria, in an unbiased and standardized manner, ultimately leading to a more robust targeted proteomics assay.
Collapse
Affiliation(s)
- Bianca A Pauletti
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências (LNBio), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas, 13083-970 São Paulo, Brazil
| | - Daniela C Granato
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências (LNBio), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas, 13083-970 São Paulo, Brazil
| | - Carolina M Carnielli
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências (LNBio), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas, 13083-970 São Paulo, Brazil
| | - Guilherme A Câmara
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências (LNBio), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas, 13083-970 São Paulo, Brazil
| | - Ana Gabriela C Normando
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências (LNBio), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas, 13083-970 São Paulo, Brazil
| | - Guilherme P Telles
- Instituto de Computação, Universidade Estadual de Campinas (UNICAMP), Campinas, 13083-852 São Paulo, Brazil
| | - Adriana F Paes Leme
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências (LNBio), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas, 13083-970 São Paulo, Brazil
| |
Collapse
|
11
|
Rusilowicz M, Newman DW, Creamer DR, Johnson J, Adair K, Harman VM, Grant CM, Beynon RJ, Hubbard SJ. AlacatDesigner─Computational Design of Peptide Concatamers for Protein Quantitation. J Proteome Res 2023; 22:594-604. [PMID: 36688735 PMCID: PMC9903321 DOI: 10.1021/acs.jproteome.2c00608] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Protein quantitation via mass spectrometry relies on peptide proxies for the parent protein from which abundances are estimated. Owing to the variability in signal from individual peptides, accurate absolute quantitation usually relies on the addition of an external standard. Typically, this involves stable isotope-labeled peptides, delivered singly or as a concatenated recombinant protein. Consequently, the selection of the most appropriate surrogate peptides and the attendant design in recombinant proteins termed QconCATs are challenges for proteome science. QconCATs can now be built in a "a-la-carte" assembly method using synthetic biology: ALACATs. To assist their design, we present "AlacatDesigner", a tool that supports the peptide selection for recombinant protein standards based on the user's target protein. The user-customizable tool considers existing databases, occurrence in the literature, potential post-translational modifications, predicted miscleavage, predicted divergence of the peptide and protein quantifications, and ionization potential within the mass spectrometer. We show that peptide selections are enriched for good proteotypic and quantotypic candidates compared to empirical data. The software is freely available to use either via a web interface AlacatDesigner, downloaded as a Desktop application or imported as a Python package for the command line interface or in scripts.
Collapse
Affiliation(s)
- Martin Rusilowicz
- Division
of Evolution, Infection and Genomics, School of Biological Sciences,
Faculty of Biology, Medicine and Health, Manchester Academic Health
Science Centre, University of Manchester, Manchester M13 9PT, United Kingdom
| | - David W. Newman
- Division
of Evolution, Infection and Genomics, School of Biological Sciences,
Faculty of Biology, Medicine and Health, Manchester Academic Health
Science Centre, University of Manchester, Manchester M13 9PT, United Kingdom
| | - Declan R. Creamer
- Division
of Molecular and Cellular Function, School of Biological Sciences,
Faculty of Biology, Medicine and Health, Manchester Academic Health
Science Centre, University of Manchester, Manchester M13 9PT, United Kingdom
| | - James Johnson
- GeneMill,
Institute of Systems Molecular and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, United
Kingdom
| | - Kareena Adair
- Centre
for Proteome Research, Institute of Systems and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, United
Kingdom
| | - Victoria M. Harman
- Centre
for Proteome Research, Institute of Systems and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, United
Kingdom
| | - Chris M. Grant
- Division
of Molecular and Cellular Function, School of Biological Sciences,
Faculty of Biology, Medicine and Health, Manchester Academic Health
Science Centre, University of Manchester, Manchester M13 9PT, United Kingdom
| | - Robert J. Beynon
- Centre
for Proteome Research, Institute of Systems and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, United
Kingdom
| | - Simon J. Hubbard
- Division
of Evolution, Infection and Genomics, School of Biological Sciences,
Faculty of Biology, Medicine and Health, Manchester Academic Health
Science Centre, University of Manchester, Manchester M13 9PT, United Kingdom,
| |
Collapse
|
12
|
Sun B, Smialowski P, Aftab W, Schmidt A, Forne I, Straub T, Imhof A. Improving SWATH-MS analysis by deep-learning. Proteomics 2022; 23:e2200179. [PMID: 36571325 DOI: 10.1002/pmic.202200179] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 11/22/2022] [Accepted: 12/21/2022] [Indexed: 12/27/2022]
Abstract
Data-independent acquisition (DIA) of tandem mass spectrometry spectra has emerged as a promising technology to improve coverage and quantification of proteins in complex mixtures. The success of DIA experiments is dependent on the quality of spectral libraries used for data base searching. Frequently, these libraries need to be generated by labor and time intensive data dependent acquisition (DDA) experiments. Recently, several algorithms have been published that allow the generation of theoretical libraries by an efficient prediction of retention time and intensity of the fragment ions. Sequential windowed acquisition of all theoretical fragment ion spectra mass spectrometry (SWATH-MS) is a DIA method that can be applied at an unprecedented speed, but the fragmentation spectra suffer from a lower quality than data acquired on Orbitrap instruments. To reliably generate theoretical libraries that can be used in SWATH experiments, we developed deep-learning for SWATH analysis (dpSWATH), to improve the sensitivity and specificity of data generated by Q-TOF mass spectrometers. The theoretical library built by dpSWATH allowed us to increase the identification rate of proteins compared to traditional or library-free methods. Based on our analysis we conclude that dpSWATH is a superior prediction framework for SWATH-MS measurements than other algorithms based on Orbitrap data.
Collapse
Affiliation(s)
- Bo Sun
- Faculty of Medicine, Biomedical Center, Protein Analysis Unit, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Pawel Smialowski
- Institute of Stem Cell Research, Helmholtz Center Munich, German Research Center for Environmental Health, Germany.,Faculty of Medicine, Biomedical Center, Computational Biology Unit, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Wasim Aftab
- Faculty of Medicine, Biomedical Center, Protein Analysis Unit, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Andreas Schmidt
- Faculty of Medicine, Biomedical Center, Protein Analysis Unit, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Ignasi Forne
- Faculty of Medicine, Biomedical Center, Protein Analysis Unit, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Tobias Straub
- Faculty of Medicine, Biomedical Center, Computational Biology Unit, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Axel Imhof
- Faculty of Medicine, Biomedical Center, Protein Analysis Unit, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| |
Collapse
|
13
|
PD-BertEDL: An Ensemble Deep Learning Method Using BERT and Multivariate Representation to Predict Peptide Detectability. Int J Mol Sci 2022; 23:ijms232012385. [PMID: 36293242 PMCID: PMC9604182 DOI: 10.3390/ijms232012385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 10/11/2022] [Accepted: 10/12/2022] [Indexed: 12/03/2022] Open
Abstract
Peptide detectability is defined as the probability of identifying a peptide from a mixture of standard samples, which is a key step in protein identification and analysis. Exploring effective methods for predicting peptide detectability is helpful for disease treatment and clinical research. However, most existing computational methods for predicting peptide detectability rely on a single information. With the increasing complexity of feature representation, it is necessary to explore the influence of multivariate information on peptide detectability. Thus, we propose an ensemble deep learning method, PD-BertEDL. Bidirectional encoder representations from transformers (BERT) is introduced to capture the context information of peptides. Context information, sequence information, and physicochemical information of peptides were combined to construct the multivariate feature space of peptides. We use different deep learning methods to capture the high-quality features of different categories of peptides information and use the average fusion strategy to integrate three model prediction results to solve the heterogeneity problem and to enhance the robustness and adaptability of the model. The experimental results show that PD-BertEDL is superior to the existing prediction methods, which can effectively predict peptide detectability and provide strong support for protein identification and quantitative analysis, as well as disease treatment.
Collapse
|
14
|
Abstract
Paleoproteomics, the study of ancient proteins, is a rapidly growing field at the intersection of molecular biology, paleontology, archaeology, paleoecology, and history. Paleoproteomics research leverages the longevity and diversity of proteins to explore fundamental questions about the past. While its origins predate the characterization of DNA, it was only with the advent of soft ionization mass spectrometry that the study of ancient proteins became truly feasible. Technological gains over the past 20 years have allowed increasing opportunities to better understand preservation, degradation, and recovery of the rich bioarchive of ancient proteins found in the archaeological and paleontological records. Growing from a handful of studies in the 1990s on individual highly abundant ancient proteins, paleoproteomics today is an expanding field with diverse applications ranging from the taxonomic identification of highly fragmented bones and shells and the phylogenetic resolution of extinct species to the exploration of past cuisines from dental calculus and pottery food crusts and the characterization of past diseases. More broadly, these studies have opened new doors in understanding past human-animal interactions, the reconstruction of past environments and environmental changes, the expansion of the hominin fossil record through large scale screening of nondiagnostic bone fragments, and the phylogenetic resolution of the vertebrate fossil record. Even with these advances, much of the ancient proteomic record still remains unexplored. Here we provide an overview of the history of the field, a summary of the major methods and applications currently in use, and a critical evaluation of current challenges. We conclude by looking to the future, for which innovative solutions and emerging technology will play an important role in enabling us to access the still unexplored "dark" proteome, allowing for a fuller understanding of the role ancient proteins can play in the interpretation of the past.
Collapse
Affiliation(s)
- Christina Warinner
- Department
of Anthropology, Harvard University, Cambridge, Massachusetts 02138, United States
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig 04103, Germany
| | - Kristine Korzow Richter
- Department
of Anthropology, Harvard University, Cambridge, Massachusetts 02138, United States
| | - Matthew J. Collins
- Department
of Archaeology, Cambridge University, Cambridge CB2 3DZ, United Kingdom
- Section
for Evolutionary Genomics, Globe Institute,
University of Copenhagen, Copenhagen 1350, Denmark
| |
Collapse
|
15
|
Yang Y, Lin L, Qiao L. Deep learning approaches for data-independent acquisition proteomics. Expert Rev Proteomics 2021; 18:1031-1043. [PMID: 34918987 DOI: 10.1080/14789450.2021.2020654] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
INTRODUCTION Data-independent acquisition (DIA) is an emerging technology for large-scale proteomic studies. DIA data analysis methods are evolving rapidly, and deep learning has cut a conspicuous figure in this field. AREAS COVERED This review discusses and provides an overview of the deep learning methods that are used for DIA data analysis, including spectral library prediction, feature scoring, and statistical control in peptide-centric analysis, as well as de novo peptide sequencing. Literature searches were performed for articles, including preprints, up to December 2021 from PubMed, Scopus, and Web of Science databases. EXPERT OPINION While spectral library prediction has broken through the limitation on proteome coverage of experimental libraries, the statistical burden due to the large query space is the remaining challenge of utilizing proteome-wide predicted libraries. Analysis of post-translational modifications is another promising direction of deep learning-based DIA methods.
Collapse
Affiliation(s)
- Yi Yang
- Department of Chemistry, Shanghai Stomatological Hospital, and Minhang Hospital, Fudan University, Shanghai China
| | - Ling Lin
- Department of Chemistry, Shanghai Stomatological Hospital, and Minhang Hospital, Fudan University, Shanghai China
| | - Liang Qiao
- Department of Chemistry, Shanghai Stomatological Hospital, and Minhang Hospital, Fudan University, Shanghai China
| |
Collapse
|
16
|
Masuda K, Kasahara K, Narumi R, Shimojo M, Shimizu Y. Versatile and multiplexed mass spectrometry-based absolute quantification with cell-free-synthesized internal standard peptides. J Proteomics 2021; 251:104393. [PMID: 34678518 DOI: 10.1016/j.jprot.2021.104393] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 10/04/2021] [Accepted: 10/04/2021] [Indexed: 10/20/2022]
Abstract
Preparation of stable isotope-labeled internal standard peptides is crucial for mass spectrometry (MS)-based targeted proteomics. Herein, we developed versatile and multiplexed absolute protein quantification method using MS. A previously developed method based on the cell-free peptide synthesis system, termed MS-based quantification by isotope-labeled cell-free products (MS-QBiC), was improved for multiple peptide synthesis in one-pot reaction. We pluralized the quantification tags used for the quantification of synthesized peptides and thus, made it possible to use cell-free synthesized isotope-labeled peptides as mixtures for the absolute quantification. The improved multiplexed MS-QBiC method was proved to be applied to clarify ribosomal proteins stoichiometry in the ribosomal subunit, one of the largest cellular complexes. The study demonstrates that the developed method enables the preparation of several dozens and even several hundreds of internal standard peptides within a few days for quantification of multiple proteins with only a single-run of MS analysis. SIGNIFICANCE: The developed method can be applied for the preparation of internal standard peptides without limiting the number of peptides to be synthesized, which may result in more practical screening of quantitatively reliable peptides, one of the fundamental steps in the reliable absolute quantification using MS. Furthermore, the method is highly versatile for proteome analysis of any organisms or species without any cDNA or SIL peptide libraries. The quantification can be finished in a few days including design and preparation of appropriate SIL peptides using small-scale batch cell-free reactions, which has a potential to be a part of the standard methodology in a field of quantitative proteomics.
Collapse
Affiliation(s)
- Keiko Masuda
- Laboratory for Cell-Free Protein Synthesis, RIKEN Center for Biosystems Dynamics Research, Suita, Osaka 565-0874, Japan
| | - Keiko Kasahara
- Department of Surgery, Kyoto University Graduate School of Medicine, Sakyo-ku, Kyoto, Kyoto 606-8501, Japan; Laboratory of Proteome Research, National Institutes of Biomedical Innovation, Health and Nutrition, Ibaraki, Osaka 567-0085, Japan
| | - Ryohei Narumi
- Laboratory of Proteome Research, National Institutes of Biomedical Innovation, Health and Nutrition, Ibaraki, Osaka 567-0085, Japan
| | - Masaru Shimojo
- Laboratory for Cell-Free Protein Synthesis, RIKEN Center for Biosystems Dynamics Research, Suita, Osaka 565-0874, Japan
| | - Yoshihiro Shimizu
- Laboratory for Cell-Free Protein Synthesis, RIKEN Center for Biosystems Dynamics Research, Suita, Osaka 565-0874, Japan.
| |
Collapse
|
17
|
Sun B, Smialowski P, Straub T, Imhof A. Investigation and Highly Accurate Prediction of Missed Tryptic Cleavages by Deep Learning. J Proteome Res 2021; 20:3749-3757. [PMID: 34137619 DOI: 10.1021/acs.jproteome.1c00346] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Trypsin is one of the most important and widely used proteolytic enzymes in mass spectrometry (MS)-based proteomic research. It exclusively cleaves peptide bonds at the C-terminus of lysine and arginine. However, the cleavage is also affected by several factors, including specific surrounding amino acids, resulting in frequent incomplete proteolysis and subsequent issues in peptide identification and quantification. The accurate annotations on missed cleavages are crucial to database searching in MS analysis. Here, we present deep-learning predicting missed cleavages (dpMC), a novel algorithm for the prediction of missed trypsin cleavage sites. This algorithm provides a very high accuracy for predicting missed cleavages with area under the curves (AUCs) of cross-validation and holdout testing above 0.99, along with the mean F1 score and the Matthews correlation coefficient (MCC) of 0.9677 and 0.9349, respectively. We tested our algorithm on data sets from different species and different experimental conditions, and its performance outperforms other currently available prediction methods. In addition, the method also provides a better insight into the detailed rules of trypsin cleavages coupled with propensity and motif analysis. Moreover, our method can be integrated into database searching in the MS analysis to identify and quantify mass spectra effectively and efficiently.
Collapse
Affiliation(s)
- Bo Sun
- Biomedical Center, Protein Analysis Unit, Faculty of Medicine, Ludwig-Maximilians-Universität München, Großhaderner Strasse 9, 82152 Planegg-Martinsried, Germany
| | - Pawel Smialowski
- Institute of Stem Cell Research, Helmholtz Center Munich, German Research Center for Environmental Health, 85764 Munich, Germany.,Biomedical Center, Computational Biology Unit, Faculty of Medicine, Ludwig-Maximilians-Universität München, Großhaderner Strasse 9, 82152 Planegg-Martinsried, Germany
| | - Tobias Straub
- Biomedical Center, Computational Biology Unit, Faculty of Medicine, Ludwig-Maximilians-Universität München, Großhaderner Strasse 9, 82152 Planegg-Martinsried, Germany
| | - Axel Imhof
- Biomedical Center, Protein Analysis Unit, Faculty of Medicine, Ludwig-Maximilians-Universität München, Großhaderner Strasse 9, 82152 Planegg-Martinsried, Germany
| |
Collapse
|
18
|
Cheng H, Rao B, Liu L, Cui L, Xiao G, Su R, Wei L. PepFormer: End-to-End Transformer-Based Siamese Network to Predict and Enhance Peptide Detectability Based on Sequence Only. Anal Chem 2021; 93:6481-6490. [DOI: 10.1021/acs.analchem.1c00354] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Hao Cheng
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Bing Rao
- School of Mechanical Electronic & Information Engineering, China University of Mining &Technology, Beijing 221008, China
| | - Lei Liu
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Lizhen Cui
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Guobao Xiao
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou 350000, China
| | - Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin 300384, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou 350000, China
| |
Collapse
|
19
|
Yang J, Gao Z, Ren X, Sheng J, Xu P, Chang C, Fu Y. DeepDigest: Prediction of Protein Proteolytic Digestion with Deep Learning. Anal Chem 2021; 93:6094-6103. [DOI: 10.1021/acs.analchem.0c04704] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Jinghan Yang
- CEMS, NCMIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P. R. China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| | - Zhiqiang Gao
- CEMS, NCMIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P. R. China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| | - Xiuhan Ren
- School of Sciences, China University of Mining & Technology, Beijing 100083, P. R. China
| | - Jie Sheng
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, P. R. China
| | - Ping Xu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, P. R. China
| | - Cheng Chang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, P. R. China
| | - Yan Fu
- CEMS, NCMIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P. R. China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| |
Collapse
|
20
|
Jafarpour A, Gregersen S, Marciel Gomes R, Marcatili P, Hegelund Olsen T, Jacobsen C, Overgaard MT, Sørensen ADM. Biofunctionality of Enzymatically Derived Peptides from Codfish ( Gadus morhua) Frame: Bulk In Vitro Properties, Quantitative Proteomics, and Bioinformatic Prediction. Mar Drugs 2020; 18:E599. [PMID: 33260992 PMCID: PMC7759894 DOI: 10.3390/md18120599] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 11/20/2020] [Accepted: 11/24/2020] [Indexed: 12/15/2022] Open
Abstract
Protein hydrolysates show great promise as bioactive food and feed ingredients and for valorization of side-streams from e.g., the fish processing industry. We present a novel approach for hydrolysate characterization that utilizes proteomics data for calculation of weighted mean peptide properties (length, molecular weight, and charge) and peptide-level abundance estimation. Using a novel bioinformatic approach for subsequent prediction of biofunctional properties of identified peptides, we are able to provide an unprecedented, in-depth characterization. The study further characterizes bulk emulsifying, foaming, and in vitro antioxidative properties of enzymatic hydrolysates derived from cod frame by application of Alcalase and Neutrase, individually and sequentially, as well as the influence of heat pre-treatment. All hydrolysates displayed comparable or higher emulsifying activity and stability than sodium caseinate. Heat-treatment significantly increased stability but showed a negative effect on the activity and degree of hydrolysis. Lower degrees of hydrolysis resulted in significantly higher chelating activity, while the opposite was observed for radical scavenging activity. Combining peptide abundance with bioinformatic prediction, we identified several peptides that are likely linked to the observed differences in bulk emulsifying properties. The study highlights the prospects of applying proteomics and bioinformatics for hydrolysate characterization and in food protein science.
Collapse
Affiliation(s)
- Ali Jafarpour
- Research Group for Bioactives-Analysis and Application, Division of Food Technology, National Food Institute, Technical University of Denmark, 2800 Kongens Lyngby, Denmark; (R.M.G.); (C.J.); (A.-D.M.S.)
| | - Simon Gregersen
- Section for Biotechnology, Department of Chemistry and Bioscience, Aalborg University, 9220 Aalborg, Denmark;
| | - Rocio Marciel Gomes
- Research Group for Bioactives-Analysis and Application, Division of Food Technology, National Food Institute, Technical University of Denmark, 2800 Kongens Lyngby, Denmark; (R.M.G.); (C.J.); (A.-D.M.S.)
| | - Paolo Marcatili
- Department of Health Technology, Technical University of Denmark, 2800 Kongens Lyngby, Denmark; (P.M.); (T.H.O.)
| | - Tobias Hegelund Olsen
- Department of Health Technology, Technical University of Denmark, 2800 Kongens Lyngby, Denmark; (P.M.); (T.H.O.)
| | - Charlotte Jacobsen
- Research Group for Bioactives-Analysis and Application, Division of Food Technology, National Food Institute, Technical University of Denmark, 2800 Kongens Lyngby, Denmark; (R.M.G.); (C.J.); (A.-D.M.S.)
| | - Michael Toft Overgaard
- Section for Biotechnology, Department of Chemistry and Bioscience, Aalborg University, 9220 Aalborg, Denmark;
| | - Ann-Dorit Moltke Sørensen
- Research Group for Bioactives-Analysis and Application, Division of Food Technology, National Food Institute, Technical University of Denmark, 2800 Kongens Lyngby, Denmark; (R.M.G.); (C.J.); (A.-D.M.S.)
| |
Collapse
|
21
|
Bouwmeester R, Gabriels R, Van Den Bossche T, Martens L, Degroeve S. The Age of Data-Driven Proteomics: How Machine Learning Enables Novel Workflows. Proteomics 2020; 20:e1900351. [PMID: 32267083 DOI: 10.1002/pmic.201900351] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 03/21/2020] [Indexed: 12/30/2022]
Abstract
A lot of energy in the field of proteomics is dedicated to the application of challenging experimental workflows, which include metaproteomics, proteogenomics, data independent acquisition (DIA), non-specific proteolysis, immunopeptidomics, and open modification searches. These workflows are all challenging because of ambiguity in the identification stage; they either expand the search space and thus increase the ambiguity of identifications, or, in the case of DIA, they generate data that is inherently more ambiguous. In this context, machine learning-based predictive models are now generating considerable excitement in the field of proteomics because these predictive models hold great potential to drastically reduce the ambiguity in the identification process of the above-mentioned workflows. Indeed, the field has already produced classical machine learning and deep learning models to predict almost every aspect of a liquid chromatography-mass spectrometry (LC-MS) experiment. Yet despite all the excitement, thorough integration of predictive models in these challenging LC-MS workflows is still limited, and further improvements to the modeling and validation procedures can still be made. Therefore, highly promising recent machine learning developments in proteomics are pointed out in this viewpoint, alongside some of the remaining challenges.
Collapse
Affiliation(s)
- Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| |
Collapse
|
22
|
Guan S, Taylor PP, Han Z, Moran MF, Ma B. Data Dependent-Independent Acquisition (DDIA) Proteomics. J Proteome Res 2020; 19:3230-3237. [PMID: 32539411 DOI: 10.1021/acs.jproteome.0c00186] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Data dependent acquisition (DDA) and data independent acquisition (DIA) are traditionally separate experimental paradigms in bottom-up proteomics. In this work, we developed a strategy combining the two experimental methods into a single LC-MS/MS run. We call the novel strategy data dependent-independent acquisition proteomics, or DDIA for short. Peptides identified from DDA scans by a conventional and robust DDA identification workflow provide useful information for interrogation of DIA scans. Deep learning based LC-MS/MS property prediction tools, developed previously, can be used repeatedly to produce spectral libraries facilitating DIA scan extraction. A complete DDIA data processing pipeline, including the modules for iRT vs RT calibration curve generation, DIA extraction classifier training, and false discovery rate control, has been developed. Compared to another spectral library-free method, DIA-Umpire, the DDIA method produced a similar number of peptide identifications, but nearly twice as many protein group identifications. The primary advantage of the DDIA method is that it requires minimal information for processing its data.
Collapse
Affiliation(s)
- Shenheng Guan
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada.,Program in Cell Biology and SPARC BioCentre, Hospital for Sick Children, 686 Bay Street, Toronto, Ontario M5G 0A4, Canada
| | - Paul P Taylor
- Rapid Novor Inc., Unit 450, 137 Glasgow Street, Kitchener, Ontario N2G 4X8, Canada
| | - Ziwei Han
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Michael F Moran
- Program in Cell Biology and SPARC BioCentre, Hospital for Sick Children, 686 Bay Street, Toronto, Ontario M5G 0A4, Canada.,Department of Molecular Genetics, University of Toronto, 686 Bay Street, Toronto, Ontario M5G 0A4, Canada
| | - Bin Ma
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|