1
|
Popova L, Carr RA, Carabetta VJ. Recent Contributions of Proteomics to Our Understanding of Reversible N ε-Lysine Acylation in Bacteria. J Proteome Res 2024; 23:2733-2749. [PMID: 38442041 PMCID: PMC11296938 DOI: 10.1021/acs.jproteome.3c00912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
Post-translational modifications (PTMs) have been extensively studied in both eukaryotes and prokaryotes. Lysine acetylation, originally thought to be a rare occurrence in bacteria, is now recognized as a prevalent and important PTM in more than 50 species. This expansion in interest in bacterial PTMs became possible with the advancement of mass spectrometry technology and improved reagents such as acyl-modification specific antibodies. In this Review, we discuss how mass spectrometry-based proteomic studies of lysine acetylation and other acyl modifications have contributed to our understanding of bacterial physiology, focusing on recently published studies from 2018 to 2023. We begin with a discussion of approaches used to study bacterial PTMs. Next, we discuss newly characterized acylomes, including acetylomes, succinylomes, and malonylomes, in different bacterial species. In addition, we examine proteomic contributions to our understanding of bacterial virulence and biofilm formation. Finally, we discuss the contributions of mass spectrometry to our understanding of the mechanisms of acetylation, both enzymatic and nonenzymatic. We end with a discussion of the current state of the field and possible future research avenues to explore.
Collapse
Affiliation(s)
- Liya Popova
- Department of Biomedical Sciences, Cooper Medical School of Rowan University, Camden, New Jersey 08103, United States
| | - Rachel A Carr
- Department of Biomedical Sciences, Cooper Medical School of Rowan University, Camden, New Jersey 08103, United States
| | - Valerie J Carabetta
- Department of Biomedical Sciences, Cooper Medical School of Rowan University, Camden, New Jersey 08103, United States
| |
Collapse
|
2
|
Roberts DS, Loo JA, Tsybin YO, Liu X, Wu S, Chamot-Rooke J, Agar JN, Paša-Tolić L, Smith LM, Ge Y. Top-down proteomics. NATURE REVIEWS. METHODS PRIMERS 2024; 4:38. [PMID: 39006170 PMCID: PMC11242913 DOI: 10.1038/s43586-024-00318-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 04/24/2024] [Indexed: 07/16/2024]
Abstract
Proteoforms, which arise from post-translational modifications, genetic polymorphisms and RNA splice variants, play a pivotal role as drivers in biology. Understanding proteoforms is essential to unravel the intricacies of biological systems and bridge the gap between genotypes and phenotypes. By analysing whole proteins without digestion, top-down proteomics (TDP) provides a holistic view of the proteome and can decipher protein function, uncover disease mechanisms and advance precision medicine. This Primer explores TDP, including the underlying principles, recent advances and an outlook on the future. The experimental section discusses instrumentation, sample preparation, intact protein separation, tandem mass spectrometry techniques and data collection. The results section looks at how to decipher raw data, visualize intact protein spectra and unravel data analysis. Additionally, proteoform identification, characterization and quantification are summarized, alongside approaches for statistical analysis. Various applications are described, including the human proteoform project and biomedical, biopharmaceutical and clinical sciences. These are complemented by discussions on measurement reproducibility, limitations and a forward-looking perspective that outlines areas where the field can advance, including potential future applications.
Collapse
Affiliation(s)
- David S Roberts
- Department of Chemistry, Stanford University, Stanford, CA, USA
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
| | - Joseph A Loo
- Department of Chemistry and Biochemistry, Department of Biological Chemistry, University of California - Los Angeles, Los Angeles, CA, USA
| | | | - Xiaowen Liu
- Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, USA
| | - Si Wu
- Department of Chemistry and Biochemistry, The University of Alabama, Tuscaloosa, AL, USA
| | | | - Jeffrey N Agar
- Departments of Chemistry and Chemical Biology and Pharmaceutical Sciences, Northeastern University, Boston, MA, USA
| | - Ljiljana Paša-Tolić
- Environmental and Molecular Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin, Madison, WI, USA
| | - Ying Ge
- Department of Chemistry, University of Wisconsin, Madison, WI, USA
- Department of Cell and Regenerative Biology, Human Proteomics Program, University of Wisconsin - Madison, Madison, WI, USA
| |
Collapse
|
3
|
Zhan Z, Wang L. Fast peak error correction algorithms for proteoform identification using top-down tandem mass spectra. Bioinformatics 2024; 40:btae149. [PMID: 38498847 PMCID: PMC11212493 DOI: 10.1093/bioinformatics/btae149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 03/05/2024] [Accepted: 03/15/2024] [Indexed: 03/20/2024] Open
Abstract
MOTIVATION Proteoform identification is an important problem in proteomics. The main task is to find a modified protein that best fits the input spectrum. To overcome the combinatorial explosion of possible proteoforms, the proteoform mass graph and spectrum mass graph are used to represent the protein database and the spectrum, respectively. The problem becomes finding an optimal alignment between the proteoform mass graph and the spectrum mass graph. Peak error correction is an important issue for computing an optimal alignment between the two input mass graphs. RESULTS We propose a faster algorithm for the error correction alignment of spectrum mass graph and proteoform mass graph problem and produce a program package TopMGFast. The newly designed algorithms require less space and running time so that we are able to compute global optimal alignments for the two input mass graphs in a reasonable time. For the local alignment version, experiments show that the running time of the new algorithm is reduced by 2.5 times. For the global alignment version, experiments show that the maximum mass errors between any pair of matched nodes in the alignments obtained by our method are within a small range as designed, while the alignments produced by the state-of-the-art method, TopMG, have very large maximum mass errors for many cases. The obtained alignment sizes are roughly the same for both TopMG and TopMGFast. Of course, TopMGFast needs more running time than TopMG. Therefore, our new algorithm can obtain more reliable global alignments within a reasonable time. This is the first time that global optimal error correction alignments can be obtained using real datasets. AVAILABILITY AND IMPLEMENTATION The source code of the algorithm is available at https://github.com/Zeirdo/TopMGFast.
Collapse
Affiliation(s)
- Zhaohui Zhan
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China
| | - Lusheng Wang
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China
- City University of Hong Kong Shenzhen Research Institution, ShenZhen, 518057, China
| |
Collapse
|
4
|
Walzer M, Jeong K, Tabb DL, Vizcaíno JA. TopDownApp: An open and modular platform for analysis and visualisation of top-down proteomics data. Proteomics 2024; 24:e2200403. [PMID: 37787899 DOI: 10.1002/pmic.202200403] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 09/13/2023] [Accepted: 09/13/2023] [Indexed: 10/04/2023]
Abstract
Although Top-down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise their adoption. In this context, there are numerous improvements that are possible in the area of open science practices, including a greater application of the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. These include, for example, increased data sharing practices and readily available open data standards. Additionally, the field would benefit from the development of open data analysis workflows that can enable data reuse of public datasets, something that is increasingly common in other proteomics fields.
Collapse
Affiliation(s)
- Mathias Walzer
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Kyowon Jeong
- Applied Bioinformatics, Computer Science Department, University of Tübingen, Tübingen, Germany
| | - David L Tabb
- Institut Pasteur, Université Paris Cité, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris, France
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| |
Collapse
|
5
|
Zancolli G, von Reumont BM, Anderluh G, Caliskan F, Chiusano ML, Fröhlich J, Hapeshi E, Hempel BF, Ikonomopoulou MP, Jungo F, Marchot P, de Farias TM, Modica MV, Moran Y, Nalbantsoy A, Procházka J, Tarallo A, Tonello F, Vitorino R, Zammit ML, Antunes A. Web of venom: exploration of big data resources in animal toxin research. Gigascience 2024; 13:giae054. [PMID: 39250076 PMCID: PMC11382406 DOI: 10.1093/gigascience/giae054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/01/2024] [Accepted: 07/13/2024] [Indexed: 09/10/2024] Open
Abstract
Research on animal venoms and their components spans multiple disciplines, including biology, biochemistry, bioinformatics, pharmacology, medicine, and more. Manipulating and analyzing the diverse array of data required for venom research can be challenging, and relevant tools and resources are often dispersed across different online platforms, making them less accessible to nonexperts. In this article, we address the multifaceted needs of the scientific community involved in venom and toxin-related research by identifying and discussing web resources, databases, and tools commonly used in this field. We have compiled these resources into a comprehensive table available on the VenomZone website (https://venomzone.expasy.org/10897). Furthermore, we highlight the challenges currently faced by researchers in accessing and using these resources and emphasize the importance of community-driven interdisciplinary approaches. We conclude by underscoring the significance of enhancing standards, promoting interoperability, and encouraging data and method sharing within the venom research community.
Collapse
Affiliation(s)
- Giulia Zancolli
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Björn Marcus von Reumont
- Goethe University Frankfurt, Faculty of Biological Sciences, 60438 Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
| | - Gregor Anderluh
- Department of Molecular Biology and Nanobiotechnology, National Institute of Chemistry, 1000 Ljubljana, Slovenia
| | - Figen Caliskan
- Department of Biology, Faculty of Science, Eskisehir Osmangazi University, 26040 Eskişehir, Turkey
| | - Maria Luisa Chiusano
- Department of Agricultural Sciences, University Federico II of Naples, 80055 Portici, Naples, Italy
- Department of Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy
| | - Jacob Fröhlich
- Veterinary Center for Resistance Research (TZR), Freie Universität Berlin, 14163 Berlin, Germany
| | - Evroula Hapeshi
- Department of Health Sciences, School of Life and Health Sciences, University of Nicosia, 1700 Nicosia, Cyprus
| | - Benjamin-Florian Hempel
- Veterinary Center for Resistance Research (TZR), Freie Universität Berlin, 14163 Berlin, Germany
| | - Maria P Ikonomopoulou
- Madrid Institute of Advanced Studies in Food, Precision Nutrition & Aging Program, 28049 Madrid, Spain
| | - Florence Jungo
- SIB Swiss Institute of Bioinformatics, Swiss-Prot Group, 1211 Geneva, Switzerland
| | - Pascale Marchot
- Laboratory Architecture et Fonction des Macromolécules Biologiques, Aix-Marseille University, Centre National de la Recherche Scientifique, Faculté des Sciences, Campus Luminy, 13288 Marseille, France
| | - Tarcisio Mendes de Farias
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Maria Vittoria Modica
- Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, 00198 Rome, Italy
| | - Yehu Moran
- Department of Ecology, Evolution and Behavior, Alexander Silberman Institute of Life Sciences, Faculty of Science, The Hebrew University of Jerusalem, 9190401 Jerusalem, Israel
| | - Ayse Nalbantsoy
- Engineering Faculty, Bioengineering Department, Ege University, 35100 Bornova-Izmir, Turkey
| | - Jan Procházka
- Laboratory of Transgenic Models of Diseases, Institute of Molecular Genetics of the Czech Academy of Sciences, 252 50 Vestec, Czech Republic
| | - Andrea Tarallo
- Institute of Research on Terrestrial Ecosystems (IRET), National Research Council (CNR), 73100 Lecce, Italy
| | - Fiorella Tonello
- Neuroscience Institute, National Research Council (CNR), 35131 Padua, Italy
| | - Rui Vitorino
- Department of Medical Sciences, iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Mark Lawrence Zammit
- Department of Clinical Pharmacology & Therapeutics, Faculty of Medicine & Surgery, University of Malta, 2090 Msida, Malta
- Malta National Poisons Centre, Malta Life Sciences Park, 3000 San Ġwann, Malta
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| |
Collapse
|
6
|
Po A, Eyers CE. Top-Down Proteomics and the Challenges of True Proteoform Characterization. J Proteome Res 2023; 22:3663-3675. [PMID: 37937372 PMCID: PMC10696603 DOI: 10.1021/acs.jproteome.3c00416] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 10/09/2023] [Accepted: 10/20/2023] [Indexed: 11/09/2023]
Abstract
Top-down proteomics (TDP) aims to identify and profile intact protein forms (proteoforms) extracted from biological samples. True proteoform characterization requires that both the base protein sequence be defined and any mass shifts identified, ideally localizing their positions within the protein sequence. Being able to fully elucidate proteoform profiles lends insight into characterizing proteoform-unique roles, and is a crucial aspect of defining protein structure-function relationships and the specific roles of different (combinations of) protein modifications. However, defining and pinpointing protein post-translational modifications (PTMs) on intact proteins remains a challenge. Characterization of (heavily) modified proteins (>∼30 kDa) remains problematic, especially when they exist in a population of similarly modified, or kindred, proteoforms. This issue is compounded as the number of modifications increases, and thus the number of theoretical combinations. Here, we present our perspective on the challenges of analyzing kindred proteoform populations, focusing on annotation of protein modifications on an "average" protein. Furthermore, we discuss the technical requirements to obtain high quality fragmentation spectral data to robustly define site-specific PTMs, and the fact that this is tempered by the time requirements necessary to separate proteoforms in advance of mass spectrometry analysis.
Collapse
Affiliation(s)
- Allen Po
- Centre
for Proteome Research, Faculty of Health & Life Sciences, University of Liverpool, Liverpool L69 7ZB, U.K.
- Department
of Biochemistry, Cell & Systems Biology, Institute of Systems,
Molecular & Integrative Biology, Faculty of Health & Life
Sciences, University of Liverpool, Liverpool L69 7ZB, U.K.
| | - Claire E. Eyers
- Centre
for Proteome Research, Faculty of Health & Life Sciences, University of Liverpool, Liverpool L69 7ZB, U.K.
- Department
of Biochemistry, Cell & Systems Biology, Institute of Systems,
Molecular & Integrative Biology, Faculty of Health & Life
Sciences, University of Liverpool, Liverpool L69 7ZB, U.K.
| |
Collapse
|
7
|
Chen W, Ding Z, Zang Y, Liu X. Characterization of Proteoform Post-Translational Modifications by Top-Down and Bottom-Up Mass Spectrometry in Conjunction with Annotations. J Proteome Res 2023; 22:3178-3189. [PMID: 37728997 PMCID: PMC10563160 DOI: 10.1021/acs.jproteome.3c00207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Indexed: 09/22/2023]
Abstract
Many proteoforms can be produced from a gene due to genetic mutations, alternative splicing, post-translational modifications (PTMs), and other variations. PTMs in proteoforms play critical roles in cell signaling, protein degradation, and other biological processes. Mass spectrometry (MS) is the primary technique for investigating PTMs in proteoforms, and two alternative MS approaches, top-down and bottom-up, have complementary strengths. The combination of the two approaches has the potential to increase the sensitivity and accuracy in PTM identification and characterization. In addition, protein and PTM knowledge bases, such as UniProt, provide valuable information for PTM characterization and verification. Here, we present a software pipeline PTM-TBA (PTM characterization by Top-down and Bottom-up MS and Annotations) for identifying and localizing PTMs in proteoforms by integrating top-down and bottom-up MS as well as PTM annotations. We assessed PTM-TBA using a technical triplicate of bottom-up and top-down MS data of SW480 cells. On average, database search of the top-down MS data identified 2000 mass shifts, 814.5 (40.7%) of which were matched to 11 common PTMs and 423 of which were localized. Of the mass shifts identified by top-down MS, PTM-TBA verified 435 mass shifts using the bottom-up MS data and UniProt annotations.
Collapse
Affiliation(s)
- Wenrong Chen
- Department
of BioHealth Informatics, Indiana University-Purdue
University Indianapolis, Indianapolis, Indiana 46202, United States
| | - Zhengming Ding
- Department
of Computer Science, Tulane School of Science and Engineering, Tulane University, New Orleans, Louisiana 70118, United States
| | - Yong Zang
- Department
of Biostatics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
- Center
for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Xiaowen Liu
- Tulane
Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, Louisiana 70112, United States
- Deming Department
of Medicine, Tulane University, New Orleans, Louisiana 70112, United States
| |
Collapse
|
8
|
Su T, Hollas MAR, Fellers RT, Kelleher NL. Identification of Splice Variants and Isoforms in Transcriptomics and Proteomics. Annu Rev Biomed Data Sci 2023; 6:357-376. [PMID: 37561601 PMCID: PMC10840079 DOI: 10.1146/annurev-biodatasci-020722-044021] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
Alternative splicing is pivotal to the regulation of gene expression and protein diversity in eukaryotic cells. The detection of alternative splicing events requires specific omics technologies. Although short-read RNA sequencing has successfully supported a plethora of investigations on alternative splicing, the emerging technologies of long-read RNA sequencing and top-down mass spectrometry open new opportunities to identify alternative splicing and protein isoforms with less ambiguity. Here, we summarize improvements in short-read RNA sequencing for alternative splicing analysis, including percent splicing index estimation and differential analysis. We also review the computational methods used in top-down proteomics analysis regarding proteoform identification, including the construction of databases of protein isoforms and statistical analyses of search results. While many improvements in sequencing and computational methods will result from emerging technologies, there should be future endeavors to increase the effectiveness, integration, and proteome coverage of alternative splicing events.
Collapse
Affiliation(s)
- Taojunfeng Su
- Department of Molecular Biosciences, Northwestern University, Evanston, Illinois, USA;
| | - Michael A R Hollas
- Proteomics Center of Excellence, Northwestern University, Evanston, Illinois, USA
| | - Ryan T Fellers
- Proteomics Center of Excellence, Northwestern University, Evanston, Illinois, USA
| | - Neil L Kelleher
- Department of Molecular Biosciences, Northwestern University, Evanston, Illinois, USA;
- Proteomics Center of Excellence, Northwestern University, Evanston, Illinois, USA
- Department of Chemistry, Northwestern University, Evanston, Illinois, USA
| |
Collapse
|
9
|
Tabb DL, Jeong K, Druart K, Gant MS, Brown KA, Nicora C, Zhou M, Couvillion S, Nakayasu E, Williams JE, Peterson HK, McGuire MK, McGuire MA, Metz TO, Chamot-Rooke J. Comparing Top-Down Proteoform Identification: Deconvolution, PrSM Overlap, and PTM Detection. J Proteome Res 2023. [PMID: 37235544 DOI: 10.1021/acs.jproteome.2c00673] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Generating top-down tandem mass spectra (MS/MS) from complex mixtures of proteoforms benefits from improvements in fractionation, separation, fragmentation, and mass analysis. The algorithms to match MS/MS to sequences have undergone a parallel evolution, with both spectral alignment and match-counting approaches producing high-quality proteoform-spectrum matches (PrSMs). This study assesses state-of-the-art algorithms for top-down identification (ProSight PD, TopPIC, MSPathFinderT, and pTop) in their yield of PrSMs while controlling false discovery rate. We evaluated deconvolution engines (ThermoFisher Xtract, Bruker AutoMSn, Matrix Science Mascot Distiller, TopFD, and FLASHDeconv) in both ThermoFisher Orbitrap-class and Bruker maXis Q-TOF data (PXD033208) to produce consistent precursor charges and mass determinations. Finally, we sought post-translational modifications (PTMs) in proteoforms from bovine milk (PXD031744) and human ovarian tissue. Contemporary identification workflows produce excellent PrSM yields, although approximately half of all identified proteoforms from these four pipelines were specific to only one workflow. Deconvolution algorithms disagree on precursor masses and charges, contributing to identification variability. Detection of PTMs is inconsistent among algorithms. In bovine milk, 18% of PrSMs produced by pTop and TopMG were singly phosphorylated, but this percentage fell to 1% for one algorithm. Applying multiple search engines produces more comprehensive assessments of experiments. Top-down algorithms would benefit from greater interoperability.
Collapse
Affiliation(s)
- David L Tabb
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| | - Kyowon Jeong
- Applied Bioinformatics, Computer Science Department, University of Tübingen, Tübingen 72076, Germany
| | - Karen Druart
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| | - Megan S Gant
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| | - Kyle A Brown
- School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin 53705, United States
| | - Carrie Nicora
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Mowei Zhou
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Sneha Couvillion
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Ernesto Nakayasu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Janet E Williams
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Haley K Peterson
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Michelle K McGuire
- Margaret Ritchie School of Family and Consumer Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Mark A McGuire
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Thomas O Metz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Julia Chamot-Rooke
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| |
Collapse
|
10
|
Chen W, Ding Z, Zang Y, Liu X. Characterization of proteoform post-translational modifications by top-down and bottom-up mass spectrometry in conjunction with UniProt annotations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.04.535618. [PMID: 37066296 PMCID: PMC10104052 DOI: 10.1101/2023.04.04.535618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Many proteoforms can be produced from a gene due to genetic mutations, alternative splicing, post-translational modifications (PTMs), and other variations. PTMs in proteoforms play critical roles in cell signaling, protein degradation, and other biological processes. Mass spectrometry (MS) is the primary technique for investigating PTMs in proteoforms, and two alternative MS approaches, top-down and bottom-up, have complementary strengths. The combination of the two approaches has the potential to increase the sensitivity and accuracy in PTM identification and characterization. In addition, protein and PTM knowledgebases, such as UniProt, provide valuable information for PTM characterization and validation. Here, we present a software pipeline called PTM-TBA (PTM characterization by Top-down, Bottom-up MS and Annotations) for identifying and localizing PTMs in proteoforms by integrating top-down and bottom-up MS as well as UniProt annotations. We identified 1,662 mass shifts from a top-down MS data set of SW480 cells, 545 (33%) of which were matched to 12 common PTMs, and 351 of which were localized. PTM-TBA validated 346 of the 1,662 mass shifts using UniProt annotations or a bottom-up MS data set of SW480 cells.
Collapse
|
11
|
Chen W, McCool EN, Sun L, Zang Y, Ning X, Liu X. Evaluation of Machine Learning Models for Proteoform Retention and Migration Time Prediction in Top-Down Mass Spectrometry. J Proteome Res 2022; 21:1736-1747. [PMID: 35616364 PMCID: PMC9250612 DOI: 10.1021/acs.jproteome.2c00124] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
Reversed-phase liquid
chromatography (RPLC) and capillary zone
electrophoresis (CZE) are two primary proteoform separation methods
in mass spectrometry (MS)-based top-down proteomics. Proteoform retention
time (RT) prediction in RPLC and migration time (MT) prediction in
CZE provide additional information for accurate proteoform identification
and quantification. While existing methods are mainly focused on peptide
RT and MT prediction in bottom-up MS, there is still a lack of methods
for proteoform RT and MT prediction in top-down MS. We systematically
evaluated eight machine learning models and a transfer learning method
for proteoform RT prediction and five models and the transfer learning
method for proteoform MT prediction. Experimental results showed that
a gated recurrent unit (GRU)-based model with transfer learning achieved
a high accuracy (R = 0.978) for proteoform RT prediction
and that the GRU-based model and a fully connected neural network
model obtained a high accuracy of R = 0.982 and 0.981
for proteoform MT prediction, respectively.
Collapse
Affiliation(s)
- Wenrong Chen
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United Staes
| | - Elijah N McCool
- Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United Staes
| | - Liangliang Sun
- Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United Staes
| | - Yong Zang
- Department of Biostatics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, Indiana 46202, United Staes
| | - Xia Ning
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio 43210, United Staes.,Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, United Staes.,Translational Data Analytics Institute, The Ohio State University, Columbus, Ohio 43210, United Staes
| | - Xiaowen Liu
- Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, Louisiana 70112, United Staes.,Deming Department of Medicine, Tulane University, New Orleans, Louisiana 70112, United Staes
| |
Collapse
|
12
|
Choi IK, Liu X. Top-Down Mass Spectrometry Data Analysis Using TopPIC Suite. Methods Mol Biol 2022; 2500:83-103. [PMID: 35657589 DOI: 10.1007/978-1-0716-2325-1_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With the advances of mass spectrometry (MS) techniques, top-down MS-based proteomics has gained increasing attention because of its advantages over bottom-up MS in studying complex proteoforms. TopPIC Suite is a widely used software package for top-down MS-based proteoform identification and quantification. Here, we present the methods for top-down MS data analysis using TopPIC Suite.
Collapse
Affiliation(s)
- In Kwon Choi
- Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, USA
| | - Xiaowen Liu
- Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, USA.
| |
Collapse
|
13
|
Choi IK, Jiang T, Kankara SR, Wu S, Liu X. TopMSV: A Web-Based Tool for Top-Down Mass Spectrometry Data Visualization. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2021; 32:1312-1318. [PMID: 33780241 PMCID: PMC8172439 DOI: 10.1021/jasms.0c00460] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Top-down mass spectrometry (MS) investigates intact proteoforms for proteoform identification, characterization, and quantification. Data visualization plays an essential role in top-down MS data analysis because proteoform identification and characterization often involve manual data inspection to determine the molecular masses of highly charged ions and validate unexpected alterations in identified proteoforms. While many software tools have been developed for MS data visualization, there is still a lack of web-based visualization software designed for top-down MS. Here, we present TopMSV, a web-based tool for top-down MS data processing and visualization. TopMSV provides interactive views of top-down MS data using a web browser. It integrates software tools for spectral deconvolution and proteoform identification and uses analysis results of the tools to annotate top-down MS data.
Collapse
Affiliation(s)
- In Kwon Choi
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Tianze Jiang
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Sreekanth Reddy Kankara
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
14
|
Protamine Characterization by Top-Down Proteomics: Boosting Proteoform Identification with DBSCAN. Proteomes 2021; 9:proteomes9020021. [PMID: 33946530 PMCID: PMC8162566 DOI: 10.3390/proteomes9020021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 04/25/2021] [Accepted: 04/27/2021] [Indexed: 12/12/2022] Open
Abstract
Protamines replace histones as the main nuclear protein in the sperm cells of many species and play a crucial role in compacting the paternal genome. Human spermatozoa contain protamine 1 (P1) and the family of protamine 2 (P2) proteins. Alterations in protamine PTMs or the P1/P2 ratio may be associated with male infertility. Top-down proteomics enables large-scale analysis of intact proteoforms derived from alternative splicing, missense or nonsense genetic variants or PTMs. In contrast to current gold standard techniques, top-down proteomics permits a more in-depth analysis of protamine PTMs and proteoforms, thereby opening up new perspectives to unravel their impact on male fertility. We report on the analysis of two normozoospermic semen samples by top-down proteomics. We discuss the difficulties encountered with the data analysis and propose solutions as this step is one of the current bottlenecks in top-down proteomics with the bioinformatics tools currently available. Our strategy for the data analysis combines two software packages, ProSight PD (PS) and TopPIC suite (TP), with a clustering algorithm to decipher protamine proteoforms. We identified up to 32 protamine proteoforms at different levels of characterization. This in-depth analysis of the protamine proteoform landscape of normozoospermic individuals represents the first step towards the future study of sperm pathological conditions opening up the potential personalized diagnosis of male infertility.
Collapse
|
15
|
Chen W, Liu X. Proteoform Identification by Combining RNA-Seq and Top-Down Mass Spectrometry. J Proteome Res 2020; 20:261-269. [PMID: 33183009 DOI: 10.1021/acs.jproteome.0c00369] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
In proteogenomic studies, genomic and transcriptomic variants are incorporated into customized protein databases for the identification of proteoforms, especially proteoforms with sample-specific variants. Most proteogenomic research has been focused on combining genomic or transcriptomic data with bottom-up mass spectrometry data. In the last decade, top-down mass spectrometry has attracted increasing attention because of its capacity to identify various proteoforms with alterations. However, top-down proteogenomics, in which genomic or transcriptomic data are combined with top-down mass spectrometry data, has not been widely adopted, and there is still a lack of software tools for top-down proteogenomic data analysis. In this paper, we introduce TopPG, a proteogenomic tool for generating proteoform sequence databases with genetic alterations and alternative splicing events. Experiments on top-down proteogenomic data of DLD-1 colorectal cancer cells showed that TopPG coupled with database search confidently identified proteoforms with sample-specific alterations.
Collapse
Affiliation(s)
- Wenrong Chen
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| |
Collapse
|
16
|
Choi IK, Abeysinghe E, Coulter E, Marru S, Pierce M, Liu X. TopPIC Gateway: A Web Gateway for Top-Down Mass Spectrometry Data Interpretation. PEARC20 : PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING 2020 : CATCH THE WAVE : JULY 27-31, 2020, PORTLAND, OR VIRTUAL CONFERENCE. PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING (CONFERENCE) (2020 : ONLINE) 2020; 2020:461-464. [PMID: 35615582 PMCID: PMC9128478 DOI: 10.1145/3311790.3400853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Top-down mass spectrometry-based proteomics has become the method of choice for identifying and quantifying intact proteoforms in biological samples. We present a web-based gateway for TopPIC suite, a widely used software suite consisting of four software tools for top-down mass spectrometry data interpretation: TopFD, TopPIC, TopMG, and TopDiff. The gateway enables the community to use heterogeneous collection of computing resources that includes high performance computing clusters at Indiana University and virtual clusters on XSEDE's Jetstream Cloud resource for top-down mass spectral data analysis using TopPIC suite. The gateway will be a useful resource for proteomics researchers and students who have limited access to high-performance computing resources or who are not familiar with interacting with server-side supercomputers.
Collapse
Affiliation(s)
- In Kwon Choi
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis
| | - Eroma Abeysinghe
- Cyberinfrastructure Integration Research Center, Indiana University
| | - Eric Coulter
- Cyberinfrastructure Integration Research Center, Indiana University
| | - Suresh Marru
- Cyberinfrastructure Integration Research Center, Indiana University
| | - Marlon Pierce
- Cyberinfrastructure Integration Research Center, Indiana University
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis
| |
Collapse
|
17
|
Hale OJ, Cooper HJ. In situ mass spectrometry analysis of intact proteins and protein complexes from biological substrates. Biochem Soc Trans 2020; 48:317-326. [PMID: 32010951 PMCID: PMC7054757 DOI: 10.1042/bst20190793] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 01/09/2020] [Accepted: 01/09/2020] [Indexed: 12/15/2022]
Abstract
Advances in sample preparation, ion sources and mass spectrometer technology have enabled the detection and characterisation of intact proteins. The challenges associated include an appropriately soft ionisation event, efficient transmission and detection of the often delicate macromolecules. Ambient ion sources, in particular, offer a wealth of strategies for analysis of proteins from solution environments, and directly from biological substrates. The last two decades have seen rapid development in this area. Innovations include liquid extraction surface analysis, desorption electrospray ionisation and nanospray desorption electrospray ionisation. Similarly, developments in native mass spectrometry allow protein-protein and protein-ligand complexes to be ionised and analysed. Identification and characterisation of these large ions involves a suite of hyphenated mass spectrometry techniques, often including the coupling of ion mobility spectrometry and fragmentation techniques. The latter include collision, electron and photon-induced methods, each with their own characteristics and benefits for intact protein identification. In this review, recent developments for in situ protein analysis are explored, with a focus on ion sources and tandem mass spectrometry techniques used for identification.
Collapse
Affiliation(s)
- Oliver J. Hale
- School of Biosciences, University of Birmingham, Edgbaston B15 2TT, U.K
| | - Helen J. Cooper
- School of Biosciences, University of Birmingham, Edgbaston B15 2TT, U.K
| |
Collapse
|
18
|
Schaffer LV, Millikin RJ, Miller RM, Anderson LC, Fellers RT, Ge Y, Kelleher NL, LeDuc RD, Liu X, Payne SH, Sun L, Thomas PM, Tucholski T, Wang Z, Wu S, Wu Z, Yu D, Shortreed MR, Smith LM. Identification and Quantification of Proteoforms by Mass Spectrometry. Proteomics 2019; 19:e1800361. [PMID: 31050378 PMCID: PMC6602557 DOI: 10.1002/pmic.201800361] [Citation(s) in RCA: 134] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 04/07/2019] [Indexed: 12/29/2022]
Abstract
A proteoform is a defined form of a protein derived from a given gene with a specific amino acid sequence and localized post-translational modifications. In top-down proteomic analyses, proteoforms are identified and quantified through mass spectrometric analysis of intact proteins. Recent technological developments have enabled comprehensive proteoform analyses in complex samples, and an increasing number of laboratories are adopting top-down proteomic workflows. In this review, some recent advances are outlined and current challenges and future directions for the field are discussed.
Collapse
Affiliation(s)
- Leah V Schaffer
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Robert J Millikin
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Rachel M Miller
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Lissa C Anderson
- Ion Cyclotron Resonance Program, National High Magnetic Field Laboratory, Tallahassee, FL, 32310, USA
| | - Ryan T Fellers
- Proteomics Center of Excellence, Northwestern University, Evanston, IL, 60208, USA
| | - Ying Ge
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Department of Cell and Regenerative Biology and Human Proteomics Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Neil L Kelleher
- Proteomics Center of Excellence, Northwestern University, Evanston, IL, 60208, USA
- Department of Chemistry and Molecular Biosciences and the Division of Hematology and Oncology, Northwestern University, Evanston, IL, 60208, USA
| | - Richard D LeDuc
- Proteomics Center of Excellence, Northwestern University, Evanston, IL, 60208, USA
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University, Indianapolis, IN, 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Samuel H Payne
- Department of Biology, Brigham Young University, Provo, UT, 84602
| | - Liangliang Sun
- Department of Chemistry, Michigan State University, East Lansing, MI, 48824, USA
| | - Paul M Thomas
- Proteomics Center of Excellence, Northwestern University, Evanston, IL, 60208, USA
| | - Trisha Tucholski
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Zhe Wang
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, 73019, USA
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, 73019, USA
| | - Zhijie Wu
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Dahang Yu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, 73019, USA
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| |
Collapse
|
19
|
Kou Q, Wang Z, Lubeckyj RA, Wu S, Sun L, Liu X. A Markov Chain Monte Carlo Method for Estimating the Statistical Significance of Proteoform Identifications by Top-Down Mass Spectrometry. J Proteome Res 2019; 18:878-889. [PMID: 30638379 DOI: 10.1021/acs.jproteome.8b00562] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Top-down mass spectrometry is capable of identifying whole proteoform sequences with multiple post-translational modifications because it generates tandem mass spectra directly from intact proteoforms. Many software tools, such as ProSightPC, MSPathFinder, and TopMG, have been proposed for identifying proteoforms with modifications. In these tools, various methods are employed to estimate the statistical significance of identifications. However, most existing methods are designed for proteoform identifications without modifications, and the challenge remains for accurately estimating the statistical significance of proteoform identifications with modifications. Here we propose TopMCMC, a method that combines a Markov chain random walk algorithm and a greedy algorithm for assigning statistical significance to matches between spectra and protein sequences with variable modifications. Experimental results showed that TopMCMC achieved high accuracy in estimating E-values and false discovery rates of identifications in top-down mass spectrometry. Coupled with TopMG, TopMCMC identified more spectra than the generating function method from an MCF-7 top-down mass spectrometry data set.
Collapse
Affiliation(s)
- Qiang Kou
- Department of BioHealth Informatics , Indiana University-Purdue University Indianapolis , Indianapolis , Indiana 46202 , United States
| | - Zhe Wang
- Department of Chemistry and Biochemistry , The University of Oklahoma , Norman , Oklahoma 73019-5251 , United States
| | - Rachele A Lubeckyj
- Department of Chemistry , Michigan State University , East Lansing , Michigan 48824-1332 , United States
| | - Si Wu
- Department of Chemistry and Biochemistry , The University of Oklahoma , Norman , Oklahoma 73019-5251 , United States
| | - Liangliang Sun
- Department of Chemistry , Michigan State University , East Lansing , Michigan 48824-1332 , United States
| | - Xiaowen Liu
- Department of BioHealth Informatics , Indiana University-Purdue University Indianapolis , Indianapolis , Indiana 46202 , United States.,Center for Computational Biology and Bioinformatics , Indiana University School of Medicine , Indianapolis , Indiana 46202 , United States
| |
Collapse
|
20
|
Li Z, He B, Kou Q, Wang Z, Wu S, Liu Y, Feng W, Liu X. Evaluation of top-down mass spectral identification with homologous protein sequences. BMC Bioinformatics 2018; 19:494. [PMID: 30591035 PMCID: PMC6309053 DOI: 10.1186/s12859-018-2462-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Top-down mass spectrometry has unique advantages in identifying proteoforms with multiple post-translational modifications and/or unknown alterations. Most software tools in this area search top-down mass spectra against a protein sequence database for proteoform identification. When the species studied in a mass spectrometry experiment lacks its proteome sequence database, a homologous protein sequence database can be used for proteoform identification. The accuracy of homologous protein sequences affects the sensitivity of proteoform identification and the accuracy of mass shift localization. RESULTS We tested TopPIC, a commonly used software tool for top-down mass spectral identification, on a top-down mass spectrometry data set of Escherichia coli K12 MG1655, and evaluated its performance using an Escherichia coli K12 MG1655 proteome database and a homologous protein database. The number of identified spectra with the homologous database was about half of that with the Escherichia coli K12 MG1655 database. We also tested TopPIC on a top-down mass spectrometry data set of human MCF-7 cells and obtained similar results. CONCLUSIONS Experimental results demonstrated that TopPIC is capable of identifying many proteoform spectrum matches and localizing unknown alterations using homologous protein sequences containing no more than 2 mutations.
Collapse
Affiliation(s)
- Ziwei Li
- College of Automation, Harbin Engineering University, 145, Nan Tong Street, Harbin, Heilongjiang, 150001 China
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
| | - Bo He
- College of Automation, Harbin Engineering University, 145, Nan Tong Street, Harbin, Heilongjiang, 150001 China
| | - Qiang Kou
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 719 Indiana Avenue, Indianapolis, IN, 46202 USA
| | - Zhe Wang
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Parkway, Norman, OK, 73019 USA
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Parkway, Norman, OK, 73019 USA
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
| | - Weixing Feng
- College of Automation, Harbin Engineering University, 145, Nan Tong Street, Harbin, Heilongjiang, 150001 China
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 719 Indiana Avenue, Indianapolis, IN, 46202 USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
| |
Collapse
|
21
|
Schaffer LV, Rensvold JW, Shortreed MR, Cesnik AJ, Jochem A, Scalf M, Frey BL, Pagliarini DJ, Smith LM. Identification and Quantification of Murine Mitochondrial Proteoforms Using an Integrated Top-Down and Intact-Mass Strategy. J Proteome Res 2018; 17:3526-3536. [PMID: 30180576 PMCID: PMC6201694 DOI: 10.1021/acs.jproteome.8b00469] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The development of effective strategies for the comprehensive identification and quantification of proteoforms in complex systems is a critical challenge in proteomics. Proteoforms, the specific molecular forms in which proteins are present in biological systems, are the key effectors of biological function. Thus, knowledge of proteoform identities and abundances is essential to unraveling the mechanisms that underlie protein function. We recently reported a strategy that integrates conventional top-down mass spectrometry with intact-mass determinations for enhanced proteoform identifications and the elucidation of proteoform families and applied it to the analysis of yeast cell lysate. In the present work, we extend this strategy to enable quantification of proteoforms, and we examine changes in the abundance of murine mitochondrial proteoforms upon differentiation of mouse myoblasts to myotubes. The integrated top-down and intact-mass strategy provided an increase of ∼37% in the number of identified proteoforms compared to top-down alone, which is in agreement with our previous work in yeast; 1779 unique proteoforms were identified using the integrated strategy compared to 1301 using top-down analysis alone. Quantitative comparison of proteoform differences between the myoblast and myotube cell types showed 129 observed proteoforms exhibiting statistically significant abundance changes (fold change >2 and false discovery rate <5%).
Collapse
Affiliation(s)
- Leah V. Schaffer
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | | | - Michael R. Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Anthony J. Cesnik
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Adam Jochem
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Mark Scalf
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Brian L. Frey
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - David J. Pagliarini
- Morgridge Institute for Research, Madison, WI 53715, USA
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Lloyd M. Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
22
|
Yang R, Zhu D. A graph-based filtering method for top-down mass spectral identification. BMC Genomics 2018; 19:666. [PMID: 30255788 PMCID: PMC6157290 DOI: 10.1186/s12864-018-5026-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Database search has been the main approach for proteoform identification by top-down tandem mass spectrometry. However, when the target proteoform that produced the spectrum contains post-translational modifications (PTMs) and/or mutations, it is quite time consuming to align a query spectrum against all protein sequences without any PTMs and mutations in a large database. Consequently, it is essential to develop efficient and sensitive filtering algorithms for speeding up database search. RESULTS In this paper, we propose a spectrum graph matching (SGM) based protein sequence filtering method for top-down mass spectral identification. It uses the subspectra of a query spectrum to generate spectrum graphs and searches them against a protein database to report the best candidates. As the sequence tag and gaped tag approaches need the preprocessing step to extract and select tags, the SGM filtering method circumvents this preprocessing step, thus simplifying data processing. We evaluated the filtration efficiency of the SGM filtering method with various parameter settings on an Escherichia coli top-down mass spectrometry data set and compared the performances of the SGM filtering method and two tag-based filtering methods on a data set of MCF-7 cells. CONCLUSIONS Experimental results on the data sets show that the SGM filtering method achieves high sensitivity in protein sequence filtration. When coupled with a spectral alignment algorithm, the SGM filtering method significantly increases the number of identified proteoform spectrum-matches compared with the tag-based methods in top-down mass spectrometry data analysis.
Collapse
Affiliation(s)
- Runmin Yang
- School of Computer Science and Technology, Shandong University, 1500, Shun Hua Lu, Jinan, 250101, China
| | - Daming Zhu
- School of Computer Science and Technology, Shandong University, 1500, Shun Hua Lu, Jinan, 250101, China.
| |
Collapse
|
23
|
Zhu K, Liu X. A graph-based approach for proteoform identification and quantification using top-down homogeneous multiplexed tandem mass spectra. BMC Bioinformatics 2018; 19:280. [PMID: 30367573 PMCID: PMC6101081 DOI: 10.1186/s12859-018-2273-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Background Top-down homogeneous multiplexed tandem mass (HomMTM) spectra are generated from modified proteoforms of the same protein with different post-translational modification patterns. They are frequently observed in the analysis of ultramodified proteins, some proteoforms of which have similar molecular weights and cannot be well separated by liquid chromatography in mass spectrometry analysis. Results We formulate the top-down HomMTM spectral identification problem as the minimum error k-splittable flow problem on graphs and propose a graph-based algorithm for the identification and quantification of proteoforms using top-down HomMTM spectra. Conclusions Experiments on a top-down mass spectrometry data set of the histone H4 protein showed that the proposed method identified many proteoform pairs that better explain the query spectra than single proteoforms. Electronic supplementary material The online version of this article (10.1186/s12859-018-2273-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kaiyuan Zhu
- Department of Computer Science, Indiana University Bloomington, 700 N. Woodlawn Avenue, Bloomington, IN, 47408, USA
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 719 Indiana Avenue, Indianapolis, IN, 46202, USA. .,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 W. 10th Street, Indianapolis, IN, 46202, USA.
| |
Collapse
|
24
|
Yang R, Zhu D, Kou Q, Bhat-Nakshatri P, Nakshatri H, Wu S, Liu X. A Spectrum Graph-Based Protein Sequence Filtering Algorithm for Proteoform Identification by Top-Down Mass Spectrometry. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2018; 2017:222-229. [PMID: 29503761 DOI: 10.1109/bibm.2017.8217653] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Database search is the main approach for identifying proteoforms using top-down tandem mass spectra. However, it is extremely slow to align a query spectrum against all protein sequences in a large database when the target proteoform that produced the spectrum contains post-translational modifications and/or mutations. As a result, efficient and sensitive protein sequence filtering algorithms are essential for speeding up database search. In this paper, we propose a novel filtering algorithm, which generates spectrum graphs from subspectra of the query spectrum and searches them against the protein database to find good candidates. Compared with the sequence tag and gaped tag approaches, the proposed method circumvents the step of tag extraction, thus simplifying data processing. Experimental results on real data showed that the proposed method achieved both high speed and high sensitivity in protein sequence filtration.
Collapse
Affiliation(s)
- Runmin Yang
- School of Computer Science and Technology, Shandong University.,Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis
| | - Daming Zhu
- School of Computer Science and Technology, Shandong University
| | - Qiang Kou
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis
| | | | | | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine
| |
Collapse
|
25
|
Kou Q, Wu S, Liu X. Systematic Evaluation of Protein Sequence Filtering Algorithms for Proteoform Identification Using Top-Down Mass Spectrometry. Proteomics 2018; 18. [PMID: 29327814 DOI: 10.1002/pmic.201700306] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Revised: 11/20/2017] [Indexed: 01/19/2023]
Abstract
Complex proteoforms contain various primary structural alterations resulting from variations in genes, RNA, and proteins. Top-down mass spectrometry is commonly used for analyzing complex proteoforms because it provides whole sequence information of the proteoforms. Proteoform identification by top-down mass spectral database search is a challenging computational problem because the types and/or locations of some alterations in target proteoforms are in general unknown. Although spectral alignment and mass graph alignment algorithms have been proposed for identifying proteoforms with unknown alterations, they are extremely slow to align millions of spectra against tens of thousands of protein sequences in high throughput proteome level analyses. Many software tools in this area combine efficient protein sequence filtering algorithms and spectral alignment algorithms to speed up database search. As a result, the performance of these tools heavily relies on the sensitivity and efficiency of their filtering algorithms. Here, we propose two efficient approximate spectrum-based filtering algorithms for proteoform identification. We evaluated the performances of the proposed algorithms and four existing ones on simulated and real top-down mass spectrometry data sets. Experiments showed that the proposed algorithms outperformed the existing ones for complex proteoform identification. In addition, combining the proposed filtering algorithms and mass graph alignment algorithms identified many proteoforms missed by ProSightPC in proteome-level proteoform analyses.
Collapse
Affiliation(s)
- Qiang Kou
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, USA
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, USA
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, USA.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
26
|
Affiliation(s)
- Bifan Chen
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Kyle A. Brown
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Ziqing Lin
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Human Proteomics Program, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Ying Ge
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Human Proteomics Program, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
27
|
Schaffer LV, Shortreed MR, Cesnik AJ, Frey BL, Solntsev SK, Scalf M, Smith LM. Expanding Proteoform Identifications in Top-Down Proteomic Analyses by Constructing Proteoform Families. Anal Chem 2017; 90:1325-1333. [PMID: 29227670 DOI: 10.1021/acs.analchem.7b04221] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
In top-down proteomics, intact proteins are analyzed by tandem mass spectrometry and proteoforms, which are defined forms of a protein with specific sequences of amino acids and localized post-translational modifications, are identified using precursor mass and fragmentation data. Many proteoforms that are detected in the precursor scan (MS1) are not selected for fragmentation by the instrument and therefore remain unidentified in typical top-down proteomic workflows. Our laboratory has developed the open source software program Proteoform Suite to analyze MS1-only intact proteoform data. Here, we have adapted it to provide identifications of proteoform masses in precursor MS1 spectra of top-down data, supplementing the top-down identifications obtained using the MS2 fragmentation data. Proteoform Suite performs mass calibration using high-scoring top-down identifications and identifies additional proteoforms using calibrated, accurate intact masses. Proteoform families, the set of proteoforms from a given gene, are constructed and visualized from proteoforms identified by both top-down and intact-mass analyses. Using this strategy, we constructed proteoform families and identified 1861 proteoforms in yeast lysate, yielding an approximately 40% increase over the original 1291 proteoform identifications observed using traditional top-down analysis alone.
Collapse
Affiliation(s)
- Leah V Schaffer
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Anthony J Cesnik
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Brian L Frey
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Stefan K Solntsev
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Mark Scalf
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States.,Genome Center of Wisconsin, University of Wisconsin , 425G Henry Mall, Room 3420, Madison, Wisconsin 53706, United States
| |
Collapse
|