1
|
Xu T, Wang Q, Wang Q, Sun L. Mass spectrometry-intensive top-down proteomics: an update on technology advancements and biomedical applications. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2024. [PMID: 38973469 DOI: 10.1039/d4ay00651h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/09/2024]
Abstract
Proteoforms are all forms of protein molecules from the same gene because of variations at the DNA, RNA, and protein levels, e.g., alternative splicing and post-translational modifications (PTMs). Delineation of proteins in a proteoform-specific manner is crucial for understanding their biological functions. Mass spectrometry (MS)-intensive top-down proteomics (TDP) is promising for comprehensively characterizing intact proteoforms in complex biological systems. It has achieved substantial progress in technological development, including sample preparation, proteoform separations, MS instrumentation, and bioinformatics tools. In a single TDP study, thousands of proteoforms can be identified and quantified from a cell lysate. It has also been applied to various biomedical research to better our understanding of protein function in regulating cellular processes and to discover novel proteoform biomarkers of diseases for early diagnosis and therapeutic development. This review covers the most recent technological development and biomedical applications of MS-intensive TDP.
Collapse
Affiliation(s)
- Tian Xu
- Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Lansing, MI 48824, USA.
| | - Qianjie Wang
- Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Lansing, MI 48824, USA.
| | - Qianyi Wang
- Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Lansing, MI 48824, USA.
| | - Liangliang Sun
- Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Lansing, MI 48824, USA.
| |
Collapse
|
2
|
Wang CR, McFarlane LO, Pukala TL. Exploring snake venoms beyond the primary sequence: From proteoforms to protein-protein interactions. Toxicon 2024; 247:107841. [PMID: 38950738 DOI: 10.1016/j.toxicon.2024.107841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 06/26/2024] [Accepted: 06/28/2024] [Indexed: 07/03/2024]
Abstract
Snakebite envenomation has been a long-standing global issue that is difficult to treat, largely owing to the flawed nature of current immunoglobulin-based antivenom therapy and the complexity of snake venoms as sophisticated mixtures of bioactive proteins and peptides. Comprehensive characterisation of venom compositions is essential to better understanding snake venom toxicity and inform effective and rationally designed antivenoms. Additionally, a greater understanding of snake venom composition will likely unearth novel biologically active proteins and peptides that have promising therapeutic or biotechnological applications. While a bottom-up proteomic workflow has been the main approach for cataloguing snake venom compositions at the toxin family level, it is unable to capture snake venom heterogeneity in the form of protein isoforms and higher-order protein interactions that are important in driving venom toxicity but remain underexplored. This review aims to highlight the importance of understanding snake venom heterogeneity beyond the primary sequence, in the form of post-translational modifications that give rise to different proteoforms and the myriad of higher-order protein complexes in snake venoms. We focus on current top-down proteomic workflows to identify snake venom proteoforms and further discuss alternative or novel separation, instrumentation, and data processing strategies that may improve proteoform identification. The current higher-order structural characterisation techniques implemented for snake venom proteins are also discussed; we emphasise the need for complementary and higher resolution structural bioanalytical techniques such as mass spectrometry-based approaches, X-ray crystallography and cryogenic electron microscopy, to elucidate poorly characterised tertiary and quaternary protein structures. We envisage that the expansion of the snake venom characterisation "toolbox" with top-down proteomics and high-resolution protein structure determination techniques will be pivotal in advancing structural understanding of snake venoms towards the development of improved therapeutic and biotechnology applications.
Collapse
Affiliation(s)
- C Ruth Wang
- Discipline of Chemistry, School of Physics, Chemistry and Earth Sciences, The University of Adelaide, Adelaide, 5005, Australia
| | - Lewis O McFarlane
- Discipline of Chemistry, School of Physics, Chemistry and Earth Sciences, The University of Adelaide, Adelaide, 5005, Australia
| | - Tara L Pukala
- Discipline of Chemistry, School of Physics, Chemistry and Earth Sciences, The University of Adelaide, Adelaide, 5005, Australia.
| |
Collapse
|
3
|
Carr AV, Bollis NE, Pavek JG, Shortreed MR, Smith LM. Spectral averaging with outlier rejection algorithms to increase identifications in top-down proteomics. Proteomics 2024; 24:e2300234. [PMID: 38487981 PMCID: PMC11216233 DOI: 10.1002/pmic.202300234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 02/15/2024] [Accepted: 02/29/2024] [Indexed: 04/05/2024]
Abstract
The identification of proteoforms by top-down proteomics requires both high quality fragmentation spectra and the neutral mass of the proteoform from which the fragments derive. Intact proteoform spectra can be highly complex and may include multiple overlapping proteoforms, as well as many isotopic peaks and charge states. The resulting lower signal-to-noise ratios for intact proteins complicates downstream analyses such as deconvolution. Averaging multiple scans is a common way to improve signal-to-noise, but mass spectrometry data contains artifacts unique to it that can degrade the quality of an averaged spectra. To overcome these limitations and increase signal-to-noise, we have implemented outlier rejection algorithms to remove outlier measurements efficiently and robustly in a set of MS1 scans prior to averaging. We have implemented averaging with rejection algorithms in the open-source, freely available, proteomics search engine MetaMorpheus. Herein, we report the application of the averaging with rejection algorithms to direct injection and online liquid chromatography mass spectrometry data. Averaging with rejection algorithms demonstrated a 45% increase in the number of proteoforms detected in Jurkat T cell lysate. We show that the increase is due to improved spectral quality, particularly in regions surrounding isotopic envelopes.
Collapse
Affiliation(s)
- Austin V Carr
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Nicholas E Bollis
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - John G Pavek
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA
| |
Collapse
|
4
|
Su T, Hollas MAR, Fellers RT, Kelleher NL. Identification of Splice Variants and Isoforms in Transcriptomics and Proteomics. Annu Rev Biomed Data Sci 2023; 6:357-376. [PMID: 37561601 PMCID: PMC10840079 DOI: 10.1146/annurev-biodatasci-020722-044021] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
Alternative splicing is pivotal to the regulation of gene expression and protein diversity in eukaryotic cells. The detection of alternative splicing events requires specific omics technologies. Although short-read RNA sequencing has successfully supported a plethora of investigations on alternative splicing, the emerging technologies of long-read RNA sequencing and top-down mass spectrometry open new opportunities to identify alternative splicing and protein isoforms with less ambiguity. Here, we summarize improvements in short-read RNA sequencing for alternative splicing analysis, including percent splicing index estimation and differential analysis. We also review the computational methods used in top-down proteomics analysis regarding proteoform identification, including the construction of databases of protein isoforms and statistical analyses of search results. While many improvements in sequencing and computational methods will result from emerging technologies, there should be future endeavors to increase the effectiveness, integration, and proteome coverage of alternative splicing events.
Collapse
Affiliation(s)
- Taojunfeng Su
- Department of Molecular Biosciences, Northwestern University, Evanston, Illinois, USA;
| | - Michael A R Hollas
- Proteomics Center of Excellence, Northwestern University, Evanston, Illinois, USA
| | - Ryan T Fellers
- Proteomics Center of Excellence, Northwestern University, Evanston, Illinois, USA
| | - Neil L Kelleher
- Department of Molecular Biosciences, Northwestern University, Evanston, Illinois, USA;
- Proteomics Center of Excellence, Northwestern University, Evanston, Illinois, USA
- Department of Chemistry, Northwestern University, Evanston, Illinois, USA
| |
Collapse
|
5
|
Larson EJ, Pergande MR, Moss ME, Rossler KJ, Wenger RK, Krichel B, Josyer H, Melby JA, Roberts DS, Pike K, Shi Z, Chan HJ, Knight B, Rogers HT, Brown KA, Ong IM, Jeong K, Marty MT, McIlwain SJ, Ge Y. MASH Native: a unified solution for native top-down proteomics data processing. Bioinformatics 2023; 39:btad359. [PMID: 37294807 PMCID: PMC10283151 DOI: 10.1093/bioinformatics/btad359] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 04/13/2023] [Accepted: 06/07/2023] [Indexed: 06/11/2023] Open
Abstract
MOTIVATION Native top-down proteomics (nTDP) integrates native mass spectrometry (nMS) with top-down proteomics (TDP) to provide comprehensive analysis of protein complexes together with proteoform identification and characterization. Despite significant advances in nMS and TDP software developments, a unified and user-friendly software package for analysis of nTDP data remains lacking. RESULTS We have developed MASH Native to provide a unified solution for nTDP to process complex datasets with database searching capabilities in a user-friendly interface. MASH Native supports various data formats and incorporates multiple options for deconvolution, database searching, and spectral summing to provide a "one-stop shop" for characterizing both native protein complexes and proteoforms. AVAILABILITY AND IMPLEMENTATION The MASH Native app, video tutorials, written tutorials, and additional documentation are freely available for download at https://labs.wisc.edu/gelab/MASH_Explorer/MASHSoftware.php. All data files shown in user tutorials are included with the MASH Native software in the download .zip file.
Collapse
Affiliation(s)
- Eli J Larson
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Melissa R Pergande
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Michelle E Moss
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Kalina J Rossler
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - R Kent Wenger
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
- Human Proteomics Program, School of Medicine and Public Health, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Boris Krichel
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Harini Josyer
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Jake A Melby
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - David S Roberts
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Kyndalanne Pike
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Zhuoxin Shi
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Hsin-Ju Chan
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Bridget Knight
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Holden T Rogers
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Kyle A Brown
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Irene M Ong
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI 53705, United States
- University of Wisconsin Carbone Cancer Center, University of Wisconsin-Madison, Madison, WI 53705, United States
- Department of Obstetrics and Gynecology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Kyowon Jeong
- Department of Applied Bioinformatics, University of Tübingen, Tübingen 72704, Germany
| | - Michael T Marty
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85719, United States
| | - Sean J McIlwain
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI 53705, United States
- University of Wisconsin Carbone Cancer Center, University of Wisconsin-Madison, Madison, WI 53705, United States
| | - Ying Ge
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
- Human Proteomics Program, School of Medicine and Public Health, University of Wisconsin–Madison, Madison, WI 53705, United States
| |
Collapse
|
6
|
Tabb DL, Jeong K, Druart K, Gant MS, Brown KA, Nicora C, Zhou M, Couvillion S, Nakayasu E, Williams JE, Peterson HK, McGuire MK, McGuire MA, Metz TO, Chamot-Rooke J. Comparing Top-Down Proteoform Identification: Deconvolution, PrSM Overlap, and PTM Detection. J Proteome Res 2023. [PMID: 37235544 DOI: 10.1021/acs.jproteome.2c00673] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Generating top-down tandem mass spectra (MS/MS) from complex mixtures of proteoforms benefits from improvements in fractionation, separation, fragmentation, and mass analysis. The algorithms to match MS/MS to sequences have undergone a parallel evolution, with both spectral alignment and match-counting approaches producing high-quality proteoform-spectrum matches (PrSMs). This study assesses state-of-the-art algorithms for top-down identification (ProSight PD, TopPIC, MSPathFinderT, and pTop) in their yield of PrSMs while controlling false discovery rate. We evaluated deconvolution engines (ThermoFisher Xtract, Bruker AutoMSn, Matrix Science Mascot Distiller, TopFD, and FLASHDeconv) in both ThermoFisher Orbitrap-class and Bruker maXis Q-TOF data (PXD033208) to produce consistent precursor charges and mass determinations. Finally, we sought post-translational modifications (PTMs) in proteoforms from bovine milk (PXD031744) and human ovarian tissue. Contemporary identification workflows produce excellent PrSM yields, although approximately half of all identified proteoforms from these four pipelines were specific to only one workflow. Deconvolution algorithms disagree on precursor masses and charges, contributing to identification variability. Detection of PTMs is inconsistent among algorithms. In bovine milk, 18% of PrSMs produced by pTop and TopMG were singly phosphorylated, but this percentage fell to 1% for one algorithm. Applying multiple search engines produces more comprehensive assessments of experiments. Top-down algorithms would benefit from greater interoperability.
Collapse
Affiliation(s)
- David L Tabb
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| | - Kyowon Jeong
- Applied Bioinformatics, Computer Science Department, University of Tübingen, Tübingen 72076, Germany
| | - Karen Druart
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| | - Megan S Gant
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| | - Kyle A Brown
- School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin 53705, United States
| | - Carrie Nicora
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Mowei Zhou
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Sneha Couvillion
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Ernesto Nakayasu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Janet E Williams
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Haley K Peterson
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Michelle K McGuire
- Margaret Ritchie School of Family and Consumer Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Mark A McGuire
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Thomas O Metz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Julia Chamot-Rooke
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| |
Collapse
|
7
|
Spatially Resolved Top-Down Proteomics of Tissue Sections Based on a Microfluidic Nanodroplet Sample Preparation Platform. Mol Cell Proteomics 2023; 22:100491. [PMID: 36603806 PMCID: PMC9944986 DOI: 10.1016/j.mcpro.2022.100491] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 12/10/2022] [Accepted: 12/20/2022] [Indexed: 01/04/2023] Open
Abstract
Conventional proteomic approaches measure the averaged signal from mixed cell populations or bulk tissues, leading to the dilution of signals arising from subpopulations of cells that might serve as important biomarkers. Recent developments in bottom-up proteomics have enabled spatial mapping of cellular heterogeneity in tissue microenvironments. However, bottom-up proteomics cannot unambiguously define and quantify proteoforms, which are intact (i.e., functional) forms of proteins capturing genetic variations, alternatively spliced transcripts and posttranslational modifications. Herein, we described a spatially resolved top-down proteomics (TDP) platform for proteoform identification and quantitation directly from tissue sections. The spatial TDP platform consisted of a nanodroplet processing in one pot for trace samples-based sample preparation system and an laser capture microdissection-based cell isolation system. We improved the nanodroplet processing in one pot for trace samples sample preparation by adding benzonase in the extraction buffer to enhance the coverage of nucleus proteins. Using ∼200 cultured cells as test samples, this approach increased total proteoform identifications from 493 to 700; with newly identified proteoforms primarily corresponding to nuclear proteins. To demonstrate the spatial TDP platform in tissue samples, we analyzed laser capture microdissection-isolated tissue voxels from rat brain cortex and hypothalamus regions. We quantified 509 proteoforms within the union of top-down mass spectrometry-based proteoform identification and characterization and TDPortal identifications to match with features from protein mass extractor. Several proteoforms corresponding to the same gene exhibited mixed abundance profiles between two tissue regions, suggesting potential posttranslational modification-specific spatial distributions. The spatial TDP workflow has prospects for biomarker discovery at proteoform level from small tissue sections.
Collapse
|
8
|
Larson EJ, Pergande MR, Moss ME, Rossler KJ, Wenger RK, Krichel B, Josyer H, Melby JA, Roberts DS, Pike K, Shi Z, Chan HJ, Knight B, Rogers HT, Brown KA, Ong IM, Jeong K, Marty M, McIlwain SJ, Ge Y. MASH Native: A Unified Solution for Native Top-Down Proteomics Data Processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.02.522513. [PMID: 36711733 PMCID: PMC9881860 DOI: 10.1101/2023.01.02.522513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Native top-down proteomics (nTDP) integrates native mass spectrometry (nMS) with top-down proteomics (TDP) to provide comprehensive analysis of protein complexes together with proteoform identification and characterization. Despite significant advances in nMS and TDP software developments, a unified and user-friendly software package for analysis of nTDP data remains lacking. Herein, we have developed MASH Native to provide a unified solution for nTDP to process complex datasets with database searching capabilities in a user-friendly interface. MASH Native supports various data formats and incorporates multiple options for deconvolution, database searching, and spectral summing to provide a one-stop shop for characterizing both native protein complexes and proteoforms. The MASH Native app, video tutorials, written tutorials and additional documentation are freely available for download at https://labs.wisc.edu/gelab/MASH_Explorer/MASHNativeSoftware.php . All data files shown in user tutorials are included with the MASH Native software in the download .zip file.
Collapse
|
9
|
Desaire H, Go EP, Hua D. Advances, obstacles, and opportunities for machine learning in proteomics. CELL REPORTS. PHYSICAL SCIENCE 2022; 3:101069. [PMID: 36381226 PMCID: PMC9648337 DOI: 10.1016/j.xcrp.2022.101069] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The fields of proteomics and machine learning are both large disciplines, each producing well over 5,000 publications per year. However, studies combining both fields are still relatively rare, with only about 2% of recent proteomics papers including machine learning. This review, which focuses on the intersection of the fields, is intended to inspire proteomics researchers to develop skills and knowledge in the application of machine learning. A brief tutorial introduction to machine learning is provided, and research advances that rely on both fields, particularly as they relate to proteomics tools development and biomarker discovery, are highlighted. Key knowledge gaps and opportunities for scientific advancement are also enumerated.
Collapse
Affiliation(s)
- Heather Desaire
- Department of Chemistry, University of Kansas, Lawrence, KS 66045, USA
| | - Eden P. Go
- Department of Chemistry, University of Kansas, Lawrence, KS 66045, USA
| | - David Hua
- Department of Chemistry, University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|
10
|
Liu R, Xia S, Li H. Native top-down mass spectrometry for higher-order structural characterization of proteins and complexes. MASS SPECTROMETRY REVIEWS 2022:e21793. [PMID: 35757976 DOI: 10.1002/mas.21793] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/23/2022] [Accepted: 05/24/2022] [Indexed: 06/15/2023]
Abstract
Progress in structural biology research has led to a high demand for powerful and yet complementary analytical tools for structural characterization of proteins and protein complexes. This demand has significantly increased interest in native mass spectrometry (nMS), particularly native top-down mass spectrometry (nTDMS) in the past decade. This review highlights recent advances in nTDMS for structural research of biological assemblies, with a particular focus on the extra multi-layers of information enabled by TDMS. We include a short introduction of sample preparation and ionization to nMS, tandem fragmentation techniques as well as mass analyzers and software/analysis pipelines used for nTDMS. We highlight unique structural information offered by nTDMS and examples of its broad range of applications in proteins, protein-ligand interactions (metal, cofactor/drug, DNA/RNA, and protein), therapeutic antibodies and antigen-antibody complexes, membrane proteins, macromolecular machineries (ribosome, nucleosome, proteosome, and viruses), to endogenous protein complexes. The challenges, potential, along with perspectives of nTDMS methods for the analysis of proteins and protein assemblies in recombinant and biological samples are discussed.
Collapse
Affiliation(s)
- Ruijie Liu
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| | - Shujun Xia
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| | - Huilin Li
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
- Guangdong Key Laboratory of Chiral Molecule and Drug Discovery, School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
11
|
Qin S, Tian Z. Proteoform Identification and Quantification Using Intact Protein Database Search Engine ProteinGoggle. Methods Mol Biol 2022; 2500:131-144. [PMID: 35657591 DOI: 10.1007/978-1-0716-2325-1_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Proteomics studies the proteome of organisms, especially proteins that are differentially expressed under certain physiological or pathological conditions; qualitative identification of protein sequences and posttranslational modifications (PTMs) and their positions can help us systematically understand the structure and function of proteoforms. With the development and relative popularity of soft ionization technology (such as electrospray ionization technology) and high mass measurement accuracy and high-resolution mass spectrometers (such as orbitrap), the mass spectrometry (MS) characterization of complete proteins (the so-called top-down proteomics) has become possible and has gradually become popular. Corresponding database search engines and protein identification bioinformatics tools have also been greatly developed. This chapter provides a brief overview of intact protein database search algorithm "isotopic mass-to-charge ratio and envelope fingerprinting" and search engine ProteinGoggle.
Collapse
Affiliation(s)
- Suideng Qin
- School of Chemical Science & Engineering and Shanghai Key Laboratory of Chemical Assessment and Sustainability, Tongji University, Shanghai, China
| | - Zhixin Tian
- School of Chemical Science & Engineering and Shanghai Key Laboratory of Chemical Assessment and Sustainability, Tongji University, Shanghai, China.
| |
Collapse
|
12
|
Sun RX, Wang RM, Luo L, Liu C, Chi H, Zeng WF, He SM. Accurate Proteoform Identification and Quantitation Using pTop 2.0. Methods Mol Biol 2022; 2500:105-129. [PMID: 35657590 DOI: 10.1007/978-1-0716-2325-1_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The remarkable advancement of top-down proteomics in the past decade is driven by the technological development in separation, mass spectrometry (MS) instrumentation, novel fragmentation, and bioinformatics. However, the accurate identification and quantification of proteoforms, all clearly-defined molecular forms of protein products from a single gene, remain a challenging computational task. This is in part due to the complicated mass spectra from intact proteoforms when compared to those from the digested peptides. Herein, pTop 2.0 is developed to fill in the gap between the large-scale complex top-down MS data and the shortage of high-accuracy bioinformatic tools. Compared with pTop 1.0, the first version, pTop 2.0 concentrates mainly on the identification of the proteoforms with unexpected modifications or a terminal truncation. The quantitation based on isotopic labeling is also a new function, which can be carried out by the convenient and user-friendly "one-key operation," integrated together with the qualitative identifications. The accuracy and running speed of pTop 2.0 is significantly improved on the test data sets. This chapter will introduce the main features, step-by-step running operations, and algorithmic developments of pTop 2.0 in order to push the identification and quantitation of intact proteoforms to a higher-accuracy level in top-down proteomics.
Collapse
Affiliation(s)
- Rui-Xiang Sun
- National Institute of Biological Sciences, Beijing, China.
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
| | - Rui-Min Wang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Lan Luo
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Chao Liu
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Hao Chi
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Wen-Feng Zeng
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Si-Min He
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
13
|
Tiambeng TN, Wu Z, Melby JA, Ge Y. Size Exclusion Chromatography Strategies and MASH Explorer for Large Proteoform Characterization. Methods Mol Biol 2022; 2500:15-30. [PMID: 35657584 PMCID: PMC9703982 DOI: 10.1007/978-1-0716-2325-1_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Top-down mass spectrometry (MS)-based analysis of larger proteoforms (>50 kDa) is typically challenging due to an exponential decay in the signal-to-noise ratio with increasing protein molecular weight (MW) and coelution with low-MW proteoforms. Size exclusion chromatography (SEC) fractionates proteins based on their size, separating larger proteoforms from those of smaller size in the proteome. In this protocol, we initially describe the use of SEC to fractionate high-MW proteoforms from low-MW proteoforms. Subsequently, the SEC fractions containing the proteoforms of interest are subjected to reverse-phase liquid chromatography (RPLC) coupled online with high-resolution MS. Finally, proteoforms are characterized using MASH Explorer, a user-friendly software environment for in-depth proteoform characterization.
Collapse
Affiliation(s)
- Timothy N. Tiambeng
- Department of Chemistry, University of Wisconsin – Madison, Madison, WI 53706
| | - Zhijie Wu
- Department of Chemistry, University of Wisconsin – Madison, Madison, WI 53706
| | - Jake A. Melby
- Department of Chemistry, University of Wisconsin – Madison, Madison, WI 53706
| | - Ying Ge
- Department of Chemistry, University of Wisconsin – Madison, Madison, WI 53706,Department of Cell and Regenerative Biology, University of Wisconsin – Madison, Madison, WI 53705,Human Proteomic Program, University of Wisconsin – Madison, Madison WI 53705,To whom correspondence may be addressed: Dr. Ying Ge, 8551 WIMR-II, 1111 Highland Ave., Madison, Wisconsin 53705, USA. ; Tel: 608-265-4744
| |
Collapse
|
14
|
Yang Z, Sun L. Membrane Ultrafiltration-Based Sample Preparation Method and Sheath-Flow CZE-MS/MS for Top-Down Proteomics. Methods Mol Biol 2022; 2500:5-14. [PMID: 35657583 DOI: 10.1007/978-1-0716-2325-1_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Mass spectrometry (MS)-based denaturing top-down proteomics (dTDP) identify proteoforms without pretreatment of enzyme proteolysis. A universal sample preparation method that can efficiently extract protein, reduce sample loss, maintain protein solubility, and be compatible with following up liquid-phase separation, MS, and tandem MS (MS/MS) is vital for large-scale proteoform characterization. Membrane ultrafiltration (MU) was employed here for buffer exchange to efficiently remove the sodium dodecyl sulfate (SDS) detergent in protein samples used for protein extraction and solubilization, followed by capillary zone electrophoresis (CZE)-MS/MS analysis. The MU method showed good protein recovery, minimum protein bias, and nice compatibility with CZE-MS/MS. Single-shot CZE-MS/MS analysis of an Escherichia coli sample prepared by the MU method identified over 800 proteoforms.
Collapse
Affiliation(s)
- Zhichang Yang
- Department of Chemistry, Michigan State University, East Lansing, MI, USA
| | - Liangliang Sun
- Department of Chemistry, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
15
|
Lima DB, Dupré M, Duchateau M, Gianetto QG, Rey M, Matondo M, Chamot-Rooke J. ProteoCombiner: integrating bottom-up with top-down proteomics data for improved proteoform assessment. Bioinformatics 2021; 37:2206-2208. [PMID: 33165572 DOI: 10.1093/bioinformatics/btaa958] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 10/26/2020] [Accepted: 11/02/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION We present a high-performance software integrating shotgun with top-down proteomic data. The tool can deal with multiple experiments and search engines. Enable rapid and easy visualization, manual validation and comparison of the identified proteoform sequences including the post-translational modification characterization. RESULTS We demonstrate the effectiveness of our approach on a large-scale Escherichia coli dataset; ProteoCombiner unambiguously shortlisted proteoforms among those identified by the multiple search engines. AVAILABILITY AND IMPLEMENTATION ProteoCombiner, a demonstration video and user tutorial are freely available at https://proteocombiner.pasteur.fr, for academic use; all data are thus available from the ProteomeXchange consortium (identifier PXD017618). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Diogo B Lima
- Mass Spectrometry for Biology Unit, Institut Pasteur, CNRS USR 2000, Paris, France
| | - Mathieu Dupré
- Mass Spectrometry for Biology Unit, Institut Pasteur, CNRS USR 2000, Paris, France
| | - Magalie Duchateau
- Mass Spectrometry for Biology Unit, Institut Pasteur, CNRS USR 2000, Paris, France
| | - Quentin Giai Gianetto
- Mass Spectrometry for Biology Unit, Institut Pasteur, CNRS USR 2000, Paris, France.,Bioinformatics and Biostatistics HUB, Computational Biology Department, Institut Pasteur, CNRS USR 3756, Paris, France
| | - Martial Rey
- Mass Spectrometry for Biology Unit, Institut Pasteur, CNRS USR 2000, Paris, France
| | - Mariette Matondo
- Mass Spectrometry for Biology Unit, Institut Pasteur, CNRS USR 2000, Paris, France
| | - Julia Chamot-Rooke
- Mass Spectrometry for Biology Unit, Institut Pasteur, CNRS USR 2000, Paris, France
| |
Collapse
|
16
|
Khalid MF, Iman K, Ghafoor A, Saboor M, Ali A, Muaz U, Basharat AR, Tahir T, Abubakar M, Akhter MA, Nabi W, Vanderbauwhede W, Ahmad F, Wajid B, Chaudhary SU. PERCEPTRON: an open-source GPU-accelerated proteoform identification pipeline for top-down proteomics. Nucleic Acids Res 2021; 49:W510-W515. [PMID: 33999207 PMCID: PMC8262694 DOI: 10.1093/nar/gkab368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 04/10/2021] [Accepted: 04/25/2021] [Indexed: 11/12/2022] Open
Abstract
PERCEPTRON is a next-generation freely available web-based proteoform identification and characterization platform for top-down proteomics (TDP). PERCEPTRON search pipeline brings together algorithms for (i) intact protein mass tuning, (ii) de novo sequence tags-based filtering, (iii) characterization of terminal as well as post-translational modifications, (iv) identification of truncated proteoforms, (v) in silico spectral comparison, and (vi) weight-based candidate protein scoring. High-throughput performance is achieved through the execution of optimized code via multiple threads in parallel, on graphics processing units (GPUs) using NVidia Compute Unified Device Architecture (CUDA) framework. An intuitive graphical web interface allows for setting up of search parameters as well as for visualization of results. The accuracy and performance of the tool have been validated on several TDP datasets and against available TDP software. Specifically, results obtained from searching two published TDP datasets demonstrate that PERCEPTRON outperforms all other tools by up to 135% in terms of reported proteins and 10-fold in terms of runtime. In conclusion, the proposed tool significantly enhances the state-of-the-art in TDP search software and is publicly available at https://perceptron.lums.edu.pk. Users can also create in-house deployments of the tool by building code available on the GitHub repository (http://github.com/BIRL/Perceptron).
Collapse
Affiliation(s)
- Muhammad Farhan Khalid
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Kanzal Iman
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Amna Ghafoor
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Mujtaba Saboor
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Ahsan Ali
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Urwa Muaz
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Abdul Rehman Basharat
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Taha Tahir
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Muhammad Abubakar
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Momina Amer Akhter
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Waqar Nabi
- School of Computing Science, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Wim Vanderbauwhede
- School of Computing Science, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Fayyaz Ahmad
- Department of Statistics, University of Gujrat, Gujrat, Pakistan
| | - Bilal Wajid
- Department of Electrical Engineering, University of Engineering and Technology, Lahore, Pakistan
- Department of Computer Science, University of Management and Technology, Lahore, Pakistan
- Division of Research and Development, Sabz-Qalam, Lahore, Pakistan
| | - Safee Ullah Chaudhary
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| |
Collapse
|
17
|
Lu L, Scalf M, Shortreed MR, Smith LM. Mesh Fragmentation Improves Dissociation Efficiency in Top-down Proteomics. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2021; 32:1319-1325. [PMID: 33754701 PMCID: PMC8783543 DOI: 10.1021/jasms.0c00462] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Top-down proteomics is a key mass spectrometry-based technology for comprehensive analysis of proteoforms. Proteoforms exhibit multiple high charge states and isotopic forms in full MS scans. The dissociation behavior of proteoforms in different charge states and subjected to different collision energies is highly variable. The current widely employed data-dependent acquisition (DDA) method selects a narrow m/z range (corresponding to a single proteoform charge state) for dissociation from the most abundant precursors. We describe here Mesh, a novel dissociation strategy, to dissociate multiple charge states of one proteoform with multiple collision energies. We show that the Mesh strategy has the potential to generate fragment ions with improved sequence coverage and improve identification ratios in top-down proteomic analyses of complex samples. The strategy is implemented within an open-source instrument control software program named MetaDrive to perform real time deconvolution and precursor selection.
Collapse
Affiliation(s)
- Lei Lu
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706, United States
| | - Mark Scalf
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706, United States
| | - Michael R. Shortreed
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706, United States
| | - Lloyd M. Smith
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706, United States
- Corresponding Author Phone: (608) 263-2594. Fax: (608) 265-6780.
| |
Collapse
|
18
|
Melby JA, Roberts DS, Larson EJ, Brown KA, Bayne EF, Jin S, Ge Y. Novel Strategies to Address the Challenges in Top-Down Proteomics. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2021; 32:1278-1294. [PMID: 33983025 PMCID: PMC8310706 DOI: 10.1021/jasms.1c00099] [Citation(s) in RCA: 93] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Top-down mass spectrometry (MS)-based proteomics is a powerful technology for comprehensively characterizing proteoforms to decipher post-translational modifications (PTMs) together with genetic variations and alternative splicing isoforms toward a proteome-wide understanding of protein functions. In the past decade, top-down proteomics has experienced rapid growth benefiting from groundbreaking technological advances, which have begun to reveal the potential of top-down proteomics for understanding basic biological functions, unraveling disease mechanisms, and discovering new biomarkers. However, many challenges remain to be comprehensively addressed. In this Account & Perspective, we discuss the major challenges currently facing the top-down proteomics field, particularly in protein solubility, proteome dynamic range, proteome complexity, data analysis, proteoform-function relationship, and analytical throughput for precision medicine. We specifically review the major technology developments addressing these challenges with an emphasis on our research group's efforts, including the development of top-down MS-compatible surfactants for protein solubilization, functionalized nanoparticles for the enrichment of low-abundance proteoforms, strategies for multidimensional chromatography separation of proteins, and a new comprehensive user-friendly software package for top-down proteomics. We have also made efforts to connect proteoforms with biological functions and provide our visions on what the future holds for top-down proteomics.
Collapse
Affiliation(s)
- Jake A Melby
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - David S Roberts
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Eli J Larson
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Kyle A Brown
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Department of Surgery, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Elizabeth F Bayne
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Song Jin
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Ying Ge
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
- Human Proteomics Program, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| |
Collapse
|
19
|
Yang C, Shan YC, Zhang WJ, Dai ZP, Zhang LH, Zhang YK. Full-length Protein Sequencing Based on Continuous Digestion Using Non-specific Proteases. ACTA CHIMICA SINICA 2021. [DOI: 10.6023/a21010025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
20
|
Zhou M, Malhan N, Ahkami AH, Engbrecht K, Myers G, Dahlberg J, Hollingsworth J, Sievert JA, Hutmacher R, Madera M, Lemaux PG, Hixson KK, Jansson C, Paša-Tolić L. Top-down mass spectrometry of histone modifications in sorghum reveals potential epigenetic markers for drought acclimation. Methods 2020; 184:29-39. [DOI: 10.1016/j.ymeth.2019.10.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Revised: 10/10/2019] [Accepted: 10/21/2019] [Indexed: 12/30/2022] Open
|
21
|
Chen W, Liu X. Proteoform Identification by Combining RNA-Seq and Top-Down Mass Spectrometry. J Proteome Res 2020; 20:261-269. [PMID: 33183009 DOI: 10.1021/acs.jproteome.0c00369] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
In proteogenomic studies, genomic and transcriptomic variants are incorporated into customized protein databases for the identification of proteoforms, especially proteoforms with sample-specific variants. Most proteogenomic research has been focused on combining genomic or transcriptomic data with bottom-up mass spectrometry data. In the last decade, top-down mass spectrometry has attracted increasing attention because of its capacity to identify various proteoforms with alterations. However, top-down proteogenomics, in which genomic or transcriptomic data are combined with top-down mass spectrometry data, has not been widely adopted, and there is still a lack of software tools for top-down proteogenomic data analysis. In this paper, we introduce TopPG, a proteogenomic tool for generating proteoform sequence databases with genetic alterations and alternative splicing events. Experiments on top-down proteogenomic data of DLD-1 colorectal cancer cells showed that TopPG coupled with database search confidently identified proteoforms with sample-specific alterations.
Collapse
Affiliation(s)
- Wenrong Chen
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| |
Collapse
|
22
|
Brown KA, Melby JA, Roberts DS, Ge Y. Top-down proteomics: challenges, innovations, and applications in basic and clinical research. Expert Rev Proteomics 2020; 17:719-733. [PMID: 33232185 PMCID: PMC7864889 DOI: 10.1080/14789450.2020.1855982] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 11/23/2020] [Indexed: 12/14/2022]
Abstract
Introduction- A better understanding of the underlying molecular mechanism of diseases is critical for developing more effective diagnostic tools and therapeutics toward precision medicine. However, many challenges remain to unravel the complex nature of diseases. Areas covered- Changes in protein isoform expression and post-translation modifications (PTMs) have gained recognition for their role in underlying disease mechanisms. Top-down mass spectrometry (MS)-based proteomics is increasingly recognized as an important method for the comprehensive characterization of proteoforms that arise from alternative splicing events and/or PTMs for basic and clinical research. Here, we review the challenges, technological innovations, and recent studies that utilize top-down proteomics to elucidate changes in the proteome with an emphasis on its use to study heart diseases. Expert opinion- Proteoform-resolved information can substantially contribute to the understanding of the molecular mechanisms underlying various diseases and for the identification of novel proteoform targets for better therapeutic development . Despite the challenges of sequencing intact proteins, top-down proteomics has enabled a wealth of information regarding protein isoform switching and changes in PTMs. Continuous developments in sample preparation, intact protein separation, and instrumentation for top-down MS have broadened its capabilities to characterize proteoforms from a range of samples on an increasingly global scale.
Collapse
Affiliation(s)
- Kyle A. Brown
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, United States
| | - Jake A. Melby
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, United States
| | - David S. Roberts
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, United States
| | - Ying Ge
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, United States
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin, United States
- Human Proteomics Program, University of Wisconsin-Madison, Madison, Wisconsin, United States
| |
Collapse
|
23
|
Wu Z, Roberts DS, Melby JA, Wenger K, Wetzel M, Gu Y, Ramanathan SG, Bayne EF, Liu X, Sun R, Ong IM, McIlwain SJ, Ge Y. MASH Explorer: A Universal Software Environment for Top-Down Proteomics. J Proteome Res 2020; 19:3867-3876. [PMID: 32786689 DOI: 10.1021/acs.jproteome.0c00469] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Top-down mass spectrometry (MS)-based proteomics enable a comprehensive analysis of proteoforms with molecular specificity to achieve a proteome-wide understanding of protein functions. However, the lack of a universal software for top-down proteomics is becoming increasingly recognized as a major barrier, especially for newcomers. Here, we have developed MASH Explorer, a universal, comprehensive, and user-friendly software environment for top-down proteomics. MASH Explorer integrates multiple spectral deconvolution and database search algorithms into a single, universal platform which can process top-down proteomics data from various vendor formats, for the first time. It addresses the urgent need in the rapidly growing top-down proteomics community and is freely available to all users worldwide. With the critical need and tremendous support from the community, we envision that this MASH Explorer software package will play an integral role in advancing top-down proteomics to realize its full potential for biomedical research.
Collapse
Affiliation(s)
- Zhijie Wu
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - David S Roberts
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Jake A Melby
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Kent Wenger
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,Human Proteomics Program, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Molly Wetzel
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Yiwen Gu
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,Human Proteomics Program, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | | | - Elizabeth F Bayne
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States.,Center for Computational Biology and Bioinformatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States
| | - Ruixiang Sun
- National Institute of Biological Sciences, Beijing, 102206, China
| | - Irene M Ong
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,University of Wisconsin Carbone Cancer Center, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,Department of Obstetrics and Gynecology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Sean J McIlwain
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,University of Wisconsin Carbone Cancer Center, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Ying Ge
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,Human Proteomics Program, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| |
Collapse
|
24
|
Basharat AR, Ning X, Liu X. EnvCNN: A Convolutional Neural Network Model for Evaluating Isotopic Envelopes in Top-Down Mass-Spectral Deconvolution. Anal Chem 2020; 92:7778-7785. [PMID: 32356965 PMCID: PMC7341906 DOI: 10.1021/acs.analchem.0c00903] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Top-down mass spectrometry has become the main method for intact proteoform identification, characterization, and quantitation. Because of the complexity of top-down mass spectrometry data, spectral deconvolution is an indispensable step in spectral data analysis, which groups spectral peaks into isotopic envelopes and extracts monoisotopic masses of precursor or fragment ions. The performance of spectral deconvolution methods relies heavily on their scoring functions, which distinguish correct envelopes from incorrect ones. A good scoring function increases the accuracy of deconvoluted masses reported from mass spectra. In this paper, we present EnvCNN, a convolutional neural network-based model for evaluating isotopic envelopes. We show that the model outperforms other scoring functions in distinguishing correct envelopes from incorrect ones and that it increases the number of identifications and improves the statistical significance of identifications in top-down spectral interpretation.
Collapse
Affiliation(s)
- Abdul Rehman Basharat
- School of Informatics and Computing, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana, 46202, USA
| | - Xia Ning
- Department of Biomedical Informatics and Department of Computer Science and Engineering, Ohio State University, Columbus, Ohio, 43210, USA
| | - Xiaowen Liu
- School of Informatics and Computing, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana, 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, 46202, USA
| |
Collapse
|
25
|
Yang Z, Shen X, Chen D, Sun L. Toward a Universal Sample Preparation Method for Denaturing Top-Down Proteomics of Complex Proteomes. J Proteome Res 2020; 19:3315-3325. [PMID: 32419461 DOI: 10.1021/acs.jproteome.0c00226] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
A universal and standardized sample preparation method becomes vital for denaturing top-down proteomics (dTDP) to advance the scale and accuracy of proteoform delineation in complex biological systems. It needs to have high protein recovery, minimum bias, good reproducibility, and compatibility with downstream mass spectrometry (MS) analysis. Here, we employed a lysis buffer containing sodium dodecyl sulfate for extracting proteoforms from cells and, for the first time, compared membrane ultrafiltration (MU), chloroform-methanol precipitation (CMP), and single-spot solid-phase sample preparation using magnetic beads (SP3) for proteoform cleanup for dTDP. The MU method outperformed CMP and SP3 methods, resulting in high and reproducible protein recovery from both Escherichia coli cell (59 ± 3%) and human HepG2 cell (86 ± 5%) samples without a significant bias. Single-shot capillary zone electrophoresis (CZE)-MS/MS analyses of the prepared E. coli and HepG2 cell samples using the MU method identified 821 and 516 proteoforms, respectively. Nearly 30 and 50% of the identified E. coli and HepG2 proteins are membrane proteins. CZE-MS/MS identified 94 histone proteoforms from the HepG2 sample with various post-translational modifications, including acetylation, methylation, and phosphorylation. Our results suggest that combining the SDS-based protein extraction and the MU-based protein cleanup could be a universal sample preparation method for dTDP. The MS raw data have been deposited to the ProteomeXchange Consortium with the data set identifier PXD018248.
Collapse
Affiliation(s)
- Zhichang Yang
- Department of Chemistry, Michigan State University, 578 S Shaw Ln, East Lansing, Michigan 48824 United States
| | - Xiaojing Shen
- Department of Chemistry, Michigan State University, 578 S Shaw Ln, East Lansing, Michigan 48824 United States
| | - Daoyang Chen
- Department of Chemistry, Michigan State University, 578 S Shaw Ln, East Lansing, Michigan 48824 United States
| | - Liangliang Sun
- Department of Chemistry, Michigan State University, 578 S Shaw Ln, East Lansing, Michigan 48824 United States
| |
Collapse
|
26
|
McIlwain SJ, Wu Z, Wetzel M, Belongia D, Jin Y, Wenger K, Ong IM, Ge Y. Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2020; 31:1104-1113. [PMID: 32223200 PMCID: PMC7909725 DOI: 10.1021/jasms.0c00035] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Top-down mass spectrometry (MS) is a powerful tool for the identification and comprehensive characterization of proteoforms arising from alternative splicing, sequence variation, and post-translational modifications. However, the complex data set generated from top-down MS experiments requires multiple sequential data processing steps to successfully interpret the data for identifying and characterizing proteoforms. One critical step is the deconvolution of the complex isotopic distribution that arises from naturally occurring isotopes. Multiple algorithms are currently available to deconvolute top-down mass spectra, resulting in different deconvoluted peak lists with varied accuracy compared to true positive annotations. In this study, we have designed a machine learning strategy that can process and combine the peak lists from different deconvolution results. By optimizing clustering results, deconvolution results from THRASH, TopFD, MS-Deconv, and SNAP algorithms were combined into consensus peak lists at various thresholds using either a simple voting ensemble method or a random forest machine learning algorithm. For the random forest algorithm, which had better predictive performance, the consensus peak lists on average could achieve a recall value (true positive rate) of 0.60 and a precision value (positive predictive value) of 0.78. It outperforms the single best algorithm, which achieved a recall value of only 0.47 and a precision value of 0.58. This machine learning strategy enhanced the accuracy and confidence in protein identification during database searches by accelerating the detection of true positive peaks while filtering out false positive peaks. Thus, this method shows promise in enhancing proteoform identification and characterization for high-throughput data analysis in top-down proteomics.
Collapse
Affiliation(s)
- Sean J. McIlwain
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53705, USA
- University of Wisconsin Carbone Comprehensive Cancer Center, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Zhijie Wu
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Molly Wetzel
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Daniel Belongia
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Yutong Jin
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Kent Wenger
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Irene M. Ong
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53705, USA
- University of Wisconsin Carbone Comprehensive Cancer Center, University of Wisconsin-Madison, Madison, WI 53705, USA
- Department of Obstetrics & Gynecology, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Ying Ge
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI 53705, USA
- Human Proteomics Program, University of Wisconsin-Madison, Madison, WI 53705, USA
| |
Collapse
|
27
|
Zhong J, Sun Y, Xie M, Peng W, Zhang C, Wu FX, Wang J. Proteoform characterization based on top-down mass spectrometry. Brief Bioinform 2020; 22:1729-1750. [PMID: 32118252 DOI: 10.1093/bib/bbaa015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 01/23/2020] [Indexed: 12/16/2022] Open
Abstract
Proteins are dominant executors of living processes. Compared to genetic variations, changes in the molecular structure and state of a protein (i.e. proteoforms) are more directly related to pathological changes in diseases. Characterizing proteoforms involves identifying and locating primary structure alterations (PSAs) in proteoforms, which is of practical importance for the advancement of the medical profession. With the development of mass spectrometry (MS) technology, the characterization of proteoforms based on top-down MS technology has become possible. This type of method is relatively new and faces many challenges. Since the proteoform identification is the most important process in characterizing proteoforms, we comprehensively review the existing proteoform identification methods in this study. Before identifying proteoforms, the spectra need to be preprocessed, and protein sequence databases can be filtered to speed up the identification. Therefore, we also summarize some popular deconvolution algorithms, various filtering algorithms for improving the proteoform identification performance and various scoring methods for localizing proteoforms. Moreover, commonly used methods were evaluated and compared in this review. We believe our review could help researchers better understand the current state of the development in this field and design new efficient algorithms for the proteoform characterization.
Collapse
Affiliation(s)
- Jiancheng Zhong
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Yusui Sun
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Wei Peng
- Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Chushu Zhang
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Fang-Xiang Wu
- College of Engineering and the Department of Computer Science at University of Saskatchewan, Saskatoon, Canada
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering at Central South University, Changsha, Hunan, China
| |
Collapse
|
28
|
Hale OJ, Cooper HJ. In situ mass spectrometry analysis of intact proteins and protein complexes from biological substrates. Biochem Soc Trans 2020; 48:317-326. [PMID: 32010951 PMCID: PMC7054757 DOI: 10.1042/bst20190793] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 01/09/2020] [Accepted: 01/09/2020] [Indexed: 12/15/2022]
Abstract
Advances in sample preparation, ion sources and mass spectrometer technology have enabled the detection and characterisation of intact proteins. The challenges associated include an appropriately soft ionisation event, efficient transmission and detection of the often delicate macromolecules. Ambient ion sources, in particular, offer a wealth of strategies for analysis of proteins from solution environments, and directly from biological substrates. The last two decades have seen rapid development in this area. Innovations include liquid extraction surface analysis, desorption electrospray ionisation and nanospray desorption electrospray ionisation. Similarly, developments in native mass spectrometry allow protein-protein and protein-ligand complexes to be ionised and analysed. Identification and characterisation of these large ions involves a suite of hyphenated mass spectrometry techniques, often including the coupling of ion mobility spectrometry and fragmentation techniques. The latter include collision, electron and photon-induced methods, each with their own characteristics and benefits for intact protein identification. In this review, recent developments for in situ protein analysis are explored, with a focus on ion sources and tandem mass spectrometry techniques used for identification.
Collapse
Affiliation(s)
- Oliver J. Hale
- School of Biosciences, University of Birmingham, Edgbaston B15 2TT, U.K
| | - Helen J. Cooper
- School of Biosciences, University of Birmingham, Edgbaston B15 2TT, U.K
| |
Collapse
|
29
|
Shen X, Yang Z, McCool EN, Lubeckyj RA, Chen D, Sun L. Capillary zone electrophoresis-mass spectrometry for top-down proteomics. Trends Analyt Chem 2019; 120:115644. [PMID: 31537953 PMCID: PMC6752746 DOI: 10.1016/j.trac.2019.115644] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Mass spectrometry (MS)-based top-down proteomics characterizes complex proteomes at the intact proteoform level and provides an accurate picture of protein isoforms and protein post-translational modifications in the cell. The progress of top-down proteomics requires novel analytical tools with high peak capacity for proteoform separation and high sensitivity for proteoform detection. The requirements have made capillary zone electrophoresis (CZE)-MS an attractive approach for advancing large-scale top-down proteomics. CZE has achieved a peak capacity of 300 for separation of complex proteoform mixtures. CZE-MS has shown drastically better sensitivity than commonly used reversed-phase liquid chromatography (RPLC)-MS for proteoform detection. The advanced CZE-MS identified 6,000 proteoforms of nearly 1,000 proteoform families from a complex proteome sample, which represents one of the largest top-down proteomic datasets so far. In this review, we focus on the recent progress in CZE-MS-based top-down proteomics and provide our perspectives about its future directions.
Collapse
Affiliation(s)
- Xiaojing Shen
- Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Lansing, Michigan 48824, United States
| | - Zhichang Yang
- Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Lansing, Michigan 48824, United States
| | - Elijah N. McCool
- Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Lansing, Michigan 48824, United States
| | - Rachele A. Lubeckyj
- Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Lansing, Michigan 48824, United States
| | - Daoyang Chen
- Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Lansing, Michigan 48824, United States
| | - Liangliang Sun
- Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Lansing, Michigan 48824, United States
| |
Collapse
|
30
|
SPECTRUM - A MATLAB Toolbox for Proteoform Identification from Top-Down Proteomics Data. Sci Rep 2019; 9:11267. [PMID: 31375721 PMCID: PMC6677810 DOI: 10.1038/s41598-019-47724-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Accepted: 06/10/2019] [Indexed: 01/07/2023] Open
Abstract
Top-Down Proteomics (TDP) is an emerging proteomics protocol that involves identification, characterization, and quantitation of intact proteins using high-resolution mass spectrometry. TDP has an edge over other proteomics protocols in that it allows for: (i) accurate measurement of intact protein mass, (ii) high sequence coverage, and (iii) enhanced identification of post-translational modifications (PTMs). However, the complexity of TDP spectra poses a significant impediment to protein search and PTM characterization. Furthermore, limited software support is currently available in the form of search algorithms and pipelines. To address this need, we propose ‘SPECTRUM’, an open-architecture and open-source toolbox for TDP data analysis. Its salient features include: (i) MS2-based intact protein mass tuning, (ii) de novo peptide sequence tag analysis, (iii) propensity-driven PTM characterization, (iv) blind PTM search, (v) spectral comparison, (vi) identification of truncated proteins, (vii) multifactorial coefficient-weighted scoring, and (viii) intuitive graphical user interfaces to access the aforementioned functionalities and visualization of results. We have validated SPECTRUM using published datasets and benchmarked it against salient TDP tools. SPECTRUM provides significantly enhanced protein identification rates (91% to 177%) over its contemporaries. SPECTRUM has been implemented in MATLAB, and is freely available along with its source code and documentation at https://github.com/BIRL/SPECTRUM/.
Collapse
|
31
|
Chen ZL, Meng JM, Cao Y, Yin JL, Fang RQ, Fan SB, Liu C, Zeng WF, Ding YH, Tan D, Wu L, Zhou WJ, Chi H, Sun RX, Dong MQ, He SM. A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides. Nat Commun 2019; 10:3404. [PMID: 31363125 PMCID: PMC6667459 DOI: 10.1038/s41467-019-11337-z] [Citation(s) in RCA: 238] [Impact Index Per Article: 47.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Accepted: 06/20/2019] [Indexed: 01/05/2023] Open
Abstract
We describe pLink 2, a search engine with higher speed and reliability for proteome-scale identification of cross-linked peptides. With a two-stage open search strategy facilitated by fragment indexing, pLink 2 is ~40 times faster than pLink 1 and 3~10 times faster than Kojak. Furthermore, using simulated datasets, synthetic datasets, 15N metabolically labeled datasets, and entrapment databases, four analysis methods were designed to evaluate the credibility of ten state-of-the-art search engines. This systematic evaluation shows that pLink 2 outperforms these methods in precision and sensitivity, especially at proteome scales. Lastly, re-analysis of four published proteome-scale cross-linking datasets with pLink 2 required only a fraction of the time used by pLink 1, with up to 27% more cross-linked residue pairs identified. pLink 2 is therefore an efficient and reliable tool for cross-linking mass spectrometry analysis, and the systematic evaluation methods described here will be useful for future software development. The identification of cross-linked peptides at a proteome scale for interactome analyses represents a complex challenge. Here the authors report an efficient and reliable search engine pLink 2 for proteome-scale cross-linking mass spectrometry analyses, and demonstrate how to systematically evaluate the credibility of search engines.
Collapse
Affiliation(s)
- Zhen-Lin Chen
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jia-Ming Meng
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yong Cao
- National Institute of Biological Sciences, Beijing, 102206, China
| | - Ji-Li Yin
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Run-Qian Fang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Sheng-Bo Fan
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Chao Liu
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Wen-Feng Zeng
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yue-He Ding
- National Institute of Biological Sciences, Beijing, 102206, China
| | - Dan Tan
- National Institute of Biological Sciences, Beijing, 102206, China
| | - Long Wu
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Wen-Jing Zhou
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Hao Chi
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Rui-Xiang Sun
- National Institute of Biological Sciences, Beijing, 102206, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, 102206, China.
| | - Si-Min He
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
32
|
Ghezellou P, Garikapati V, Kazemi SM, Strupat K, Ghassempour A, Spengler B. A perspective view of top-down proteomics in snake venom research. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2019; 33 Suppl 1:20-27. [PMID: 30076652 DOI: 10.1002/rcm.8255] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 07/25/2018] [Accepted: 07/29/2018] [Indexed: 06/08/2023]
Abstract
The venom produced by snakes contains complex mixtures of pharmacologically active proteins and peptides which play a crucial role in the pathophysiology of snakebite diseases. The deep understanding of venom proteomes can help to improve the treatment of this "neglected tropical disease" (as expressed by the World Health Organization [WHO]) and to develop new drugs. The most widely used technique for venom analysis is liquid chromatography/tandem mass spectrometry (LC/MS/MS)-based bottom-up (BU) proteomics. Considering the fact that multiple multi-locus gene families encode snake venom proteins, the major challenge for the BU proteomics is the limited sequence coverage and also the "protein inference problem" which result in a loss of information for the identification and characterization of toxin proteoforms (genetic variation, alternative mRNA splicing, single nucleotide polymorphism [SNP] and post-translational modifications [PTMs]). In contrast, intact protein measurements with top-down (TD) MS strategies cover almost complete protein sequences, and prove the ability to identify venom proteoforms and to localize their modifications and sequence variations.
Collapse
Affiliation(s)
- Parviz Ghezellou
- Institute of Inorganic and Analytical Chemistry, Justus Liebig University Giessen, Germany
- Medicinal Plants and Drugs Research Institute, Shahid Beheshti University, Tehran, Iran
| | | | - Seyed Mahdi Kazemi
- Medicinal Plants and Drugs Research Institute, Shahid Beheshti University, Tehran, Iran
| | | | - Alireza Ghassempour
- Medicinal Plants and Drugs Research Institute, Shahid Beheshti University, Tehran, Iran
| | - Bernhard Spengler
- Institute of Inorganic and Analytical Chemistry, Justus Liebig University Giessen, Germany
| |
Collapse
|
33
|
Liu Z, Wang R, Liu J, Sun R, Wang F. Global Quantification of Intact Proteins via Chemical Isotope Labeling and Mass Spectrometry. J Proteome Res 2019; 18:2185-2194. [PMID: 30990045 DOI: 10.1021/acs.jproteome.9b00071] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Although thousands of intact proteins have been feasibly identified in recent years, global quantification of intact proteins is still challenging. Herein, we develop a high-throughput strategy for global intact protein quantification based on chemical isotope labeling. The isotope incorporation efficiency is as high as 99.2% for complex intact protein samples extracted from HeLa cells. Further, the pTop 2.0 software is developed for automated quantification of intact proteoforms in a high-throughput manner. The high quantification accuracy and reproducibility of this strategy have been demonstrated for both standard and complex cellular protein samples. A total of 2283 intact proteoforms originated from 660 protein accessions are successfully quantified under anaerobic and aerobic conditions and the differentially expressed proteins are observed to be involved in the important biological processes such as stress response.
Collapse
Affiliation(s)
- Zheyi Liu
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics , Chinese Academy of Sciences , Dalian , 116023 , China
| | - Ruimin Wang
- Institute of Computing Technology , Chinese Academy of Sciences , Beijing , 100190 , China
| | - Jing Liu
- College of Pharmacy , Dalian Medical University , Dalian , 116044 , China
| | - Ruixiang Sun
- Institute of Computing Technology , Chinese Academy of Sciences , Beijing , 100190 , China
| | - Fangjun Wang
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics , Chinese Academy of Sciences , Dalian , 116023 , China
| |
Collapse
|
34
|
Li Z, He B, Kou Q, Wang Z, Wu S, Liu Y, Feng W, Liu X. Evaluation of top-down mass spectral identification with homologous protein sequences. BMC Bioinformatics 2018; 19:494. [PMID: 30591035 PMCID: PMC6309053 DOI: 10.1186/s12859-018-2462-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Top-down mass spectrometry has unique advantages in identifying proteoforms with multiple post-translational modifications and/or unknown alterations. Most software tools in this area search top-down mass spectra against a protein sequence database for proteoform identification. When the species studied in a mass spectrometry experiment lacks its proteome sequence database, a homologous protein sequence database can be used for proteoform identification. The accuracy of homologous protein sequences affects the sensitivity of proteoform identification and the accuracy of mass shift localization. RESULTS We tested TopPIC, a commonly used software tool for top-down mass spectral identification, on a top-down mass spectrometry data set of Escherichia coli K12 MG1655, and evaluated its performance using an Escherichia coli K12 MG1655 proteome database and a homologous protein database. The number of identified spectra with the homologous database was about half of that with the Escherichia coli K12 MG1655 database. We also tested TopPIC on a top-down mass spectrometry data set of human MCF-7 cells and obtained similar results. CONCLUSIONS Experimental results demonstrated that TopPIC is capable of identifying many proteoform spectrum matches and localizing unknown alterations using homologous protein sequences containing no more than 2 mutations.
Collapse
Affiliation(s)
- Ziwei Li
- College of Automation, Harbin Engineering University, 145, Nan Tong Street, Harbin, Heilongjiang, 150001 China
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
| | - Bo He
- College of Automation, Harbin Engineering University, 145, Nan Tong Street, Harbin, Heilongjiang, 150001 China
| | - Qiang Kou
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 719 Indiana Avenue, Indianapolis, IN, 46202 USA
| | - Zhe Wang
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Parkway, Norman, OK, 73019 USA
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Parkway, Norman, OK, 73019 USA
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
| | - Weixing Feng
- College of Automation, Harbin Engineering University, 145, Nan Tong Street, Harbin, Heilongjiang, 150001 China
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 719 Indiana Avenue, Indianapolis, IN, 46202 USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
| |
Collapse
|
35
|
Schaffer LV, Rensvold JW, Shortreed MR, Cesnik AJ, Jochem A, Scalf M, Frey BL, Pagliarini DJ, Smith LM. Identification and Quantification of Murine Mitochondrial Proteoforms Using an Integrated Top-Down and Intact-Mass Strategy. J Proteome Res 2018; 17:3526-3536. [PMID: 30180576 PMCID: PMC6201694 DOI: 10.1021/acs.jproteome.8b00469] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The development of effective strategies for the comprehensive identification and quantification of proteoforms in complex systems is a critical challenge in proteomics. Proteoforms, the specific molecular forms in which proteins are present in biological systems, are the key effectors of biological function. Thus, knowledge of proteoform identities and abundances is essential to unraveling the mechanisms that underlie protein function. We recently reported a strategy that integrates conventional top-down mass spectrometry with intact-mass determinations for enhanced proteoform identifications and the elucidation of proteoform families and applied it to the analysis of yeast cell lysate. In the present work, we extend this strategy to enable quantification of proteoforms, and we examine changes in the abundance of murine mitochondrial proteoforms upon differentiation of mouse myoblasts to myotubes. The integrated top-down and intact-mass strategy provided an increase of ∼37% in the number of identified proteoforms compared to top-down alone, which is in agreement with our previous work in yeast; 1779 unique proteoforms were identified using the integrated strategy compared to 1301 using top-down analysis alone. Quantitative comparison of proteoform differences between the myoblast and myotube cell types showed 129 observed proteoforms exhibiting statistically significant abundance changes (fold change >2 and false discovery rate <5%).
Collapse
Affiliation(s)
- Leah V. Schaffer
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | | | - Michael R. Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Anthony J. Cesnik
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Adam Jochem
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Mark Scalf
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Brian L. Frey
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - David J. Pagliarini
- Morgridge Institute for Research, Madison, WI 53715, USA
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Lloyd M. Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
36
|
Yang R, Zhu D. A graph-based filtering method for top-down mass spectral identification. BMC Genomics 2018; 19:666. [PMID: 30255788 PMCID: PMC6157290 DOI: 10.1186/s12864-018-5026-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Database search has been the main approach for proteoform identification by top-down tandem mass spectrometry. However, when the target proteoform that produced the spectrum contains post-translational modifications (PTMs) and/or mutations, it is quite time consuming to align a query spectrum against all protein sequences without any PTMs and mutations in a large database. Consequently, it is essential to develop efficient and sensitive filtering algorithms for speeding up database search. RESULTS In this paper, we propose a spectrum graph matching (SGM) based protein sequence filtering method for top-down mass spectral identification. It uses the subspectra of a query spectrum to generate spectrum graphs and searches them against a protein database to report the best candidates. As the sequence tag and gaped tag approaches need the preprocessing step to extract and select tags, the SGM filtering method circumvents this preprocessing step, thus simplifying data processing. We evaluated the filtration efficiency of the SGM filtering method with various parameter settings on an Escherichia coli top-down mass spectrometry data set and compared the performances of the SGM filtering method and two tag-based filtering methods on a data set of MCF-7 cells. CONCLUSIONS Experimental results on the data sets show that the SGM filtering method achieves high sensitivity in protein sequence filtration. When coupled with a spectral alignment algorithm, the SGM filtering method significantly increases the number of identified proteoform spectrum-matches compared with the tag-based methods in top-down mass spectrometry data analysis.
Collapse
Affiliation(s)
- Runmin Yang
- School of Computer Science and Technology, Shandong University, 1500, Shun Hua Lu, Jinan, 250101, China
| | - Daming Zhu
- School of Computer Science and Technology, Shandong University, 1500, Shun Hua Lu, Jinan, 250101, China.
| |
Collapse
|
37
|
Zhu K, Liu X. A graph-based approach for proteoform identification and quantification using top-down homogeneous multiplexed tandem mass spectra. BMC Bioinformatics 2018; 19:280. [PMID: 30367573 PMCID: PMC6101081 DOI: 10.1186/s12859-018-2273-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Background Top-down homogeneous multiplexed tandem mass (HomMTM) spectra are generated from modified proteoforms of the same protein with different post-translational modification patterns. They are frequently observed in the analysis of ultramodified proteins, some proteoforms of which have similar molecular weights and cannot be well separated by liquid chromatography in mass spectrometry analysis. Results We formulate the top-down HomMTM spectral identification problem as the minimum error k-splittable flow problem on graphs and propose a graph-based algorithm for the identification and quantification of proteoforms using top-down HomMTM spectra. Conclusions Experiments on a top-down mass spectrometry data set of the histone H4 protein showed that the proposed method identified many proteoform pairs that better explain the query spectra than single proteoforms. Electronic supplementary material The online version of this article (10.1186/s12859-018-2273-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kaiyuan Zhu
- Department of Computer Science, Indiana University Bloomington, 700 N. Woodlawn Avenue, Bloomington, IN, 47408, USA
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 719 Indiana Avenue, Indianapolis, IN, 46202, USA. .,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 W. 10th Street, Indianapolis, IN, 46202, USA.
| |
Collapse
|
38
|
Yang R, Zhu D, Kou Q, Bhat-Nakshatri P, Nakshatri H, Wu S, Liu X. A Spectrum Graph-Based Protein Sequence Filtering Algorithm for Proteoform Identification by Top-Down Mass Spectrometry. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2018; 2017:222-229. [PMID: 29503761 DOI: 10.1109/bibm.2017.8217653] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Database search is the main approach for identifying proteoforms using top-down tandem mass spectra. However, it is extremely slow to align a query spectrum against all protein sequences in a large database when the target proteoform that produced the spectrum contains post-translational modifications and/or mutations. As a result, efficient and sensitive protein sequence filtering algorithms are essential for speeding up database search. In this paper, we propose a novel filtering algorithm, which generates spectrum graphs from subspectra of the query spectrum and searches them against the protein database to find good candidates. Compared with the sequence tag and gaped tag approaches, the proposed method circumvents the step of tag extraction, thus simplifying data processing. Experimental results on real data showed that the proposed method achieved both high speed and high sensitivity in protein sequence filtration.
Collapse
Affiliation(s)
- Runmin Yang
- School of Computer Science and Technology, Shandong University.,Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis
| | - Daming Zhu
- School of Computer Science and Technology, Shandong University
| | - Qiang Kou
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis
| | | | | | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine
| |
Collapse
|
39
|
Kou Q, Wu S, Liu X. Systematic Evaluation of Protein Sequence Filtering Algorithms for Proteoform Identification Using Top-Down Mass Spectrometry. Proteomics 2018; 18. [PMID: 29327814 DOI: 10.1002/pmic.201700306] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Revised: 11/20/2017] [Indexed: 01/19/2023]
Abstract
Complex proteoforms contain various primary structural alterations resulting from variations in genes, RNA, and proteins. Top-down mass spectrometry is commonly used for analyzing complex proteoforms because it provides whole sequence information of the proteoforms. Proteoform identification by top-down mass spectral database search is a challenging computational problem because the types and/or locations of some alterations in target proteoforms are in general unknown. Although spectral alignment and mass graph alignment algorithms have been proposed for identifying proteoforms with unknown alterations, they are extremely slow to align millions of spectra against tens of thousands of protein sequences in high throughput proteome level analyses. Many software tools in this area combine efficient protein sequence filtering algorithms and spectral alignment algorithms to speed up database search. As a result, the performance of these tools heavily relies on the sensitivity and efficiency of their filtering algorithms. Here, we propose two efficient approximate spectrum-based filtering algorithms for proteoform identification. We evaluated the performances of the proposed algorithms and four existing ones on simulated and real top-down mass spectrometry data sets. Experiments showed that the proposed algorithms outperformed the existing ones for complex proteoform identification. In addition, combining the proposed filtering algorithms and mass graph alignment algorithms identified many proteoforms missed by ProSightPC in proteome-level proteoform analyses.
Collapse
Affiliation(s)
- Qiang Kou
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, USA
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, USA
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, USA.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
40
|
Avtonomov DM, Polasky DA, Ruotolo BT, Nesvizhskii AI. IMTBX and Grppr: Software for Top-Down Proteomics Utilizing Ion Mobility-Mass Spectrometry. Anal Chem 2018; 90:2369-2375. [PMID: 29278491 PMCID: PMC5826643 DOI: 10.1021/acs.analchem.7b04999] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Top-down proteomics has emerged as a transformative method for the analysis of protein sequence and post-translational modifications (PTMs). Top-down experiments have historically been performed primarily on ultrahigh resolution mass spectrometers due to the complexity of spectra resulting from fragmentation of intact proteins, but recent advances in coupling ion mobility separations to faster, lower resolution mass analyzers now offer a viable alternative. However, software capable of interpreting the highly complex two-dimensional spectra that result from coupling ion mobility separation to top-down experiments is currently lacking. In this manuscript we present a software suite consisting of two programs, IMTBX ("IM Toolbox") and Grppr ("Grouper"), that enable fully automated processing of such data. We demonstrate the capabilities of this software suite by examining a series of intact proteins on a Waters Synapt G2 ion-mobility equipped mass spectrometer and compare the results to the manual and semiautomated data analysis procedures we have used previously.
Collapse
Affiliation(s)
- Dmitry M Avtonomov
- Department of Pathology, ‡Department of Chemistry, and §Department of Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, Michigan United States
| | - Daniel A Polasky
- Department of Pathology, ‡Department of Chemistry, and §Department of Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, Michigan United States
| | - Brandon T Ruotolo
- Department of Pathology, ‡Department of Chemistry, and §Department of Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, Michigan United States
| | - Alexey I Nesvizhskii
- Department of Pathology, ‡Department of Chemistry, and §Department of Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, Michigan United States
| |
Collapse
|
41
|
Kou Q, Wu S, Tolic N, Paša-Tolic L, Liu Y, Liu X. A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra. Bioinformatics 2018; 33:1309-1316. [PMID: 28453668 DOI: 10.1093/bioinformatics/btw806] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 12/15/2016] [Indexed: 11/14/2022] Open
Abstract
Motivation Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a 'bird's eye view' of intact proteoforms. The combinatorial explosion of various alterations on a protein may result in billions of possible proteoforms, making proteoform identification a challenging computational problem. Results We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graph-based software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry datasets showed that TopMG outperformed existing methods in identifying complex proteoforms. Availability and implementation http://proteomics.informatics.iupui.edu/software/topmg/. Contact xwliu@iupui.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiang Kou
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019, USA
| | - Nikola Tolic
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Ljiljana Paša-Tolic
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
42
|
Affiliation(s)
- Bifan Chen
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Kyle A. Brown
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Ziqing Lin
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Human Proteomics Program, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Ying Ge
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Human Proteomics Program, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
43
|
Schaffer LV, Shortreed MR, Cesnik AJ, Frey BL, Solntsev SK, Scalf M, Smith LM. Expanding Proteoform Identifications in Top-Down Proteomic Analyses by Constructing Proteoform Families. Anal Chem 2017; 90:1325-1333. [PMID: 29227670 DOI: 10.1021/acs.analchem.7b04221] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
In top-down proteomics, intact proteins are analyzed by tandem mass spectrometry and proteoforms, which are defined forms of a protein with specific sequences of amino acids and localized post-translational modifications, are identified using precursor mass and fragmentation data. Many proteoforms that are detected in the precursor scan (MS1) are not selected for fragmentation by the instrument and therefore remain unidentified in typical top-down proteomic workflows. Our laboratory has developed the open source software program Proteoform Suite to analyze MS1-only intact proteoform data. Here, we have adapted it to provide identifications of proteoform masses in precursor MS1 spectra of top-down data, supplementing the top-down identifications obtained using the MS2 fragmentation data. Proteoform Suite performs mass calibration using high-scoring top-down identifications and identifies additional proteoforms using calibrated, accurate intact masses. Proteoform families, the set of proteoforms from a given gene, are constructed and visualized from proteoforms identified by both top-down and intact-mass analyses. Using this strategy, we constructed proteoform families and identified 1861 proteoforms in yeast lysate, yielding an approximately 40% increase over the original 1291 proteoform identifications observed using traditional top-down analysis alone.
Collapse
Affiliation(s)
- Leah V Schaffer
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Anthony J Cesnik
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Brian L Frey
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Stefan K Solntsev
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Mark Scalf
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States.,Genome Center of Wisconsin, University of Wisconsin , 425G Henry Mall, Room 3420, Madison, Wisconsin 53706, United States
| |
Collapse
|
44
|
Yang H, Chi H, Zhou WJ, Zeng WF, Liu C, Wang RM, Wang ZW, Niu XN, Chen ZL, He SM. pSite: Amino Acid Confidence Evaluation for Quality Control of De Novo Peptide Sequencing and Modification Site Localization. J Proteome Res 2017; 17:119-128. [DOI: 10.1021/acs.jproteome.7b00428] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Hao Yang
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hao Chi
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Wen-Jing Zhou
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wen-Feng Zeng
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chao Liu
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Rui-Min Wang
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhao-Wei Wang
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiu-Nan Niu
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhen-Lin Chen
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Si-Min He
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
45
|
Melani RD, Nogueira FCS, Domont GB. It is time for top-down venomics. J Venom Anim Toxins Incl Trop Dis 2017; 23:44. [PMID: 29075288 PMCID: PMC5648493 DOI: 10.1186/s40409-017-0135-6] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Accepted: 09/21/2017] [Indexed: 12/19/2022] Open
Abstract
The protein composition of animal venoms is usually determined by peptide-centric proteomics approaches (bottom-up proteomics). However, this technique cannot, in most cases, distinguish among toxin proteoforms, herein called toxiforms, because of the protein inference problem. Top-down proteomics (TDP) analyzes intact proteins without digestion and provides high quality data to identify and characterize toxiforms. Denaturing top-down proteomics is the most disseminated subarea of TDP, which performs qualitative and quantitative analyzes of proteoforms up to ~30 kDa in high-throughput and automated fashion. On the other hand, native top-down proteomics provides access to information on large proteins (> 50 kDA) and protein interactions preserving non-covalent bonds and physiological complex stoichiometry. The use of native and denaturing top-down venomics introduced novel and useful techniques to toxinology, allowing an unprecedented characterization of venom proteins and protein complexes at the toxiform level. The collected data contribute to a deep understanding of venom natural history, open new possibilities to study the toxin evolution, and help in the development of better biotherapeutics.
Collapse
Affiliation(s)
- Rafael D. Melani
- Proteomics Unit, Department of Biochemistry, Institute of Chemistry, Federal University of Rio de Janeiro, Av. Athos da Silveira Ramos, 149, CT A-542, Cidade Universitária, Rio de Janeiro, RJ CEP 21941-909 Brazil
| | - Fabio C. S. Nogueira
- Proteomics Unit, Department of Biochemistry, Institute of Chemistry, Federal University of Rio de Janeiro, Av. Athos da Silveira Ramos, 149, CT A-542, Cidade Universitária, Rio de Janeiro, RJ CEP 21941-909 Brazil
| | - Gilberto B. Domont
- Proteomics Unit, Department of Biochemistry, Institute of Chemistry, Federal University of Rio de Janeiro, Av. Athos da Silveira Ramos, 149, CT A-542, Cidade Universitária, Rio de Janeiro, RJ CEP 21941-909 Brazil
| |
Collapse
|
46
|
Park J, Piehowski PD, Wilkins C, Zhou M, Mendoza J, Fujimoto GM, Gibbons BC, Shaw JB, Shen Y, Shukla AK, Moore RJ, Liu T, Petyuk VA, Tolic N, Pasa-Tolic L, Smith RD, Payne SH, Kim S. Informed-Proteomics: open-source software package for top-down proteomics. Nat Methods 2017; 14:909-914. [PMID: 28783154 PMCID: PMC5578875 DOI: 10.1038/nmeth.4388] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Accepted: 06/21/2017] [Indexed: 12/12/2022]
Abstract
Top-down proteomics, the analysis of intact proteins in their endogenous form, preserves valuable information about post-translation modifications, isoforms and proteolytic processing. The quality of top-down liquid chromatography-tandem MS (LC-MS/MS) data sets is rapidly increasing on account of advances in instrumentation and sample-processing protocols. However, top-down mass spectra are substantially more complex than conventional bottom-up data. New algorithms and software tools for confident proteoform identification and quantification are needed. Here we present Informed-Proteomics, an open-source software suite for top-down proteomics analysis that consists of an LC-MS feature-finding algorithm, a database search algorithm, and an interactive results viewer. We compare our tool with several other popular tools using human-in-mouse xenograft luminal and basal breast tumor samples that are known to have significant differences in protein abundance based on bottom-up analysis.
Collapse
Affiliation(s)
- Jungkap Park
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Paul D. Piehowski
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Christopher Wilkins
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Mowei Zhou
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Joshua Mendoza
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Grant M Fujimoto
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Bryson C. Gibbons
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Jared B. Shaw
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Yufeng Shen
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Anil K. Shukla
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Ronald J. Moore
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Tao Liu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Vladislav A Petyuk
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Nikola Tolic
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Ljiljana Pasa-Tolic
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Richard D. Smith
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Samuel H. Payne
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington USA
| | - Sangtae Kim
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington USA
| |
Collapse
|
47
|
Tholey A, Becker A. Top-down proteomics for the analysis of proteolytic events - Methods, applications and perspectives. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH 2017; 1864:2191-2199. [PMID: 28711385 DOI: 10.1016/j.bbamcr.2017.07.002] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Revised: 07/07/2017] [Accepted: 07/09/2017] [Indexed: 02/06/2023]
Abstract
Mass spectrometry based proteomics is an indispensable tool for almost all research areas relevant for the understanding of proteolytic processing, ranging from the identification of substrates, products and cleavage sites up to the analysis of structural features influencing protease activity. The majority of methods for these studies are based on bottom-up proteomics performing analysis at peptide level. As this approach is characterized by a number of pitfalls, e.g. loss of molecular information, there is an ongoing effort to establish top-down proteomics, performing separation and MS analysis both at intact protein level. We briefly introduce major approaches of bottom-up proteomics used in the field of protease research and highlight the shortcomings of these methods. We then discuss the present state-of-the-art of top-down proteomics. Together with the discussion of known challenges we show the potential of this approach and present a number of successful applications of top-down proteomics in protease research. This article is part of a Special Issue entitled: Proteolysis as a Regulatory Event in Pathophysiology edited by Stefan Rose-John.
Collapse
Affiliation(s)
- Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany.
| | - Alexander Becker
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| |
Collapse
|
48
|
Dryden MDM, Fobel R, Fobel C, Wheeler AR. Upon the Shoulders of Giants: Open-Source Hardware and Software in Analytical Chemistry. Anal Chem 2017; 89:4330-4338. [PMID: 28379683 DOI: 10.1021/acs.analchem.7b00485] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Isaac Newton famously observed that "if I have seen further it is by standing on the shoulders of giants." We propose that this sentiment is a powerful motivation for the "open-source" movement in scientific research, in which creators provide everything needed to replicate a given project online, as well as providing explicit permission for users to use, improve, and share it with others. Here, we write to introduce analytical chemists who are new to the open-source movement to best practices and concepts in this area and to survey the state of open-source research in analytical chemistry. We conclude by considering two examples of open-source projects from our own research group, with the hope that a description of the process, motivations, and results will provide a convincing argument about the benefits that this movement brings to both creators and users.
Collapse
Affiliation(s)
- Michael D M Dryden
- Department of Chemistry, University of Toronto , 80 Saint George Street, Toronto, Ontario M5S 3H6, Canada
| | - Ryan Fobel
- Department of Chemistry, University of Toronto , 80 Saint George Street, Toronto, Ontario M5S 3H6, Canada.,Donnelly Centre for Cellular and Biomolecular Research , 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Christian Fobel
- Department of Chemistry, University of Toronto , 80 Saint George Street, Toronto, Ontario M5S 3H6, Canada.,Donnelly Centre for Cellular and Biomolecular Research , 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Aaron R Wheeler
- Department of Chemistry, University of Toronto , 80 Saint George Street, Toronto, Ontario M5S 3H6, Canada.,Donnelly Centre for Cellular and Biomolecular Research , 160 College Street, Toronto, Ontario M5S 3E1, Canada.,Institute of Biomaterials and Biomedical Engineering, University of Toronto , 164 College Street, Toronto, Ontario M5S 3G9, Canada
| |
Collapse
|
49
|
Kou Q, Xun L, Liu X. TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization. Bioinformatics 2016; 32:3495-3497. [PMID: 27423895 DOI: 10.1093/bioinformatics/btw398] [Citation(s) in RCA: 93] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Revised: 05/30/2016] [Accepted: 06/17/2016] [Indexed: 11/14/2022] Open
Abstract
Top-down mass spectrometry enables the observation of whole complex proteoforms in biological samples and provides crucial information complementary to bottom-up mass spectrometry. Because of the complexity of top-down mass spectra and proteoforms, it is a challenging problem to efficiently interpret top-down tandem mass spectra in high-throughput proteome-level proteomics studies. We present TopPIC, a tool that efficiently identifies and characterizes complex proteoforms with unknown primary structure alterations, such as amino acid mutations and post-translational modifications, by searching top-down tandem mass spectra against a protein database. AVAILABILITY AND IMPLEMENTATION http://proteomics.informatics.iupui.edu/software/toppic/ CONTACT: xwliu@iupui.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiang Kou
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Likun Xun
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|