1
|
Tabb DL, Jeong K, Druart K, Gant MS, Brown KA, Nicora C, Zhou M, Couvillion S, Nakayasu E, Williams JE, Peterson HK, McGuire MK, McGuire MA, Metz TO, Chamot-Rooke J. Comparing Top-Down Proteoform Identification: Deconvolution, PrSM Overlap, and PTM Detection. J Proteome Res 2023. [PMID: 37235544 DOI: 10.1021/acs.jproteome.2c00673] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Generating top-down tandem mass spectra (MS/MS) from complex mixtures of proteoforms benefits from improvements in fractionation, separation, fragmentation, and mass analysis. The algorithms to match MS/MS to sequences have undergone a parallel evolution, with both spectral alignment and match-counting approaches producing high-quality proteoform-spectrum matches (PrSMs). This study assesses state-of-the-art algorithms for top-down identification (ProSight PD, TopPIC, MSPathFinderT, and pTop) in their yield of PrSMs while controlling false discovery rate. We evaluated deconvolution engines (ThermoFisher Xtract, Bruker AutoMSn, Matrix Science Mascot Distiller, TopFD, and FLASHDeconv) in both ThermoFisher Orbitrap-class and Bruker maXis Q-TOF data (PXD033208) to produce consistent precursor charges and mass determinations. Finally, we sought post-translational modifications (PTMs) in proteoforms from bovine milk (PXD031744) and human ovarian tissue. Contemporary identification workflows produce excellent PrSM yields, although approximately half of all identified proteoforms from these four pipelines were specific to only one workflow. Deconvolution algorithms disagree on precursor masses and charges, contributing to identification variability. Detection of PTMs is inconsistent among algorithms. In bovine milk, 18% of PrSMs produced by pTop and TopMG were singly phosphorylated, but this percentage fell to 1% for one algorithm. Applying multiple search engines produces more comprehensive assessments of experiments. Top-down algorithms would benefit from greater interoperability.
Collapse
Affiliation(s)
- David L Tabb
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| | - Kyowon Jeong
- Applied Bioinformatics, Computer Science Department, University of Tübingen, Tübingen 72076, Germany
| | - Karen Druart
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| | - Megan S Gant
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| | - Kyle A Brown
- School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin 53705, United States
| | - Carrie Nicora
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Mowei Zhou
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Sneha Couvillion
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Ernesto Nakayasu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Janet E Williams
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Haley K Peterson
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Michelle K McGuire
- Margaret Ritchie School of Family and Consumer Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Mark A McGuire
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Thomas O Metz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Julia Chamot-Rooke
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| |
Collapse
|
2
|
Singla R, Abidi SMS, Dar AI, Acharya A. Inhibition of Glycation-Induced Aggregation of Human Serum Albumin by Organic-Inorganic Hybrid Nanocomposites of Iron Oxide-Functionalized Nanocellulose. ACS OMEGA 2019; 4:14805-14819. [PMID: 31552320 PMCID: PMC6751540 DOI: 10.1021/acsomega.9b01392] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 07/31/2019] [Indexed: 05/06/2023]
Abstract
Protein aggregation leads to the transformation of proteins from their soluble form to the insoluble amyloid fibrils and these aggregates get deposited in the specific body tissues, accounting for various diseases. To prevent such an aggregation, organic-inorganic hybrid nanocomposites of iron oxide nanoparticle (NP, ∼6.5-7.0 nm)-conjugated cellulose nanocrystals (CNCs) isolated from Syzygium cumini (SC) and Pinus roxburghii (PR) were chemically synthesized. Transmission electron microscopy (TEM) images of the nanocomposites suggested that the in situ-synthesized iron oxide NPs were bound to the CNC surface in a uniform and regular fashion. The ThT fluorescence assay together with 8-anilino-1-naphthalenesulfonic acid, Congo Red, and CD studies suggested that short fiber-based SC nanocomposites showed better inhibition as well as dissociation of human serum albumin aggregates. The TEM and fluorescence microscopy studies supported similar observations. Native polyacrylamide gel electrophoresis results documented dissociation of higher protein aggregates in the presence of the developed nanocomposite. Interestingly, the dissociated proteins retained their biological function by maintaining a high amount of α-helix content. The in vitro studies with HEK-293 cells suggested that the developed nanocomposite reduces aggregation-induced cytotoxicity by intracellular reactive oxygen species scavenging and maintaining the Ca2+ ion-channel. These results indicated that the hybrid organic-inorganic nanocomposite, with simultaneous sites for hydrophobic and hydrophilic interactions, tends to provide a larger surface area for nanocomposite-protein interactions, which ultimately disfavors the nucleation step for fibrillation for protein aggregates.
Collapse
Affiliation(s)
- Rubbel Singla
- Biotechnology
Division and Academy of Scientific & Innovative Research (AcSIR), CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh 176061, India
| | - Syed M. S. Abidi
- Biotechnology
Division and Academy of Scientific & Innovative Research (AcSIR), CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh 176061, India
| | - Aqib Iqbal Dar
- Biotechnology
Division and Academy of Scientific & Innovative Research (AcSIR), CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh 176061, India
| | - Amitabha Acharya
- Biotechnology
Division and Academy of Scientific & Innovative Research (AcSIR), CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh 176061, India
| |
Collapse
|
3
|
Polanski A, Marczyk M, Pietrowska M, Widlak P, Polanska J. Signal Partitioning Algorithm for Highly Efficient Gaussian Mixture Modeling in Mass Spectrometry. PLoS One 2015; 10:e0134256. [PMID: 26230717 PMCID: PMC4521892 DOI: 10.1371/journal.pone.0134256] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Accepted: 07/07/2015] [Indexed: 12/02/2022] Open
Abstract
Mixture - modeling of mass spectra is an approach with many potential applications including peak detection and quantification, smoothing, de-noising, feature extraction and spectral signal compression. However, existing algorithms do not allow for automated analyses of whole spectra. Therefore, despite highlighting potential advantages of mixture modeling of mass spectra of peptide/protein mixtures and some preliminary results presented in several papers, the mixture modeling approach was so far not developed to the stage enabling systematic comparisons with existing software packages for proteomic mass spectra analyses. In this paper we present an efficient algorithm for Gaussian mixture modeling of proteomic mass spectra of different types (e.g., MALDI-ToF profiling, MALDI-IMS). The main idea is automated partitioning of protein mass spectral signal into fragments. The obtained fragments are separately decomposed into Gaussian mixture models. The parameters of the mixture models of fragments are then aggregated to form the mixture model of the whole spectrum. We compare the elaborated algorithm to existing algorithms for peak detection and we demonstrate improvements of peak detection efficiency obtained by using Gaussian mixture modeling. We also show applications of the elaborated algorithm to real proteomic datasets of low and high resolution.
Collapse
Affiliation(s)
- Andrzej Polanski
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| | - Michal Marczyk
- Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland
| | - Monika Pietrowska
- Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Gliwice, Poland
| | - Piotr Widlak
- Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Gliwice, Poland
| | - Joanna Polanska
- Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland
- * E-mail:
| |
Collapse
|
4
|
Abstract
![]()
Accurate
assignment of peptide sequences to observed fragmentation
spectra is hindered by the large number of hypotheses that must be
considered for each observed spectrum. A high score assigned to a
particular peptide–spectrum match (PSM) may not end up being
statistically significant after multiple testing correction. Researchers
can mitigate this problem by controlling the hypothesis space in various
ways: considering only peptides resulting from enzymatic cleavages,
ignoring possible post-translational modifications or single nucleotide
variants, etc. However, these strategies sacrifice identifications
of spectra generated by rarer types of peptides. In this work, we
introduce a statistical testing framework, cascade search, that directly
addresses this problem. The method requires that the user specify a priori a statistical confidence threshold as well as a
series of peptide databases. For instance, such a cascade of databases
could include fully tryptic, semitryptic, and nonenzymatic peptides
or peptides with increasing numbers of modifications. Cascaded search
then gradually expands the list of candidate peptides from more likely
peptides toward rare peptides, sequestering at each stage any spectrum
that is identified with a specified statistical confidence. We compare
cascade search to a standard procedure that lumps all of the peptides
into a single database, as well as to a previously described group
FDR procedure that computes the FDR separately within each database.
We demonstrate, using simulated and real data, that cascade search
identifies more spectra at a fixed FDR threshold than with either
the ungrouped or grouped approach. Cascade search thus provides a
general method for maximizing the number of identified spectra in
a statistically rigorous fashion.
Collapse
Affiliation(s)
- Attila Kertesz-Farkas
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States.,School of Mathematics and Statistics, University of Sydney, Camperdown, NSW 2006, Australia.,Department of Genome Sciences, Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Uri Keich
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States.,School of Mathematics and Statistics, University of Sydney, Camperdown, NSW 2006, Australia.,Department of Genome Sciences, Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States.,School of Mathematics and Statistics, University of Sydney, Camperdown, NSW 2006, Australia.,Department of Genome Sciences, Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
5
|
Kertész-Farkas A, Reiz B, Vera R, Myers MP, Pongor S. PTMTreeSearch: a novel two-stage tree-search algorithm with pruning rules for the identification of post-translational modification of proteins in MS/MS spectra. ACTA ACUST UNITED AC 2013; 30:234-41. [PMID: 24215026 DOI: 10.1093/bioinformatics/btt642] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
MOTIVATION Tandem mass spectrometry has become a standard tool for identifying post-translational modifications (PTMs) of proteins. Algorithmic searches for PTMs from tandem mass spectrum data (MS/MS) tend to be hampered by noisy data as well as by a combinatorial explosion of search space. This leads to high uncertainty and long search-execution times. RESULTS To address this issue, we present PTMTreeSearch, a new algorithm that uses a large database of known PTMs to identify PTMs from MS/MS data. For a given peptide sequence, PTMTreeSearch builds a computational tree wherein each path from the root to the leaves is labeled with the amino acids of a peptide sequence. Branches then represent PTMs. Various empirical tree pruning rules have been designed to decrease the search-execution time by eliminating biologically unlikely solutions. PTMTreeSearch first identifies a relatively small set of high confidence PTM types, and in a second stage, performs a more exhaustive search on this restricted set using relaxed search parameter settings. An analysis of experimental data shows that using the same criteria for false discovery, PTMTreeSearch annotates more peptides than the current state-of-the-art methods and PTM identification algorithms, and achieves this at roughly the same execution time. PTMTreeSearch is implemented as a plugable scoring function in the X!Tandem search engine. AVAILABILITY The source code of PTMTreeSearch and a demo server application can be found at http://net.icgeb.org/ptmtreesearch
Collapse
Affiliation(s)
- Attila Kertész-Farkas
- Protein Structure and Bioinformatics Group, International Centre for Genetic Engineering and Biotechnology, AREA Research Park, 99 Padriciano, Trieste, Italy, 34149, Institute of Biophysics, Biological Research Centre, Temesvari krt. 62, H-6727 Szeged, Hungary, Protein Networks Group, International Centre for Genetic Engineering and Biotechnology, AREA Research Park, Padriciano 99, 34149 Trieste, Italy and Faculty of Information Technology, Pázmány Péter Catholic University, Práter u. 50/a, H-1083 Budapest, Hungary
| | | | | | | | | |
Collapse
|
6
|
The Expression, Purification, and Characterization of a Ras Oncogene (Bras2) in Silkworm (Bombyx mori). Int J Genomics 2013; 2013:269609. [PMID: 23781494 PMCID: PMC3678442 DOI: 10.1155/2013/269609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2013] [Revised: 04/10/2013] [Accepted: 04/28/2013] [Indexed: 11/23/2022] Open
Abstract
The Ras oncogene of silkworm pupae (Bras2) may belong to the Ras superfamily. It shares 77% of its amino acid identity with teratocarcinoma oncogene 21 (TC21) related ras viral oncogene homolog-2 (R-Ras2) and possesses an identical core effector region. The mRNA of Bombyx mori Bras2 has 1412 bp. The open reading frame contains 603 bp, which encodes 200 amino acid residues. This recombinant BmBras2 protein was subsequently used as an antigen to raise a rabbit polyclonal antibody. Western blotting and real-time PCR analyses showed that BmBras2 was expressed during four developmental stages. The BmBras2 expression level was the highest in the pupae and was low in other life cycle stages. BmBras2 was expressed in all eight tested tissues, and it was highly expressed in the head, intestine, and epidermis. Subcellular localization studies indicated that BmBras2 was predominantly localized in the nuclei of Bm5 cells, although cytoplasmic staining was also observed to a lesser extent. A cell proliferation assay showed that rBmBras2 could stimulate the proliferation of hepatoma cells. The higher BmBras2 expression levels in the pupal stage, tissue expression patterns, and a cell proliferation assay indicated that BmBras2 promotes cell division and proliferation, most likely by influencing cell signal transduction.
Collapse
|
7
|
|
8
|
Wang P, Wilson SR. Mass spectrometry-based protein identification by integrating de novo sequencing with database searching. BMC Bioinformatics 2013; 14 Suppl 2:S24. [PMID: 23369017 PMCID: PMC3549845 DOI: 10.1186/1471-2105-14-s2-s24] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Mass spectrometry-based protein identification is a very challenging task. The main identification approaches include de novo sequencing and database searching. Both approaches have shortcomings, so an integrative approach has been developed. The integrative approach firstly infers partial peptide sequences, known as tags, directly from tandem spectra through de novo sequencing, and then puts these sequences into a database search to see if a close peptide match can be found. However the current implementation of this integrative approach has several limitations. Firstly, simplistic de novo sequencing is applied and only very short sequence tags are used. Secondly, most integrative methods apply an algorithm similar to BLAST to search for exact sequence matches and do not accommodate sequence errors well. Thirdly, by applying these methods the integrated de novo sequencing makes a limited contribution to the scoring model which is still largely based on database searching. RESULTS We have developed a new integrative protein identification method which can integrate de novo sequencing more efficiently into database searching. Evaluated on large real datasets, our method outperforms popular identification methods.
Collapse
Affiliation(s)
- Penghao Wang
- Prince of Wales Clinical School, University of New South Wales, Australia.
| | | |
Collapse
|
9
|
Cantel S, Brunel L, Ohara K, Enjalbal C, Martinez J, Vasseur JJ, Smietana M. An innovative strategy for sulfopeptides analysis using MALDI-TOF MS reflectron positive ion mode. Proteomics 2012; 12:2247-57. [DOI: 10.1002/pmic.201100525] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Sonia Cantel
- Institut des Biomolécules Max Mousseron (IBMM) UMR 5247 CNRS-Université Montpellier 1 et 2; Montpellier France
| | - Luc Brunel
- Institut des Biomolécules Max Mousseron (IBMM) UMR 5247 CNRS-Université Montpellier 1 et 2; Montpellier France
| | - Keiichiro Ohara
- Institut des Biomolécules Max Mousseron (IBMM) UMR 5247 CNRS-Université Montpellier 1 et 2; Montpellier France
| | - Christine Enjalbal
- Institut des Biomolécules Max Mousseron (IBMM) UMR 5247 CNRS-Université Montpellier 1 et 2; Montpellier France
| | - Jean Martinez
- Institut des Biomolécules Max Mousseron (IBMM) UMR 5247 CNRS-Université Montpellier 1 et 2; Montpellier France
| | - Jean-Jacques Vasseur
- Institut des Biomolécules Max Mousseron (IBMM) UMR 5247 CNRS-Université Montpellier 1 et 2; Montpellier France
| | - Michael Smietana
- Institut des Biomolécules Max Mousseron (IBMM) UMR 5247 CNRS-Université Montpellier 1 et 2; Montpellier France
| |
Collapse
|
10
|
Kinsinger CR, Apffel J, Baker M, Bian X, Borchers CH, Bradshaw R, Brusniak MY, Chan DW, Deutsch EW, Domon B, Gorman J, Grimm R, Hancock W, Hermjakob H, Horn D, Hunter C, Kolar P, Kraus HJ, Langen H, Linding R, Moritz RL, Omenn GS, Orlando R, Pandey A, Ping P, Rahbar A, Rivers R, Seymour SL, Simpson RJ, Slotta D, Smith RD, Stein SE, Tabb DL, Tagle D, Yates JR, Rodriguez H. Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam principles). Proteomics Clin Appl 2012; 5:580-9. [PMID: 22213554 DOI: 10.1002/prca.201100097] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Policies supporting the rapid and open sharing of proteomic data are being implemented by the leading journals in the field. The proteomics community is taking steps to ensure that data are made publicly accessible and are of high quality, a challenging task that requires the development and deployment of methods for measuring and documenting data quality metrics. On September 18, 2010, the U.S. National Cancer Institute (NCI) convened the "International Workshop on Proteomic Data Quality Metrics" in Sydney, Australia, to identify and address issues facing the development and use of such methods for open access proteomics data. The stakeholders at the workshop enumerated the key principles underlying a framework for data quality assessment in mass spectrometry data that will meet the needs of the research community, journals, funding agencies, and data repositories. Attendees discussed and agreed up on two primary needs for the wide use of quality metrics: (i) an evolving list of comprehensive quality metrics and (ii) standards accompanied by software analytics. Attendees stressed the importance of increased education and training programs to promote reliable protocols in proteomics. This workshop report explores the historic precedents, key discussions, and necessary next steps to enhance the quality of open access data. By agreement, this article is published simultaneously in Proteomics, Proteomics Clinical Applications, Journal of Proteome Research, and Molecular and Cellular Proteomics, as a public service to the research community. The peer review process was a coordinated effort conducted by a panel of referees selected by the journals.
Collapse
Affiliation(s)
- Christopher R Kinsinger
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Dupré M, Coffinier Y, Boukherroub R, Cantel S, Martinez J, Enjalbal C. Laser desorption ionization mass spectrometry of protein tryptic digests on nanostructured silicon plates. J Proteomics 2012; 75:1973-90. [DOI: 10.1016/j.jprot.2011.12.039] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Revised: 12/19/2011] [Accepted: 12/27/2011] [Indexed: 10/14/2022]
|
12
|
Kinsinger CR, Apffel J, Baker M, Bian X, Borchers CH, Bradshaw R, Brusniak MY, Chan DW, Deutsch EW, Domon B, Gorman J, Grimm R, Hancock W, Hermjakob H, Horn D, Hunter C, Kolar P, Kraus HJ, Langen H, Linding R, Moritz RL, Omenn GS, Orlando R, Pandey A, Ping P, Rahbar A, Rivers R, Seymour SL, Simpson RJ, Slotta D, Smith RD, Stein SE, Tabb DL, Tagle D, Yates JR, Rodriguez H. Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam Principles). J Proteome Res 2012; 11:1412-9. [PMID: 22053864 PMCID: PMC3272102 DOI: 10.1021/pr201071t] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Policies supporting the rapid and open sharing of proteomic data are being implemented by the leading journals in the field. The proteomics community is taking steps to ensure that data are made publicly accessible and are of high quality, a challenging task that requires the development and deployment of methods for measuring and documenting data quality metrics. On September 18, 2010, the U.S. National Cancer Institute (NCI) convened the "International Workshop on Proteomic Data Quality Metrics" in Sydney, Australia, to identify and address issues facing the development and use of such methods for open access proteomics data. The stakeholders at the workshop enumerated the key principles underlying a framework for data quality assessment in mass spectrometry data that will meet the needs of the research community, journals, funding agencies, and data repositories. Attendees discussed and agreed up on two primary needs for the wide use of quality metrics: (1) an evolving list of comprehensive quality metrics and (2) standards accompanied by software analytics. Attendees stressed the importance of increased education and training programs to promote reliable protocols in proteomics. This workshop report explores the historic precedents, key discussions, and necessary next steps to enhance the quality of open access data. By agreement, this article is published simultaneously in the Journal of Proteome Research, Molecular and Cellular Proteomics, Proteomics, and Proteomics Clinical Applications as a public service to the research community. The peer review process was a coordinated effort conducted by a panel of referees selected by the journals.
Collapse
Affiliation(s)
- Christopher R Kinsinger
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, United States.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Dupré M, Cantel S, Martinez J, Enjalbal C. Occurrence of C-terminal residue exclusion in peptide fragmentation by ESI and MALDI tandem mass spectrometry. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2012; 23:330-346. [PMID: 22095165 DOI: 10.1007/s13361-011-0254-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2011] [Revised: 09/14/2011] [Accepted: 09/14/2011] [Indexed: 05/31/2023]
Abstract
By screening a data set of 392 synthetic peptides MS/MS spectra, we found that a known C-terminal rearrangement was unexpectedly frequently occurring from monoprotonated molecular ions in both ESI and MALDI tandem mass spectrometry upon low and high energy collision activated dissociations with QqTOF and TOF/TOF mass analyzer configuration, respectively. Any residue localized at the C-terminal carboxylic acid end, even a basic one, was lost, provided that a basic amino acid such arginine and to a lesser extent histidine and lysine was present in the sequence leading to a fragment ion, usually depicted as (b(n-1) + H(2)O) ion, corresponding to a shortened non-scrambled peptide chain. Far from being an epiphenomenon, such a residue exclusion from the peptide chain C-terminal extremity gave a fragment ion that was the base peak of the MS/MS spectrum in certain cases. Within the frame of the mobile proton model, the ionizing proton being sequestered onto the basic amino acid side chain, it is known that the charge directed fragmentation mechanism involved the C-terminal carboxylic acid function forming an anhydride intermediate structure. The same mechanism was also demonstrated from cationized peptides. To confirm such assessment, we have prepared some of the peptides that displayed such C-terminal residue exclusion as a C-terminal backbone amide. As expected in this peptide amide series, the production of truncated chains was completely suppressed. Besides, multiply charged molecular ions of all peptides recorded in ESI mass spectrometry did not undergo such fragmentation validating that any mobile ionizing proton will prevent such a competitive C-terminal backbone rearrangement. Among all well-known nondirect sequence fragment ions issued from non specific loss of neutral molecules (mainly H(2)O and NH(3)) and multiple backbone amide ruptures (b-type internal ions), the described C-terminal residue exclusion is highly identifiable giving raise to a single fragment ion in the high mass range of the MS/MS spectra. The mass difference between this signal and the protonated molecular ion corresponds to the mass of the C-terminal residue. It allowed a straightforward identification of the amino acid positioned at this extremity. It must be emphasized that a neutral residue loss can be misattributed to the formation of a y(m-1) ion, i.e., to the loss of the N-terminal residue following the a(1)-y(m-1) fragmentation channel. Extreme caution must be adopted when reading the direct sequence ion on the positive ion MS/MS spectra of singly charged peptides not to mix up the attribution of the N- and C-terminal amino acids. Although such peculiar fragmentation behavior is of obvious interest for de novo peptide sequencing, it can also be exploited in proteomics, especially for studies involving digestion protocols carried out with proteolytic enzymes other than trypsin (Lys-N, Glu-C, and Asp-N) that produce arginine-containing peptides.
Collapse
Affiliation(s)
- Mathieu Dupré
- Institut des Biomolécules Max Mousseron (IBMM), UMR 5247, Bâtiment Chimie (17), Université Montpellier 2, Universités Montpellier 1 et 2 - CNRS, Place Eugène Bataillon, 34095 Montpellier Cedex 5, France
| | | | | | | |
Collapse
|
14
|
Chierici M, Albanese D, Franceschi P, Furlanello C. TOFwave: reproducibility in biomarker discovery from time-of-flight mass spectrometry data. MOLECULAR BIOSYSTEMS 2012; 8:2845-9. [DOI: 10.1039/c2mb25223f] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
15
|
Kinsinger CR, Apffel J, Baker M, Bian X, Borchers CH, Bradshaw R, Brusniak MY, Chan DW, Deutsch EW, Domon B, Gorman J, Grimm R, Hancock W, Hermjakob H, Horn D, Hunter C, Kolar P, Kraus HJ, Langen H, Linding R, Moritz RL, Omenn GS, Orlando R, Pandey A, Ping P, Rahbar A, Rivers R, Seymour SL, Simpson RJ, Slotta D, Smith RD, Stein SE, Tabb DL, Tagle D, Yates JR, Rodriguez H. Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam principles). Proteomics 2011; 12:11-20. [PMID: 22069307 DOI: 10.1002/pmic.201100562] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2011] [Accepted: 10/27/2011] [Indexed: 11/10/2022]
Abstract
Policies supporting the rapid and open sharing of proteomic data are being implemented by the leading journals in the field. The proteomics community is taking steps to ensure that data are made publicly accessible and are of high quality, a challenging task that requires the development and deployment of methods for measuring and documenting data quality metrics. On September 18, 2010, the U.S. National Cancer Institute (NCI) convened the "International Workshop on Proteomic Data Quality Metrics" in Sydney, Australia, to identify and address issues facing the development and use of such methods for open access proteomics data. The stakeholders at the workshop enumerated the key principles underlying a framework for data quality assessment in mass spectrometry data that will meet the needs of the research community, journals, funding agencies, and data repositories. Attendees discussed and agreed upon two primary needs for the wide use of quality metrics: (i) an evolving list of comprehensive quality metrics and (ii) standards accompanied by software analytics. Attendees stressed the importance of increased education and training programs to promote reliable protocols in proteomics. This workshop report explores the historic precedents, key discussions, and necessary next steps to enhance the quality of open access data. By agreement, this article is published simultaneously in Proteomics, Proteomics Clinical Applications, Journal of Proteome Research, and Molecular and Cellular Proteomics, as a public service to the research community. The peer review process was a coordinated effort conducted by a panel of referees selected by the journals.
Collapse
Affiliation(s)
- Christopher R Kinsinger
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Bern MW, Kil YJ. Two-dimensional target decoy strategy for shotgun proteomics. J Proteome Res 2011; 10:5296-301. [PMID: 22010998 DOI: 10.1021/pr200780j] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The target-decoy approach to estimating and controlling false discovery rate (FDR) has become a de facto standard in shotgun proteomics, and it has been applied at both the peptide-to-spectrum match (PSM) and protein levels. Current bioinformatics methods control either the PSM- or the protein-level FDR, but not both. In order to obtain the most reliable information from their data, users must employ one method when the number of tandem mass spectra exceeds the number of proteins in the database and another method when the reverse is true. Here we propose a simple variation of the standard target-decoy strategy that estimates and controls PSM and protein FDRs simultaneously, regardless of the relative numbers of spectra and proteins. We demonstrate that even if the final goal is a list of PSMs with a fixed low FDR and not a list of protein identifications, the proposed two-dimensional strategy offers advantages over a pure PSM-level strategy.
Collapse
Affiliation(s)
- Marshall W Bern
- Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, California 94304, United States.
| | | |
Collapse
|
17
|
Kinsinger CR, Apffel J, Baker M, Bian X, Borchers CH, Bradshaw R, Brusniak MY, Chan DW, Deutsch EW, Domon B, Gorman J, Grimm R, Hancock W, Hermjakob H, Horn D, Hunter C, Kolar P, Kraus HJ, Langen H, Linding R, Moritz RL, Omenn GS, Orlando R, Pandey A, Ping P, Rahbar A, Rivers R, Seymour SL, Simpson RJ, Slotta D, Smith RD, Stein SE, Tabb DL, Tagle D, Yates JR, Rodriguez H. Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam Principles). Mol Cell Proteomics 2011; 10:O111.015446. [PMID: 22052993 DOI: 10.1074/mcp.o111.015446] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Policies supporting the rapid and open sharing of proteomic data are being implemented by the leading journals in the field. The proteomics community is taking steps to ensure that data are made publicly accessible and are of high quality, a challenging task that requires the development and deployment of methods for measuring and documenting data quality metrics. On September 18, 2010, the United States National Cancer Institute convened the "International Workshop on Proteomic Data Quality Metrics" in Sydney, Australia, to identify and address issues facing the development and use of such methods for open access proteomics data. The stakeholders at the workshop enumerated the key principles underlying a framework for data quality assessment in mass spectrometry data that will meet the needs of the research community, journals, funding agencies, and data repositories. Attendees discussed and agreed up on two primary needs for the wide use of quality metrics: 1) an evolving list of comprehensive quality metrics and 2) standards accompanied by software analytics. Attendees stressed the importance of increased education and training programs to promote reliable protocols in proteomics. This workshop report explores the historic precedents, key discussions, and necessary next steps to enhance the quality of open access data. By agreement, this article is published simultaneously in the Journal of Proteome Research, Molecular and Cellular Proteomics, Proteomics, and Proteomics Clinical Applications as a public service to the research community. The peer review process was a coordinated effort conducted by a panel of referees selected by the journals.
Collapse
Affiliation(s)
- Christopher R Kinsinger
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Zhu P, Bowden P, Zhang D, Marshall JG. Mass spectrometry of peptides and proteins from human blood. MASS SPECTROMETRY REVIEWS 2011; 30:685-732. [PMID: 24737629 DOI: 10.1002/mas.20291] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2008] [Revised: 12/09/2009] [Accepted: 01/19/2010] [Indexed: 06/03/2023]
Abstract
It is difficult to convey the accelerating rate and growing importance of mass spectrometry applications to human blood proteins and peptides. Mass spectrometry can rapidly detect and identify the ionizable peptides from the proteins in a simple mixture and reveal many of their post-translational modifications. However, blood is a complex mixture that may contain many proteins first expressed in cells and tissues. The complete analysis of blood proteins is a daunting task that will rely on a wide range of disciplines from physics, chemistry, biochemistry, genetics, electromagnetic instrumentation, mathematics and computation. Therefore the comprehensive discovery and analysis of blood proteins will rank among the great technical challenges and require the cumulative sum of many of mankind's scientific achievements together. A variety of methods have been used to fractionate, analyze and identify proteins from blood, each yielding a small piece of the whole and throwing the great size of the task into sharp relief. The approaches attempted to date clearly indicate that enumerating the proteins and peptides of blood can be accomplished. There is no doubt that the mass spectrometry of blood will be crucial to the discovery and analysis of proteins, enzyme activities, and post-translational processes that underlay the mechanisms of disease. At present both discovery and quantification of proteins from blood are commonly reaching sensitivities of ∼1 ng/mL.
Collapse
Affiliation(s)
- Peihong Zhu
- Department of Chemistry and Biology, Ryerson University, 350 Victoria Street, Toronto, Ontario, Canada M5B 2K3
| | | | | | | |
Collapse
|
19
|
Llerena-Suster CR, Obregón WD, Trejo SA, Morcelle SR. Papain Purification Insights: Monitoring by Electrophoretic Approaches and MALDI-TOF Peptide Mass Fingerprint Analyses. ANAL LETT 2011. [DOI: 10.1080/00032719.2010.546022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
20
|
Li Y, Hao P, Zhang S, Li Y. Feature-matching pattern-based support vector machines for robust peptide mass fingerprinting. Mol Cell Proteomics 2011; 10:M110.005785. [PMID: 21775775 DOI: 10.1074/mcp.m110.005785] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Peptide mass fingerprinting, regardless of becoming complementary to tandem mass spectrometry for protein identification, is still the subject of in-depth study because of its higher sample throughput, higher level of specificity for single peptides and lower level of sensitivity to unexpected post-translational modifications compared with tandem mass spectrometry. In this study, we propose, implement and evaluate a uniform approach using support vector machines to incorporate individual concepts and conclusions for accurate PMF. We focus on the inherent attributes and critical issues of the theoretical spectrum (peptides), the experimental spectrum (peaks) and spectrum (masses) alignment. Eighty-one feature-matching patterns derived from cleavage type, uniqueness and variable masses of theoretical peptides together with the intensity rank of experimental peaks were proposed to characterize the matching profile of the peptide mass fingerprinting procedure. We developed a new strategy including the participation of matched peak intensity redistribution to handle shared peak intensities and 440 parameters were generated to digitalize each feature-matching pattern. A high performance for an evaluation data set of 137 items was finally achieved by the optimal multi-criteria support vector machines approach, with 491 final features out of a feature vector of 35,640 normalized features through cross training and validating a publicly available "gold standard" peptide mass fingerprinting data set of 1733 items. Compared with the Mascot, MS-Fit, ProFound and Aldente algorithms commonly used for MS-based protein identification, the feature-matching patterns algorithm has a greater ability to clearly separate correct identifications and random matches with the highest values for sensitivity (82%), precision (97%) and F1-measure (89%) of protein identification. Several conclusions reached via this research make general contributions to MS-based protein identification. Firstly, inherent attributes showed comparable or even greater robustness than other explicit. As an inherent attribute of an experimental spectrum, peak intensity should receive considerable attention during protein identification. Secondly, alignment between intense experimental peaks and properly digested, unique or non-modified theoretical peptides is very likely to occur in positive peptide mass fingerprinting. Finally, normalization by several types of harmonic factors, including missed cleavages and mass modification, can make important contributions to the performance of the procedure.
Collapse
Affiliation(s)
- Youyuan Li
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, P R China
| | | | | | | |
Collapse
|
21
|
Kil YJ, Becker C, Sandoval W, Goldberg D, Bern M. Preview: a program for surveying shotgun proteomics tandem mass spectrometry data. Anal Chem 2011; 83:5259-67. [PMID: 21619057 DOI: 10.1021/ac200609a] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Database search programs for peptide identification by tandem mass spectrometry ask their users to set various parameters, including precursor and fragment mass tolerances, digestion specificity, and allowed types of modifications. Even proteomics experts with detailed knowledge of their samples may find it difficult to make these choices without significant investigation, and poor choices can lead to missed identifications and misleading results. Here we describe a program called Preview that analyzes a set of mass spectra for mass errors, digestion specificity, and known and unknown modifications, thereby facilitating parameter selection. Moreover, Preview optionally recalibrates mass over charge measurements, leading to further improvement in identification results. In a study of Bruton's tyrosine kinase, we find that the use of Preview improved the number of confidently identified mass spectra and phosphorylation sites by about 50%.
Collapse
Affiliation(s)
- Yong J Kil
- Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, California 94304, USA
| | | | | | | | | |
Collapse
|
22
|
Noy K, Towfic F, Wittenberg GM, Fasulo D. Shape-Based Feature Matching Improves Protein Identification via LC-MS and Tandem MS. J Comput Biol 2011; 18:547-57. [DOI: 10.1089/cmb.2010.0155] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Affiliation(s)
- Karin Noy
- Integrated Data Systems, Siemens Corporate Research, Princeton, New Jersey
| | - Fadi Towfic
- Department of Computer Science, Iowa State University Ames, Iowa
| | | | - Daniel Fasulo
- Integrated Data Systems, Siemens Corporate Research, Princeton, New Jersey
| |
Collapse
|
23
|
Bern M, Kil YJ. Comment on "Unbiased statistical analysis for multi-stage proteomic search strategies". J Proteome Res 2011; 10:2123-7. [PMID: 21288048 DOI: 10.1021/pr101143m] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Everett et al. recently reported on a statistical bias that arises in the target-decoy approach to false discovery rate estimation in two-pass proteomics search strategies as exemplified by X!Tandem. This bias can cause serious underestimation of the false discovery rate. We argue here that the "unbiased" solution proposed by Everett et al., however, is also biased and under certain circumstances can also result in a serious underestimate of the FDR, especially at the protein level.
Collapse
Affiliation(s)
- Marshall Bern
- Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, California 94304, USA.
| | | |
Collapse
|
24
|
Kertész-Farkas A, Reiz B, Myers MP, Pongor S. PTMSearch: A Greedy Tree Traversal Algorithm for Finding Protein Post-Translational Modifications in Tandem Mass Spectra. ACTA ACUST UNITED AC 2011. [DOI: 10.1007/978-3-642-23783-6_11] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
25
|
Källberg M, Lu H. An improved machine learning protocol for the identification of correct Sequest search results. BMC Bioinformatics 2010; 11:591. [PMID: 21138573 PMCID: PMC3013103 DOI: 10.1186/1471-2105-11-591] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2010] [Accepted: 12/07/2010] [Indexed: 11/18/2022] Open
Abstract
Background Mass spectrometry has become a standard method by which the proteomic profile of cell or tissue samples is characterized. To fully take advantage of tandem mass spectrometry (MS/MS) techniques in large scale protein characterization studies robust and consistent data analysis procedures are crucial. In this work we present a machine learning based protocol for the identification of correct peptide-spectrum matches from Sequest database search results, improving on previously published protocols. Results The developed model improves on published machine learning classification procedures by 6% as measured by the area under the ROC curve. Further, we show how the developed model can be presented as an interpretable tree of additive rules, thereby effectively removing the 'black-box' notion often associated with machine learning classifiers, allowing for comparison with expert rule-of-thumb. Finally, a method for extending the developed peptide identification protocol to give probabilistic estimates of the presence of a given protein is proposed and tested. Conclusions We demonstrate the construction of a high accuracy classification model for Sequest search results from MS/MS spectra obtained by using the MALDI ionization. The developed model performs well in identifying correct peptide-spectrum matches and is easily extendable to the protein identification problem. The relative ease with which additional experimental parameters can be incorporated into the classification framework, to give additional discriminatory power, allows for future tailoring of the model to take advantage of information from specific instrument set-ups.
Collapse
Affiliation(s)
- Morten Källberg
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, USA
| | | |
Collapse
|
26
|
Bythell BJ, Csonka IP, Suhai S, Barofsky DF, Paizs B. Gas-phase structure and fragmentation pathways of singly protonated peptides with N-terminal arginine. J Phys Chem B 2010; 114:15092-105. [PMID: 20973555 PMCID: PMC3664278 DOI: 10.1021/jp108452y] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The gas-phase structures and fragmentation pathways of the singly protonated peptide arginylglycylaspartic acid (RGD) are investigated by means of collision-induced-dissociation (CID) and detailed molecular mechanics and density functional theory (DFT) calculations. It is demonstrated that despite the ionizing proton being strongly sequestered at the guanidine group, protonated RGD can easily be fragmented on charge directed fragmentation pathways. This is due to facile mobilization of the C-terminal or aspartic acid COOH protons thereby generating salt-bridge (SB) stabilized structures. These SB intermediates can directly fragment to generate b(2) ions or facilely rearrange to form anhydrides from which both b(2) and b(2)+H(2)O fragments can be formed. The salt-bridge stabilized and anhydride transition structures (TSs) necessary to form b(2) and b(2)+H(2)O are much lower in energy than their traditional charge solvated counterparts. These mechanisms provide compelling evidence of the role of SB and anhydride structures in protonated peptide fragmentation which complements and supports our recent findings for tryptic systems (Bythell, B. J.; Suhai, S.; Somogyi, A.; Paizs, B. J. Am. Chem. Soc. 2009, 131, 14057-14065.). In addition to these findings we also report on the mechanisms for the formation of the b(1) ion, neutral loss (H(2)O, NH(3), guanidine) fragment ions, and the d(3) ion.
Collapse
Affiliation(s)
- Benjamin J. Bythell
- Computational Proteomics Group, German Cancer Research Center, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
- Department of Chemistry, Oregon State University, Corvallis, Oregon, USA
| | - István P. Csonka
- Department of Molecular Biophysics, German Cancer Research Center, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Sándor Suhai
- Department of Molecular Biophysics, German Cancer Research Center, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | | | - Béla Paizs
- Computational Proteomics Group, German Cancer Research Center, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
- Department of Molecular Biophysics, German Cancer Research Center, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| |
Collapse
|
27
|
Chi-square comparison of tryptic peptide-to-protein distributions of tandem mass spectrometry from blood with those of random expectation. Anal Biochem 2010; 409:189-94. [PMID: 20977879 DOI: 10.1016/j.ab.2010.10.027] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Revised: 09/19/2010] [Accepted: 10/18/2010] [Indexed: 11/23/2022]
Abstract
Proteomics uses tandem mass spectrometers and correlation algorithms to match peptides and their fragment spectra to amino acid sequences. The replication of multiple liquid chromatography experiments with electrospray ionization of peptides and tandem mass spectrometry (LC-ESI-MS/MS) produces large sets of MS/MS spectra. There is a need to assess the quality of large sets of experimental results by statistical comparison with that of random expectation. Classical frequency-based statistics such as goodness-of-fit tests for peptide-to-protein distributions could be used to calculate the probability that an entire set of experimental results has arisen by random chance. The frequency distributions of authentic MS/MS spectra from human blood were compared with those of false positive MS/MS spectra generated by a computer, or instrument noise, using the chi-square test. Here the mechanics of the chi-square test to compare the results in toto from a set of LC-ESI-MS/MS experiments with those of random expectation is detailed. The chi-square analysis of authentic spectra demonstrates unambiguously that the analysis of blood proteins separated by partition chromatography prior to tryptic digestions has a low probability that the cumulative peptide-to-protein distribution is the same as that of random or noise false positive spectra.
Collapse
|
28
|
Barbarini N, Magni P. Accurate peak list extraction from proteomic mass spectra for identification and profiling studies. BMC Bioinformatics 2010; 11:518. [PMID: 20950483 PMCID: PMC2967564 DOI: 10.1186/1471-2105-11-518] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2010] [Accepted: 10/16/2010] [Indexed: 08/30/2023] Open
Abstract
Background Mass spectrometry is an essential technique in proteomics both to identify the proteins of a biological sample and to compare proteomic profiles of different samples. In both cases, the main phase of the data analysis is the procedure to extract the significant features from a mass spectrum. Its final output is the so-called peak list which contains the mass, the charge and the intensity of every detected biomolecule. The main steps of the peak list extraction procedure are usually preprocessing, peak detection, peak selection, charge determination and monoisotoping operation. Results This paper describes an original algorithm for peak list extraction from low and high resolution mass spectra. It has been developed principally to improve the precision of peak extraction in comparison to other reference algorithms. It contains many innovative features among which a sophisticated method for managing the overlapping isotopic distributions. Conclusions The performances of the basic version of the algorithm and of its optional functionalities have been evaluated in this paper on both SELDI-TOF, MALDI-TOF and ESI-FTICR ECD mass spectra. Executable files of MassSpec, a MATLAB implementation of the peak list extraction procedure for Windows and Linux systems, can be downloaded free of charge for nonprofit institutions from the following web site: http://aimed11.unipv.it/MassSpec
Collapse
Affiliation(s)
- Nicola Barbarini
- Dipartimento di Informatica e Sistemistica, Università degli Studi di Pavia, Pavia, Italy.
| | | |
Collapse
|
29
|
McHugh LC, Arthur JW. Harvest: an open-source tool for the validation and improvement of peptide identification metrics and fragmentation exploration. BMC Bioinformatics 2010; 11:448. [PMID: 20815925 PMCID: PMC2941693 DOI: 10.1186/1471-2105-11-448] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2010] [Accepted: 09/06/2010] [Indexed: 01/21/2023] Open
Abstract
Background Protein identification using mass spectrometry is an important tool in many areas of the life sciences, and in proteomics research in particular. Increasing the number of proteins correctly identified is dependent on the ability to include new knowledge about the mass spectrometry fragmentation process, into computational algorithms designed to separate true matches of peptides to unidentified mass spectra from spurious matches. This discrimination is achieved by computing a function of the various features of the potential match between the observed and theoretical spectra to give a numerical approximation of their similarity. It is these underlying "metrics" that determine the ability of a protein identification package to maximise correct identifications while limiting false discovery rates. There is currently no software available specifically for the simple implementation and analysis of arbitrary novel metrics for peptide matching and for the exploration of fragmentation patterns for a given dataset. Results We present Harvest: an open source software tool for analysing fragmentation patterns and assessing the power of a new piece of information about the MS/MS fragmentation process to more clearly differentiate between correct and random peptide assignments. We demonstrate this functionality using data metrics derived from the properties of individual datasets in a peptide identification context. Using Harvest, we demonstrate how the development of such metrics may improve correct peptide assignment confidence in the context of a high-throughput proteomics experiment and characterise properties of peptide fragmentation. Conclusions Harvest provides a simple framework in C++ for analysing and prototyping metrics for peptide matching, the core of the protein identification problem. It is not a protein identification package and answers a different research question to packages such as Sequest, Mascot, X!Tandem, and other protein identification packages. It does not aim to maximise the number of assigned peptides from a set of unknown spectra, but instead provides a method by which researchers can explore fragmentation properties and assess the power of novel metrics for peptide matching in the context of a given experiment. Metrics developed using Harvest may then become candidates for later integration into protein identification packages.
Collapse
Affiliation(s)
- Leo C McHugh
- Discipline of Medicine, Sydney Medical School, University of Sydney, Sydney, Australia
| | | |
Collapse
|
30
|
Wang P, Yang P, Arthur J, Yang JYH. A dynamic wavelet-based algorithm for pre-processing tandem mass spectrometry data. Bioinformatics 2010; 26:2242-9. [PMID: 20628072 DOI: 10.1093/bioinformatics/btq403] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Mass spectrometry (MS)-based proteomics is one of the most commonly used research techniques for identifying and characterizing proteins in biological and medical research. The identification of a protein is the critical first step in elucidating its biological function. Successful protein identification depends on various interrelated factors, including effective analysis of MS data generated in a proteomic experiment. This analysis comprises several stages, often combined in a pipeline or workflow. The first component of the analysis is known as spectra pre-processing. In this component, the raw data generated by the mass spectrometer is processed to eliminate noise and identify the mass-to-charge ratio (m/z) and intensity for the peaks in the spectrum corresponding to the presence of certain peptides or peptide fragments. Since all downstream analyses depend on the pre-processed data, effective pre-processing is critical to protein identification and characterization. There is a critical need for more robust pre-processing algorithms that perform well on tandem mass spectra under a variety of different conditions and can be easily integrated into sophisticated data analysis pipelines for practical wet-lab applications. RESULT We have developed a new pre-processing algorithm. Based on wavelet theory, our method uses a dynamic peak model to identify peaks. It is designed to be easily integrated into a complete proteomic analysis workflow. We compared the method with other available algorithms using a reference library of raw MS and tandem MS spectra with known protein composition information. Our pre-processing algorithm results in the identification of significantly more peptides and proteins in the downstream analysis for a given false discovery rate. AVAILABILITY Software available at: http://www.maths.usyd.edu.au/u/penghao/index.html.
Collapse
Affiliation(s)
- Penghao Wang
- School of Mathematics and Statistics, University of Sydney, Sydney, Australia.
| | | | | | | |
Collapse
|
31
|
Bowden P, Pendrak V, Zhu P, Marshall JG. Meta sequence analysis of human blood peptides and their parent proteins. J Proteomics 2010; 73:1163-75. [PMID: 20170764 DOI: 10.1016/j.jprot.2010.02.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2009] [Revised: 01/23/2010] [Accepted: 02/09/2010] [Indexed: 11/19/2022]
Abstract
Sequence analysis of the blood peptides and their qualities will be key to understanding the mechanisms that contribute to error in LC-ESI-MS/MS. Analysis of peptides and their proteins at the level of sequences is much more direct and informative than the comparison of disparate accession numbers. A portable database of all blood peptide and protein sequences with descriptor fields and gene ontology terms might be useful for designing immunological or MRM assays from human blood. The results of twelve studies of human blood peptides and/or proteins identified by LC-MS/MS and correlated against a disparate array of genetic libraries were parsed and matched to proteins from the human ENSEMBL, SwissProt and RefSeq databases by SQL. The reported peptide and protein sequences were organized into an SQL database with full protein sequences and up to five unique peptides in order of prevalence along with the peptide count for each protein. Structured query language or BLAST was used to acquire descriptive information in current databases. Sampling error at the level of peptides is the largest source of disparity between groups. Chi Square analysis of peptide to protein distributions confirmed the significant agreement between groups on identified proteins.
Collapse
Affiliation(s)
- Peter Bowden
- Department of Chemistry and Biology, Ryerson University, Toronto, Canada
| | | | | | | |
Collapse
|
32
|
D'Alessandro A, Liumbruno G, Grazzini G, Pupella S, Lombardini L, Zolla L. Umbilical cord blood stem cells: Towards a proteomic approach. J Proteomics 2010; 73:468-82. [DOI: 10.1016/j.jprot.2009.06.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2009] [Revised: 06/04/2009] [Accepted: 06/16/2009] [Indexed: 02/07/2023]
|
33
|
Jain R, Wagner M. Kolmogorov−Smirnov Scores and Intrinsic Mass Tolerances for Peptide Mass Fingerprinting. J Proteome Res 2009; 9:737-42. [DOI: 10.1021/pr9005525] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Rachana Jain
- Department of Biomedical Engineering, University of Cincinnati, Cincinnati, Ohio 45219, Division of Biomedical Informatics, Cincinnati Children’s Hospital Research Foundation, Cincinnati, Ohio 45229, and Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio 45229
| | - Michael Wagner
- Department of Biomedical Engineering, University of Cincinnati, Cincinnati, Ohio 45219, Division of Biomedical Informatics, Cincinnati Children’s Hospital Research Foundation, Cincinnati, Ohio 45229, and Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio 45229
| |
Collapse
|
34
|
Bythell BJ, Suhai S, Somogyi Á, Paizs B. Proton-Driven Amide Bond-Cleavage Pathways of Gas-Phase Peptide Ions Lacking Mobile Protons. J Am Chem Soc 2009; 131:14057-65. [DOI: 10.1021/ja903883z] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Benjamin J. Bythell
- Department of Molecular Biophysics, Im Neuenheimer Feld 580, German Cancer Research Center, 69120 Heidelberg, Germany, and Department of Chemistry, University of Arizona, Tucson, Arizona 85721
| | - Sándor Suhai
- Department of Molecular Biophysics, Im Neuenheimer Feld 580, German Cancer Research Center, 69120 Heidelberg, Germany, and Department of Chemistry, University of Arizona, Tucson, Arizona 85721
| | - Árpád Somogyi
- Department of Molecular Biophysics, Im Neuenheimer Feld 580, German Cancer Research Center, 69120 Heidelberg, Germany, and Department of Chemistry, University of Arizona, Tucson, Arizona 85721
| | - Béla Paizs
- Department of Molecular Biophysics, Im Neuenheimer Feld 580, German Cancer Research Center, 69120 Heidelberg, Germany, and Department of Chemistry, University of Arizona, Tucson, Arizona 85721
| |
Collapse
|
35
|
Bowden P, Beavis R, Marshall J. Tandem mass spectrometry of human tryptic blood peptides calculated by a statistical algorithm and captured by a relational database with exploration by a general statistical analysis system. J Proteomics 2009; 73:103-11. [PMID: 19703602 DOI: 10.1016/j.jprot.2009.08.004] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Revised: 08/04/2009] [Accepted: 08/17/2009] [Indexed: 01/23/2023]
Abstract
A goodness of fit test may be used to assign tandem mass spectra of peptides to amino acid sequences and to directly calculate the expected probability of mis-identification. The product of the peptide expectation values directly yields the probability that the parent protein has been mis-identified. A relational database could capture the mass spectral data, the best fit results, and permit subsequent calculations by a general statistical analysis system. The many files of the Hupo blood protein data correlated by X!TANDEM against the proteins of ENSEMBL were collected into a relational database. A redundant set of 247,077 proteins and peptides were correlated by X!TANDEM, and that was collapsed to a set of 34,956 peptides from 13,379 distinct proteins. About 6875 distinct proteins were only represented by a single distinct peptide, 2866 proteins showed 2 distinct peptides, and 3454 proteins showed at least three distinct peptides by X!TANDEM. More than 99% of the peptides were associated with proteins that had cumulative expectation values, i.e. probability of false positive identification, of one in one hundred or less. The distribution of peptides per protein from X!TANDEM was significantly different than those expected from random assignment of peptides.
Collapse
Affiliation(s)
- Peter Bowden
- Department of Chemistry and Biology, Ryerson University, 350 Victoria Street, Toronto, ON, Canada M5B 2K3
| | | | | |
Collapse
|
36
|
Mead JA, Bianco L, Bessant C. Recent developments in public proteomic MS repositories and pipelines. Proteomics 2009; 9:861-81. [PMID: 19212957 DOI: 10.1002/pmic.200800553] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This article provides an overview of publicly available proteomic data repositories in a single document with a particular focus on the latest developments, many of which are not announced through traditional publications. The review is intended to inform the proteomics practitioner of the options for storage and dissemination of their MS/MS data in the public domain, and to help those who want to mine proteomic data generated by others. The latter area has arguably seen the most development in recent times, as repositories have sprouted new tools for data analysis, visualisation and experimental design. We also highlight key biological datasets available at each repository, including standard datasets. Finally, we touch upon areas of significant challenge and future directions.
Collapse
|
37
|
Shenar N, Sommerer N, Martinez J, Enjalbal C. Comparison of LID versus CID activation modes in tandem mass spectrometry of peptides. JOURNAL OF MASS SPECTROMETRY : JMS 2009; 44:621-632. [PMID: 19097045 DOI: 10.1002/jms.1535] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
We report our contribution to the systematic investigation of peptide fragmentations performed on high-performance Tof equipment, operating in MS and MS/MS modes, such as ESI-QqTof and MALDI-Tof/Tof instruments that are commonly available today in proteomic laboratories. Whereas the former analyzer's configuration provides low-energy collision-induced dissociations (CID), the latter allows tunable activation methods of the selected parent ion to induce either metastable laser-induced dissociations (LID) or high-energy CID ('gas on spectra LID'). Fragmentation of the monoprotonated ion of 53 peptides (FW 807-2853 g/mol) was undertaken upon low-energy CID on an ESI-QTof mass spectrometer (Waters) as well as high-energy CID and LID conditions on a MALDI Ultraflex mass spectrometer (Bruker). Systematic comparison of MS/MS spectra provided useful information on the performance of each piece of equipment for efficient peptide sequencing and also insights into the observed fragmentation behaviors.
Collapse
Affiliation(s)
- Nawar Shenar
- Institut des Biomolécules Max Mousseron, UMR 5247 CNRS-Universités Montpellier 1 et 2, Bâtiment Chimie (17), Université Montpellier 2, Place Eugène Bataillon, 34095 Montpellier Cedex 5, France
| | | | | | | |
Collapse
|
38
|
Edwards N, Wu X, Tseng CW. An Unsupervised, Model-Free, Machine-Learning Combiner for Peptide Identifications from Tandem Mass Spectra. Clin Proteomics 2009. [DOI: 10.1007/s12014-009-9024-5] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Abstract
As the speed of mass spectrometers, sophistication of sample fractionation, and complexity of experimental designs increase, the volume of tandem mass spectra requiring reliable automated analysis continues to grow. Software tools that quickly, effectively, and robustly determine the peptide associated with each spectrum with high confidence are sorely needed. Currently available tools that postprocess the output of sequence-database search engines use three techniques to distinguish the correct peptide identifications from the incorrect: statistical significance re-estimation, supervised machine learning scoring and prediction, and combining or merging of search engine results. We present a unifying framework that encompasses each of these techniques in a single model-free machine-learning framework that can be trained in an unsupervised manner. The predictor is trained on the fly for each new set of search results without user intervention, making it robust for different instruments, search engines, and search engine parameters. We demonstrate the performance of the technique using mixtures of known proteins and by using shuffled databases to estimate false discovery rates, from data acquired on three different instruments with two different ionization technologies. We show that this approach outperforms machine-learning techniques applied to a single search engine’s output, and demonstrate that combining search engine results provides additional benefit. We show that the performance of the commercial Mascot tool can be bested by the machine-learning combination of two open-source tools X!Tandem and OMSSA, but that the use of all three search engines boosts performance further still. The Peptide identification Arbiter by Machine Learning (PepArML) unsupervised, model-free, combining framework can be easily extended to support an arbitrary number of additional searches, search engines, or specialized peptide–spectrum match metrics for each spectrum data set. PepArML is open-source and is available from http://peparml.sourceforge.net.
Collapse
|
39
|
Bianco L, Mead JA, Bessant C. Comparison of Novel Decoy Database Designs for Optimizing Protein Identification Searches Using ABRF sPRG2006 Standard MS/MS Data Sets. J Proteome Res 2009; 8:1782-91. [DOI: 10.1021/pr800792z] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Luca Bianco
- Bioinformatics Group, Building 63, Cranfield University, Cranfield, Bedfordshire, United Kingdom MK43 0AL
| | - Jennifer A. Mead
- Bioinformatics Group, Building 63, Cranfield University, Cranfield, Bedfordshire, United Kingdom MK43 0AL
| | - Conrad Bessant
- Bioinformatics Group, Building 63, Cranfield University, Cranfield, Bedfordshire, United Kingdom MK43 0AL
| |
Collapse
|
40
|
Yang C, He Z, Yu W. Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis. BMC Bioinformatics 2009; 10:4. [PMID: 19126200 PMCID: PMC2631518 DOI: 10.1186/1471-2105-10-4] [Citation(s) in RCA: 175] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2008] [Accepted: 01/06/2009] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND In mass spectrometry (MS) based proteomic data analysis, peak detection is an essential step for subsequent analysis. Recently, there has been significant progress in the development of various peak detection algorithms. However, neither a comprehensive survey nor an experimental comparison of these algorithms is yet available. The main objective of this paper is to provide such a survey and to compare the performance of single spectrum based peak detection methods. RESULTS In general, we can decompose a peak detection procedure into three consequent parts: smoothing, baseline correction and peak finding. We first categorize existing peak detection algorithms according to the techniques used in different phases. Such a categorization reveals the differences and similarities among existing peak detection algorithms. Then, we choose five typical peak detection algorithms to conduct a comprehensive experimental study using both simulation data and real MALDI MS data. CONCLUSION The results of comparison show that the continuous wavelet-based algorithm provides the best average performance.
Collapse
Affiliation(s)
- Chao Yang
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, PR China.
| | | | | |
Collapse
|
41
|
Novel nicotine oxidoreductase-encoding gene involved in nicotine degradation by Pseudomonas putida strain S16. Appl Environ Microbiol 2008; 75:772-8. [PMID: 19060159 DOI: 10.1128/aem.02300-08] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
There are quite a few ongoing biochemical investigations of nicotine degradation in different organisms. In this work, we identified and sequenced a gene (designated nicA) involved in nicotine degradation by Pseudomonas putida strain S16. The gene product, NicA, was heterologously expressed and characterized as a nicotine oxidoreductase catalyzing the initial steps of nicotine metabolism. Biochemical analyses using resting cells and the purified enzyme suggested that nicA encodes an oxidoreductase, which converts nicotine to 3-succinoylpyridine through pseudooxynicotine. Based on enzymatic reactions and direct evidence obtained using H(2)(18)O labeling, the process may consist of enzyme-catalyzed dehydrogenation, followed by spontaneous hydrolysis and then repetition of the dehydrogenation and hydrolysis steps. Sequence comparisons revealed that the gene showed 40% similarity to genes encoding NADH dehydrogenase subunit I and cytochrome c oxidase subunit I in eukaryotes. Our findings demonstrate that the molecular mechanism for nicotine degradation in strain S16 involves the pyrrolidine pathway and is similar to the mechanism in mammals, in which pseudooxynicotine, the direct precursor of a potent tobacco-specific lung carcinogen, is produced.
Collapse
|
42
|
Alves G, Ogurtsov AY, Yu YK. RAId_DbS: mass-spectrometry based peptide identification web server with knowledge integration. BMC Genomics 2008; 9:505. [PMID: 18954448 PMCID: PMC2605478 DOI: 10.1186/1471-2164-9-505] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2008] [Accepted: 10/27/2008] [Indexed: 11/17/2022] Open
Abstract
Background Existing scientific literature is a rich source of biological information such as disease markers. Integration of this information with data analysis may help researchers to identify possible controversies and to form useful hypotheses for further validations. In the context of proteomics studies, individualized proteomics era may be approached through consideration of amino acid substitutions/modifications as well as information from disease studies. Integration of such information with peptide searches facilitates speedy, dynamic information retrieval that may significantly benefit clinical laboratory studies. Description We have integrated from various sources annotated single amino acid polymorphisms, post-translational modifications, and their documented disease associations (if they exist) into one enhanced database per organism. We have also augmented our peptide identification software RAId_DbS to take into account this information while analyzing a tandem mass spectrum. In principle, one may choose to respect or ignore the correlation of amino acid polymorphisms/modifications within each protein. The former leads to targeted searches and avoids scoring of unnecessary polymorphism/modification combinations; the latter explores possible polymorphisms in a controlled fashion. To facilitate new discoveries, RAId_DbS also allows users to conduct searches permitting novel polymorphisms as well as to search a knowledge database created by the users. Conclusion We have finished constructing enhanced databases for 17 organisms. The web link to RAId_DbS and the enhanced databases is . The relevant databases and binaries of RAId_DbS for Linux, Windows, and Mac OS X are available for download from the same web page.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
43
|
Falkner JA, Falkner JW, Yocum AK, Andrews PC. A spectral clustering approach to MS/MS identification of post-translational modifications. J Proteome Res 2008; 7:4614-22. [PMID: 18800783 DOI: 10.1021/pr800226w] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Unidentified tandem mass spectra typically represent 50-90% of the spectra acquired in proteomics studies. This manuscript describes a novel algorithm, "Bonanza", for clustering spectra without knowledge of peptide or protein identifications. Further analysis leverages existing peptide identifications to infer related, likely valid identifications. Significantly more spectra can be identified with this approach, including spectra with unexpected potential modifications or amino-acid substitutions.
Collapse
Affiliation(s)
- Jayson A Falkner
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA.
| | | | | | | |
Collapse
|
44
|
Tabb DL, Ma ZQ, Martin DB, Ham AJL, Chambers MC. DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. J Proteome Res 2008; 7:3838-46. [PMID: 18630943 DOI: 10.1021/pr800154p] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In shotgun proteomics, tandem mass spectra of peptides are typically identified through database search algorithms such as Sequest. We have developed DirecTag, an open-source algorithm to infer partial sequence tags directly from observed fragment ions. This algorithm is unique in its implementation of three separate scoring systems to evaluate each tag on the basis of peak intensity, m/ z fidelity, and complementarity. In data sets from several types of mass spectrometers, DirecTag reproducibly exceeded the accuracy and speed of InsPecT and GutenTag, two previously published algorithms for this purpose. The source code and binaries for DirecTag are available from http://fenchurch.mc.vanderbilt.edu.
Collapse
Affiliation(s)
- David L Tabb
- Mass Spectrometry Research Center, Vanderbilt University Medical Center, Nashville, Tennessee 37232-8575, USA.
| | | | | | | | | |
Collapse
|
45
|
A novel gene, encoding 6-hydroxy-3-succinoylpyridine hydroxylase, involved in nicotine degradation by Pseudomonas putida strain S16. Appl Environ Microbiol 2008; 74:1567-74. [PMID: 18203859 DOI: 10.1128/aem.02529-07] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Previous research suggested that Pseudomonas spp. may attack the pyrrolidine ring of nicotine in a way similar to mammalian metabolism, resulting in the formation of pseudooxynicotine, the direct precursor of a potent tobacco-specific lung carcinogen. In addition, the subsequent intermediates, 6-hydroxy-3-succinoylpyridine (HSP) and 2,5-dihydroxypyridine (DHP) in the Pseudomonas nicotine degradation pathway are two important precursors for drug syntheses. However, there is little information on the molecular mechanism for nicotine degradation via the pyrrolidine pathway until now. In this study we cloned and sequenced a 4,879-bp gene cluster involved in nicotine degradation. Intermediates N-methylmyosmine, pseudooxynicotine, 3-succinoylpyridine, HSP, and DHP were identified from resting cell reactions of the transformant containing the gene cluster and shown to be identical to those of the pyrrolidine pathway reported in wild-type strain Pseudomonas putida S16. The gene for 6-hydroxy-3-succinoylpyridine hydroxylase (HSP hydroxylase) catalyzing HSP directly to DHP was cloned, sequenced, and expressed in Escherichia coli, and the purified HSP hydroxylase (38 kDa) is NADH dependent. DNA sequence analysis of this 936-bp fragment reveals that the deduced amino acid shows no similarity with any protein of known function.
Collapse
|
46
|
Dodds ED, German JB, Lebrilla CB. Enabling MALDI-FTICR-MS/MS for high-performance proteomics through combination of infrared and collisional activation. Anal Chem 2007; 79:9547-56. [PMID: 18001128 DOI: 10.1021/ac701763t] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Matrix-assisted laser desorption/ionization (MALDI) mass spectrometry (MS) is a central tool for proteomic analysis, yet the singly protonated tryptic peptide ions produced by MALDI are significantly more difficult to dissociate for tandem mass spectrometry (MS/MS) than the corresponding multiply protonated ions. In order to overcome this limitation, current proteomic approaches using MALDI-MS/MS involve high-energy collision-induced dissociation (CID). Unfortunately, the use of high-energy CID complicates product ion spectra with a significant proportion of irrelevant fragments while also reducing mass accuracy and mass resolution. In order to address the lack of a high-resolution, high mass accuracy MALDI-MS/MS platform for proteomics, Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS) and a recently developed MS/MS technique termed CIRCA (for combination of infrared and collisional activation) have been applied to proteomic analysis. Here, CIRCA is shown to be suitable for dissociating singly protonated tryptic peptides, providing greater sequence coverage than either CID or infrared multiphoton dissociation (IRMPD) alone. Furthermore, the CIRCA fragmentation spectra are of sufficient quality to allow protein identification based on the MS/MS spectra alone or in concert with the peptide mass fingerprint (PMF). This is accomplished without compromising mass accuracy or mass resolution. As a result, CIRCA serves to enable MALDI-FTICR-MS/MS for high-performance proteomics experiments.
Collapse
Affiliation(s)
- Eric D Dodds
- Department of Chemistry, School of Medicine, University of California Davis, One Shields Avenue, Davis, California 95616, USA
| | | | | |
Collapse
|
47
|
Noy K, Fasulo D. Improved model-based, platform-independent feature extraction for mass spectrometry. Bioinformatics 2007; 23:2528-35. [PMID: 17698491 DOI: 10.1093/bioinformatics/btm385] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Mass spectrometry (MS) is increasingly being used for biomedical research. The typical analysis of MS data consists of several steps. Feature extraction is a crucial step since subsequent analyses are performed only on the detected features. Current methodologies applied to low-resolution MS, in which features are peaks or wavelet functions, are parameter-sensitive and inaccurate in the sense that peaks and wavelet functions do not directly correspond to the underlying molecules under observation. In high-resolution MS, the model-based approach is more appealing as it can provide a better representation of the MS signals by incorporating information about peak shapes and isotopic distributions. Current model-based techniques are computationally expensive; various algorithms have been proposed to improve the computational efficiency of this paradigm. However, these methods cannot deal well with overlapping features, especially when they are merged to create one broad peak. In addition, no method has been proven to perform well across different MS platforms. RESULTS We suggest a new model-based approach to feature extraction in which spectra are decomposed into a mixture of distributions derived from peptide models. By incorporating kernel-based smoothing and perceptual similarity for matching distributions, our statistical framework improves existing methodologies in terms of computational efficiency and the accuracy of the results. Our model is parameterized by physical properties and is therefore applicable to different MS instruments and settings. We validate our approach on simulated data, and show that the performance is higher than commonly used tools on real high- and low-resolution MS, and MS/MS data sets.
Collapse
Affiliation(s)
- Karin Noy
- Integrated Data System Department, Siemens Corporate Research, 755 College Road East, Princeton, NJ 08540, USA
| | | |
Collapse
|