1
|
He T, Liu Y, Zhou Y, Li L, Wang H, Chen S, Gao J, Jiang W, Yu Y, Ge W, Chang HY, Fan Z, Nesvizhskii AI, Guo T, Sun Y. Comparative Evaluation of Proteome Discoverer and FragPipe for the TMT-Based Proteome Quantification. J Proteome Res 2022; 21:3007-3015. [PMID: 36315902 DOI: 10.1021/acs.jproteome.2c00390] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
Isobaric labeling-based proteomics is widely applied in deep proteome quantification. Among the platforms for isobaric labeled proteomic data analysis, the commercial software Proteome Discoverer (PD) is widely used, incorporating the search engine CHIMERYS, while FragPipe (FP) is relatively new, free for noncommercial purposes, and integrates the engine MSFragger. Here, we compared PD and FP over three public proteomic data sets labeled using 6plex, 10plex, and 16plex tandem mass tags. Our results showed the protein abundances generated by the two software are highly correlated. PD quantified more proteins (10.02%, 15.44%, 8.19%) than FP with comparable NA ratios (0.00% vs. 0.00%, 0.85% vs. 0.38%, and 11.74% vs. 10.52%) in the three data sets. Using the 16plex data set, PD and FP outputs showed high consistency in quantifying technical replicates, batch effects, and functional enrichment in differentially expressed proteins. However, FP saved 93.93%, 96.65%, and 96.41% of processing time compared to PD for analyzing the three data sets, respectively. In conclusion, while PD is a well-maintained commercial software integrating various additional functions and can quantify more proteins, FP is freely available and achieves similar output with a shorter computational time. Our results will guide users in choosing the most suitable quantification software for their needs.
Collapse
Affiliation(s)
- Tianen He
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, No.18 Shilongshan Road, Hangzhou 310024, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, No.18 Shilongshan Road, Hangzhou 310024, China.,Research Center for Industries of the Future, Westlake University, No.600 Dunyu Road, Hangzhou 310030, China.,School of Life Sciences, Peking University, No.5 Yiheyuan Road, Beijing 100871, China
| | - Youqi Liu
- Westlake Omics (Hangzhou) Biotechnology Co., Ltd., No.1 Yunmeng Road, Hangzhou 310024, China
| | - Yan Zhou
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, No.18 Shilongshan Road, Hangzhou 310024, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, No.18 Shilongshan Road, Hangzhou 310024, China.,Research Center for Industries of the Future, Westlake University, No.600 Dunyu Road, Hangzhou 310030, China
| | - Lu Li
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, No.18 Shilongshan Road, Hangzhou 310024, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, No.18 Shilongshan Road, Hangzhou 310024, China.,Research Center for Industries of the Future, Westlake University, No.600 Dunyu Road, Hangzhou 310030, China
| | - He Wang
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, No.18 Shilongshan Road, Hangzhou 310024, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, No.18 Shilongshan Road, Hangzhou 310024, China.,Research Center for Industries of the Future, Westlake University, No.600 Dunyu Road, Hangzhou 310030, China
| | - Shanjun Chen
- Westlake Omics (Hangzhou) Biotechnology Co., Ltd., No.1 Yunmeng Road, Hangzhou 310024, China
| | - Jinlong Gao
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, No.18 Shilongshan Road, Hangzhou 310024, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, No.18 Shilongshan Road, Hangzhou 310024, China.,Research Center for Industries of the Future, Westlake University, No.600 Dunyu Road, Hangzhou 310030, China
| | - Wenhao Jiang
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, No.18 Shilongshan Road, Hangzhou 310024, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, No.18 Shilongshan Road, Hangzhou 310024, China.,Research Center for Industries of the Future, Westlake University, No.600 Dunyu Road, Hangzhou 310030, China
| | - Yi Yu
- Westlake Omics (Hangzhou) Biotechnology Co., Ltd., No.1 Yunmeng Road, Hangzhou 310024, China
| | - Weigang Ge
- Westlake Omics (Hangzhou) Biotechnology Co., Ltd., No.1 Yunmeng Road, Hangzhou 310024, China
| | - Hui-Yin Chang
- Department of Pathology; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, United States.,Department of Biomedical Sciences and Engineering, National Central University, Taoyuan City 320317, Taiwan
| | - Ziquan Fan
- Thermo Fisher Scientific, No.2517 Jinke Road, Shanghai 201203, China
| | - Alexey I Nesvizhskii
- Department of Pathology; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Tiannan Guo
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, No.18 Shilongshan Road, Hangzhou 310024, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, No.18 Shilongshan Road, Hangzhou 310024, China.,Research Center for Industries of the Future, Westlake University, No.600 Dunyu Road, Hangzhou 310030, China
| | - Yaoting Sun
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, No.18 Shilongshan Road, Hangzhou 310024, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, No.18 Shilongshan Road, Hangzhou 310024, China.,Research Center for Industries of the Future, Westlake University, No.600 Dunyu Road, Hangzhou 310030, China
| |
Collapse
|
2
|
Tariq MU, Haseeb M, Aledhari M, Razzak R, Parizi RM, Saeed F. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 9:5497-5516. [PMID: 33537181 PMCID: PMC7853650 DOI: 10.1109/access.2020.3047588] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process a small-scale dataset of one million spectra against a genome of 3 GB. In this article, we provide an up-to-date review of tools that can analyze proteogenomic datasets, providing a critical analysis of the techniques' relative merits and potential pitfalls. We also point out potential bottlenecks and recommendations that can be incorporated in the future design of these workflows to ensure scalability with the increasing size of proteogenomic data. Lastly, we make a case of how high-performance computing (HPC) solutions may be the best bet to ensure the scalability of future big data proteogenomic data analysis.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Muhammad Haseeb
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Mohammed Aledhari
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Rehma Razzak
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Reza M Parizi
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
3
|
Moumbock AFA, Ntie-Kang F, Akone SH, Li J, Gao M, Telukunta KK, Günther S. An overview of tools, software, and methods for natural product fragment and mass spectral analysis. PHYSICAL SCIENCES REVIEWS 2019. [DOI: 10.1515/psr-2018-0126] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Abstract
One major challenge in natural product (NP) discovery is the determination of the chemical structure of unknown metabolites using automated software tools from either GC–mass spectrometry (MS) or liquid chromatography–MS/MS data only. This chapter reviews the existing spectral libraries and predictive computational tools used in MS-based untargeted metabolomics, which is currently a hot topic in NP structure elucidation. We begin by focusing on spectral databases and the general workflow of MS annotation. We then describe software and tools used in MS, particularly those used to predict fragmentation patterns, mass spectral classifiers, and tools for fragmentation trees analysis. We then round up the chapter by looking at more advanced approaches implemented in tools for competitive fragmentation modeling and quantum chemical approaches.
Collapse
|
4
|
Li C, Li K, Li K, Lin F. MCtandem: an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture. BMC Bioinformatics 2019; 20:397. [PMID: 31315562 PMCID: PMC6637555 DOI: 10.1186/s12859-019-2980-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Accepted: 07/02/2019] [Indexed: 11/17/2022] Open
Abstract
Background Tandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics. However, due to the rapid growth of spectra data produced by advanced mass spectrometry and the greatly increased number of modified and digested peptides identified in recent years, the current methods for peptide database searching cannot rapidly and thoroughly process large MS/MS spectra datasets. A breakthrough in efficient database search algorithms is crucial for peptide identification in computational proteomics. Results This paper presents MCtandem, an efficient tool for large-scale peptide identification on Intel Many Integrated Core (MIC) architecture. To support big data processing capability, a novel parallel match scoring algorithm, named MIC-SDP (spectrum dot product), and its two-level parallelization are presented in MCtandem’s design. In addition, a series of optimization strategies on both the host CPU side and the MIC side, which includes pre-fetching, optimized communication overlapping scheme, multithreading and hyper-threading, are exploited to improve the execution performance. Conclusions For fair comparisons, we first set up experiments and verified the 28 fold times speedup on a single MIC against the original CPU-based implementation. We then execute the MCtandem for a very large dataset on an MIC cluster (a component of the Tianhe-2 supercomputer) and achieved much higher scalability than in a benchmark MapReduce-based programs, MR-Tandem. MCtandem is an open-source software tool implemented in C++. The source code and the parameter settings are available at https://github.com/LogicZY/MCtandem.
Collapse
Affiliation(s)
- Chuang Li
- College of Computer Science and Electronic Engineering, Hunan University, Lushannan Road, Changsha, 410082, China.,School of Computer Science and Engineering, Nanyang Technological University, Nangyang Road, Singapore, 639798, Singapore
| | - Kenli Li
- College of Computer Science and Electronic Engineering, Hunan University, Lushannan Road, Changsha, 410082, China. .,National Supercomputing Center in Changsha, Lushannan Road, Changsha, 410082, China.
| | - Keqin Li
- College of Computer Science and Electronic Engineering, Hunan University, Lushannan Road, Changsha, 410082, China.,National Supercomputing Center in Changsha, Lushannan Road, Changsha, 410082, China.,Department of Computer Science, State University of New York, New Paltz, New York, 12561, USA
| | - Feng Lin
- School of Computer Science and Engineering, Nanyang Technological University, Nangyang Road, Singapore, 639798, Singapore
| |
Collapse
|
5
|
Li C, Li K, Li K, Xie X, Lin F. SWPepNovo: An Efficient De Novo Peptide Sequencing Tool for Large-scale MS/MS Spectra Analysis. Int J Biol Sci 2019; 15:1787-1801. [PMID: 31523183 PMCID: PMC6743289 DOI: 10.7150/ijbs.32142] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 04/09/2019] [Indexed: 12/17/2022] Open
Abstract
Tandem mass spectrometry (MS/MS)-based de novo peptide sequencing is a powerful method for high-throughput protein analysis. However, the explosively increasing size of MS/MS spectra dataset inevitably and exponentially raises the computational demand of existing de novo peptide sequencing methods, which is an issue urgently to be solved in computational biology. This paper introduces an efficient tool based on SW26010 many-core processor, namely SWPepNovo, to process the large-scale peptide MS/MS spectra using a parallel peptide spectrum matches (PSMs) algorithm. Our design employs a two-level parallelization mechanism: (1) the task-level parallelism between MPEs using MPI based on a data transformation method and a dynamic feedback task scheduling algorithm, (2) the thread-level parallelism across CPEs using asynchronous task transfer and multithreading. Moreover, three optimization strategies, including vectorization, double buffering and memory access optimizations, have been employed to overcome both the compute-bound and the memory-bound bottlenecks in the parallel PSMs algorithm. The results of experiments conducted on multiple spectra datasets demonstrate the performance of SWPepNovo against three state-of-the-art tools for peptide sequencing, including PepNovo+, PEAKS and DeepNovo-DIA. The SWPepNovo also shows high scalability in experiments on extremely large datasets sized up to 11.22 GB. The software and the parameter settings are available at https://github.com/ChuangLi99/SWPepNovo.
Collapse
Affiliation(s)
- Chuang Li
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Kenli Li
- College of Information Science and Engineering, Hunan University, National Supercomputing Center in Changsha, Changsha, China
| | - Keqin Li
- College of Information Science and Engineering, Hunan University, Department of Computer Science, State University of New York, NY, USA
| | - Xianghui Xie
- State Key Laboratory of Mathematic Engineering and Advance Computing, Wuxi Jiangnan Institute of Computing Technology, Jiangsu, China
| | - Feng Lin
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| |
Collapse
|
6
|
dAcierno A. IsAProteinDB: An Indexed Database of Trypsinized Proteins for Fast Peptide Mass Fingerprinting. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1195-1201. [PMID: 28113723 DOI: 10.1109/tcbb.2016.2564964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In peptite mass fingerprinting, an unknown protein is fragmented into smaller peptides whose masses are accurately measured; the obtained list of weights is then compared with a reference database to obtain a set of matching proteins. The exponential growth of known proteins discourage the use of brute force methods, where the weights' list is compared with each protein in the reference collection; luckily, the scientific literature in the database field highlights that well designed searching algorithms, coupled with a proper data organization, allow to quickly solve the identification problem even on standard desktop computers. In this paper, IsAProteinsDB, an indexed database of trypsinized proteins, is presented. The corresponding search algorithm shows a time complexity that does not significantly depends on the size of the reference protein database.
Collapse
|
7
|
Tripathi P, Rabara RC, Reese RN, Miller MA, Rohila JS, Subramanian S, Shen QJ, Morandi D, Bücking H, Shulaev V, Rushton PJ. A toolbox of genes, proteins, metabolites and promoters for improving drought tolerance in soybean includes the metabolite coumestrol and stomatal development genes. BMC Genomics 2016; 17:102. [PMID: 26861168 PMCID: PMC4746818 DOI: 10.1186/s12864-016-2420-0] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 01/26/2016] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The purpose of this project was to identify metabolites, proteins, genes, and promoters associated with water stress responses in soybean. A number of these may serve as new targets for the biotechnological improvement of drought responses in soybean (Glycine max). RESULTS We identified metabolites, proteins, and genes that are strongly up or down regulated during rapid water stress following removal from a hydroponics system. 163 metabolites showed significant changes during water stress in roots and 93 in leaves. The largest change was a root-specific 160-fold increase in the coumestan coumestrol making it a potential biomarker for drought and a promising target for improving drought responses. Previous reports suggest that coumestrol stimulates mycorrhizal colonization and under certain conditions mycorrhizal plants have improved drought tolerance. This suggests that coumestrol may be part of a call for help to the rhizobiome during stress. About 3,000 genes were strongly up-regulated by drought and we identified regulators such as ERF, MYB, NAC, bHLH, and WRKY transcription factors, receptor-like kinases, and calcium signaling components as potential targets for soybean improvement as well as the jasmonate and abscisic acid biosynthetic genes JMT, LOX1, and ABA1. Drought stressed soybean leaves show reduced mRNA levels of stomatal development genes including FAMA-like, MUTE-like and SPEECHLESS-like bHLH transcription factors and leaves formed after drought stress had a reduction in stomatal density of 22.34 % and stomatal index of 17.56 %. This suggests that reducing stomatal density may improve drought tolerance. MEME analyses suggest that ABRE (CACGT/CG), CRT/DRE (CCGAC) and a novel GTGCnTGC/G element play roles in transcriptional activation and these could form components of synthetic promoters to drive expression of transgenes. Using transformed hairy roots, we validated the increase in promoter activity of GmWRKY17 and GmWRKY67 during dehydration and after 20 μM ABA treatment. CONCLUSIONS Our toolbox provides new targets and strategies for improving soybean drought tolerance and includes the coumestan coumestrol, transcription factors that regulate stomatal density, water stress-responsive WRKY gene promoters and a novel DNA element that appears to be enriched in water stress responsive promoters.
Collapse
Affiliation(s)
- Prateek Tripathi
- Department of Biology and Microbiology, South Dakota State University, Brookings, SD57007, USA.
- Current address, Molecular and Computational Biology, Dana & David Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, CA, 90089, USA.
| | - Roel C Rabara
- Department of Biology and Microbiology, South Dakota State University, Brookings, SD57007, USA.
- Current address: Texas A&M AgriLife Research and Extension Center, Dallas, TX, 75252, USA.
| | - R Neil Reese
- Department of Biology and Microbiology, South Dakota State University, Brookings, SD57007, USA.
| | - Marissa A Miller
- Texas A&M AgriLife Research and Extension Center, Dallas, TX, 75252, USA.
| | - Jai S Rohila
- Department of Biology and Microbiology, South Dakota State University, Brookings, SD57007, USA.
| | - Senthil Subramanian
- Department of Biology and Microbiology, South Dakota State University, Brookings, SD57007, USA.
| | - Qingxi J Shen
- School of Life Sciences, University of Nevada, Las Vegas, 89154, USA.
| | - Dominique Morandi
- INRA, UMR 1347 Agroécologie, 17 rue Sully, BP 86510, 21065, Dijon, CEDEX, France.
| | - Heike Bücking
- Department of Biology and Microbiology, South Dakota State University, Brookings, SD57007, USA.
| | - Vladimir Shulaev
- Department of Biological Sciences, University of North Texas, Denton, TX, 76203, USA.
| | - Paul J Rushton
- Texas A&M AgriLife Research and Extension Center, Dallas, TX, 75252, USA.
- Current address, 22nd Century Group Inc., 9530 Main Street Clarence, New York, 14031, USA.
| |
Collapse
|
8
|
Taylor EB, Moulana M, Stuge TB, Quiniou SMA, Bengten E, Wilson M. A Leukocyte Immune-Type Receptor Subset Is a Marker of Antiviral Cytotoxic Cells in Channel Catfish, Ictalurus punctatus. THE JOURNAL OF IMMUNOLOGY 2016; 196:2677-89. [PMID: 26856701 DOI: 10.4049/jimmunol.1502166] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Accepted: 01/03/2016] [Indexed: 11/19/2022]
Abstract
Channel catfish, Ictalurus punctatus, leukocyte immune type receptors (LITRs) represent a multigene family that encodes Ig superfamily proteins that mediate activating or inhibitory signaling. In this study, we demonstrate the use of mAb CC41 to monitor viral cytotoxic responses in catfish and determine that CC41 binds to a subset of LITRs on the surface of catfish clonal CTLs. Homozygous gynogenetic catfish were immunized with channel catfish virus (CCV)-infected MHC-matched clonal T cells (G14D-CCV), and PBL were collected at various times after immunization for flow cytometric analyses. The percentage of CC41(+) cells was significantly increased 5 d after primary immunization with G14D-CCV and at 3 d after a booster immunization as compared with control fish only injected with G14D. Moreover, CC41(+) cells magnetically isolated from the PBL specifically killed CCV-infected targets as measured by (51)Cr release assays and expressed messages for CD3γδ, perforin, and at least one of the CD4-like receptors as analyzed by RNA flow cytometry. When MLC effector cells derived from a G14D-CCV-immunized fish were preincubated with CC41 mAb, killing of G14D-CCV targets was reduced by ∼40%, suggesting that at least some LITRs have a role in target cell recognition and/or cytotoxicity. The availability of a LITR-specific mAb has allowed, to our knowledge for the first time, functional characterization of LITRs in an autologous system. In addition, the identification of an LITR subset as a cytotoxic cell marker will allow for more effective monitoring of catfish immune responses to pathogens.
Collapse
Affiliation(s)
- Erin B Taylor
- Department of Microbiology and Immunology, University of Mississippi Medical Center, Jackson, MS 39216
| | - Mohadetheh Moulana
- Warmwater Aquaculture Research Unit, U.S. Department of Agriculture-Agricultural Research Service, Stoneville, MS 38776; and
| | - Tor B Stuge
- Immunology Research Group, Department of Medical Biology, Faculty of Health Sciences, University of Tromso-Arctic University of Norway, N-9037 Tromso, Norway
| | - Sylvie M A Quiniou
- Warmwater Aquaculture Research Unit, U.S. Department of Agriculture-Agricultural Research Service, Stoneville, MS 38776; and
| | - Eva Bengten
- Department of Microbiology and Immunology, University of Mississippi Medical Center, Jackson, MS 39216
| | - Melanie Wilson
- Department of Microbiology and Immunology, University of Mississippi Medical Center, Jackson, MS 39216;
| |
Collapse
|
9
|
Subbannayya Y, Pinto SM, Gowda H, Prasad TSK. Proteogenomics for understanding oncology: recent advances and future prospects. Expert Rev Proteomics 2016; 13:297-308. [PMID: 26697917 DOI: 10.1586/14789450.2016.1136217] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The concept of proteogenomics has emerged rapidly as a valuable approach to integrate mass spectrometry-derived proteomic data with genomic and transcriptomic data. It is used to harness the full potential of the former dataset in the discovery of potential biomarkers, therapeutic targets and novel proteins associated with various biological processes including diseases. Proteogenomic strategies have been successfully utilized to identify novel genes and redefine annotation of existing gene models in various genomes. In recent years, this approach has been extended to the field of cancer biology to unravel complexities in the tumor genomes and proteomes. Standard proteomics workflows employing translated cancer genomes and transcriptomes can potentially identify peptides from mutant proteins, splice variants and fusion proteins in the tumor proteome, which in addition to the currently available biomarker panels can serve as potential diagnostic and prognostic biomarkers, besides having therapeutic utility. This review focuses on the role of proteogenomics to understand cancer biology.
Collapse
Affiliation(s)
- Yashwanth Subbannayya
- a YU-IOB Center for Systems Biology and Molecular Medicine , Yenepoya University , Mangalore, India.,b Institute of Bioinformatics , Bangalore , India
| | - Sneha M Pinto
- a YU-IOB Center for Systems Biology and Molecular Medicine , Yenepoya University , Mangalore, India.,b Institute of Bioinformatics , Bangalore , India
| | - Harsha Gowda
- a YU-IOB Center for Systems Biology and Molecular Medicine , Yenepoya University , Mangalore, India.,b Institute of Bioinformatics , Bangalore , India
| | - T S Keshava Prasad
- a YU-IOB Center for Systems Biology and Molecular Medicine , Yenepoya University , Mangalore, India.,b Institute of Bioinformatics , Bangalore , India.,c NIMHANS-IOB Proteomics and Bioinformatics Laboratory, Neurobiology Research Centre , National Institute of Mental Health and Neurosciences , Bangalore , India
| |
Collapse
|
10
|
Chi H, He K, Yang B, Chen Z, Sun RX, Fan SB, Zhang K, Liu C, Yuan ZF, Wang QH, Liu SQ, Dong MQ, He SM. Reprint of "pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data". J Proteomics 2015; 129:33-41. [PMID: 26232248 DOI: 10.1016/j.jprot.2015.07.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 05/04/2015] [Accepted: 05/10/2015] [Indexed: 01/23/2023]
Abstract
Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested.This article is part of a Special Issue entitled: Computational Proteomics.
Collapse
Affiliation(s)
- Hao Chi
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Kun He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Bing Yang
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Zhen Chen
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Rui-Xiang Sun
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Sheng-Bo Fan
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Kun Zhang
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Chao Liu
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Zuo-Fei Yuan
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Quan-Hui Wang
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Si-Qi Liu
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Si-Min He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.
| |
Collapse
|
11
|
Skeie JM, Roybal CN, Mahajan VB. Proteomic insight into the molecular function of the vitreous. PLoS One 2015; 10:e0127567. [PMID: 26020955 PMCID: PMC4447289 DOI: 10.1371/journal.pone.0127567] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2015] [Accepted: 04/16/2015] [Indexed: 02/06/2023] Open
Abstract
The human vitreous contains primarily water, but also contains proteins which have yet to be fully characterized. To gain insight into the four vitreous substructures and their potential functions, we isolated and analyzed the vitreous protein profiles of three non-diseased human eyes. The four analyzed substructures were the anterior hyaloid, the vitreous cortex, the vitreous core, and the vitreous base. Proteins were separated by multidimensional liquid chromatography and identified by tandem mass spectrometry. Bioinformatics tools then extracted the expression profiles, signaling pathways, and interactomes unique to each tissue. From each substructure, a mean of 2,062 unique proteins were identified, with many being differentially expressed in a specific substructure: 278 proteins were unique to the anterior hyaloid, 322 to the vitreous cortex, 128 to the vitreous base, and 136 to the vitreous core. When the identified proteins were organized according to relevant functional pathways and networks, key patterns appeared. The blood coagulation pathway and extracellular matrix turnover networks were highly represented. Oxidative stress regulation and energy metabolism proteins were distributed throughout the vitreous. Immune functions were represented by high levels of immunoglobulin, the complement pathway, damage-associated molecular patterns (DAMPs), and evolutionarily conserved antimicrobial proteins. The majority of vitreous proteins detected were intracellular proteins, some of which originate from the retina, including rhodopsin (RHO), phosphodiesterase 6 (PDE6), and glial fibrillary acidic protein (GFAP). This comprehensive analysis uncovers a picture of the vitreous as a biologically active tissue, where proteins localize to distinct substructures to protect the intraocular tissues from infection, oxidative stress, and energy disequilibrium. It also reveals the retina as a potential source of inflammatory mediators. The vitreous proteome catalogues the dynamic interactions between the vitreous and surrounding tissues. It therefore could be an indirect and effective method for surveying vitreoretinal disease for specific biomarkers.
Collapse
Affiliation(s)
- Jessica M. Skeie
- Omics Laboratory, Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, IA, United States of America
| | - C. Nathaniel Roybal
- Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, IA, United States of America
| | - Vinit B. Mahajan
- Omics Laboratory, Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, IA, United States of America
- Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, IA, United States of America
- * E-mail:
| |
Collapse
|
12
|
Chi H, He K, Yang B, Chen Z, Sun RX, Fan SB, Zhang K, Liu C, Yuan ZF, Wang QH, Liu SQ, Dong MQ, He SM. pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data. J Proteomics 2015; 125:89-97. [PMID: 25979774 DOI: 10.1016/j.jprot.2015.05.009] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 05/04/2015] [Accepted: 05/10/2015] [Indexed: 10/23/2022]
Abstract
Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested.
Collapse
Affiliation(s)
- Hao Chi
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Kun He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Bing Yang
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Zhen Chen
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Rui-Xiang Sun
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Sheng-Bo Fan
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Kun Zhang
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Chao Liu
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Zuo-Fei Yuan
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Quan-Hui Wang
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Si-Qi Liu
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Si-Min He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.
| |
Collapse
|
13
|
Protein Vicinal Thiol Oxidations in the Healthy Brain: Not So Radical Links between Physiological Oxidative Stress and Neural Cell Activities. Neurochem Res 2014; 39:2030-9. [DOI: 10.1007/s11064-014-1378-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Revised: 06/16/2014] [Accepted: 06/30/2014] [Indexed: 11/26/2022]
|
14
|
Li Y, Chi H, Xia L, Chu X. Accelerating the scoring module of mass spectrometry-based peptide identification using GPUs. BMC Bioinformatics 2014; 15:121. [PMID: 24773593 PMCID: PMC4049470 DOI: 10.1186/1471-2105-15-121] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2012] [Accepted: 04/23/2014] [Indexed: 11/10/2022] Open
Abstract
Background Tandem mass spectrometry-based database searching is currently the main method for protein identification in shotgun proteomics. The explosive growth of protein and peptide databases, which is a result of genome translations, enzymatic digestions, and post-translational modifications (PTMs), is making computational efficiency in database searching a serious challenge. Profile analysis shows that most search engines spend 50%-90% of their total time on the scoring module, and that the spectrum dot product (SDP) based scoring module is the most widely used. As a general purpose and high performance parallel hardware, graphics processing units (GPUs) are promising platforms for speeding up database searches in the protein identification process. Results We designed and implemented a parallel SDP-based scoring module on GPUs that exploits the efficient use of GPU registers, constant memory and shared memory. Compared with the CPU-based version, we achieved a 30 to 60 times speedup using a single GPU. We also implemented our algorithm on a GPU cluster and achieved an approximately favorable speedup. Conclusions Our GPU-based SDP algorithm can significantly improve the speed of the scoring module in mass spectrometry-based protein identification. The algorithm can be easily implemented in many database search engines such as X!Tandem, SEQUEST, and pFind. A software tool implementing this algorithm is available at http://www.comp.hkbu.edu.hk/~youli/ProteinByGPU.html
Collapse
Affiliation(s)
| | | | | | - Xiaowen Chu
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong.
| |
Collapse
|
15
|
Skeie JM, Mahajan VB. Proteomic interactions in the mouse vitreous-retina complex. PLoS One 2013; 8:e82140. [PMID: 24312404 PMCID: PMC3843729 DOI: 10.1371/journal.pone.0082140] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2013] [Accepted: 10/21/2013] [Indexed: 11/19/2022] Open
Abstract
PURPOSE Human vitreoretinal diseases are due to presumed abnormal mechanical interactions between the vitreous and retina, and translational models are limited. This study determined whether nonstructural proteins and potential retinal biomarkers were expressed by the normal mouse vitreous and retina. METHODS Vitreous and retina samples from mice were collected by evisceration and analyzed by liquid chromatography-tandem mass spectrometry. Identified proteins were further analyzed for differential expression and functional interactions using bioinformatic software. RESULTS We identified 1,680 unique proteins in the retina and 675 unique proteins in the vitreous. Unbiased clustering identified protein pathways that distinguish retina from vitreous including oxidative phosphorylation and neurofilament cytoskeletal remodeling, whereas the vitreous expressed oxidative stress and innate immunology pathways. Some intracellular protein pathways were found in both retina and vitreous, such as glycolysis and gluconeogenesis and neuronal signaling, suggesting proteins might be shuttled between the retina and vitreous. We also identified human disease biomarkers represented in the mouse vitreous and retina, including carbonic anhydrase-2 and 3, crystallins, macrophage inhibitory factor, glutathione peroxidase, peroxiredoxins, S100 precursors, and von Willebrand factor. CONCLUSIONS Our analysis suggests the vitreous expresses nonstructural proteins that functionally interact with the retina to manage oxidative stress, immune reactions, and intracellular proteins may be exchanged between the retina and vitreous. This novel proteomic dataset can be used for investigating human vitreoretinopathies in mouse models. Validation of vitreoretinal biomarkers for human ocular diseases will provide a critical tool for diagnostics and an avenue for therapeutics.
Collapse
Affiliation(s)
- Jessica M. Skeie
- Omics Laboratory, University of Iowa, Iowa City, Iowa, United States of America
- Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, Iowa, United States of America
| | - Vinit B. Mahajan
- Omics Laboratory, University of Iowa, Iowa City, Iowa, United States of America
- Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, Iowa, United States of America
| |
Collapse
|
16
|
Deletion in the N-terminal half of olfactomedin 1 modifies its interaction with synaptic proteins and causes brain dystrophy and abnormal behavior in mice. Exp Neurol 2013; 250:205-18. [PMID: 24095980 DOI: 10.1016/j.expneurol.2013.09.019] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2013] [Revised: 09/16/2013] [Accepted: 09/19/2013] [Indexed: 11/24/2022]
Abstract
Olfactomedin 1 (Olfm1) is a secreted glycoprotein that is preferentially expressed in neuronal tissues. Here we show that deletion of exons 4 and 5 from the Olfm1 gene, which encodes a 52 amino acid long region in the N-terminal part of the protein, increased neonatal death and reduced body weight of surviving homozygous mice. Magnetic resonance imaging analyses revealed reduced brain volume and attenuated size of white matter tracts such as the anterior commissure, corpus callosum, and optic nerve. Adult Olfm1 mutant mice demonstrated abnormal behavior in several tests including reduced marble digging, elevated plus maze test, nesting activity and latency on balance beam tests as compared with their wild-type littermates. The olfactory system was both structurally and functionally disturbed by the mutation in the Olfm1 gene as shown by functional magnetic resonance imaging analysis and a smell test. Deficiencies of the olfactory system may contribute to the neonatal death and loss of body weight of Olfm1 mutant. Shotgun proteomics revealed 59 candidate proteins that co-precipitated with wild-type or mutant Olfm1 proteins in postnatal day 1 brain. Olfm1-binding targets included GluR2, Cav2.1, teneurin-4 and Kidins220. Modified interaction of Olfm1 with binding targets led to an increase in intracellular Ca(2+) concentration and activation of ERK1/2, MEK1 and CaMKII in the hippocampus and olfactory bulb of Olfm1 mutant mice compared with their wild-type littermates. Excessive activation of the CaMKII and Ras-ERK pathways in the Olfm1 mutant olfactory bulb and hippocampus by elevated intracellular calcium may contribute to the abnormal behavior and olfactory activity of Olfm1 mutant mice.
Collapse
|
17
|
Helmy M, Sugiyama N, Tomita M, Ishihama Y. Mass spectrum sequential subtraction speeds up searching large peptide MS/MS spectra datasets against large nucleotide databases for proteogenomics. Genes Cells 2012; 17:633-44. [PMID: 22686349 DOI: 10.1111/j.1365-2443.2012.01615.x] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 04/14/2012] [Indexed: 01/18/2023]
Abstract
We have developed a novel bioinformatics method called mass spectrum sequential subtraction (MSSS) to search large peptide spectra datasets produced by liquid chromatography/mass spectrometry (LC-MS/MS) against protein and large-sized nucleotide sequence databases. The main principle in MSSS is to search the peptide spectra set against the protein database, followed by removal of the spectra corresponding to the identified peptides to create a smaller set of the remaining peptide spectra for searching against the nucleotide sequences database. Therefore, we reduce the number of spectra to be searched to limit the peptide search space. Comparing MSSS and conventional search approach using a dataset of 27 LC-MS/MS runs of rice culture cells indicated that MSSS reduced the search queries to 50% and the search time to 75% on average. In addition, MSSS had no effect on the identification false-positive rate (FPR) or the novel peptide sequences identification ability. We used MSSS to analyze another dataset of 34 LC-MS/MS runs, resulting in identifying additional 74 novel peptides. Proteogenomic analysis with these additional peptides yielded 47 new genomic features in 24 rice genes plus 24 intergenic peptides. These results show that the utility of MSSS in searching large databases with large MS/MS datasets for proteogenomics.
Collapse
Affiliation(s)
- Mohamed Helmy
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0017, Japan
| | | | | | | |
Collapse
|
18
|
Grover H, Gopalakrishnan V. Efficient Processing of Models for Large-scale Shotgun Proteomics Data. INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING : NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM). INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS, AND WORKSHARING 2012; 2012:591-596. [PMID: 25309967 DOI: 10.4108/icst.collaboratecom.2012.250716] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Mass-spectrometry (MS) based proteomics has become a key enabling technology for the systems approach to biology, providing insights into the protein complement of an organism. Bioinformatics analyses play a critical role in interpretation of large, and often replicated, MS datasets generated across laboratories and institutions. A significant amount of computational effort in the workflow is spent on the identification of protein and peptide components of complex biological samples, and consists of a series of steps relying on large database searches and intricate scoring algorithms. In this work, we share our efforts and experience in efficient handling of these large MS datasets through database indexing and parallelization based on multiprocessor architectures. We also identify important challenges and opportunities that are relevant specifically to the task of peptide and protein identification, and more generally to similar multi-step problems that are inherently parallelizable.
Collapse
Affiliation(s)
- Himanshu Grover
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206-3701 USA ( )
| | - Vanathi Gopalakrishnan
- Department of Biomedical Informatics, and has joint appointments with the Intelligent Systems Program and the Computational & Systems Biology Department, University of Pittsburgh, Pittsburgh, PA 15206-3701 USA. She is also the corresponding author (phone: 412-624-3290; fax: 412-624-5310; )
| |
Collapse
|
19
|
Mazur MT, Fyhr R. An algorithm for identifying multiply modified endogenous proteins using both full-scan and high-resolution tandem mass spectrometric data. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2011; 25:3617-3626. [PMID: 22095511 DOI: 10.1002/rcm.5257] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Mass spectrometry based proteomic experiments have advanced considerably over the past decade with high-resolution and mass accuracy tandem mass spectrometry (MS/MS) capabilities now allowing routine interrogation of large peptides and proteins. Often a major bottleneck to 'top-down' proteomics, however, is the ability to identify and characterize the complex peptides or proteins based on the acquired high-resolution MS/MS spectra. For biological samples containing proteins with multiple unpredicted processing events, unsupervised identifications can be particularly challenging. Described here is a newly created search algorithm (MAR) designed for the identification of experimentally detected peptides or proteins. This algorithm relies only on predefined list of 'differential' modifications (e.g. phosphorylation) and a FASTA-formatted protein database, and is not constrained to full-length proteins for identification. The algorithm is further powered by the ability to leverage identified mass differences between chromatographically separated ions within full-scan MS spectra to automatically generate a list of likely 'differential' modifications to be searched. The utility of the algorithm is demonstrated with the identification of 54 unique polypeptides from human apolipoprotein enriched from the high-density lipoprotein particle (HDL), and searching time benchmarks demonstrate scalability (12 high-resolution MS/MS scans searched per minute with modifications considered). This parallelizable algorithm provides an additional solution for converting high-quality MS/MS data of multiply processed proteins into reliable identifications.
Collapse
Affiliation(s)
- Matthew T Mazur
- Department of Proteomics, Merck & Co., Inc., 126 E. Lincoln Avenue, P.O. Box 2000, Rahway, NJ 07065, USA
| | | |
Collapse
|
20
|
Diament BJ, Noble WS. Faster SEQUEST searching for peptide identification from tandem mass spectra. J Proteome Res 2011; 10:3871-9. [PMID: 21761931 DOI: 10.1021/pr101196n] [Citation(s) in RCA: 123] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Computational analysis of mass spectra remains the bottleneck in many proteomics experiments. SEQUEST was one of the earliest software packages to identify peptides from mass spectra by searching a database of known peptides. Though still popular, SEQUEST performs slowly. Crux and TurboSEQUEST have successfully sped up SEQUEST by adding a precomputed index to the search, but the demand for ever-faster peptide identification software continues to grow. Tide, introduced here, is a software program that implements the SEQUEST algorithm for peptide identification and that achieves a dramatic speedup over Crux and SEQUEST. The optimization strategies detailed here employ a combination of algorithmic and software engineering techniques to achieve speeds up to 170 times faster than a recent version of SEQUEST that uses indexing. For example, on a single Xeon CPU, Tide searches 10,000 spectra against a tryptic database of 27,499 Caenorhabditis elegans proteins at a rate of 1550 spectra per second, which compares favorably with a rate of 8.8 spectra per second for a recent version of SEQUEST with index running on the same hardware.
Collapse
Affiliation(s)
- Benjamin J Diament
- Department of Computer Science and Engineering, University of Washington, Seattle, Washington, United States
| | | |
Collapse
|
21
|
Ning Z, Zhou H, Wang F, Abu-Farha M, Figeys D. Analytical Aspects of Proteomics: 2009–2010. Anal Chem 2011; 83:4407-26. [DOI: 10.1021/ac200857t] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
| | - Hu Zhou
- Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China 201203
| | - Fangjun Wang
- Key Lab of Separation Sciences for Analytical Chemistry, National Chromatographic Research and Analysis Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, China 116023
| | | | | |
Collapse
|
22
|
Kertész-Farkas A, Reiz B, Myers MP, Pongor S. PTMSearch: A Greedy Tree Traversal Algorithm for Finding Protein Post-Translational Modifications in Tandem Mass Spectra. ACTA ACUST UNITED AC 2011. [DOI: 10.1007/978-3-642-23783-6_11] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
23
|
Zhou C, Chi H, Wang LH, Li Y, Wu YJ, Fu Y, Sun RX, He SM. Speeding up tandem mass spectrometry-based database searching by longest common prefix. BMC Bioinformatics 2010; 11:577. [PMID: 21108792 PMCID: PMC3000425 DOI: 10.1186/1471-2105-11-577] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2010] [Accepted: 11/25/2010] [Indexed: 11/10/2022] Open
Abstract
Background Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi- or non-specific enzymatic digestion, post-translational modifications and other factors. Some software tools choose peptide indexing to accelerate processing. However, peptide indexing requires a large amount of time and space for construction, especially for the non-specific digestion. Additionally, it is not flexible to use. Results We developed an algorithm based on the longest common prefix (ABLCP) to efficiently organize a protein sequence database. The longest common prefix is a data structure that is always coupled to the suffix array. It eliminates redundant candidate peptides in databases and reduces the corresponding peptide-spectrum matching times, thereby decreasing the identification time. This algorithm is based on the property of the longest common prefix. Even enzymatic digestion poses a challenge to this property, but some adjustments can be made to this algorithm to ensure that no candidate peptides are omitted. Compared with peptide indexing, ABLCP requires much less time and space for construction and is subject to fewer restrictions. Conclusions The ABLCP algorithm can help to improve data analysis efficiency. A software tool implementing this algorithm is available at http://pfind.ict.ac.cn/pfind2dot5/index.htm
Collapse
Affiliation(s)
- Chen Zhou
- Key Lab of Intelligent Information Processing, Chinese Academy of Sciences, Beijing 100190, China
| | | | | | | | | | | | | | | |
Collapse
|
24
|
Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 2010; 73:2092-123. [PMID: 20816881 DOI: 10.1016/j.jprot.2010.08.009] [Citation(s) in RCA: 358] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 08/25/2010] [Accepted: 08/25/2010] [Indexed: 12/18/2022]
Abstract
This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues.
Collapse
|
25
|
Wang L, Wang W, Chi H, Wu Y, Li Y, Fu Y, Zhou C, Sun R, Wang H, Liu C, Yuan Z, Xiu L, He SM. An efficient parallelization of phosphorylated peptide and protein identification. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2010; 24:1791-1798. [PMID: 20499324 DOI: 10.1002/rcm.4578] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Protein sequence database search based on tandem mass spectrometry is an essential method for protein identification. As the computational demand increases, parallel computing has become an important technique for accelerating proteomics data analysis. In this paper, we discuss several factors which could affect the runtime of the pFind search engine and build an estimation model. Based on this model, effective on-line and off-line scheduling methods were developed. An experiment on the public dataset from PhosphoPep consisting of 100 RAW files of phosphopeptides shows that the speedup on 100 processors is 83.7. The parallel version can complete the identification task within 9 min, while a stand-alone process on a single PC takes more than 10 h. On another larger dataset consisting of 1,366,471 spectra, the speedup on 320 processors is 258.9 and the efficiency is 80.9%. Our approach can be applied to other similar search engines.
Collapse
Affiliation(s)
- Leheng Wang
- Key Lab of Intelligent Information Processing, Chinese Academy of Sciences, Beijing 100190, P.R. China
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|