1
|
Mao J, Zhu H, Liu L, Fang Z, Dong M, Qin H, Ye M. MS-Decipher: a user-friendly proteome database search software with an emphasis on deciphering the spectra of O-linked glycopeptides. Bioinformatics 2022; 38:1911-1919. [PMID: 35020790 DOI: 10.1093/bioinformatics/btac014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Revised: 12/29/2021] [Accepted: 01/08/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The interpretation of mass spectrometry (MS) data is a crucial step in proteomics analysis, and the identification of post-translational modifications (PTMs) is vital for the understanding of the regulation mechanism of the living system. Among various PTMs, glycosylation is one of the most diverse ones. Though many search engines have been developed to decipher proteomic data, some of them are difficult to operate and have poor performance on glycoproteomic datasets compared to advanced glycoproteomic software. RESULTS To simplify the analysis of proteomic datasets, especially O-glycoproteomic datasets, here, we present a user-friendly proteomic database search platform, MS-Decipher, for the identification of peptides from MS data. Two scoring schemes can be chosen for peptide-spectra matching. It was found that MS-Decipher had the same sensitivity and confidence in peptide identification compared to traditional database searching software. In addition, a special search mode, O-Search, is integrated into MS-Decipher to identify O-glycopeptides for O-glycoproteomic analysis. Compared with Mascot, MetaMorpheus and MSFragger, MS-Decipher can obtain about 139.9%, 48.8% and 6.9% more O-glycopeptide-spectrum matches. A useful tool is provided in MS-Decipher for the visualization of O-glycopeptide-spectra matches. MS-Decipher has a user-friendly graphical user interface, making it easier to operate. Several file formats are available in the searching and validation steps. MS-Decipher is implemented with Java, and can be used cross-platform. AVAILABILITY AND IMPLEMENTATION MS-Decipher is freely available at https://github.com/DICP-1809/MS-Decipher for academic use. For detailed implementation steps, please see the user guide. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiawei Mao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Science, Dalian 116023, China
| | - He Zhu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Science, Dalian 116023, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Luyao Liu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Science, Dalian 116023, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zheng Fang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Science, Dalian 116023, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Mingming Dong
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Science, Dalian 116023, China.,School of Bioengineering, Dalian University of Technology, Dalian 116024, China
| | - Hongqiang Qin
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Science, Dalian 116023, China
| | - Mingliang Ye
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Science, Dalian 116023, China
| |
Collapse
|
2
|
The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms. Comput Struct Biotechnol J 2022; 20:1402-1412. [PMID: 35386104 PMCID: PMC8956878 DOI: 10.1016/j.csbj.2022.03.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 03/09/2022] [Accepted: 03/09/2022] [Indexed: 01/24/2023] Open
Abstract
Most correct de novo peptides have ⩽1 missing fragmentation cleavages. DeepNovo outperforms Novor for peptide accuracy for both data types. Novor excels at amino acid recall when many fragmentation cleavages are missing. Deep learning allows DeepNovo to predict amino acids without adjacent peaks.
Proteomics aims to characterise system-wide protein expression and typically relies on mass-spectrometry and peptide fragmentation, followed by a database search for protein identification. It has wide ranging applications from clinical to environmental settings and virtually impacts on every area of biology. In that context, de novo peptide sequencing is becoming increasingly popular. Historically its performance lagged behind database search methods but with the integration of machine learning, this field of research is gaining momentum. To enable de novo peptide sequencing to realise its full potential, it is critical to explore the mass spectrometry data underpinning peptide identification. In this research we investigate the characteristics of tandem mass spectra using 8 published datasets. We then evaluate two state of the art de novo peptide sequencing algorithms, Novor and DeepNovo, with a particular focus on their performance with regard to missing fragmentation cleavage sites and noise. DeepNovo was found to perform better than Novor overall. However, Novor recalled more correct amino acids when 6 or more cleavage sites were missing. Furthermore, less than 11% of each algorithms’ correct peptide predictions emanate from data with more than one missing cleavage site, highlighting the issues missing cleavages pose. We further investigate how the algorithms manage to correctly identify peptides with many of these missing fragmentation cleavages. We show how noise negatively impacts the performance of both algorithms, when high intensity peaks are considered. Finally, we provide recommendations regarding further algorithms’ improvements and offer potential avenues to overcome current inherent data limitations.
Collapse
|
3
|
Dai J, Yu F, Zhou C, Yu W. Understanding the Limit of Open Search in the Identification of Peptides With Post-translational Modifications - A Simulation-Based Study. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2884-2890. [PMID: 32356758 DOI: 10.1109/tcbb.2020.2991207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Peptide identification from tandem mass spectrometry data is a fundamental task in computational proteomics. Traditional algorithms perform well when facing unmodified peptides. However, when peptides have post-translational modifications (PTMs), these methods cannot provide satisfactory results. Recently, open search methods have been proposed to identify peptides with PTMs. While the performance of these new methods is promising, the identification results vary greatly with respect to the quality of tandem mass spectra and the number of PTMs in peptides. This motivates us to systematically study the relationship between the performance of open search methods and the quality parameters of tandem mass spectrometry data as well as the number of PTMs in peptides. In this paper, we have proposed an analytical model derived from simulated data to describe the relationship between the probability of obtaining correct results and the spectrum quality as well as the number of PTMs. The proposed model is verified using 1,464,146 real experimental spectra. The consistent trend observed in both simulated data and real data reveals the necessary conditions to effectively apply open search methods. Source code of our study is available at http://bioinformatics.ust.hk/PST.html.
Collapse
|
4
|
Aksenov AA, Laponogov I, Zhang Z, Doran SLF, Belluomo I, Veselkov D, Bittremieux W, Nothias LF, Nothias-Esposito M, Maloney KN, Misra BB, Melnik AV, Smirnov A, Du X, Jones KL, Dorrestein K, Panitchpakdi M, Ernst M, van der Hooft JJJ, Gonzalez M, Carazzone C, Amézquita A, Callewaert C, Morton JT, Quinn RA, Bouslimani A, Orio AA, Petras D, Smania AM, Couvillion SP, Burnet MC, Nicora CD, Zink E, Metz TO, Artaev V, Humston-Fulmer E, Gregor R, Meijler MM, Mizrahi I, Eyal S, Anderson B, Dutton R, Lugan R, Boulch PL, Guitton Y, Prevost S, Poirier A, Dervilly G, Le Bizec B, Fait A, Persi NS, Song C, Gashu K, Coras R, Guma M, Manasson J, Scher JU, Barupal DK, Alseekh S, Fernie AR, Mirnezami R, Vasiliou V, Schmid R, Borisov RS, Kulikova LN, Knight R, Wang M, Hanna GB, Dorrestein PC, Veselkov K. Auto-deconvolution and molecular networking of gas chromatography-mass spectrometry data. Nat Biotechnol 2021; 39:169-173. [PMID: 33169034 PMCID: PMC7971188 DOI: 10.1038/s41587-020-0700-3] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 08/26/2020] [Accepted: 09/09/2020] [Indexed: 12/23/2022]
Abstract
We engineered a machine learning approach, MSHub, to enable auto-deconvolution of gas chromatography-mass spectrometry (GC-MS) data. We then designed workflows to enable the community to store, process, share, annotate, compare and perform molecular networking of GC-MS data within the Global Natural Product Social (GNPS) Molecular Networking analysis platform. MSHub/GNPS performs auto-deconvolution of compound fragmentation patterns via unsupervised non-negative matrix factorization and quantifies the reproducibility of fragmentation patterns across samples.
Collapse
Affiliation(s)
- Alexander A Aksenov
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
| | - Ivan Laponogov
- Department of Surgery and Cancer, Imperial College London, South Kensington Campus, London, UK
| | - Zheng Zhang
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Sophie L F Doran
- Department of Surgery and Cancer, Imperial College London, South Kensington Campus, London, UK
| | - Ilaria Belluomo
- Department of Surgery and Cancer, Imperial College London, South Kensington Campus, London, UK
| | - Dennis Veselkov
- Intelligify Limited, London, UK
- Department of Computing, Imperial College, South Kensington Campus, London, UK
| | - Wout Bittremieux
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Louis Felix Nothias
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
| | - Mélissa Nothias-Esposito
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
| | - Katherine N Maloney
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Department of Chemistry, Point Loma Nazarene University, San Diego, CA, USA
| | - Biswapriya B Misra
- Center for Precision Medicine, Department of Internal Medicine, Section of Molecular Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Alexey V Melnik
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Aleksandr Smirnov
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | - Xiuxia Du
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | - Kenneth L Jones
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Kathleen Dorrestein
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
| | - Morgan Panitchpakdi
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Madeleine Ernst
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Section for Clinical Mass Spectrometry, Department of Congenital Disorders, Danish Center for Neonatal Screening, Statens Serum Institut, Copenhagen, Denmark
| | - Justin J J van der Hooft
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Bioinformatics Group, Wageningen University, Wageningen, the Netherlands
| | - Mabel Gonzalez
- Department of Chemistry, Universidad de los Andes, Bogotá, Colombia
| | - Chiara Carazzone
- Department of Chemistry, Universidad de los Andes, Bogotá, Colombia
| | - Adolfo Amézquita
- Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Chris Callewaert
- Center for Microbial Ecology and Technology, Ghent, Belgium
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - James T Morton
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Robert A Quinn
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
| | - Amina Bouslimani
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
| | - Andrea Albarracín Orio
- IRNASUS, Universidad Católica de Córdoba, CONICET, Facultad de Ciencias Agropecuarias, Córdoba, Argentina
| | - Daniel Petras
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
| | - Andrea M Smania
- Universidad Nacional de Córdoba, Facultad de Ciencias Químicas, Departamento de Química Biológica Ranwel Caputto, Córdoba, Argentina
- CONICET, Universidad Nacional de Córdoba, Centro de Investigaciones en Química Biológica de Córdoba (CIQUIBIC), Córdoba, Argentina
| | - Sneha P Couvillion
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Meagan C Burnet
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Carrie D Nicora
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Erika Zink
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Thomas O Metz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | | | | | - Rachel Gregor
- Department of Chemistry and the National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Michael M Meijler
- Department of Chemistry and the National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Itzhak Mizrahi
- Department of Life Sciences and the National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Stav Eyal
- Department of Life Sciences and the National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Brooke Anderson
- Division of Biological Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Rachel Dutton
- Division of Biological Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Raphaël Lugan
- UMR Qualisud, Université d'Avignon et des Pays du Vaucluse, Agrosciences, Avignon, France
| | - Pauline Le Boulch
- UMR Qualisud, Université d'Avignon et des Pays du Vaucluse, Agrosciences, Avignon, France
| | - Yann Guitton
- Laboratoire d'Etude des Résidus et Contaminants dans les Aliments (LABERCA), Oniris, INRAe, Nantes, France
| | - Stephanie Prevost
- Laboratoire d'Etude des Résidus et Contaminants dans les Aliments (LABERCA), Oniris, INRAe, Nantes, France
| | - Audrey Poirier
- Laboratoire d'Etude des Résidus et Contaminants dans les Aliments (LABERCA), Oniris, INRAe, Nantes, France
| | - Gaud Dervilly
- Laboratoire d'Etude des Résidus et Contaminants dans les Aliments (LABERCA), Oniris, INRAe, Nantes, France
| | - Bruno Le Bizec
- Laboratoire d'Etude des Résidus et Contaminants dans les Aliments (LABERCA), Oniris, INRAe, Nantes, France
| | - Aaron Fait
- The French Associates Institute for Agriculture and Biotechnology of Dryland, The Jacob Blaustein Institutes for Desert Research, Ben Gurion University of the Negev, Sede Boqer Campus, Beer Sheva, Israel
| | - Noga Sikron Persi
- The French Associates Institute for Agriculture and Biotechnology of Dryland, The Jacob Blaustein Institutes for Desert Research, Ben Gurion University of the Negev, Sede Boqer Campus, Beer Sheva, Israel
| | - Chao Song
- The French Associates Institute for Agriculture and Biotechnology of Dryland, The Jacob Blaustein Institutes for Desert Research, Ben Gurion University of the Negev, Sede Boqer Campus, Beer Sheva, Israel
| | - Kelem Gashu
- The French Associates Institute for Agriculture and Biotechnology of Dryland, The Jacob Blaustein Institutes for Desert Research, Ben Gurion University of the Negev, Sede Boqer Campus, Beer Sheva, Israel
| | - Roxana Coras
- Division of Rheumatology, Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Monica Guma
- Division of Rheumatology, Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Julia Manasson
- Division of Rheumatology, Department of Medicine, New York University School of Medicine, New York, NY, USA
| | - Jose U Scher
- Division of Rheumatology, Department of Medicine, New York University School of Medicine, New York, NY, USA
| | - Dinesh Kumar Barupal
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Saleh Alseekh
- Max Planck Institute for Molecular Plant Physiology, Potsdam-Golm, Germany
- Center of Plant Systems Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria
| | - Alisdair R Fernie
- Max Planck Institute for Molecular Plant Physiology, Potsdam-Golm, Germany
- Center of Plant Systems Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria
| | - Reza Mirnezami
- Department of Colorectal Surgery, Royal Free Hospital NHS Foundation Trust, Hampstead, London, UK
| | - Vasilis Vasiliou
- Department of Environmental Health Sciences, Yale School of Public Health, Yale University, New Haven, CT, USA
| | - Robin Schmid
- Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
| | - Roman S Borisov
- A.V. Topchiev Institute of Petrochemical Synthesis RAS, Moscow, Russian Federation
| | - Larisa N Kulikova
- Рeoples' Friendship University of Russia (RUDN University), Moscow, Russian Federation
| | - Rob Knight
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
- UCSD Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Mingxun Wang
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
| | - George B Hanna
- Department of Surgery and Cancer, Imperial College London, South Kensington Campus, London, UK
| | - Pieter C Dorrestein
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA.
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA.
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA.
- UCSD Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA.
| | - Kirill Veselkov
- Department of Surgery and Cancer, Imperial College London, South Kensington Campus, London, UK.
| |
Collapse
|
5
|
Aksenov AA, Laponogov I, Zhang Z, Doran SLF, Belluomo I, Veselkov D, Bittremieux W, Nothias LF, Nothias-Esposito M, Maloney KN, Misra BB, Melnik AV, Smirnov A, Du X, Jones KL, Dorrestein K, Panitchpakdi M, Ernst M, van der Hooft JJJ, Gonzalez M, Carazzone C, Amézquita A, Callewaert C, Morton JT, Quinn RA, Bouslimani A, Orio AA, Petras D, Smania AM, Couvillion SP, Burnet MC, Nicora CD, Zink E, Metz TO, Artaev V, Humston-Fulmer E, Gregor R, Meijler MM, Mizrahi I, Eyal S, Anderson B, Dutton R, Lugan R, Boulch PL, Guitton Y, Prevost S, Poirier A, Dervilly G, Le Bizec B, Fait A, Persi NS, Song C, Gashu K, Coras R, Guma M, Manasson J, Scher JU, Barupal DK, Alseekh S, Fernie AR, Mirnezami R, Vasiliou V, Schmid R, Borisov RS, Kulikova LN, Knight R, Wang M, Hanna GB, Dorrestein PC, Veselkov K. Auto-deconvolution and molecular networking of gas chromatography-mass spectrometry data. Nat Biotechnol 2021. [PMID: 33169034 DOI: 10.1038/s41587-41020-40700-41583] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
We engineered a machine learning approach, MSHub, to enable auto-deconvolution of gas chromatography-mass spectrometry (GC-MS) data. We then designed workflows to enable the community to store, process, share, annotate, compare and perform molecular networking of GC-MS data within the Global Natural Product Social (GNPS) Molecular Networking analysis platform. MSHub/GNPS performs auto-deconvolution of compound fragmentation patterns via unsupervised non-negative matrix factorization and quantifies the reproducibility of fragmentation patterns across samples.
Collapse
Affiliation(s)
- Alexander A Aksenov
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
| | - Ivan Laponogov
- Department of Surgery and Cancer, Imperial College London, South Kensington Campus, London, UK
| | - Zheng Zhang
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Sophie L F Doran
- Department of Surgery and Cancer, Imperial College London, South Kensington Campus, London, UK
| | - Ilaria Belluomo
- Department of Surgery and Cancer, Imperial College London, South Kensington Campus, London, UK
| | - Dennis Veselkov
- Intelligify Limited, London, UK
- Department of Computing, Imperial College, South Kensington Campus, London, UK
| | - Wout Bittremieux
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Louis Felix Nothias
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
| | - Mélissa Nothias-Esposito
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
| | - Katherine N Maloney
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Department of Chemistry, Point Loma Nazarene University, San Diego, CA, USA
| | - Biswapriya B Misra
- Center for Precision Medicine, Department of Internal Medicine, Section of Molecular Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Alexey V Melnik
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Aleksandr Smirnov
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | - Xiuxia Du
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | - Kenneth L Jones
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Kathleen Dorrestein
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
| | - Morgan Panitchpakdi
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Madeleine Ernst
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Section for Clinical Mass Spectrometry, Department of Congenital Disorders, Danish Center for Neonatal Screening, Statens Serum Institut, Copenhagen, Denmark
| | - Justin J J van der Hooft
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Bioinformatics Group, Wageningen University, Wageningen, the Netherlands
| | - Mabel Gonzalez
- Department of Chemistry, Universidad de los Andes, Bogotá, Colombia
| | - Chiara Carazzone
- Department of Chemistry, Universidad de los Andes, Bogotá, Colombia
| | - Adolfo Amézquita
- Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Chris Callewaert
- Center for Microbial Ecology and Technology, Ghent, Belgium
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - James T Morton
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Robert A Quinn
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
| | - Amina Bouslimani
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
| | - Andrea Albarracín Orio
- IRNASUS, Universidad Católica de Córdoba, CONICET, Facultad de Ciencias Agropecuarias, Córdoba, Argentina
| | - Daniel Petras
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
| | - Andrea M Smania
- Universidad Nacional de Córdoba, Facultad de Ciencias Químicas, Departamento de Química Biológica Ranwel Caputto, Córdoba, Argentina
- CONICET, Universidad Nacional de Córdoba, Centro de Investigaciones en Química Biológica de Córdoba (CIQUIBIC), Córdoba, Argentina
| | - Sneha P Couvillion
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Meagan C Burnet
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Carrie D Nicora
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Erika Zink
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Thomas O Metz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | | | | | - Rachel Gregor
- Department of Chemistry and the National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Michael M Meijler
- Department of Chemistry and the National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Itzhak Mizrahi
- Department of Life Sciences and the National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Stav Eyal
- Department of Life Sciences and the National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Brooke Anderson
- Division of Biological Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Rachel Dutton
- Division of Biological Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Raphaël Lugan
- UMR Qualisud, Université d'Avignon et des Pays du Vaucluse, Agrosciences, Avignon, France
| | - Pauline Le Boulch
- UMR Qualisud, Université d'Avignon et des Pays du Vaucluse, Agrosciences, Avignon, France
| | - Yann Guitton
- Laboratoire d'Etude des Résidus et Contaminants dans les Aliments (LABERCA), Oniris, INRAe, Nantes, France
| | - Stephanie Prevost
- Laboratoire d'Etude des Résidus et Contaminants dans les Aliments (LABERCA), Oniris, INRAe, Nantes, France
| | - Audrey Poirier
- Laboratoire d'Etude des Résidus et Contaminants dans les Aliments (LABERCA), Oniris, INRAe, Nantes, France
| | - Gaud Dervilly
- Laboratoire d'Etude des Résidus et Contaminants dans les Aliments (LABERCA), Oniris, INRAe, Nantes, France
| | - Bruno Le Bizec
- Laboratoire d'Etude des Résidus et Contaminants dans les Aliments (LABERCA), Oniris, INRAe, Nantes, France
| | - Aaron Fait
- The French Associates Institute for Agriculture and Biotechnology of Dryland, The Jacob Blaustein Institutes for Desert Research, Ben Gurion University of the Negev, Sede Boqer Campus, Beer Sheva, Israel
| | - Noga Sikron Persi
- The French Associates Institute for Agriculture and Biotechnology of Dryland, The Jacob Blaustein Institutes for Desert Research, Ben Gurion University of the Negev, Sede Boqer Campus, Beer Sheva, Israel
| | - Chao Song
- The French Associates Institute for Agriculture and Biotechnology of Dryland, The Jacob Blaustein Institutes for Desert Research, Ben Gurion University of the Negev, Sede Boqer Campus, Beer Sheva, Israel
| | - Kelem Gashu
- The French Associates Institute for Agriculture and Biotechnology of Dryland, The Jacob Blaustein Institutes for Desert Research, Ben Gurion University of the Negev, Sede Boqer Campus, Beer Sheva, Israel
| | - Roxana Coras
- Division of Rheumatology, Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Monica Guma
- Division of Rheumatology, Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Julia Manasson
- Division of Rheumatology, Department of Medicine, New York University School of Medicine, New York, NY, USA
| | - Jose U Scher
- Division of Rheumatology, Department of Medicine, New York University School of Medicine, New York, NY, USA
| | - Dinesh Kumar Barupal
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Saleh Alseekh
- Max Planck Institute for Molecular Plant Physiology, Potsdam-Golm, Germany
- Center of Plant Systems Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria
| | - Alisdair R Fernie
- Max Planck Institute for Molecular Plant Physiology, Potsdam-Golm, Germany
- Center of Plant Systems Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria
| | - Reza Mirnezami
- Department of Colorectal Surgery, Royal Free Hospital NHS Foundation Trust, Hampstead, London, UK
| | - Vasilis Vasiliou
- Department of Environmental Health Sciences, Yale School of Public Health, Yale University, New Haven, CT, USA
| | - Robin Schmid
- Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
| | - Roman S Borisov
- A.V. Topchiev Institute of Petrochemical Synthesis RAS, Moscow, Russian Federation
| | - Larisa N Kulikova
- Рeoples' Friendship University of Russia (RUDN University), Moscow, Russian Federation
| | - Rob Knight
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
- UCSD Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Mingxun Wang
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA
| | - George B Hanna
- Department of Surgery and Cancer, Imperial College London, South Kensington Campus, London, UK
| | - Pieter C Dorrestein
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA.
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California,San Diego, La Jolla, CA, USA.
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA.
- UCSD Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA.
| | - Kirill Veselkov
- Department of Surgery and Cancer, Imperial College London, South Kensington Campus, London, UK.
| |
Collapse
|
6
|
Vitorino R, Guedes S, Trindade F, Correia I, Moura G, Carvalho P, Santos MAS, Amado F. De novo sequencing of proteins by mass spectrometry. Expert Rev Proteomics 2020; 17:595-607. [PMID: 33016158 DOI: 10.1080/14789450.2020.1831387] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
INTRODUCTION Proteins are crucial for every cellular activity and unraveling their sequence and structure is a crucial step to fully understand their biology. Early methods of protein sequencing were mainly based on the use of enzymatic or chemical degradation of peptide chains. With the completion of the human genome project and with the expansion of the information available for each protein, various databases containing this sequence information were formed. AREAS COVERED De novo protein sequencing, shotgun proteomics and other mass-spectrometric techniques, along with the various software are currently available for proteogenomic analysis. Emphasis is placed on the methods for de novo sequencing, together with potential and shortcomings using databases for interpretation of protein sequence data. EXPERT OPINION As mass-spectrometry sequencing performance is improving with better software and hardware optimizations, combined with user-friendly interfaces, de-novo protein sequencing becomes imperative in shotgun proteomic studies. Issues regarding unknown or mutated peptide sequences, as well as, unexpected post-translational modifications (PTMs) and their identification through false discovery rate searches using the target/decoy strategy need to be addressed. Ideally, it should become integrated in standard proteomic workflows as an add-on to conventional database search engines, which then would be able to provide improved identification.
Collapse
Affiliation(s)
- Rui Vitorino
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal.,iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal.,Unidade De Investigação Cardiovascular, Departamento De Cirurgia E Fisiologia, Faculdade De Medicina, Universidade Do Porto , Porto, Portugal
| | - Sofia Guedes
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal
| | - Fabio Trindade
- Unidade De Investigação Cardiovascular, Departamento De Cirurgia E Fisiologia, Faculdade De Medicina, Universidade Do Porto , Porto, Portugal
| | - Inês Correia
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Gabriela Moura
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Paulo Carvalho
- Laboratory for Structural and Computational Proteomics, Carlos Chagas Institute, FIOCRUZ, Laboratory for Proteomics and Protein Engineering , Brazil
| | - Manuel A S Santos
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Francisco Amado
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal
| |
Collapse
|
7
|
DeLaney K, Cao W, Ma Y, Ma M, Zhang Y, Li L. PRESnovo: Prescreening Prior to de novo Sequencing to Improve Accuracy and Sensitivity of Neuropeptide Identification. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2020; 31:1358-1371. [PMID: 32266812 PMCID: PMC7332408 DOI: 10.1021/jasms.0c00013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Identification of peptides in species lacking fully sequenced genomes is challenging due to the lack of prior knowledge. De novo sequencing is the method of choice, but its performance is less than satisfactory due to algorithmic bias and interference in complex MS/MS spectra. The task becomes even more challenging for endogenous peptides that do not involve an enzymatic digestion step, such as neuropeptides. However, many neuropeptides possess common sequence motifs that are conserved across members of the same family. Taking advantage of this feature to improve de novo sequencing of neuropeptides, we have developed a method named PRESnovo (prescreening precursors prior to de novo sequencing) to predict the motif from a MS/MS spectrum. A neuropeptide sequence is broken into a motif with conserved amino acid residues and the remaining partial sequence. By searching against a predefined motif database constructed from known homologous sequences, PRESnovo assigns the most probable motif to each precursor via a sophisticated scoring function. Performance analysis was conducted with 15 neuropeptide standards, and 11 neuropeptides were correctly identified with PRESnovo compared to 1 identification by PEAKS only. We applied PRESnovo to assign motifs to peptide sequences in conjunction with PEAKS for assigning the rest of the peptide sequence in order to discover neuropeptides in tissue samples of green crab, C. maenas, and Jonah crab, C. borealis. Collectively, a large number of neuropeptides were identified, including 13 putative neuropeptides identified in green crab brain, 77 in Jonah crab brain, and 47 in Jonah crab sinus glands for the first time. This PRESnovo strategy greatly simplifies de novo sequencing and enhances the accuracy and sensitivity of neuropeptide identification when common motifs are present.
Collapse
|
8
|
Tagirdzhanov AM, Shlemov A, Gurevich A. NPS: scoring and evaluating the statistical significance of peptidic natural product-spectrum matches. Bioinformatics 2020; 35:i315-i323. [PMID: 31510666 PMCID: PMC6612854 DOI: 10.1093/bioinformatics/btz374] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Peptidic natural products (PNPs) are considered a promising compound class that has many applications in medicine. Recently developed mass spectrometry-based pipelines are transforming PNP discovery into a high-throughput technology. However, the current computational methods for PNP identification via database search of mass spectra are still in their infancy and could be substantially improved. RESULTS Here we present NPS, a statistical learning-based approach for scoring PNP-spectrum matches. We incorporated NPS into two leading PNP discovery tools and benchmarked them on millions of natural product mass spectra. The results demonstrate more than 45% increase in the number of identified spectra and 20% more found PNPs at a false discovery rate of 1%. AVAILABILITY AND IMPLEMENTATION NPS is available as a command line tool and as a web application at http://cab.spbu.ru/software/NPS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Azat M Tagirdzhanov
- Center for Algorithmic Biotechnology, St. Petersburg State University, St. Petersburg, Russia.,Department of Higher Mathematics, St. Petersburg Electrotechnical University "LETI", St. Petersburg, Russia
| | - Alexander Shlemov
- Center for Algorithmic Biotechnology, St. Petersburg State University, St. Petersburg, Russia
| | - Alexey Gurevich
- Center for Algorithmic Biotechnology, St. Petersburg State University, St. Petersburg, Russia
| |
Collapse
|
9
|
Zhong J, Sun Y, Xie M, Peng W, Zhang C, Wu FX, Wang J. Proteoform characterization based on top-down mass spectrometry. Brief Bioinform 2020; 22:1729-1750. [PMID: 32118252 DOI: 10.1093/bib/bbaa015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 01/23/2020] [Indexed: 12/16/2022] Open
Abstract
Proteins are dominant executors of living processes. Compared to genetic variations, changes in the molecular structure and state of a protein (i.e. proteoforms) are more directly related to pathological changes in diseases. Characterizing proteoforms involves identifying and locating primary structure alterations (PSAs) in proteoforms, which is of practical importance for the advancement of the medical profession. With the development of mass spectrometry (MS) technology, the characterization of proteoforms based on top-down MS technology has become possible. This type of method is relatively new and faces many challenges. Since the proteoform identification is the most important process in characterizing proteoforms, we comprehensively review the existing proteoform identification methods in this study. Before identifying proteoforms, the spectra need to be preprocessed, and protein sequence databases can be filtered to speed up the identification. Therefore, we also summarize some popular deconvolution algorithms, various filtering algorithms for improving the proteoform identification performance and various scoring methods for localizing proteoforms. Moreover, commonly used methods were evaluated and compared in this review. We believe our review could help researchers better understand the current state of the development in this field and design new efficient algorithms for the proteoform characterization.
Collapse
Affiliation(s)
- Jiancheng Zhong
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Yusui Sun
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Wei Peng
- Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Chushu Zhang
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Fang-Xiang Wu
- College of Engineering and the Department of Computer Science at University of Saskatchewan, Saskatoon, Canada
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering at Central South University, Changsha, Hunan, China
| |
Collapse
|
10
|
Mao Y, Daly TJ, Li N. Lys-Sequencer: An algorithm for de novo sequencing of peptides by paired single residue transposed Lys-C and Lys-N digestion coupled with high-resolution mass spectrometry. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2020; 34:e8574. [PMID: 31499586 DOI: 10.1002/rcm.8574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Revised: 08/27/2019] [Accepted: 09/02/2019] [Indexed: 06/10/2023]
Abstract
RATIONALE Database-dependent identification of proteins by mass spectrometry is well established, but has limitations when there are novel proteins, mutations, splice variants, and post-translational modifications (PTMs) not available in the established reference database. De novo sequencing as a database-independent approach could address these limitations by deducing peptide sequences directly from experimental tandem mass spectrometry spectra, while concomitantly yielding residue-by-residue confidence metrics. METHODS Equal amounts of bovine serum albumin (BSA) sample aliquots were digested separately with Lys-C and Lys-N complementary peptidases, separated by reversed-phase ultra-high-performance liquid chromatography (UPLC), and analyzed by collision-induced dissociation (CID)-based mass spectrometry on an Orbitrap mass spectrometer. In the Lys-Sequencer algorithm, matched tandem mass spectra with equal precursor ion mass from complementary digestions were paired, and fragment ion types were identified based on the unique mass relationship between fragment ions extracted from a spectrum pair followed by de novo sequencing of peptides with identification confidence assigned at the residue level. RESULTS In all the matched spectrum pairs, 34 top-ranked BSA peptides were identified, from which 391 amino acid residues were identified correctly, covering ~67% of the full sequence of BSA (583 residues) with only ~6% (35 residues) exhibiting ambiguity in the sequence order (although amino acid compositions were still correctly assigned). Of note, this approach identified peptide sequences up to 17 amino acids in length without ambiguity, with the exception of the N-terminal or C-terminal peptides containing lysine (18-mer). CONCLUSIONS The algorithm ("Lys-Sequencer") developed in this work achieves high precision for de novo sequencing of peptides. This method facilitates the identification of point mutation and new PTMs in the protein characterization and discovery of new peptides and proteins with varying levels of confidence.
Collapse
Affiliation(s)
- Yuan Mao
- Department of Analytical Chemistry, Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| | - Thomas J Daly
- Department of Analytical Chemistry, Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| | - Ning Li
- Department of Analytical Chemistry, Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| |
Collapse
|
11
|
Chi H, Liu C, Yang H, Zeng WF, Wu L, Zhou WJ, Wang RM, Niu XN, Ding YH, Zhang Y, Wang ZW, Chen ZL, Sun RX, Liu T, Tan GM, Dong MQ, Xu P, Zhang PH, He SM. Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine. Nat Biotechnol 2018; 36:nbt.4236. [PMID: 30295672 DOI: 10.1038/nbt.4236] [Citation(s) in RCA: 219] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Accepted: 08/03/2018] [Indexed: 12/27/2022]
Abstract
We present a sequence-tag-based search engine, Open-pFind, to identify peptides in an ultra-large search space that includes coeluting peptides, unexpected modifications and digestions. Our method detects peptides with higher precision and speed than seven other search engines. Open-pFind identified 70-85% of the tandem mass spectra in four large-scale datasets and 14,064 proteins, each supported by at least two protein-unique peptides, in a human proteome dataset.
Collapse
Affiliation(s)
- Hao Chi
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Chao Liu
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Hao Yang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wen-Feng Zeng
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Long Wu
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wen-Jing Zhou
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Rui-Min Wang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xiu-Nan Niu
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yue-He Ding
- National Institute of Biological Sciences, Beijing, Beijing, China
| | - Yao Zhang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
- State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, College of Ecology and Evolution, Sun Yat-Sen University, Guangzhou, China
| | - Zhao-Wei Wang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhen-Lin Chen
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Rui-Xiang Sun
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Tao Liu
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
| | - Guang-Ming Tan
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, Beijing, China
| | - Ping Xu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Pei-Heng Zhang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
| | - Si-Min He
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
12
|
Kou Q, Wu S, Liu X. Systematic Evaluation of Protein Sequence Filtering Algorithms for Proteoform Identification Using Top-Down Mass Spectrometry. Proteomics 2018; 18. [PMID: 29327814 DOI: 10.1002/pmic.201700306] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Revised: 11/20/2017] [Indexed: 01/19/2023]
Abstract
Complex proteoforms contain various primary structural alterations resulting from variations in genes, RNA, and proteins. Top-down mass spectrometry is commonly used for analyzing complex proteoforms because it provides whole sequence information of the proteoforms. Proteoform identification by top-down mass spectral database search is a challenging computational problem because the types and/or locations of some alterations in target proteoforms are in general unknown. Although spectral alignment and mass graph alignment algorithms have been proposed for identifying proteoforms with unknown alterations, they are extremely slow to align millions of spectra against tens of thousands of protein sequences in high throughput proteome level analyses. Many software tools in this area combine efficient protein sequence filtering algorithms and spectral alignment algorithms to speed up database search. As a result, the performance of these tools heavily relies on the sensitivity and efficiency of their filtering algorithms. Here, we propose two efficient approximate spectrum-based filtering algorithms for proteoform identification. We evaluated the performances of the proposed algorithms and four existing ones on simulated and real top-down mass spectrometry data sets. Experiments showed that the proposed algorithms outperformed the existing ones for complex proteoform identification. In addition, combining the proposed filtering algorithms and mass graph alignment algorithms identified many proteoforms missed by ProSightPC in proteome-level proteoform analyses.
Collapse
Affiliation(s)
- Qiang Kou
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, USA
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, USA
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, USA.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
13
|
Dimitrakopoulos L, Prassas I, Diamandis EP, Charames GS. Onco-proteogenomics: Multi-omics level data integration for accurate phenotype prediction. Crit Rev Clin Lab Sci 2017; 54:414-432. [DOI: 10.1080/10408363.2017.1384446] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Lampros Dimitrakopoulos
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Ioannis Prassas
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
| | - Eleftherios P. Diamandis
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Clinical Biochemistry, University Health Network, Toronto, ON, Canada
| | - George S. Charames
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| |
Collapse
|
14
|
Abstract
In computational proteomics, the identification of peptides with an unlimited number of post-translational modification (PTM) types is a challenging task. The computational cost associated with database search increases exponentially with respect to the number of modified amino acids and linearly with respect to the number of potential PTM types at each amino acid. The problem becomes intractable very quickly if we want to enumerate all possible PTM patterns. To address this issue, one group of methods named restricted tools (including Mascot, Comet, and MS-GF+) only allow a small number of PTM types in database search process. Alternatively, the other group of methods named unrestricted tools (including MS-Alignment, ProteinProspector, and MODa) avoids enumerating PTM patterns with an alignment-based approach to localizing and characterizing modified amino acids. However, because of the large search space and PTM localization issue, the sensitivity of these unrestricted tools is low. This paper proposes a novel method named PIPI to achieve PTM-invariant peptide identification. PIPI belongs to the category of unrestricted tools. It first codes peptide sequences into Boolean vectors and codes experimental spectra into real-valued vectors. For each coded spectrum, it then searches the coded sequence database to find the top scored peptide sequences as candidates. After that, PIPI uses dynamic programming to localize and characterize modified amino acids in each candidate. We used simulation experiments and real data experiments to evaluate the performance in comparison with restricted tools (i.e., Mascot, Comet, and MS-GF+) and unrestricted tools (i.e., Mascot with error tolerant search, MS-Alignment, ProteinProspector, and MODa). Comparison with restricted tools shows that PIPI has a close sensitivity and running speed. Comparison with unrestricted tools shows that PIPI has the highest sensitivity except for Mascot with error tolerant search and ProteinProspector. These two tools simplify the task by only considering up to one modified amino acid in each peptide, which results in a higher sensitivity but has difficulty in dealing with multiple modified amino acids. The simulation experiments also show that PIPI has the lowest false discovery proportion, the highest PTM characterization accuracy, and the shortest running time among the unrestricted tools.
Collapse
Affiliation(s)
- Fengchao Yu
- Division of Biomedical Engineering, The Hong Kong University of Science and Technology , Hong Kong, China
| | - Ning Li
- Division of Biomedical Engineering, The Hong Kong University of Science and Technology , Hong Kong, China.,Division of Life Science, The Hong Kong University of Science and Technology , Hong Kong, China
| | - Weichuan Yu
- Division of Biomedical Engineering, The Hong Kong University of Science and Technology , Hong Kong, China.,Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology , Hong Kong, China
| |
Collapse
|
15
|
Gorshkov V, Hotta SYK, Verano-Braga T, Kjeldsen F. Peptide de novo sequencing of mixture tandem mass spectra. Proteomics 2016; 16:2470-9. [PMID: 27329701 PMCID: PMC5297990 DOI: 10.1002/pmic.201500549] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2015] [Revised: 04/27/2016] [Accepted: 06/17/2016] [Indexed: 02/02/2023]
Abstract
The impact of mixture spectra deconvolution on the performance of four popular de novo sequencing programs was tested using artificially constructed mixture spectra as well as experimental proteomics data. Mixture fragmentation spectra are recognized as a limitation in proteomics because they decrease the identification performance using database search engines. De novo sequencing approaches are expected to be even more sensitive to the reduction in mass spectrum quality resulting from peptide precursor co‐isolation and thus prone to false identifications. The deconvolution approach matched complementary b‐, y‐ions to each precursor peptide mass, which allowed the creation of virtual spectra containing sequence specific fragment ions of each co‐isolated peptide. Deconvolution processing resulted in equally efficient identification rates but increased the absolute number of correctly sequenced peptides. The improvement was in the range of 20–35% additional peptide identifications for a HeLa lysate sample. Some correct sequences were identified only using unprocessed spectra; however, the number of these was lower than those where improvement was obtained by mass spectral deconvolution. Tight candidate peptide score distribution and high sensitivity to small changes in the mass spectrum introduced by the employed deconvolution method could explain some of the missing peptide identifications.
Collapse
Affiliation(s)
- Vladimir Gorshkov
- Department of Biochemistry and Molecular Biology, University of Southern Denmark Odense M, Odense, Denmark.
| | | | - Thiago Verano-Braga
- Department of Biochemistry and Molecular Biology, University of Southern Denmark Odense M, Odense, Denmark.,Department of Physiology and Biophysics, Federal University of Minas Gerais Belo Horizonte - MG, Belo Horizonte, Brazil
| | - Frank Kjeldsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark Odense M, Odense, Denmark
| |
Collapse
|
16
|
Gillet LC, Leitner A, Aebersold R. Mass Spectrometry Applied to Bottom-Up Proteomics: Entering the High-Throughput Era for Hypothesis Testing. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2016; 9:449-72. [PMID: 27049628 DOI: 10.1146/annurev-anchem-071015-041535] [Citation(s) in RCA: 218] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Proteins constitute a key class of molecular components that perform essential biochemical reactions in living cells. Whether the aim is to extensively characterize a given protein or to perform high-throughput qualitative and quantitative analysis of the proteome content of a sample, liquid chromatography coupled to tandem mass spectrometry has become the technology of choice. In this review, we summarize the current state of mass spectrometry applied to bottom-up proteomics, the approach that focuses on analyzing peptides obtained from proteolytic digestion of proteins. With the recent advances in instrumentation and methodology, we show that the field is moving away from providing qualitative identification of long lists of proteins to delivering highly consistent and accurate quantification values for large numbers of proteins across large numbers of samples. We believe that this shift will have a profound impact for the field of proteomics and life science research in general.
Collapse
Affiliation(s)
- Ludovic C Gillet
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8093 Zürich, Switzerland;
| | - Alexander Leitner
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8093 Zürich, Switzerland;
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8093 Zürich, Switzerland;
- Faculty of Science, University of Zürich, 8057 Zürich, Switzerland
| |
Collapse
|
17
|
Sheynkman GM, Shortreed MR, Cesnik AJ, Smith LM. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2016; 9:521-45. [PMID: 27049631 PMCID: PMC4991544 DOI: 10.1146/annurev-anchem-071015-041722] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.
Collapse
Affiliation(s)
- Gloria M Sheynkman
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215;
- Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Anthony J Cesnik
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
- Genome Center of Wisconsin, University of Wisconsin, Madison, Wisconsin 53706;
| |
Collapse
|
18
|
Xiong Y, Guo Y, Xiao W, Cao Q, Li S, Qi X, Zhang Z, Wang Q, Shui W. An NGS-Independent Strategy for Proteome-Wide Identification of Single Amino Acid Polymorphisms by Mass Spectrometry. Anal Chem 2016; 88:2784-91. [PMID: 26810586 DOI: 10.1021/acs.analchem.5b04417] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Detection of proteins containing single amino acid polymorphisms (SAPs) encoded by nonsynonymous SNPs (nsSNPs) can aid researchers in studying the functional significance of protein variants. Most proteogenomic approaches for large-scale SAPs mapping require construction of a sample-specific database containing protein variants predicted from the next-generation sequencing (NGS) data. Searching shotgun proteomic data sets against these NGS-derived databases allowed for identification of SAP peptides, thus validating the proteome-level sequence variation. Contrary to the conventional approaches, our study presents a novel strategy for proteome-wide SAP detection without relying on sample-specific NGS data. By searching a deep-coverage proteomic data set from an industrial thermotolerant yeast strain using our strategy, we identified 337 putative SAPs compared to the reference genome. Among the SAP peptides identified with stringent criteria, 85.2% of SAP sites were validated using whole-genome sequencing data obtained for this organism, which indicates high accuracy of SAP identification with our strategy. More interestingly, for certain SAP peptides that cannot be predicted by genomic sequencing, we used synthetic peptide standards to verify expression of peptide variants in the proteome. Our study has provided a unique tool for proteogenomics to enable proteome-wide direct SAP identification and capture nongenetic protein variants not linked to nsSNPs.
Collapse
Affiliation(s)
- Yun Xiong
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| | - Yufeng Guo
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| | - Weidi Xiao
- College of Life Sciences, Nankai University , Tianjin 300071, China
| | - Qichen Cao
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| | - Shanshan Li
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| | - Xianni Qi
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| | - Zhidan Zhang
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| | - Qinhong Wang
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| | - Wenqing Shui
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| |
Collapse
|
19
|
Abstract
![]()
Every
molecular player in the cast of biology’s central
dogma is being sequenced and quantified with increasing ease and coverage.
To bring the resulting genomic, transcriptomic, and proteomic data
sets into coherence, tools must be developed that do not constrain
data acquisition and analytics in any way but rather provide simple
links across previously acquired data sets with minimal preprocessing
and hassle. Here we present such a tool: PGx, which supports proteogenomic
integration of mass spectrometry proteomics data with next-generation
sequencing by mapping identified peptides onto their putative genomic
coordinates.
Collapse
Affiliation(s)
- Manor Askenazi
- Biomedical Hosting LLC, 33 Lewis Avenue, Arlington, Massachusetts 02474, United States
| | - Kelly V Ruggles
- NYU Langone Medical Center , 227 East 30th Street, New York, New York 10016, United States
| | - David Fenyö
- NYU Langone Medical Center , 227 East 30th Street, New York, New York 10016, United States
| |
Collapse
|
20
|
Sadygov RG. Using SEQUEST with theoretically complete sequence databases. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2015; 26:1858-1864. [PMID: 26238326 PMCID: PMC4607654 DOI: 10.1007/s13361-015-1228-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 05/08/2015] [Accepted: 06/17/2015] [Indexed: 06/04/2023]
Abstract
SEQUEST has long been used to identify peptides/proteins from their tandem mass spectra and protein sequence databases. The algorithm has proven to be hugely successful for its sensitivity and specificity in identifying peptides/proteins, the sequences of which are present in the protein sequence databases. In this work, we report on work that attempts a new use for the algorithm by applying it to search a complete list of theoretically possible peptides, a de novo-like sequencing. We used freely available mass spectral data and determined a number of unique peptides as identified by SEQUEST. Using masses of these peptides and the mass accuracy of 0.001 Da, we have created a database of all theoretically possible peptide sequences corresponding to the precursor masses. We used our recently developed algorithm for determining all amino acid compositions corresponding to a mass interval, and used a lexicographic ordering to generate theoretical sequences from the compositions. The newly generated theoretical database was many-fold more complex than the original protein sequence database. We used SEQUEST to search and identify the best matches to the spectra from all theoretically possible peptide sequences. We found that SEQUEST cross-correlation score ranked the correct peptide match among the top sequence matches. The results testify to the high specificity of SEQUEST when combined with the high mass accuracy for intact peptides. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Rovshan G Sadygov
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, 77555, USA.
- Sealy Center for Molecular Medicine, The University of Texas Medical Branch, Galveston, TX, 77555, USA.
| |
Collapse
|
21
|
Chi H, He K, Yang B, Chen Z, Sun RX, Fan SB, Zhang K, Liu C, Yuan ZF, Wang QH, Liu SQ, Dong MQ, He SM. Reprint of "pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data". J Proteomics 2015; 129:33-41. [PMID: 26232248 DOI: 10.1016/j.jprot.2015.07.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 05/04/2015] [Accepted: 05/10/2015] [Indexed: 01/23/2023]
Abstract
Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested.This article is part of a Special Issue entitled: Computational Proteomics.
Collapse
Affiliation(s)
- Hao Chi
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Kun He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Bing Yang
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Zhen Chen
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Rui-Xiang Sun
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Sheng-Bo Fan
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Kun Zhang
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Chao Liu
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Zuo-Fei Yuan
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Quan-Hui Wang
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Si-Qi Liu
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Si-Min He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.
| |
Collapse
|
22
|
Chick JM, Kolippakkam D, Nusinow DP, Zhai B, Rad R, Huttlin EL, Gygi SP. A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides. Nat Biotechnol 2015; 33:743-9. [PMID: 26076430 PMCID: PMC4515955 DOI: 10.1038/nbt.3267] [Citation(s) in RCA: 284] [Impact Index Per Article: 31.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 05/11/2015] [Indexed: 12/17/2022]
Abstract
Fewer than half of all tandem mass spectrometry (MS/MS) spectra acquired in shotgun proteomics experiments are typically matched to a peptide with high confidence. Here we determine the identity of unassigned peptides using an ultra-tolerant Sequest database search that allows peptide matching even with modifications of unknown masses up to ± 500 Da. In a proteome-wide data set on HEK293 cells (9,513 proteins and 396,736 peptides), this approach matched an additional 184,000 modified peptides, which were linked to biological and chemical modifications representing 523 distinct mass bins, including phosphorylation, glycosylation and methylation. We localized all unknown modification masses to specific regions within a peptide. Known modifications were assigned to the correct amino acids with frequencies >90%. We conclude that at least one-third of unassigned spectra arise from peptides with substoichiometric modifications.
Collapse
Affiliation(s)
- Joel M. Chick
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Deepak Kolippakkam
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - David P. Nusinow
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Bo Zhai
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Ramin Rad
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Edward L. Huttlin
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Steven P. Gygi
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
23
|
Chi H, He K, Yang B, Chen Z, Sun RX, Fan SB, Zhang K, Liu C, Yuan ZF, Wang QH, Liu SQ, Dong MQ, He SM. pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data. J Proteomics 2015; 125:89-97. [PMID: 25979774 DOI: 10.1016/j.jprot.2015.05.009] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 05/04/2015] [Accepted: 05/10/2015] [Indexed: 10/23/2022]
Abstract
Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested.
Collapse
Affiliation(s)
- Hao Chi
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Kun He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Bing Yang
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Zhen Chen
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Rui-Xiang Sun
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Sheng-Bo Fan
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Kun Zhang
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Chao Liu
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Zuo-Fei Yuan
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Quan-Hui Wang
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Si-Qi Liu
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Si-Min He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.
| |
Collapse
|
24
|
Medzihradszky KF, Chalkley RJ. Lessons in de novo peptide sequencing by tandem mass spectrometry. MASS SPECTROMETRY REVIEWS 2015; 34:43-63. [PMID: 25667941 PMCID: PMC4367481 DOI: 10.1002/mas.21406] [Citation(s) in RCA: 137] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Mass spectrometry has become the method of choice for the qualitative and quantitative characterization of protein mixtures isolated from all kinds of living organisms. The raw data in these studies are MS/MS spectra, usually of peptides produced by proteolytic digestion of a protein. These spectra are "translated" into peptide sequences, normally with the help of various search engines. Data acquisition and interpretation have both been automated, and most researchers look only at the summary of the identifications without ever viewing the underlying raw data used for assignments. Automated analysis of data is essential due to the volume produced. However, being familiar with the finer intricacies of peptide fragmentation processes, and experiencing the difficulties of manual data interpretation allow a researcher to be able to more critically evaluate key results, particularly because there are many known rules of peptide fragmentation that are not incorporated into search engine scoring. Since the most commonly used MS/MS activation method is collision-induced dissociation (CID), in this article we present a brief review of the history of peptide CID analysis. Next, we provide a detailed tutorial on how to determine peptide sequences from CID data. Although the focus of the tutorial is de novo sequencing, the lessons learned and resources supplied are useful for data interpretation in general.
Collapse
|
25
|
De Haes W, Van Sinay E, Detienne G, Temmerman L, Schoofs L, Boonen K. Functional neuropeptidomics in invertebrates. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2014; 1854:812-26. [PMID: 25528324 DOI: 10.1016/j.bbapap.2014.12.011] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2014] [Revised: 11/27/2014] [Accepted: 12/10/2014] [Indexed: 10/24/2022]
Abstract
Neuropeptides are key messengers in almost all physiological processes. They originate from larger precursors and are extensively processed to become bioactive. Neuropeptidomics aims to comprehensively identify the collection of neuropeptides in an organism, organ, tissue or cell. The neuropeptidome of several invertebrates is thoroughly explored since they are important model organisms (and models for human diseases), disease vectors and pest species. The charting of the neuropeptidome is the first step towards understanding peptidergic signaling. This review will first discuss the latest developments in exploring the neuropeptidome. The physiological roles and modes of action of neuropeptides can be explored in two ways, which are largely orthogonal and therefore complementary. The first way consists of inferring the functions of neuropeptides by a forward approach where neuropeptide profiles are compared under different physiological conditions. Second is the reverse approach were neuropeptide collections are used to screen for receptor-binding. This is followed by localization studies and functional tests. This review will focus on how these different functional screening methods contributed to the field of invertebrate neuropeptidomics and expanded our knowledge of peptidergic signaling. This article is part of a Special Issue entitled: Neuroproteomics: Applications in Neuroscience and Neurology.
Collapse
Affiliation(s)
- Wouter De Haes
- Functional Genomics and Proteomics, Department of Biology, University of Leuven (KU Leuven), Naamsestraat 59, 3000 Leuven, Belgium
| | - Elien Van Sinay
- Functional Genomics and Proteomics, Department of Biology, University of Leuven (KU Leuven), Naamsestraat 59, 3000 Leuven, Belgium
| | - Giel Detienne
- Functional Genomics and Proteomics, Department of Biology, University of Leuven (KU Leuven), Naamsestraat 59, 3000 Leuven, Belgium
| | - Liesbet Temmerman
- Functional Genomics and Proteomics, Department of Biology, University of Leuven (KU Leuven), Naamsestraat 59, 3000 Leuven, Belgium
| | - Liliane Schoofs
- Functional Genomics and Proteomics, Department of Biology, University of Leuven (KU Leuven), Naamsestraat 59, 3000 Leuven, Belgium
| | - Kurt Boonen
- Functional Genomics and Proteomics, Department of Biology, University of Leuven (KU Leuven), Naamsestraat 59, 3000 Leuven, Belgium.
| |
Collapse
|
26
|
MS-GF+ makes progress towards a universal database search tool for proteomics. Nat Commun 2014; 5:5277. [PMID: 25358478 PMCID: PMC5036525 DOI: 10.1038/ncomms6277] [Citation(s) in RCA: 764] [Impact Index Per Article: 76.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Accepted: 09/16/2014] [Indexed: 02/06/2023] Open
Abstract
Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but the software tools to analyze tandem mass spectra are lagging behind. We present a database search tool MS-GF+ that is sensitive (it identifies more peptides than most other database search tools) and universal (it works well for diverse types of spectra, different configurations of MS instruments and different experimental protocols). We benchmark MS-GF+ using diverse spectral datasets: (i) spectra of varying fragmentation methods; (ii) spectra of multiple enzyme digests; (iii) spectra of phosphorylated peptides; (iv) spectra of peptides with unusual fragmentation propensities produced by a novel alpha-lytic protease. For all these datasets, MS-GF+ significantly increases the number of identified peptides compared to commonly used methods for peptide identifications. We emphasize that while MS-GF+ is not specifically designed for any particular experimental set-up, it improves upon the performance of tools specifically designed for these applications (e.g., specialized tools for phosphoproteomics).
Collapse
|
27
|
Deng F, Wang L, Liu X. An efficient algorithm for the blocked pattern matching problem. Bioinformatics 2014; 31:532-8. [DOI: 10.1093/bioinformatics/btu678] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
28
|
Wang J, Bourne PE, Bandeira N. MixGF: spectral probabilities for mixture spectra from more than one peptide. Mol Cell Proteomics 2014; 13:3688-97. [PMID: 25225354 DOI: 10.1074/mcp.o113.037218] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
In large-scale proteomic experiments, multiple peptide precursors are often cofragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols promoting the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30-390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra.
Collapse
Affiliation(s)
- Jian Wang
- From the ‡Bioinformatics Program, University of California, San Diego, La Jolla, California
| | - Philip E Bourne
- §Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California
| | - Nuno Bandeira
- §Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California; ¶Center for Computational Mass Spectrometry, University of California, San Diego, La, Jolla, California; ‖Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92092
| |
Collapse
|
29
|
Su ZD, Sheng QH, Li QR, Chi H, Jiang X, Yan Z, Fu N, He SM, Khaitovich P, Wu JR, Zeng R. De novo identification and quantification of single amino-acid variants in human brain. J Mol Cell Biol 2014; 6:421-33. [PMID: 25007923 DOI: 10.1093/jmcb/mju031] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The detection of single amino-acid variants (SAVs) usually depends on single-nucleotide polymorphisms (SNPs) database. Here, we describe a novel method that discovers SAVs at proteome level independent of SNPs data. Using mass spectrometry-based de novo sequencing algorithm, peptide-candidates are identified and compared with theoretical protein database to generate SAVs under pairing strategy, which is followed by database re-searching to control false discovery rate. In human brain tissues, we can confidently identify known and novel protein variants with diverse origins. Combined with DNA/RNA sequencing, we verify SAVs derived from DNA mutations, RNA alternative splicing, and unknown post-transcriptional mechanisms. Furthermore, quantitative analysis in human brain tissues reveals several tissue-specific differential expressions of SAVs. This approach provides a novel access to high-throughput detection of protein variants, which may offer the potential for clinical biomarker discovery and mechanistic research.
Collapse
Affiliation(s)
- Zhi-Duan Su
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| | - Quan-Hu Sheng
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| | - Qing-Run Li
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| | - Hao Chi
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Xi Jiang
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Zheng Yan
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Ning Fu
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| | - Si-Min He
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Philipp Khaitovich
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Jia-Rui Wu
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| | - Rong Zeng
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| |
Collapse
|
30
|
Wang J, Anania VG, Knott J, Rush J, Lill JR, Bourne PE, Bandeira N. Combinatorial approach for large-scale identification of linked peptides from tandem mass spectrometry spectra. Mol Cell Proteomics 2014; 13:1128-36. [PMID: 24493012 DOI: 10.1074/mcp.m113.035758] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The combination of chemical cross-linking and mass spectrometry has recently been shown to constitute a powerful tool for studying protein-protein interactions and elucidating the structure of large protein complexes. However, computational methods for interpreting the complex MS/MS spectra from linked peptides are still in their infancy, making the high-throughput application of this approach largely impractical. Because of the lack of large annotated datasets, most current approaches do not capture the specific fragmentation patterns of linked peptides and therefore are not optimal for the identification of cross-linked peptides. Here we propose a generic approach to address this problem and demonstrate it using disulfide-bridged peptide libraries to (i) efficiently generate large mass spectral reference data for linked peptides at a low cost and (ii) automatically train an algorithm that can efficiently and accurately identify linked peptides from MS/MS spectra. We show that using this approach we were able to identify thousands of MS/MS spectra from disulfide-bridged peptides through comparison with proteome-scale sequence databases and significantly improve the sensitivity of cross-linked peptide identification. This allowed us to identify 60% more direct pairwise interactions between the protein subunits in the 20S proteasome complex than existing tools on cross-linking studies of the proteasome complexes. The basic framework of this approach and the MS/MS reference dataset generated should be valuable resources for the future development of new tools for the identification of linked peptides.
Collapse
Affiliation(s)
- Jian Wang
- Bioinformatics Program, University of California, San Diego, La Jolla, California
| | | | | | | | | | | | | |
Collapse
|
31
|
Liu X, Hengel S, Wu S, Tolić N, Pasa-Tolić L, Pevzner PA. Identification of ultramodified proteins using top-down tandem mass spectra. J Proteome Res 2013; 12:5830-8. [PMID: 24188097 DOI: 10.1021/pr400849y] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Post-translational modifications (PTMs) play an important role in various biological processes through changing protein structure and function. Some ultramodified proteins (like histones) have multiple PTMs forming PTM patterns that define the functionality of a protein. While bottom-up mass spectrometry (MS) has been successful in identifying individual PTMs within short peptides, it is unable to identify PTM patterns spreading along entire proteins in a coordinated fashion. In contrast, top-down MS analyzes intact proteins and reveals PTM patterns along the entire proteins. However, while recent advances in instrumentation have made top-down MS accessible to many laboratories, most computational tools for top-down MS focus on proteins with few PTMs and are unable to identify complex PTM patterns. We propose a new algorithm, MS-Align-E, that identifies both expected and unexpected PTMs in ultramodified proteins. We demonstrate that MS-Align-E identifies many proteoforms of histone H4 and benchmark it against the currently accepted software tools.
Collapse
Affiliation(s)
- Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis , Indianapolis, IN 46202, United States
| | | | | | | | | | | |
Collapse
|
32
|
Mazin P, Xiong J, Liu X, Yan Z, Zhang X, Li M, He L, Somel M, Yuan Y, Phoebe Chen YP, Li N, Hu Y, Fu N, Ning Z, Zeng R, Yang H, Chen W, Gelfand M, Khaitovich P. Widespread splicing changes in human brain development and aging. Mol Syst Biol 2013; 9:633. [PMID: 23340839 PMCID: PMC3564255 DOI: 10.1038/msb.2012.67] [Citation(s) in RCA: 147] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2012] [Revised: 11/14/2012] [Accepted: 12/16/2012] [Indexed: 02/07/2023] Open
Abstract
While splicing differences between tissues, sexes and species are well documented, little is known about the extent and the nature of splicing changes that take place during human or mammalian development and aging. Here, using high-throughput transcriptome sequencing, we have characterized splicing changes that take place during whole human lifespan in two brain regions: prefrontal cortex and cerebellum. Identified changes were confirmed using independent human and rhesus macaque RNA-seq data sets, exon arrays and PCR, and were detected at the protein level using mass spectrometry. Splicing changes across lifespan were abundant in both of the brain regions studied, affecting more than a third of the genes expressed in the human brain. Approximately 15% of these changes differed between the two brain regions. Across lifespan, splicing changes followed discrete patterns that could be linked to neural functions, and associated with the expression profiles of the corresponding splicing factors. More than 60% of all splicing changes represented a single splicing pattern reflecting preferential inclusion of gene segments potentially targeting transcripts for nonsense-mediated decay in infants and elderly.
Collapse
Affiliation(s)
- Pavel Mazin
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Abstract
Motivation: Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but de novo peptide sequencing algorithms to analyze tandem mass (MS/MS) spectra are lagging behind. Although existing de novo sequencing tools perform well on certain types of spectra [e.g. Collision Induced Dissociation (CID) spectra of tryptic peptides], their performance often deteriorates on other types of spectra, such as Electron Transfer Dissociation (ETD), Higher-energy Collisional Dissociation (HCD) spectra or spectra of non-tryptic digests. Thus, rather than developing a new algorithm for each type of spectra, we develop a universal de novo sequencing algorithm called UniNovo that works well for all types of spectra or even for spectral pairs (e.g. CID/ETD spectral pairs). UniNovo uses an improved scoring function that captures the dependences between different ion types, where such dependencies are learned automatically using a modified offset frequency function. Results: The performance of UniNovo is compared with PepNovo+, PEAKS and pNovo using various types of spectra. The results show that the performance of UniNovo is superior to other tools for ETD spectra and superior or comparable with others for CID and HCD spectra. UniNovo also estimates the probability that each reported reconstruction is correct, using simple statistics that are readily obtained from a small training dataset. We demonstrate that the estimation is accurate for all tested types of spectra (including CID, HCD, ETD, CID/ETD and HCD/ETD spectra of trypsin, LysC or AspN digested peptides). Availability: UniNovo is implemented in JAVA and tested on Windows, Ubuntu and OS X machines. UniNovo is available at http://proteomics.ucsd.edu/Software/UniNovo.html along with the manual. Contact:kwj@ucsd.edu or ppevzner@ucsd.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kyowon Jeong
- Department of Electrical and Computer Engineering and Department of Computer Science and Engineering, University of California-San Diego, CA 92093, USA.
| | | | | |
Collapse
|
34
|
Costa EP, Menschaert G, Luyten W, De Grave K, Ramon J. PIUS: peptide identification by unbiased search. Bioinformatics 2013; 29:1913-4. [DOI: 10.1093/bioinformatics/btt298] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
35
|
Faccin M, Bruscolini P. MS/MS Spectra Interpretation as a Statistical–Mechanics Problem. Anal Chem 2013; 85:4884-92. [DOI: 10.1021/ac4005666] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Mauro Faccin
- Departamento de Física
Teórica &
Instituto de Biocomputacíon y Física de Sistemas Complejos
(BIFI), Universidad de Zaragoza, c/Mariano
Esquillors s/n, 50018 Zaragoza, Spain
| | - Pierpaolo Bruscolini
- Departamento de Física
Teórica &
Instituto de Biocomputacíon y Física de Sistemas Complejos
(BIFI), Universidad de Zaragoza, c/Mariano
Esquillors s/n, 50018 Zaragoza, Spain
| |
Collapse
|
36
|
Zhang Y, Fonslow BR, Shan B, Baek MC, Yates JR. Protein analysis by shotgun/bottom-up proteomics. Chem Rev 2013; 113:2343-94. [PMID: 23438204 PMCID: PMC3751594 DOI: 10.1021/cr3003533] [Citation(s) in RCA: 986] [Impact Index Per Article: 89.6] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Yaoyang Zhang
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Bryan R. Fonslow
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Bing Shan
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Moon-Chang Baek
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
- Department of Molecular Medicine, Cell and Matrix Biology Research Institute, School of Medicine, Kyungpook National University, Daegu 700-422, Republic of Korea
| | - John R. Yates
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| |
Collapse
|
37
|
Abstract
Historically many genome annotation strategies have lacked experimental evidence at the protein level, which and have instead relied heavily on ab initio gene prediction tools, which consequently resulted in many incorrectly annotated genomic sequences. Proteogenomics aims to address these issues using mass spectrometry (MS)-based proteomics, genomic mapping, and providing statistical significance measures such as false discovery rates (FDRs) to validate the mapped peptides. Presented here is a tool capable of meeting this goal, the UCSD proteogenomic pipeline, which maps peptide-spectrum matches (PSMs) to the genome using the Inspect MS/MS database search tool and assigns a statistical significance to the match using a target-decoy search approach to assign estimated FDRs. This pipeline also provides the option of using a more reliable approach to proteogenomics by determining the precise false-positive rates (FPRs) and p-values of each PSM by calculating their spectral probabilities and rescoring each PSM accordingly. In addition to the protein prediction challenges in the rapidly growing number of sequenced plant genomes, it is difficult to extract high-quality protein samples from many plant species. For that reason, this chapter contains methods for protein extraction and trypsin digestion that reliably produce samples suitable for proteogenomic analysis.
Collapse
|
38
|
Identification of Ultramodified Proteins Using Top-Down Spectra. LECTURE NOTES IN COMPUTER SCIENCE 2013. [DOI: 10.1007/978-3-642-37195-0_11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
|
39
|
Hoopmann MR, Moritz RL. Current algorithmic solutions for peptide-based proteomics data generation and identification. Curr Opin Biotechnol 2012; 24:31-8. [PMID: 23142544 DOI: 10.1016/j.copbio.2012.10.013] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2012] [Revised: 10/08/2012] [Accepted: 10/18/2012] [Indexed: 12/28/2022]
Abstract
Peptide-based proteomic data sets are ever increasing in size and complexity. These data sets provide computational challenges when attempting to quickly analyze spectra and obtain correct protein identifications. Database search and de novo algorithms must consider high-resolution MS/MS spectra and alternative fragmentation methods. Protein inference is a tricky problem when analyzing large data sets of degenerate peptide identifications. Combining multiple algorithms for improved peptide identification puts significant strain on computational systems when investigating large data sets. This review highlights some of the recent developments in peptide and protein identification algorithms for analyzing shotgun mass spectrometry data when encountering the aforementioned hurdles. Also explored are the roles that analytical pipelines, public spectral libraries, and cloud computing play in the evolution of peptide-based proteomics.
Collapse
|
40
|
Guthals A, Bandeira N. Peptide identification by tandem mass spectrometry with alternate fragmentation modes. Mol Cell Proteomics 2012; 11:550-7. [PMID: 22595789 PMCID: PMC3434779 DOI: 10.1074/mcp.r112.018556] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Revised: 05/04/2012] [Indexed: 11/06/2022] Open
Abstract
The high-throughput nature of proteomics mass spectrometry is enabled by a productive combination of data acquisition protocols and the computational tools used to interpret the resulting spectra. One of the key components in mainstream protocols is the generation of tandem mass (MS/MS) spectra by peptide fragmentation using collision induced dissociation, the approach currently used in the large majority of proteomics experiments to routinely identify hundreds to thousands of proteins from single mass spectrometry runs. Complementary to these, alternative peptide fragmentation methods such as electron capture/transfer dissociation and higher-energy collision dissociation have consistently achieved significant improvements in the identification of certain classes of peptides, proteins, and post-translational modifications. Recognizing these advantages, mass spectrometry instruments now conveniently support fine-tuned methods that automatically alternate between peptide fragmentation modes for either different types of peptides or for acquisition of multiple MS/MS spectra from each peptide. But although these developments have the potential to substantially improve peptide identification, their routine application requires corresponding adjustments to the software tools and procedures used for automated downstream processing. This review discusses the computational implications of alternative and alternate modes of MS/MS peptide fragmentation and addresses some practical aspects of using such protocols for identification of peptides and post-translational modifications.
Collapse
Affiliation(s)
- Adrian Guthals
- Department of Computer Science and Engineering, University of California, San Diego, California, USA
| | | |
Collapse
|
41
|
Key issues in the acquisition and analysis of qualitative and quantitative mass spectrometry data for peptide-centric proteomic experiments. Amino Acids 2012; 43:1075-85. [PMID: 22821266 DOI: 10.1007/s00726-012-1287-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2010] [Accepted: 04/03/2012] [Indexed: 01/05/2023]
Abstract
Proteomic technologies have matured to a level enabling accurate and reproducible quantitation of peptides and proteins from complex biological matrices. Analysis of samples as diverse as assembled protein complexes, whole cell lysates or sub-cellular proteomes from cell cultures, and direct analysis of animal and human tissues and fluids demonstrate the incredible versatility of the fundamental nature of the technique that forms the basis of most proteomic applications today (mass spectrometry). Determining the mass of biomolecules and their fragments or related products with high accuracy can convey a highly specific assay for detection and identification. Importantly, ion currents representative of these specifically identified analytes can be accurately quantified with the correct application of smart isobaric tagging chemistries, heavy and light isotopically derivatised samples or standards, or by careful application of workflows to compare unlabelled samples in so-called 'label-free' and targeted selected reaction monitoring experiments. In terms of exploring biology, a myriad of protein changes and modifications are being increasingly probed and quantified, including diverse chemical changes from relatively decisive modifications such as protein splicing and truncation, to more transient dynamic modifications such as phosphorylation, acetylation and ubiquitination. Proteomic workflows can be complex beasts and several key considerations to ensure effective applications have been outlined in the recent literature. The past year has witnessed the publication of several excellent reviews that thoroughly describe the fundamental principles underlying the state of the art. This review further elaborates on specific critical issues introduced by these publications and raises other important unaddressed considerations and new developments that directly impact on the effectiveness of proteomic technologies, in particular for, but not necessarily exclusive to peptide-centric experiments. These factors are discussed both in terms of qualitative analyses, including dynamic range and sampling issues, and developments to improve the translation of peptide fragmentation data into peptide and protein identities, as well as quantitative analyses, including data normalisation and the utility of ontology or functional annotation, the effects of modified peptides, and considered experimental design to facilitate the use of robust statistical methods.
Collapse
|
42
|
Medzihradszky KF, Bohlen CJ. Partial de novo sequencing and unusual CID fragmentation of a 7 kDa, disulfide-bridged toxin. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2012; 23:923-34. [PMID: 22351294 PMCID: PMC4367482 DOI: 10.1007/s13361-012-0350-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2011] [Revised: 01/12/2012] [Accepted: 01/22/2012] [Indexed: 05/12/2023]
Abstract
A 7 kDa toxin isolated from the venom of the Texas coral snake (Micrurus tener tener) was subjected to collision-induced dissociation (CID) and electron-transfer dissociation (ETD) analyses both before and after reduction at low pH. Manual and automated approaches to de novo sequencing are compared in detail. Manual de novo sequencing utilizing the combination of high accuracy CID and ETD data and an acid-related cleavage yielded the N-terminal half of the sequence from the reduced species. The intact polypeptide, containing 3 disulfide bridges produced a series of unusual fragments in ion trap CID experiments: abundant internal amino acid losses were detected, and also one of the disulfide-linkage positions could be determined from fragments formed by the cleavage of two bonds. In addition, internal and c-type fragments were also observed.
Collapse
Affiliation(s)
- Katalin F Medzihradszky
- Mass Spectrometry Facility, Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA 94158-2517, USA.
| | | |
Collapse
|
43
|
Allmer J. Algorithms for the de novo sequencing of peptides from tandem mass spectra. Expert Rev Proteomics 2012; 8:645-57. [PMID: 21999834 DOI: 10.1586/epr.11.54] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not often available or, despite their availability, some sequences are not readily found therein. To overcome this problem, de novo sequencing can be used to directly assign a peptide sequence to a tandem mass spectrometry spectrum. Many algorithms have been proposed for de novo sequencing and a selection of them are detailed in this article. Although a standard accuracy measure has not been agreed upon in the field, relative algorithm performance is discussed. The current state of the de novo sequencing is assessed thereafter and, finally, examples are used to construct possible future perspectives of the field.
Collapse
Affiliation(s)
- Jens Allmer
- Molecular Biology and Genetics, Izmir Institute of Technology, Urla, Izmir 35430, Turkey.
| |
Collapse
|
44
|
Cantarel BL, Erickson AR, VerBerkmoes NC, Erickson BK, Carey PA, Pan C, Shah M, Mongodin EF, Jansson JK, Fraser-Liggett CM, Hettich RL. Strategies for metagenomic-guided whole-community proteomics of complex microbial environments. PLoS One 2011; 6:e27173. [PMID: 22132090 PMCID: PMC3223167 DOI: 10.1371/journal.pone.0027173] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2011] [Accepted: 10/11/2011] [Indexed: 11/05/2022] Open
Abstract
Accurate protein identification in large-scale proteomics experiments relies upon a detailed, accurate protein catalogue, which is derived from predictions of open reading frames based on genome sequence data. Integration of mass spectrometry-based proteomics data with computational proteome predictions from environmental metagenomic sequences has been challenging because of the variable overlap between proteomic datasets and corresponding short-read nucleotide sequence data. In this study, we have benchmarked several strategies for increasing microbial peptide spectral matching in metaproteomic datasets using protein predictions generated from matched metagenomic sequences from the same human fecal samples. Additionally, we investigated the impact of mass spectrometry-based filters (high mass accuracy, delta correlation), and de novo peptide sequencing on the number and robustness of peptide-spectrum assignments in these complex datasets. In summary, we find that high mass accuracy peptide measurements searched against non-assembled reads from DNA sequencing of the same samples significantly increased identifiable proteins without sacrificing accuracy.
Collapse
Affiliation(s)
- Brandi L. Cantarel
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Alison R. Erickson
- Oak Ridge National Laboratory, Chemical Sciences Division, Oak Ridge, Tennessee, United States of America
- Graduate School of Genome Science & Technology, University of Tennessee, Knoxville, Tennessee, United States of America
| | - Nathan C. VerBerkmoes
- Oak Ridge National Laboratory, Chemical Sciences Division, Oak Ridge, Tennessee, United States of America
| | - Brian K. Erickson
- Oak Ridge National Laboratory, Chemical Sciences Division, Oak Ridge, Tennessee, United States of America
- Graduate School of Genome Science & Technology, University of Tennessee, Knoxville, Tennessee, United States of America
| | - Patricia A. Carey
- Oak Ridge National Laboratory, Chemical Sciences Division, Oak Ridge, Tennessee, United States of America
| | - Chongle Pan
- Oak Ridge National Laboratory, Chemical Sciences Division, Oak Ridge, Tennessee, United States of America
| | - Manesh Shah
- Oak Ridge National Laboratory, Chemical Sciences Division, Oak Ridge, Tennessee, United States of America
| | - Emmanuel F. Mongodin
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Janet K. Jansson
- Lawrence Berkeley National Laboratory, Earth Sciences Division, Department of Ecology, Berkeley, California, United States of America
| | - Claire M. Fraser-Liggett
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Robert L. Hettich
- Oak Ridge National Laboratory, Chemical Sciences Division, Oak Ridge, Tennessee, United States of America
| |
Collapse
|
45
|
Proteomics in molecular diagnosis: typing of amyloidosis. J Biomed Biotechnol 2011; 2011:754109. [PMID: 22131817 PMCID: PMC3205904 DOI: 10.1155/2011/754109] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2011] [Revised: 07/01/2011] [Accepted: 07/11/2011] [Indexed: 12/21/2022] Open
Abstract
Amyloidosis is a group of disorders caused by deposition of misfolded proteins as aggregates in the extracellular tissues of the body, leading to impairment of organ function. Correct identification of the causal amyloid protein is absolutely crucial for clinical management in order to avoid misdiagnosis and inappropriate, potentially harmful treatment, to assess prognosis and to offer genetic counselling if relevant. Current diagnostic methods, including antibody-based amyloid typing, have limited ability to detect the full range of amyloid forming proteins. Recent investigations into proteomic identification of amyloid protein have shown promise. This paper will review the current state of the art in proteomic analysis of amyloidosis, discuss the suitability of techniques based on the properties of amyloidosis, and further suggest potential areas of development. Establishment of mass spectrometry aided amyloid typing procedures in the pathology laboratory will allow accurate amyloidosis diagnosis in a timely manner and greatly facilitate clinical management of the disease.
Collapse
|
46
|
Mohimani H, Liu WT, Yang YL, Gaudêncio SP, Fenical W, Dorrestein PC, Pevzner PA. Multiplex de novo sequencing of peptide antibiotics. J Comput Biol 2011; 18:1371-81. [PMID: 22035290 DOI: 10.1089/cmb.2011.0158] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Proliferation of drug-resistant diseases raises the challenge of searching for new, more efficient antibiotics. Currently, some of the most effective antibiotics (i.e., Vancomycin and Daptomycin) are cyclic peptides produced by non-ribosomal biosynthetic pathways. The isolation and sequencing of cyclic peptide antibiotics, unlike the same activity with linear peptides, is time-consuming and error-prone. The dominant technique for sequencing cyclic peptides is nuclear magnetic resonance (NMR)-based and requires large amounts (milligrams) of purified materials that, for most compounds, are not possible to obtain. Given these facts, there is a need for new tools to sequence cyclic non-ribosomal peptides (NRPs) using picograms of material. Since nearly all cyclic NRPs are produced along with related analogs, we develop a mass spectrometry approach for sequencing all related peptides at once (in contrast to the existing approach that analyzes individual peptides). Our results suggest that instead of attempting to isolate and NMR-sequence the most abundant compound, one should acquire spectra of many related compounds and sequence all of them simultaneously using tandem mass spectrometry. We illustrate applications of this approach by sequencing new variants of cyclic peptide antibiotics from Bacillus brevis, as well as sequencing a previously unknown family of cyclic NRPs produced by marine bacteria. Supplementary Material is available online at www.liebertonline.com/cmb.
Collapse
Affiliation(s)
- Hosein Mohimani
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, California 92092, USA
| | | | | | | | | | | | | |
Collapse
|
47
|
Mohimani H, Liu WT, Mylne JS, Poth AG, Colgrave ML, Tran D, Selsted ME, Dorrestein PC, Pevzner PA. Cycloquest: identification of cyclopeptides via database search of their mass spectra against genome databases. J Proteome Res 2011; 10:4505-12. [PMID: 21851130 PMCID: PMC3242011 DOI: 10.1021/pr200323a] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Hundreds of ribosomally synthesized cyclopeptides have been isolated from all domains of life, the vast majority having been reported in the last 15 years. Studies of cyclic peptides have highlighted their exceptional potential both as stable drug scaffolds and as biomedicines in their own right. Despite this, computational techniques for cyclopeptide identification are still in their infancy, with many such peptides remaining uncharacterized. Tandem mass spectrometry has occupied a niche role in cyclopeptide identification, taking over from traditional techniques such as nuclear magnetic resonance spectroscopy (NMR). MS/MS studies require only picogram quantities of peptide (compared to milligrams for NMR studies) and are applicable to complex samples, abolishing the requirement for time-consuming chromatographic purification. While database search tools such as Sequest and Mascot have become standard tools for the MS/MS identification of linear peptides, they are not applicable to cyclopeptides, due to the parent mass shift resulting from cyclization and different fragmentation patterns of cyclic peptides. In this paper, we describe the development of a novel database search methodology to aid in the identification of cyclopeptides by mass spectrometry and evaluate its utility in identifying two peptide rings from Helianthus annuus, a bacterial cannibalism factor from Bacillus subtilis, and a θ-defensin from Rhesus macaque.
Collapse
Affiliation(s)
- Hosein Mohimani
- Department of Electrical and Computer Engineering, UC San Diego
| | - Wei-Ting Liu
- Department of Chemistry and Biochemistry, UC San Diego
| | - Joshua S. Mylne
- Institute for Molecular Bioscience, The University of Queensland, Brisbane
| | - Aaron G. Poth
- Institute for Molecular Bioscience, The University of Queensland, Brisbane
- Division of Livestock Industries, CSIRO, Brisbane
| | | | - Dat Tran
- Department of Pathology and Laboratory Medicine, School of Medicine, UC Irvine
- Center for Immunology, UC Irvine
- Department of Pathology and Laboratory Medicine, Keck School of Medicine, USC
| | - Michael E. Selsted
- Department of Pathology and Laboratory Medicine, School of Medicine, UC Irvine
- Center for Immunology, UC Irvine
- Department of Pathology and Laboratory Medicine, Keck School of Medicine, USC
| | - Pieter C. Dorrestein
- Department of Chemistry and Biochemistry, UC San Diego
- Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego
| | | |
Collapse
|
48
|
Wang J, Bourne PE, Bandeira N. Peptide identification by database search of mixture tandem mass spectra. Mol Cell Proteomics 2011; 10:M111.010017. [PMID: 21862760 DOI: 10.1074/mcp.m111.010017] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision.
Collapse
Affiliation(s)
- Jian Wang
- Bioinformatics Program, University of California, San Diego, La Jolla, CA 92093, USA
| | | | | |
Collapse
|
49
|
Gupta N, Bandeira N, Keich U, Pevzner PA. Target-decoy approach and false discovery rate: when things may go wrong. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2011; 22:1111-20. [PMID: 21953092 PMCID: PMC3220955 DOI: 10.1007/s13361-011-0139-3] [Citation(s) in RCA: 116] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Revised: 02/19/2011] [Accepted: 02/22/2011] [Indexed: 05/12/2023]
Abstract
The target-decoy approach (TDA) has done the field of proteomics a great service by filling in the need to estimate the false discovery rates (FDR) of peptide identifications. While TDA is often viewed as a universal solution to the problem of FDR evaluation, we argue that the time has come to critically re-examine TDA and to acknowledge not only its merits but also its demerits. We demonstrate that some popular MS/MS search tools are not TDA-compliant and that it is easy to develop a non-TDA compliant tool that outperforms all TDA-compliant tools. Since the distinction between TDA-compliant and non-TDA compliant tools remains elusive, we are concerned about a possible proliferation of non-TDA-compliant tools in the future (developed with the best intentions). We are also concerned that estimation of the FDR by TDA awkwardly depends on a virtual coin toss and argue that it is important to take the coin toss factor out of our estimation of the FDR. Since computing FDR via TDA suffers from various restrictions, we argue that TDA is not needed when accurate p-values of individual Peptide-Spectrum Matches are available.
Collapse
Affiliation(s)
- Nitin Gupta
- Bioinformatics Program, University of California San Diego, La Jolla, CA, USA
| | | | | | | |
Collapse
|
50
|
Jefferys SR, Giddings MC. Baking a mass-spectrometry data PIE with McMC and simulated annealing: predicting protein post-translational modifications from integrated top-down and bottom-up data. ACTA ACUST UNITED AC 2011; 27:844-52. [PMID: 21389073 DOI: 10.1093/bioinformatics/btr027] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
MOTIVATION Post-translational modifications are vital to the function of proteins, but are hard to study, especially since several modified isoforms of a protein may be present simultaneously. Mass spectrometers are a great tool for investigating modified proteins, but the data they provide is often incomplete, ambiguous and difficult to interpret. Combining data from multiple experimental techniques-especially bottom-up and top-down mass spectrometry-provides complementary information. When integrated with background knowledge this allows a human expert to interpret what modifications are present and where on a protein they are located. However, the process is arduous and for high-throughput applications needs to be automated. RESULTS This article explores a data integration methodology based on Markov chain Monte Carlo and simulated annealing. Our software, the Protein Inference Engine (the PIE) applies these algorithms using a modular approach, allowing multiple types of data to be considered simultaneously and for new data types to be added as needed. Even for complicated data representing multiple modifications and several isoforms, the PIE generates accurate modification predictions, including location. When applied to experimental data collected on the L7/L12 ribosomal protein the PIE was able to make predictions consistent with manual interpretation for several different L7/L12 isoforms using a combination of bottom-up data with experimentally identified intact masses. AVAILABILITY Software, demo projects and source can be downloaded from http://pie.giddingslab.org/
Collapse
|