1
|
Tariq MU, Ebert S, Saeed F. Making MS Omics Data ML-Ready: SpeCollate Protocols. Methods Mol Biol 2024; 2836:135-155. [PMID: 38995540 DOI: 10.1007/978-1-0716-4007-4_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
The increasing complexity and volume of mass spectrometry (MS) data have presented new challenges and opportunities for proteomics data analysis and interpretation. In this chapter, we provide a comprehensive guide to transforming MS data for machine learning (ML) training, inference, and applications. The chapter is organized into three parts. The first part describes the data analysis needed for MS-based experiments and a general introduction to our deep learning model SpeCollate-which we will use throughout the chapter for illustration. The second part of the chapter explores the transformation of MS data for inference, providing a step-by-step guide for users to deduce peptides from their MS data. This section aims to bridge the gap between data acquisition and practical applications by detailing the necessary steps for data preparation and interpretation. In the final part, we present a demonstrative example of SpeCollate, a deep learning-based peptide database search engine that overcomes the problems of simplistic simulation of theoretical spectra and heuristic scoring functions for peptide-spectrum matches by generating joint embeddings for spectra and peptides. SpeCollate is a user-friendly tool with an intuitive command-line interface to perform the search, showcasing the effectiveness of the techniques and methodologies discussed in the earlier sections and highlighting the potential of machine learning in the context of mass spectrometry data analysis. By offering a comprehensive overview of data transformation, inference, and ML model applications for mass spectrometry, this chapter aims to empower researchers and practitioners in leveraging the power of machine learning to unlock novel insights and drive innovation in the field of mass spectrometry-based omics.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- Knight Foundation School of Computing and Information Sciences (KFSCIS), Florida International University (FIU), Miami, FL, USA
| | - Samuel Ebert
- Knight Foundation School of Computing and Information Sciences (KFSCIS), Florida International University (FIU), Miami, FL, USA
| | - Fahad Saeed
- Knight Foundation School of Computing and Information Sciences (KFSCIS), Florida International University (FIU), Miami, FL, USA.
| |
Collapse
|
2
|
Tariq MU, Saeed F. SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions. PLoS One 2021; 16:e0259349. [PMID: 34714871 PMCID: PMC8555789 DOI: 10.1371/journal.pone.0259349] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 10/18/2021] [Indexed: 11/19/2022] Open
Abstract
Historically, the database search algorithms have been the de facto standard for inferring peptides from mass spectrometry (MS) data. Database search algorithms deduce peptides by transforming theoretical peptides into theoretical spectra and matching them to the experimental spectra. Heuristic similarity-scoring functions are used to match an experimental spectrum to a theoretical spectrum. However, the heuristic nature of the scoring functions and the simple transformation of the peptides into theoretical spectra, along with noisy mass spectra for the less abundant peptides, can introduce a cascade of inaccuracies. In this paper, we design and implement a Deep Cross-Modal Similarity Network called SpeCollate, which overcomes these inaccuracies by learning the similarity function between experimental spectra and peptides directly from the labeled MS data. SpeCollate transforms spectra and peptides into a shared Euclidean subspace by learning fixed size embeddings for both. Our proposed deep-learning network trains on sextuplets of positive and negative examples coupled with our custom-designed SNAP-loss function. Online hardest negative mining is used to select the appropriate negative examples for optimal training performance. We use 4.8 million sextuplets obtained from the NIST and MassIVE peptide libraries to train the network and demonstrate that for closed search, SpeCollate is able to perform better than Crux and MSFragger in terms of the number of peptide-spectrum matches (PSMs) and unique peptides identified under 1% FDR for real-world data. SpeCollate also identifies a large number of peptides not reported by either Crux or MSFragger. To the best of our knowledge, our proposed SpeCollate is the first deep-learning network that can determine the cross-modal similarity between peptides and mass-spectra for MS-based proteomics. We believe SpeCollate is significant progress towards developing machine-learning solutions for MS-based omics data analysis. SpeCollate is available at https://deepspecs.github.io/.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing & Information Sciences, Florida International University, Miami, FL, United States of America
| | - Fahad Saeed
- School of Computing & Information Sciences, Florida International University, Miami, FL, United States of America
| |
Collapse
|
3
|
Novel bioactive peptides of Achillea eriophora show anticancer and antioxidant activities. Bioorg Chem 2021; 110:104777. [PMID: 33714023 DOI: 10.1016/j.bioorg.2021.104777] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 12/19/2020] [Accepted: 02/21/2021] [Indexed: 12/24/2022]
Abstract
Despite the limitations of current methods in cancer treatment, the use of bioactive peptides can be as an alternative to treat today. Therefore, isolation and relative purification of bioactive peptides was carried out form Achillea eriophora using a Sep-Pak C18 SPE cartridge and Amicon® Ultra Centrifugal Filters. The presence of desired peptides was checked using RP-HPLC and confirmed using LC-MS. The results of anticancer assay showed that the peptide mixture inhibits the growth of MCF-7 cancerous cell line with the values of IC50, GI50, and LC50 equal to 18.73 ± 0.22, 7.52 ± 0.15, and 56.73 ± 0.18 µg/mL, respectively. It also showed DPPH radical scavenging activity and cupric-ion reducing power with the IC50 value of 5.095 ± 0.23 and 63.3 ± 0.44 µg/mL, respectively. Although flavonoids were present in the sample along with the peptides, their amount was trivial (18.097 ± 1.36 μg/mL). Nevertheless, the results of the LC-MS showed mass-to-charge ratios of 301.17, 261.22, and 243.25, which was a dipeptide or tripeptide in compression to enzyme-digested BSA as a standard. In addition, SEM analysis of the purified peptide mixture showed that it kills the MCF-7 cancerous cell line by creating pores in the membrane. Therefore, it might be valuable to these peptides sequenced and be studied for physicochemical properties. Animal and clinical studies could help its application in drug development.
Collapse
|
4
|
Tariq MU, Haseeb M, Aledhari M, Razzak R, Parizi RM, Saeed F. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 9:5497-5516. [PMID: 33537181 PMCID: PMC7853650 DOI: 10.1109/access.2020.3047588] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process a small-scale dataset of one million spectra against a genome of 3 GB. In this article, we provide an up-to-date review of tools that can analyze proteogenomic datasets, providing a critical analysis of the techniques' relative merits and potential pitfalls. We also point out potential bottlenecks and recommendations that can be incorporated in the future design of these workflows to ensure scalability with the increasing size of proteogenomic data. Lastly, we make a case of how high-performance computing (HPC) solutions may be the best bet to ensure the scalability of future big data proteogenomic data analysis.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Muhammad Haseeb
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Mohammed Aledhari
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Rehma Razzak
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Reza M Parizi
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
5
|
Vitorino R, Guedes S, Trindade F, Correia I, Moura G, Carvalho P, Santos MAS, Amado F. De novo sequencing of proteins by mass spectrometry. Expert Rev Proteomics 2020; 17:595-607. [PMID: 33016158 DOI: 10.1080/14789450.2020.1831387] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
INTRODUCTION Proteins are crucial for every cellular activity and unraveling their sequence and structure is a crucial step to fully understand their biology. Early methods of protein sequencing were mainly based on the use of enzymatic or chemical degradation of peptide chains. With the completion of the human genome project and with the expansion of the information available for each protein, various databases containing this sequence information were formed. AREAS COVERED De novo protein sequencing, shotgun proteomics and other mass-spectrometric techniques, along with the various software are currently available for proteogenomic analysis. Emphasis is placed on the methods for de novo sequencing, together with potential and shortcomings using databases for interpretation of protein sequence data. EXPERT OPINION As mass-spectrometry sequencing performance is improving with better software and hardware optimizations, combined with user-friendly interfaces, de-novo protein sequencing becomes imperative in shotgun proteomic studies. Issues regarding unknown or mutated peptide sequences, as well as, unexpected post-translational modifications (PTMs) and their identification through false discovery rate searches using the target/decoy strategy need to be addressed. Ideally, it should become integrated in standard proteomic workflows as an add-on to conventional database search engines, which then would be able to provide improved identification.
Collapse
Affiliation(s)
- Rui Vitorino
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal.,iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal.,Unidade De Investigação Cardiovascular, Departamento De Cirurgia E Fisiologia, Faculdade De Medicina, Universidade Do Porto , Porto, Portugal
| | - Sofia Guedes
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal
| | - Fabio Trindade
- Unidade De Investigação Cardiovascular, Departamento De Cirurgia E Fisiologia, Faculdade De Medicina, Universidade Do Porto , Porto, Portugal
| | - Inês Correia
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Gabriela Moura
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Paulo Carvalho
- Laboratory for Structural and Computational Proteomics, Carlos Chagas Institute, FIOCRUZ, Laboratory for Proteomics and Protein Engineering , Brazil
| | - Manuel A S Santos
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Francisco Amado
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal
| |
Collapse
|
6
|
Software-aided detection and structural characterization of cyclic peptide metabolites in biological matrix by high-resolution mass spectrometry. J Pharm Anal 2020; 10:240-246. [PMID: 32612870 PMCID: PMC7322757 DOI: 10.1016/j.jpha.2020.05.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Revised: 05/25/2020] [Accepted: 05/25/2020] [Indexed: 11/21/2022] Open
Abstract
Compared to their linear counterparts, cyclic peptides show better biological activities, such as antibacterial, immunosuppressive, and anti-tumor activities, and pharmaceutical properties due to their conformational rigidity. However, cyclic peptides could form numerous putative metabolites from potential hydrolytic cleavages and their fragments are very difficult to interpret. These characteristics pose a great challenge when analyzing metabolites of cyclic peptides by mass spectrometry. This study was to assess and apply a software-aided analytical workflow for the detection and structural characterization of cyclic peptide metabolites. Insulin and atrial natriuretic peptide (ANP) as model cyclic peptides were incubated with trypsin/chymotrypsin and/or rat liver S9, followed by data acquisition using TripleTOF® 5600. Resultant full-scan MS and MS/MS datasets were automatically processed through a combination of targeted and untargeted peak finding strategies. MS/MS spectra of predicted metabolites were interrogated against putative metabolite sequences, in light of a, b, y and internal fragment series. The resulting fragment assignments led to the confirmation and ranking of the metabolite sequences and identification of metabolic modification. As a result, 29 metabolites with linear or cyclic structures were detected in the insulin incubation with the hydrolytic enzymes. Sequences of twenty insulin metabolites were further determined, which were consistent with the hydrolytic sites of these enzymes. In the same manner, multiple metabolites of insulin and ANP formed in rat liver S9 incubation were detected and structurally characterized, some of which have not been previously reported. The results demonstrated the utility of software-aided data processing tool in detection and identification of cyclic peptide metabolites. A software-aided workflow enabling detection and characterization of cyclic peptide metabolites by LC/HRMS. Automatically data processing through a combination of targeted and untargeted peak finding strategies. MS/MS spectra of predicted metabolites interrogated against putative metabolite sequences. Rapidly determining metabolite profiles of insulin and atrial natriuretic peptide in rat liver S9. Potentially applicable to metabolic soft spot analysis and in vitro metabolism across species in drug discovery.
Collapse
|
7
|
Silla Y, Varshney S, Ray A, Basak T, Zinellu A, Sabareesh V, Carru C, Sengupta S. Hydrolysis of homocysteine thiolactone results in the formation of Protein-Cys-S-S-homocysteinylation. Proteins 2019; 87:625-634. [PMID: 30869815 DOI: 10.1002/prot.25681] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Revised: 01/31/2019] [Accepted: 02/17/2019] [Indexed: 11/07/2022]
Abstract
An increased level of homocysteine, a reactive thiol amino acid, is associated with several complex disorders and is an independent risk factor for cardiovascular disease. A majority (>80%) of circulating homocysteine is protein bound. Homocysteine exclusively binds to protein cysteine residues via thiol disulfide exchange reaction, the mechanism of which has been reported. In contrast, homocysteine thiolactone, the cyclic thioester of homocysteine, is believed to exclusively bind to the primary amine group of lysine residue leading to N-homocysteinylation of proteins and hence studies on binding of homocysteine thiolactone to proteins thus far have only focused on N-homocysteinylation. Although it is known that homocysteine thiolactone can hydrolyze to homocysteine at physiological pH, surprisingly the extent of S-homocysteinylation during the exposure of homocysteine thiolactone with proteins has never been looked into. In this study, we clearly show that the hydrolysis of homocysteine thiolactone is pH dependent, and at physiological pH, 1 mM homocysteine thiolactone is hydrolysed to ~0.71 mM homocysteine within 24 h. Using albumin, we also show that incubation of HTL with albumin leads to a greater proportion of S-homocysteinylation (0.41 mol/mol of albumin) than N-homocysteinylation (0.14 mol/mol of albumin). S-homocysteinylation at Cys34 of HSA on treatment with homocysteine thiolactone was confirmed using LC-MS. Further, contrary to earlier reports, our results indicate that there is no cross talk between the cysteine attached to Cys34 of albumin and homocysteine attached to lysine residues.
Collapse
Affiliation(s)
- Yumnam Silla
- Department of Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, Delhi, India
- Academy of Scientific & Innovative Research (AcSIR), New Delhi, Delhi, India
| | - Swati Varshney
- Department of Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, Delhi, India
- Academy of Scientific & Innovative Research (AcSIR), New Delhi, Delhi, India
| | - Arjun Ray
- Department of Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, Delhi, India
- Academy of Scientific & Innovative Research (AcSIR), New Delhi, Delhi, India
| | - Trayambak Basak
- Department of Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, Delhi, India
- Academy of Scientific & Innovative Research (AcSIR), New Delhi, Delhi, India
| | - Angelo Zinellu
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Varatharajan Sabareesh
- Department of Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, Delhi, India
| | - Ciriaco Carru
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
- Quality Control Unit, University Hospital of Sassari (AOU Sassari), Sassari, Italy
| | - Shantanu Sengupta
- Department of Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, Delhi, India
- Academy of Scientific & Innovative Research (AcSIR), New Delhi, Delhi, India
| |
Collapse
|
8
|
Annotating and Interpreting Linear and Cyclic Peptide Tandem Mass Spectra. Methods Mol Biol 2016. [PMID: 26831710 DOI: 10.1007/978-1-4939-3375-4_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Nonribosomal peptides often possess pronounced bioactivity, and thus, they are often interesting hit compounds in natural product-based drug discovery programs. Their mass spectrometric characterization is difficult due to the predominant occurrence of non-proteinogenic monomers and, especially in the case of cyclic peptides, the complex fragmentation patterns observed. This makes nonribosomal peptide tandem mass spectra annotation challenging and time-consuming. To meet this challenge, software tools for this task have been developed. In this chapter, the workflow for using the software mMass for the annotation of experimentally obtained peptide tandem mass spectra is described. mMass is freely available (http://www.mmass.org), open-source, and the most advanced and user-friendly software tool for this purpose. The software enables the analyst to concisely annotate and interpret tandem mass spectra of linear and cyclic peptides. Thus, it is highly useful for accelerating the structure confirmation and elucidation of cyclic as well as linear peptides and depsipeptides.
Collapse
|
9
|
Pascale R, Grossi G, Cruciani G, Mecca G, Santoro D, Sarli Calace R, Falabella P, Bianco G. Sequence protein identification by randomized sequence database and transcriptome mass spectrometry (SPIDER-TMS): from manual to automatic application of a 'de novo sequencing' approach. EUROPEAN JOURNAL OF MASS SPECTROMETRY (CHICHESTER, ENGLAND) 2016; 22:193-198. [PMID: 27882884 DOI: 10.1255/ejms.1434] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Sequence protein identification by a randomized sequence database and transcriptome mass spectrometry software package has been developed at the University of Basilicata in Potenza (Italy) and designed to facilitate the determination of the amino acid sequence of a peptide as well as an unequivocal identification of proteins in a high-throughput manner with enormous advantages of time, economical resource and expertise. The software package is a valid tool for the automation of a de novo sequencing approach, overcoming the main limits and a versatile platform useful in the proteomic field for an unequivocal identification of proteins, starting from tandem mass spectrometry data. The strength of this software is that it is a user-friendly and non-statistical approach, so protein identification can be considered unambiguous.
Collapse
Affiliation(s)
- Raffaella Pascale
- Scuola di Ingegneria, Università degli Studi della Basilicata, Via dell'Ateneo Lucano, 10-85100 Potenza, Italy
| | - Gerarda Grossi
- Dipartimento di Scienze, Università degli Studi della Basilicata, Via dell'Ateneo Lucano, 10-85100 Potenza, Italy
| | - Gabriele Cruciani
- Dipartimento di Chimica, Università di Perugia, via Elce di Sotto, 8-06123 Perugia, Italy
| | - Giansalvatore Mecca
- Dipartimento di Matematica, Informatica ed Economia, Via dell'Ateneo Lucano, 10-85100 Potenza, Italy
| | - Donatello Santoro
- Dipartimento di Matematica, Informatica ed Economia, Via dell'Ateneo Lucano, 10-85100 Potenza, Italy
| | | | - Patrizia Falabella
- Dipartimento di Scienze, Università degli Studi della Basilicata, Via dell'Ateneo Lucano, 10-85100 Potenza, Italy.
| | - Giuliana Bianco
- Dipartimento di Scienze, Università degli Studi della Basilicata, Via dell'Ateneo Lucano, 10-85100 Potenza, Italy.
| |
Collapse
|
10
|
Labella C, Kanawati B, Vogel H, Schmitt-Kopplin P, Laurino S, Bianco G, Falabella P. Identification of two arginine kinase forms of endoparasitoid Leptomastix dactylopii venom by bottom up-sequence tag approach. JOURNAL OF MASS SPECTROMETRY : JMS 2015; 50:756-765. [PMID: 26259659 DOI: 10.1002/jms.3585] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2014] [Revised: 02/20/2015] [Accepted: 02/21/2015] [Indexed: 06/04/2023]
Abstract
Leptomastix dactylopii (Howard) is an endoparasitoid wasp, natural enemy of mealybug Planococcus citri (Risso). Despite the acquired knowledge regarding this host-parasitoid interaction, only little information is available on the factors of parasitoid origin able to modulate the mealybug physiology. The major alteration observed in P. citri is a strong reduction in fecundity, which is evident soon after parasitization by L. dactylopii or venom injection in unparasitized hosts indicating that this proteinaceus secretion injected at the oviposition plays a key-role in host regulation. Protein identification of L. dactilopii venom has been limited by the lack of literature sources and public protein databases. Here, we identified two venom proteins by an integrated trascriptomic and proteomic approach. A custom-made transcriptomic database from the L. dactylopii venom glands was created by applying the high-throughput RNA sequencing approach. Two-dimensional gel electrophoresis (2DE) trypsinized protein spots were analyzed by high-resolution mass spectrometry (FTICRMS-12 T). The most abundant peptide ions were fragmented by collision induced dissociation and the obtained sequence tags were subjected to custom-made protein database searching. Two putative arginine kinases (full-length and truncated form) were identified. This is the first case in which both, truncated and full length arginine kinases, are identified in an endoparasitoid non-paralyzing venom.
Collapse
Affiliation(s)
- Cristiana Labella
- Dipartimento di Scienze, Università degli Studi della Basilicata, Via dell'Ateneo Lucano 10, 85100, Potenza, Italy
| | - Basem Kanawati
- Department of Environmental Sciences, Research Unit Analytical BioGeoChemistry (BGC), Ingolstaedter Landstrasse, 85764, Neuherberg, Germany
| | - Heiko Vogel
- Department of Entomology, Host Plant Adaptation, Max Planck Institute for Chemical Ecology, Hans-Knöll-Straße 8, D-07745, Jena, Germany
| | - Philippe Schmitt-Kopplin
- Department of Environmental Sciences, Research Unit Analytical BioGeoChemistry (BGC), Ingolstaedter Landstrasse, 85764, Neuherberg, Germany
- Chair of Analytical Food Chemistry, Technische Universität München, Alte Akademie 10, D-85354, Freising-Weihenstephan, Germany
| | - Simona Laurino
- Dipartimento di Scienze, Università degli Studi della Basilicata, Via dell'Ateneo Lucano 10, 85100, Potenza, Italy
| | - Giuliana Bianco
- Dipartimento di Scienze, Università degli Studi della Basilicata, Via dell'Ateneo Lucano 10, 85100, Potenza, Italy
| | - Patrizia Falabella
- Dipartimento di Scienze, Università degli Studi della Basilicata, Via dell'Ateneo Lucano 10, 85100, Potenza, Italy
| |
Collapse
|
11
|
Chi H, Chen H, He K, Wu L, Yang B, Sun RX, Liu J, Zeng WF, Song CQ, He SM, Dong MQ. pNovo+: De Novo Peptide Sequencing Using Complementary HCD and ETD Tandem Mass Spectra. J Proteome Res 2012; 12:615-25. [DOI: 10.1021/pr3006843] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Hao Chi
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Haifeng Chen
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kun He
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Long Wu
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bing Yang
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Rui-Xiang Sun
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Jianyun Liu
- Laboratory of Intelligent Recognition
and Image Processing, Beijing Key Laboratory of Digital Media, Beihang University, Beijing, 100191, China
| | - Wen-Feng Zeng
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chun-Qing Song
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Si-Min He
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| |
Collapse
|
12
|
Niedermeyer THJ, Strohalm M. mMass as a software tool for the annotation of cyclic peptide tandem mass spectra. PLoS One 2012; 7:e44913. [PMID: 23028676 PMCID: PMC3441486 DOI: 10.1371/journal.pone.0044913] [Citation(s) in RCA: 210] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Accepted: 08/09/2012] [Indexed: 11/19/2022] Open
Abstract
Natural or synthetic cyclic peptides often possess pronounced bioactivity. Their mass spectrometric characterization is difficult due to the predominant occurrence of non-proteinogenic monomers and the complex fragmentation patterns observed. Even though several software tools for cyclic peptide tandem mass spectra annotation have been published, these tools are still unable to annotate a majority of the signals observed in experimentally obtained mass spectra. They are thus not suitable for extensive mass spectrometric characterization of these compounds. This lack of advanced and user-friendly software tools has motivated us to extend the fragmentation module of a freely available open-source software, mMass (http://www.mmass.org), to allow for cyclic peptide tandem mass spectra annotation and interpretation. The resulting software has been tested on several cyanobacterial and other naturally occurring peptides. It has been found to be superior to other currently available tools concerning both usability and annotation extensiveness. Thus it is highly useful for accelerating the structure confirmation and elucidation of cyclic as well as linear peptides and depsipeptides.
Collapse
|
13
|
Allmer J. Algorithms for the de novo sequencing of peptides from tandem mass spectra. Expert Rev Proteomics 2012; 8:645-57. [PMID: 21999834 DOI: 10.1586/epr.11.54] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not often available or, despite their availability, some sequences are not readily found therein. To overcome this problem, de novo sequencing can be used to directly assign a peptide sequence to a tandem mass spectrometry spectrum. Many algorithms have been proposed for de novo sequencing and a selection of them are detailed in this article. Although a standard accuracy measure has not been agreed upon in the field, relative algorithm performance is discussed. The current state of the de novo sequencing is assessed thereafter and, finally, examples are used to construct possible future perspectives of the field.
Collapse
Affiliation(s)
- Jens Allmer
- Molecular Biology and Genetics, Izmir Institute of Technology, Urla, Izmir 35430, Turkey.
| |
Collapse
|
14
|
Chi H, Sun RX, Yang B, Song CQ, Wang LH, Liu C, Fu Y, Yuan ZF, Wang HP, He SM, Dong MQ. pNovo: de novo peptide sequencing and identification using HCD spectra. J Proteome Res 2010; 9:2713-24. [PMID: 20329752 DOI: 10.1021/pr100182k] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
De novo peptide sequencing has improved remarkably in the past decade as a result of better instruments and computational algorithms. However, de novo sequencing can correctly interpret only approximately 30% of high- and medium-quality spectra generated by collision-induced dissociation (CID), which is much less than database search. This is mainly due to incomplete fragmentation and overlap of different ion series in CID spectra. In this study, we show that higher-energy collisional dissociation (HCD) is of great help to de novo sequencing because it produces high mass accuracy tandem mass spectrometry (MS/MS) spectra without the low-mass cutoff associated with CID in ion trap instruments. Besides, abundant internal and immonium ions in the HCD spectra can help differentiate similar peptide sequences. Taking advantage of these characteristics, we developed an algorithm called pNovo for efficient de novo sequencing of peptides from HCD spectra. pNovo gave correct identifications to 80% or more of the HCD spectra identified by database search. The number of correct full-length peptides sequenced by pNovo is comparable with that obtained by database search. A distinct advantage of de novo sequencing is that deamidated peptides and peptides with amino acid mutations can be identified efficiently without extra cost in computation. In summary, implementation of the HCD characteristics makes pNovo an excellent tool for de novo peptide sequencing from HCD spectra.
Collapse
Affiliation(s)
- Hao Chi
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Liu WT, Ng J, Meluzzi D, Bandeira N, Gutierrez M, Simmons TL, Schultz AW, Linington RG, Moore BS, Gerwick WH, Pevzner PA, Dorrestein PC. Interpretation of tandem mass spectra obtained from cyclic nonribosomal peptides. Anal Chem 2009; 81:4200-9. [PMID: 19413302 PMCID: PMC2765223 DOI: 10.1021/ac900114t] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Natural and non-natural cyclic peptides are a crucial component in drug discovery programs because of their considerable pharmaceutical properties. Cyclosporin, microcystins, and nodularins are all notable pharmacologically important cyclic peptides. Because these biologically active peptides are often biosynthesized nonribosomally, they often contain nonstandard amino acids, thus increasing the complexity of the resulting tandem mass spectrometry data. In addition, because of the cyclic nature, the fragmentation patterns of many of these peptides showed much higher complexity when compared to related counterparts. Therefore, at the present time it is still difficult to annotate cyclic peptides MS/MS spectra. In this current work, an annotation program was developed for the annotation and characterization of tandem mass spectra obtained from cyclic peptides. This program, which we call MS-CPA is available as a web tool (http://lol.ucsd.edu/ms-cpa_v1/Input.py). Using this program, we have successfully annotated the sequence of representative cyclic peptides, such as seglitide, tyrothricin, desmethoxymajusculamide C, dudawalamide A, and cyclomarins, in a rapid manner and also were able to provide the first-pass structure evidence of a newly discovered natural product based on predicted sequence. This compound is not available in sufficient quantities for structural elucidation by other means such as NMR. In addition to the development of this cyclic annotation program, it was observed that some cyclic peptides fragmented in unexpected ways resulting in the scrambling of sequences. In summary, MS-CPA not only provides a platform for rapid confirmation and annotation of tandem mass spectrometry data obtained with cyclic peptides but also enables quantitative analysis of the ion intensities. This program facilitates cyclic peptide analysis, sequencing, and also acts as a useful tool to investigate the uncommon fragmentation phenomena of cyclic peptides and aids the characterization of newly discovered cyclic peptides encountered in drug discovery programs.
Collapse
Affiliation(s)
- Wei-Ting Liu
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093-0636, USA
| | - Julio Ng
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093-0404, USA
| | - Dario Meluzzi
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093-0636, USA
| | - Nuno Bandeira
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093-0404, USA
| | - Marcelino Gutierrez
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, California 92093-0204, USA
| | - Thomas L. Simmons
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, California 92093-0204, USA
| | - Andrew W. Schultz
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, California 92093-0204, USA
| | - Roger G. Linington
- Department of Chemistry and Biochemistry, University of California, Santa Cruz, Santa Cruz, CA 95064 USA
| | - Bradley S. Moore
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, California 92093-0204, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California at San Diego, La Jolla, CA 92093-0636, USA
| | - William H. Gerwick
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, California 92093-0204, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California at San Diego, La Jolla, CA 92093-0636, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093-0404, USA
| | - Pieter C. Dorrestein
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093-0636, USA
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, California 92093-0204, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California at San Diego, La Jolla, CA 92093-0636, USA
| |
Collapse
|
16
|
Stoppacher N, Zeilinger S, Omann M, Lassahn PG, Roitinger A, Krska R, Schuhmacher R. Characterisation of the peptaibiome of the biocontrol fungus Trichoderma atroviride by liquid chromatography/tandem mass spectrometry. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2008; 22:1889-1898. [PMID: 18470867 DOI: 10.1002/rcm.3568] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The present study describes the liquid chromatography/tandem mass spectrometry (LC/MS/MS)-based screening and characterisation of linear antibiotic alpha-aminoisobutyric acid (Aib)-containing non-ribosomal peptides (NRP) in culture samples of the filamentous fungus Trichoderma atroviride ATCC 74058. Fungal culture filtrates were enriched by solid-phase extraction (SPE) and separated by reversed-phase high-performance liquid chromatography (HPLC), prior to mass spectrometric (MS) and tandem mass spectrometric (MS/MS) analysis on a triple quadrupole-linear ion trap tandem mass spectrometer. A workflow consisting of two alternative screening strategies was applied to search for NRP. Various MS full scan and MS/MS measurement modes led to the identification of 16 trichorzianines and diagnostic in-source fragment ions of another four trichorzianines. Furthermore, we detected 15 novel Aib-containing peptides with putative molecular weights ranging from 951.7 to 1043.7 g/mol (monoisotopic masses), composed of up to 9 amino acids. While the amino acid sequences of the novel peptaibiotics showed typical microheterogeneity and consisted of the amino acids Leu/Ile, Aib, Ser, Val/Iva, Gly, Ac-Aib, Tyr and Phe, the mass increments at the C-termini of the peptides were not assignable to any residues described in the literature. The amino acid sequences were confirmed and structure proposals made for both molecule termini by high-resolution MS and MS/MS analysis. We propose the group name 'trichoatrokontins' for the newly identified peptaibiotics. As no other peptaibiotics were found in the culture samples, the peptaibiome of the investigated strain of T. atroviride consists of at least 20 trichorzianines and 15 trichoatrokontins.
Collapse
Affiliation(s)
- Norbert Stoppacher
- Department for Agrobiotechnology , University of Natural Resources and Applied Life Sciences, Vienna, Konrad Lorenz Str. 20, A-3430 Tulln, Austria
| | | | | | | | | | | | | |
Collapse
|