1
|
Vitorino R, Guedes S, Trindade F, Correia I, Moura G, Carvalho P, Santos MAS, Amado F. De novo sequencing of proteins by mass spectrometry. Expert Rev Proteomics 2020; 17:595-607. [PMID: 33016158 DOI: 10.1080/14789450.2020.1831387] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
INTRODUCTION Proteins are crucial for every cellular activity and unraveling their sequence and structure is a crucial step to fully understand their biology. Early methods of protein sequencing were mainly based on the use of enzymatic or chemical degradation of peptide chains. With the completion of the human genome project and with the expansion of the information available for each protein, various databases containing this sequence information were formed. AREAS COVERED De novo protein sequencing, shotgun proteomics and other mass-spectrometric techniques, along with the various software are currently available for proteogenomic analysis. Emphasis is placed on the methods for de novo sequencing, together with potential and shortcomings using databases for interpretation of protein sequence data. EXPERT OPINION As mass-spectrometry sequencing performance is improving with better software and hardware optimizations, combined with user-friendly interfaces, de-novo protein sequencing becomes imperative in shotgun proteomic studies. Issues regarding unknown or mutated peptide sequences, as well as, unexpected post-translational modifications (PTMs) and their identification through false discovery rate searches using the target/decoy strategy need to be addressed. Ideally, it should become integrated in standard proteomic workflows as an add-on to conventional database search engines, which then would be able to provide improved identification.
Collapse
Affiliation(s)
- Rui Vitorino
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal.,iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal.,Unidade De Investigação Cardiovascular, Departamento De Cirurgia E Fisiologia, Faculdade De Medicina, Universidade Do Porto , Porto, Portugal
| | - Sofia Guedes
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal
| | - Fabio Trindade
- Unidade De Investigação Cardiovascular, Departamento De Cirurgia E Fisiologia, Faculdade De Medicina, Universidade Do Porto , Porto, Portugal
| | - Inês Correia
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Gabriela Moura
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Paulo Carvalho
- Laboratory for Structural and Computational Proteomics, Carlos Chagas Institute, FIOCRUZ, Laboratory for Proteomics and Protein Engineering , Brazil
| | - Manuel A S Santos
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Francisco Amado
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal
| |
Collapse
|
2
|
Renard BY, Xu B, Kirchner M, Zickmann F, Winter D, Korten S, Brattig NW, Tzur A, Hamprecht FA, Steen H. Overcoming species boundaries in peptide identification with Bayesian information criterion-driven error-tolerant peptide search (BICEPS). Mol Cell Proteomics 2012; 11:M111.014167. [PMID: 22493179 PMCID: PMC3394943 DOI: 10.1074/mcp.m111.014167] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. Although sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides that are not exactly contained in a protein database. De novo searches are generally hindered by their restricted reliability, and current error-tolerant search strategies are limited by global, heuristic tradeoffs between database and spectral information. We propose a Bayesian information criterion-driven error-tolerant peptide search (BICEPS) and offer an open source implementation based on this statistical criterion to automatically balance the information of each single spectrum and the database, while limiting the run time. We show that BICEPS performs as well as current database search algorithms when such algorithms are applied to sequenced organisms, whereas BICEPS only uses a remotely related organism database. For instance, we use a chicken instead of a human database corresponding to an evolutionary distance of more than 300 million years (International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716). We demonstrate the successful application to cross-species proteomics with a 33% increase in the number of identified proteins for a filarial nematode sample of Litomosoides sigmodontis.
Collapse
Affiliation(s)
- Bernhard Y Renard
- Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin 13353, Germany.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Wright P, Noirel J, Ow SY, Fazeli A. A review of current proteomics technologies with a survey on their widespread use in reproductive biology investigations. Theriogenology 2012; 77:738-765.e52. [DOI: 10.1016/j.theriogenology.2011.11.012] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2011] [Revised: 11/08/2011] [Accepted: 11/11/2011] [Indexed: 12/27/2022]
|
4
|
Proteomics in molecular diagnosis: typing of amyloidosis. J Biomed Biotechnol 2011; 2011:754109. [PMID: 22131817 PMCID: PMC3205904 DOI: 10.1155/2011/754109] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2011] [Revised: 07/01/2011] [Accepted: 07/11/2011] [Indexed: 12/21/2022] Open
Abstract
Amyloidosis is a group of disorders caused by deposition of misfolded proteins as aggregates in the extracellular tissues of the body, leading to impairment of organ function. Correct identification of the causal amyloid protein is absolutely crucial for clinical management in order to avoid misdiagnosis and inappropriate, potentially harmful treatment, to assess prognosis and to offer genetic counselling if relevant. Current diagnostic methods, including antibody-based amyloid typing, have limited ability to detect the full range of amyloid forming proteins. Recent investigations into proteomic identification of amyloid protein have shown promise. This paper will review the current state of the art in proteomic analysis of amyloidosis, discuss the suitability of techniques based on the properties of amyloidosis, and further suggest potential areas of development. Establishment of mass spectrometry aided amyloid typing procedures in the pathology laboratory will allow accurate amyloidosis diagnosis in a timely manner and greatly facilitate clinical management of the disease.
Collapse
|
5
|
Alexandridou A, Dovrolis N, Tsangaris GT, Nikita K, Spyrou G. PepServe: a web server for peptide analysis, clustering and visualization. Nucleic Acids Res 2011; 39:W381-4. [PMID: 21572105 PMCID: PMC3125752 DOI: 10.1093/nar/gkr318] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Peptides, either as protein fragments or as naturally occurring entities are characterized by their sequence and function features. Many times the researchers need to massively manage peptide lists concerning protein identification, biomarker discovery, bioactivity, immune response or other functionalities. We present a web server that manages peptide lists in terms of feature analysis as well as interactive clustering and visualization of the given peptides. PepServe is a useful tool in the understanding of the peptide feature distribution among a group of peptides. The PepServe web application is freely available at http://bioserver-1.bioacademy.gr/Bioserver/PepServe/.
Collapse
Affiliation(s)
- Anastasia Alexandridou
- Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephessiou, 115 27 Athens, Greece
| | | | | | | | | |
Collapse
|
6
|
A hybrid, de novo based, genome-wide database search approach applied to the sea urchin neuropeptidome. J Proteome Res 2010; 9:990-6. [PMID: 20000637 DOI: 10.1021/pr900885k] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Peptidomics is the identification and study of the in vivo biologically active peptide profile. A combination of high performance liquid chromatography, mass spectrometry, and bioinformatics tools such as database search engines are commonly used to perform the analysis. We report a methodology based on a database system holding the completed translated genome, whereby de novo sequencing and genome-wide database searching are combined. The methodology was applied to the sea urchin neuropeptidome resulting in a 30% increase in identification rate.
Collapse
|
7
|
Abstract
Proteomics has advanced in leaps and bounds over the past couple of decades. However, the continuing dependency of mass spectrometry-based protein identification on the searching of spectra against protein sequence databases limits many proteomics experiments. If there is no sequenced genome for a given species, then cross species proteomics is required, attempting to identify proteins across the species boundary, typically using the sequenced genome of a closely related species. Unlike sequence searching for homologues, the proteomics equivalent is confounded by small differences in amino acid sequences, leading to large differences in peptide masses; this renders mass matching of peptides and their product ions difficult. Therefore, the phylogenetic distance between the two species and the attendant level of conservation between the homologous proteins play a huge part in determining the extent of protein identification that is possible across the species boundary. In this chapter, we review the cross species challenge itself, as well as various approaches taken to deal with it and the success met with in past studies. This is followed by recommendations of best practice and suggestions to researchers facing this challenge as well as a final section predicting developments, which may help improve cross species proteomics in the future.
Collapse
Affiliation(s)
- J C Wright
- Department Veterinary Preclinical Sciences, University of Liverpool, Crown Street, Liverpool, UK
| | | | | |
Collapse
|
8
|
Critical Evaluation of Product Ion Selection and Spectral Correlation Analysis for Biomarker Screening Using Targeted Peptide Multiple Reaction Monitoring. Clin Proteomics 2009. [DOI: 10.1007/s12014-009-9023-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Abstract
Introduction
Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomic screens aimed at discovering putative protein biomarkers of disease with potential clinical applications. Systematic validation of lead candidates in large numbers of samples from patient cohorts remains an important challenge. One particularly promising high throughout technique is multiple reaction monitoring (MRM), a targeted form of MS/MS by which precise peptide precursor–product ion combinations, or transitions, are selectively tracked as informative probes. Despite recent progress, however, many important computational and statistical issues remain unresolved. These include the selection of an optimal set of transitions so as to achieve sufficiently high specificity and sensitivity when profiling complex biological specimens, and the corresponding generation of a suitable scoring function to reliably confirm tentative molecular identities based on noisy spectra.
Methods
In this study, we investigate various empirical criteria that are helpful to consider when developing and interpreting MRM-style assays based on the similarity between experimental and annotated reference spectra. We also rigorously evaluate and compare the performance of conventional spectral similarity measures, based on only a few pre-selected representative transitions, with a generic scoring metric, termed T
corr, wherein a selected product ion profile is used to score spectral comparisons.
Conclusions
Our analyses demonstrate that T
corr is potentially more suitable and effective for detecting biomarkers in complex biological mixtures than more traditional spectral library searches.
Collapse
|
9
|
Xu C, Ma B. Software for computational peptide identification from MS-MS data. Drug Discov Today 2007; 11:595-600. [PMID: 16793527 DOI: 10.1016/j.drudis.2006.05.011] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2005] [Revised: 04/07/2006] [Accepted: 05/16/2006] [Indexed: 01/22/2023]
Abstract
Protein identification in biological samples is an important task in drug discovery research. Protein identification is nowadays regularly performed by tandem mass spectrometry (MS-MS). Because of the difficulty of measuring intact proteins using MS-MS, typically a protein is enzymically digested into peptides and the MS-MS spectrum of each peptide is measured. Computational methods are then invoked to identify the peptides, which are later combined together to identify the protein. The most recognized peptide identification software packages can be classified into four categories: database searching, de novo sequencing, sequence tagging and consensus of multiple engines.
Collapse
Affiliation(s)
- Changjiang Xu
- Department of Computer Science, University of Western Ontario, London, Ontario N6A 5B7, Canada
| | | |
Collapse
|
10
|
Liu J, Bell AW, Bergeron JJM, Yanofsky CM, Carrillo B, Beaudrie CEH, Kearney RE. Methods for peptide identification by spectral comparison. Proteome Sci 2007; 5:3. [PMID: 17227583 PMCID: PMC1783643 DOI: 10.1186/1477-5956-5-3] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2006] [Accepted: 01/16/2007] [Indexed: 11/15/2022] Open
Abstract
Background Tandem mass spectrometry followed by database search is currently the predominant technology for peptide sequencing in shotgun proteomics experiments. Most methods compare experimentally observed spectra to the theoretical spectra predicted from the sequences in protein databases. There is a growing interest, however, in comparing unknown experimental spectra to a library of previously identified spectra. This approach has the advantage of taking into account instrument-dependent factors and peptide-specific differences in fragmentation probabilities. It is also computationally more efficient for high-throughput proteomics studies. Results This paper investigates computational issues related to this spectral comparison approach. Different methods have been empirically evaluated over several large sets of spectra. First, we illustrate that the peak intensities follow a Poisson distribution. This implies that applying a square root transform will optimally stabilize the peak intensity variance. Our results show that the square root did indeed outperform other transforms, resulting in improved accuracy of spectral matching. Second, different measures of spectral similarity were compared, and the results illustrated that the correlation coefficient was most robust. Finally, we examine how to assemble multiple spectra associated with the same peptide to generate a synthetic reference spectrum. Ensemble averaging is shown to provide the best combination of accuracy and efficiency. Conclusion Our results demonstrate that when combined, these methods can boost the sensitivity and specificity of spectral comparison. Therefore they are capable of enhancing and complementing existing tools for consistent and accurate peptide identification.
Collapse
Affiliation(s)
- Jian Liu
- Center for Cellular & Biomolecular Research, University of Toronto, Toronto, Canada
| | | | - John JM Bergeron
- Department of Anatomy and Cell Biology, McGill University, Montreal, Canada
| | - Corey M Yanofsky
- Department of Biomedical Engineering, McGill University, Montreal, Canada
| | - Brian Carrillo
- Department of Biomedical Engineering, McGill University, Montreal, Canada
| | | | - Robert E Kearney
- Department of Biomedical Engineering, McGill University, Montreal, Canada
| |
Collapse
|