1
|
Hagemeijer YP, Guryev V, Horvatovich P. Accurate Prediction of Protein Sequences for Proteogenomics Data Integration. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2420:233-260. [PMID: 34905178 DOI: 10.1007/978-1-0716-1936-0_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
This book chapter discusses proteogenomics data integration and provides an overview into the different omics layer involved in defining the proteome of a living organism. Various aspects of genome variability affecting either the sequence or abundance level of proteins are discussed in this book chapter, such as the effect of single-nucleotide variants or larger genomic structural variants on the proteome. Next, various sequencing technologies are introduced and discussed from a proteogenomics data integration perspective such as those providing short- and long-read sequencing and listing their respective advantages and shortcomings for accurate protein variant prediction using genomic/transcriptomics sequencing data. Finally, the various bioinformatics tools used to process and analyze DNA/RNA sequencing data are discussed with the ultimate goal of obtaining accurately predicted sample-specific protein sequences that can be used as a drop-in replacement in existing approaches for peptide and protein identification using popular database search engines such as MSFragger, SearchGUI/PeptideShaker.
Collapse
Affiliation(s)
- Yanick Paco Hagemeijer
- Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, Groningen, The Netherlands.,European Research Institute for the Biology of Ageing, University Medical Center Groningen, Groningen, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, Groningen, The Netherlands
| | - Peter Horvatovich
- Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, Groningen, The Netherlands.
| |
Collapse
|
2
|
Li Z, He B, Feng W. Evaluation of bottom-up and top-down mass spectrum identifications with different customized protein sequences databases. Bioinformatics 2020; 36:1030-1036. [PMID: 31584612 DOI: 10.1093/bioinformatics/btz733] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Revised: 08/12/2019] [Accepted: 09/25/2019] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Generally, bottom-up and top-down are two complementary approaches for proteoforms identification. The inference of proteoforms relies on searching mass spectra against an accurate proteoform sequence database. A customized protein sequence database derived by RNA-Seq data can be used to better identify the proteoform existed in a studied species. However, the quality of sequences in customized databases which constructed by different strategies affect the performances of mass spectrometry (MS) identification. Additionally, performances of identifications between bottom-up and top-down using customized databases are also needed to be evaluated. RESULTS Three customized databases were constructed with different strategies separately. Two of them were based on translating assembled transcripts with or without genomic annotation, and the third one is a variant-extending protein database. By testing with bottom-up and top-down MS data separately, a variant-extending protein database could identify not only the most number of spectra but also the alleles expressed at the same time in diploid cells. An assembled database could identify the spectrum missed in reference database and amino acid (AA) alterations existed in studied species. AVAILABILITY AND IMPLEMENTATION Experimental results demonstrated that the proteoform sequences in an annotated database are more suitable for identifying AA alterations and peptide sequences missed in reference database. An unannotated database instead of a reference proteome database gets an enough high sensitivity of identifying mass spectra. The variant-extending reference database is the most sensitive to identify mass spectra and single AA variants. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ziwei Li
- College of Automation, Harbin Engineering University, Harbin, Heilongjiang 150001, China
| | - Bo He
- College of Automation, Harbin Engineering University, Harbin, Heilongjiang 150001, China
| | - Weixing Feng
- College of Automation, Harbin Engineering University, Harbin, Heilongjiang 150001, China
| |
Collapse
|
3
|
Bults P, Spanov B, Olaleye O, van de Merbel NC, Bischoff R. Intact protein bioanalysis by liquid chromatography – High-resolution mass spectrometry. J Chromatogr B Analyt Technol Biomed Life Sci 2019; 1110-1111:155-167. [DOI: 10.1016/j.jchromb.2019.01.032] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 01/20/2019] [Accepted: 01/31/2019] [Indexed: 02/07/2023]
|
4
|
Kwiatkowski M, Krösser D, Wurlitzer M, Steffen P, Barcaru A, Krisp C, Horvatovich P, Bischoff R, Schlüter H. Application of Displacement Chromatography to Online Two-Dimensional Liquid Chromatography Coupled to Tandem Mass Spectrometry Improves Peptide Separation Efficiency and Detectability for the Analysis of Complex Proteomes. Anal Chem 2018; 90:9951-9958. [PMID: 30014690 PMCID: PMC6106052 DOI: 10.1021/acs.analchem.8b02189] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
![]()
The complexity of
mammalian proteomes is a challenge in bottom-up
proteomics. For a comprehensive proteome analysis, multidimensional
separation strategies are necessary. Online two-dimensional liquid
chromatography–tandem mass spectrometry (2D-LC-MS/MS) combining
strong cation exchange (SCX) in the first dimension with reversed-phase
(RP) chromatography in the second dimension provides a powerful approach
to analyze complex proteomes. Although the combination of SCX with
RP chromatography provides a good orthogonality, only a moderate separation
is achieved in the first dimension for peptides with two (+2) or three
(+3) positive charges. The aim of this study was to improve the performance
of online SCX-RP-MS/MS by applying displacement chromatography to
the first separation dimension. Compared to gradient chromatography
mode (GCM), displacement chromatography mode (DCM) was expected to
improve the separation of +2-peptides and +3-peptides, thus reducing
complexity and increasing ionization and detectability. The results
show that DCM provided a separation of +2-peptides and +3-peptides
in remarkably sharp zones with a low degree of coelution, thus providing
fractions with significantly higher purities compared to GCM. In particular,
+2-peptides were separated over several fractions, which was not possible
to achieve in GCM. The better separation in DCM resulted in a higher
reproducibility and significantly higher identification rates for
both peptides and proteins including a 2.6-fold increase for +2-peptides.
The higher number of identified peptides in DCM resulted in significantly
higher protein sequence coverages and a considerably higher number
of unique peptides per protein. Compared to conventionally used salt-based
GCM, DCM increased the performance of online SCX-RP-MS/MS and enabled
comprehensive proteome profiling in the low microgram range.
Collapse
Affiliation(s)
- Marcel Kwiatkowski
- Mass Spectrometric Proteomics, Institute of Clinical Chemistry and Laboratory Medicine , University Medical Center Hamburg-Eppendorf , 20246 Hamburg , Germany.,Department of Pharmacokinetics, Toxicology and Targeting, Groningen Research Institute of Pharmacy , University of Groningen , 9713 AV Groningen , The Netherlands
| | - Dennis Krösser
- Mass Spectrometric Proteomics, Institute of Clinical Chemistry and Laboratory Medicine , University Medical Center Hamburg-Eppendorf , 20246 Hamburg , Germany
| | - Marcus Wurlitzer
- Mass Spectrometric Proteomics, Institute of Clinical Chemistry and Laboratory Medicine , University Medical Center Hamburg-Eppendorf , 20246 Hamburg , Germany
| | - Pascal Steffen
- Mass Spectrometric Proteomics, Institute of Clinical Chemistry and Laboratory Medicine , University Medical Center Hamburg-Eppendorf , 20246 Hamburg , Germany
| | - Andrei Barcaru
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy , University of Groningen , 9713 AV Groningen , The Netherlands
| | - Christoph Krisp
- Mass Spectrometric Proteomics, Institute of Clinical Chemistry and Laboratory Medicine , University Medical Center Hamburg-Eppendorf , 20246 Hamburg , Germany
| | - Péter Horvatovich
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy , University of Groningen , 9713 AV Groningen , The Netherlands
| | - Rainer Bischoff
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy , University of Groningen , 9713 AV Groningen , The Netherlands
| | - Hartmut Schlüter
- Mass Spectrometric Proteomics, Institute of Clinical Chemistry and Laboratory Medicine , University Medical Center Hamburg-Eppendorf , 20246 Hamburg , Germany
| |
Collapse
|
5
|
Barbieri R, Guryev V, Brandsma CA, Suits F, Bischoff R, Horvatovich P. Proteogenomics: Key Driver for Clinical Discovery and Personalized Medicine. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2017; 926:21-47. [PMID: 27686804 DOI: 10.1007/978-3-319-42316-6_3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Proteogenomics is a multi-omics research field that has the aim to efficiently integrate genomics, transcriptomics and proteomics. With this approach it is possible to identify new patient-specific proteoforms that may have implications in disease development, specifically in cancer. Understanding the impact of a large number of mutations detected at the genomics level is needed to assess the effects at the proteome level. Proteogenomics data integration would help in identifying molecular changes that are persistent across multiple molecular layers and enable better interpretation of molecular mechanisms of disease, such as the causal relationship between single nucleotide polymorphisms (SNPs) and the expression of transcripts and translation of proteins compared to mainstream proteomics approaches. Identifying patient-specific protein forms and getting a better picture of molecular mechanisms of disease opens the avenue for precision and personalized medicine. Proteogenomics is, however, a challenging interdisciplinary science that requires the understanding of sample preparation, data acquisition and processing for genomics, transcriptomics and proteomics. This chapter aims to guide the reader through the technology and bioinformatics aspects of these multi-omics approaches, illustrated with proteogenomics applications having clinical or biological relevance.
Collapse
Affiliation(s)
- Ruggero Barbieri
- Department of Gastroenterology and Hepatology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Corry-Anke Brandsma
- Department of Pathology & Medical Biology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Frank Suits
- IBM T.J. Watson Research Centre, 1101 Kitchawan Road, Yorktown Heights, New York, 10598, NY, USA
| | - Rainer Bischoff
- Department of Analytical Biochemistry, Research Institute of Pharmacy, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Peter Horvatovich
- Department of Analytical Biochemistry, Research Institute of Pharmacy, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands.
| |
Collapse
|
6
|
Re A, Waldron L, Quattrone A. Control of Gene Expression by RNA Binding Protein Action on Alternative Translation Initiation Sites. PLoS Comput Biol 2016; 12:e1005198. [PMID: 27923063 PMCID: PMC5140048 DOI: 10.1371/journal.pcbi.1005198] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2015] [Accepted: 10/13/2016] [Indexed: 11/18/2022] Open
Abstract
Transcript levels do not faithfully predict protein levels, due to post-transcriptional regulation of gene expression mediated by RNA binding proteins (RBPs) and non-coding RNAs. We developed a multivariate linear regression model integrating RBP levels and predicted RBP-mRNA regulatory interactions from matched transcript and protein datasets. RBPs significantly improved the accuracy in predicting protein abundance of a portion of the total modeled mRNAs in three panels of tissues and cells and for different methods employed in the detection of mRNA and protein. The presence of upstream translation initiation sites (uTISs) at the mRNA 5’ untranslated regions was strongly associated with improvement in predictive accuracy. On the basis of these observations, we propose that the recently discovered widespread uTISs in the human genome can be a previously unappreciated substrate of translational control mediated by RBPs. Gene expression is a dynamic program by which the information stored in the genome is rendered functional by production and degradation of two types of macromolecules, RNAs and proteins. mRNAs are templates for proteins; therefore we expect correspondence between quantities of mRNAs and proteins. Genome-wide studies instead indicate a marked discrepancy between them, when considering their steady-state levels or their variations across different conditions. We employed linear regression approaches with paired mRNA/protein datasets in order to develop a model predicting the protein level of a gene from both the mRNA level and the protein levels of RBPs inferred to bind the mRNA untranslated regions. The results of our analyses restricted the utility of RBPs to improve accuracy of predicted protein abundance to a small fraction of the total modelled genes, and identified a novel association of the improvement induced by RBPs with the presence of upstream translation sites. This finding suggests a new avenue of experimental studies aimed at exploring the hypothesis that RBPs could influence protein abundance by changing the preference for certain translation initiation sites.
Collapse
Affiliation(s)
- Angela Re
- Laboratory of Translational Genomics, Centre for Integrative Biology, University of Trento, Polo Scientifico e Tecnologico Fabio Ferrari, Trento, Italy
- * E-mail: (AR); (LW); (AQ)
| | - Levi Waldron
- City University of New York Graduate School of Public Health and Health Policy, New York, New York, United States of America
- * E-mail: (AR); (LW); (AQ)
| | - Alessandro Quattrone
- Laboratory of Translational Genomics, Centre for Integrative Biology, University of Trento, Polo Scientifico e Tecnologico Fabio Ferrari, Trento, Italy
- * E-mail: (AR); (LW); (AQ)
| |
Collapse
|
7
|
Timms JF, Hale OJ, Cramer R. Advances in mass spectrometry-based cancer research and analysis: from cancer proteomics to clinical diagnostics. Expert Rev Proteomics 2016; 13:593-607. [DOI: 10.1080/14789450.2016.1182431] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
8
|
Jungblut P, Thiede B, Schlüter H. Towards deciphering proteomes via the proteoform, protein speciation, moonlighting and protein code concepts. J Proteomics 2016; 134:1-4. [DOI: 10.1016/j.jprot.2016.01.012] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|