1
|
Richardson MO, Eddy SR. ORFeus: a computational method to detect programmed ribosomal frameshifts and other non-canonical translation events. BMC Bioinformatics 2023; 24:471. [PMID: 38093195 PMCID: PMC10720069 DOI: 10.1186/s12859-023-05602-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 12/05/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND In canonical protein translation, ribosomes initiate translation at a specific start codon, maintain a single reading frame throughout elongation, and terminate at the first in-frame stop codon. However, ribosomal behavior can deviate at each of these steps, sometimes in a programmed manner. Certain mRNAs contain sequence and structural elements that cause ribosomes to begin translation at alternative start codons, shift reading frame, read through stop codons, or reinitiate on the same mRNA. These processes represent important translational control mechanisms that can allow an mRNA to encode multiple functional protein products or regulate protein expression. The prevalence of these events remains uncertain, due to the difficulty of systematic detection. RESULTS We have developed a computational model to infer non-canonical translation events from ribosome profiling data. CONCLUSION ORFeus identifies known examples of alternative open reading frames and recoding events across different organisms and enables transcriptome-wide searches for novel events.
Collapse
Affiliation(s)
- Mary O Richardson
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Sean R Eddy
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
2
|
Zhang B, Bassani-Sternberg M. Current perspectives on mass spectrometry-based immunopeptidomics: the computational angle to tumor antigen discovery. J Immunother Cancer 2023; 11:e007073. [PMID: 37899131 PMCID: PMC10619091 DOI: 10.1136/jitc-2023-007073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/21/2023] [Indexed: 10/31/2023] Open
Abstract
Identification of tumor antigens presented by the human leucocyte antigen (HLA) molecules is essential for the design of effective and safe cancer immunotherapies that rely on T cell recognition and killing of tumor cells. Mass spectrometry (MS)-based immunopeptidomics enables high-throughput, direct identification of HLA-bound peptides from a variety of cell lines, tumor tissues, and healthy tissues. It involves immunoaffinity purification of HLA complexes followed by MS profiling of the extracted peptides using data-dependent acquisition, data-independent acquisition, or targeted approaches. By incorporating DNA, RNA, and ribosome sequencing data into immunopeptidomics data analysis, the proteogenomic approach provides a powerful means for identifying tumor antigens encoded within the canonical open reading frames of annotated coding genes and non-canonical tumor antigens derived from presumably non-coding regions of our genome. We discuss emerging computational challenges in immunopeptidomics data analysis and tumor antigen identification, highlighting key considerations in the proteogenomics-based approach, including accurate DNA, RNA and ribosomal sequencing data analysis, careful incorporation of predicted novel protein sequences into reference protein database, special quality control in MS data analysis due to the expanded and heterogeneous search space, cancer-specificity determination, and immunogenicity prediction. The advancements in technology and computation is continually enabling us to identify tumor antigens with higher sensitivity and accuracy, paving the way toward the development of more effective cancer immunotherapies.
Collapse
Affiliation(s)
- Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| |
Collapse
|
3
|
Bogaert A, Fijalkowska D, Staes A, Van de Steene T, Vuylsteke M, Stadler C, Eyckerman S, Spirohn K, Hao T, Calderwood MA, Gevaert K. N-terminal proteoforms may engage in different protein complexes. Life Sci Alliance 2023; 6:e202301972. [PMID: 37316325 PMCID: PMC10267514 DOI: 10.26508/lsa.202301972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 05/26/2023] [Accepted: 05/30/2023] [Indexed: 06/16/2023] Open
Abstract
Alternative translation initiation and alternative splicing may give rise to N-terminal proteoforms, proteins that differ at their N-terminus compared with their canonical counterparts. Such proteoforms can have altered localizations, stabilities, and functions. Although proteoforms generated from splice variants can be engaged in different protein complexes, it remained to be studied to what extent this applies to N-terminal proteoforms. To address this, we mapped the interactomes of several pairs of N-terminal proteoforms and their canonical counterparts. First, we generated a catalogue of N-terminal proteoforms found in the HEK293T cellular cytosol from which 22 pairs were selected for interactome profiling. In addition, we provide evidence for the expression of several N-terminal proteoforms, identified in our catalogue, across different human tissues, as well as tissue-specific expression, highlighting their biological relevance. Protein-protein interaction profiling revealed that the overlap of the interactomes for both proteoforms is generally high, showing their functional relation. We also showed that N-terminal proteoforms can be engaged in new interactions and/or lose several interactions compared with their canonical counterparts, thus further expanding the functional diversity of proteomes.
Collapse
Affiliation(s)
- Annelies Bogaert
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Daria Fijalkowska
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - An Staes
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Tessa Van de Steene
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | | | - Charlotte Stadler
- Department of Protein Science, KTH Royal Institute of Technology and Science for Life Laboratories, Stockholm, Sweden
| | - Sven Eyckerman
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Kerstin Spirohn
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Tong Hao
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Michael A Calderwood
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Kris Gevaert
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| |
Collapse
|
4
|
Fedorova AD, Tierney JA, Michel AM, Baranov PV. RiboGalaxy: A Galaxy-based Web Platform for Ribosome Profiling Data Processing – 2023 Update. J Mol Biol 2023. [DOI: 10.1016/j.jmb.2023.168043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
|
5
|
Cormican JA, Soh WT, Mishto M, Liepe J. iBench: A ground truth approach for advanced validation of mass spectrometry identification method. Proteomics 2023; 23:e2200271. [PMID: 36189881 PMCID: PMC10078205 DOI: 10.1002/pmic.202200271] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 09/23/2022] [Accepted: 09/28/2022] [Indexed: 01/19/2023]
Abstract
The discovery of many noncanonical peptides detectable with sensitive mass spectrometry inside, outside, and on cells shepherded the development of novel methods for their identification, often not supported by a systematic benchmarking with other methods. We here propose iBench, a bioinformatic tool that can construct ground truth proteomics datasets and cognate databases, thereby generating a training court wherein methods, search engines, and proteomics strategies can be tested, and their performances estimated by the same tool. iBench can be coupled to the main database search engines, allows the selection of customized features of mass spectrometry spectra and peptides, provides standard benchmarking outputs, and is open source. The proof-of-concept application to tryptic proteome digestions, immunopeptidomes, and synthetic peptide libraries dissected the impact that noncanonical peptides could have on the identification of canonical peptides by Mascot search with rescoring via Percolator (Mascot+Percolator).
Collapse
Affiliation(s)
- John A. Cormican
- Max‐Planck‐Institute for Multidisciplinary Sciences (MPI‐NAT)GöttingenGermany
| | - Wai Tuck Soh
- Max‐Planck‐Institute for Multidisciplinary Sciences (MPI‐NAT)GöttingenGermany
| | - Michele Mishto
- Centre for Inflammation Biology and Cancer Immunology (CIBCI) & Peter Gorer Department of ImmunobiologyKing's College LondonLondonUK
- The Francis Crick InstituteLondonUK
| | - Juliane Liepe
- Max‐Planck‐Institute for Multidisciplinary Sciences (MPI‐NAT)GöttingenGermany
| |
Collapse
|
6
|
Bogaert A, Fijalkowska D, Staes A, Van de Steene T, Demol H, Gevaert K. Limited evidence for protein products of non-coding transcripts in the HEK293T cellular cytosol. Mol Cell Proteomics 2022; 21:100264. [PMID: 35788065 PMCID: PMC9396073 DOI: 10.1016/j.mcpro.2022.100264] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 06/22/2022] [Accepted: 06/30/2022] [Indexed: 10/25/2022] Open
Abstract
Ribosome profiling has revealed translation outside of canonical coding sequences (CDSs) including translation of short upstream ORFs, long non-coding RNAs, overlapping ORFs, ORFs in UTRs or ORFs in alternative reading frames. Studies combining mass spectrometry, ribosome profiling and CRISPR-based screens showed that hundreds of ORFs derived from non-coding transcripts produce (micro)proteins, while other studies failed to find evidence for such types of non-canonical translation products. Here, we attempted to discover translation products from non-coding regions by strongly reducing the complexity of the sample prior to mass spectrometric analysis. We used an extended database as the search space and applied stringent filtering of the identified peptides to find evidence for novel translation events. We show that, theoretically our strategy facilitates the detection of translation events of transcripts from non-coding regions, but experimentally only find 19 peptides that might originate from such translation events. Finally, Virotrap based interactome analysis of two N-terminal proteoforms originating from non-coding regions finally showed the functional potential of these novel proteins.
Collapse
Affiliation(s)
- Annelies Bogaert
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium
| | - Daria Fijalkowska
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium
| | - An Staes
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium
| | - Tessa Van de Steene
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium
| | - Hans Demol
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium
| | - Kris Gevaert
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium.
| |
Collapse
|
7
|
Tay AP, Hamey JJ, Martyn GE, Wilson LOW, Wilkins MR. Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing. J Proteome Res 2022; 21:1628-1639. [PMID: 35612954 DOI: 10.1021/acs.jproteome.1c00968] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Alternative splicing can lead to distinct protein isoforms. These can have different functions in specific cells and tissues or in different developmental stages. In this study, we explored whether transcripts assembled from long read, nanopore-based, direct RNA-sequencing (RNA-seq) could improve the identification of protein isoforms in human K562 cells. By comparing with Illumina-based short read RNA-seq, we showed that a large proportion of Ensembl transcripts (5949/14,326) and genes expressing alternatively spliced transcripts (486/2981) identified with long direct reads were missed by short paired-end reads. By co-analyzing proteomic and transcriptomic data, we also showed that some peptides (826/35,976), proteins (262/3215), and protein isoforms arising from distinct transcript variants (574/1212) identified with isoform-specific peptides via custom long-read-based databases were missed in Illumina-derived databases. Finally, we generated unequivocal peptide evidence for a set of protein isoforms and showed that long read, direct RNA-seq allows the discovery of novel protein isoforms not already in reference databases or custom databases built from short read RNA-seq data. Our analysis highlights the benefits of long read RNA-seq data in the generation of reference databases to increase tandem mass spectrometry (MS/MS) identification of protein isoforms.
Collapse
Affiliation(s)
- Aidan P Tay
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia.,Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Sydney, New South Wales 2113, Australia.,Applied Biosciences, Macquarie University, Sydney, New South Wales 2109, Australia
| | - Joshua J Hamey
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Gabriella E Martyn
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Laurence O W Wilson
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Sydney, New South Wales 2113, Australia.,Applied Biosciences, Macquarie University, Sydney, New South Wales 2109, Australia
| | - Marc R Wilkins
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia
| |
Collapse
|
8
|
Fijalkowski I, Willems P, Jonckheere V, Simoens L, Van Damme P. Hidden in plain sight: challenges in proteomics detection of small ORF-encoded polypeptides. MICROLIFE 2022; 3:uqac005. [PMID: 37223358 PMCID: PMC10117744 DOI: 10.1093/femsml/uqac005] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 04/18/2022] [Accepted: 04/29/2022] [Indexed: 05/25/2023]
Abstract
Genomic studies of bacteria have long pointed toward widespread prevalence of small open reading frames (sORFs) encoding for short proteins, <100 amino acids in length. Despite the mounting genomic evidence of their robust expression, relatively little progress has been made in their mass spectrometry-based detection and various blanket statements have been used to explain this observed discrepancy. In this study, we provide a large-scale riboproteogenomics investigation of the challenging nature of proteomic detection of such small proteins as informed by conditional translation data. A panel of physiochemical properties alongside recently developed mass spectrometry detectability metrics was interrogated to provide a comprehensive evidence-based assessment of sORF-encoded polypeptide (SEP) detectability. Moreover, a large-scale proteomics and translatomics compendium of proteins produced by Salmonella Typhimurium (S. Typhimurium), a model human pathogen, across a panel of growth conditions is presented and used in support of our in silico SEP detectability analysis. This integrative approach is used to provide a data-driven census of small proteins expressed by S. Typhimurium across growth phases and infection-relevant conditions. Taken together, our study pinpoints current limitations in proteomics-based detection of novel small proteins currently missing from bacterial genome annotations.
Collapse
Affiliation(s)
- Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Patrick Willems
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Veronique Jonckheere
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Laure Simoens
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
9
|
Cope AL, Anderson F, Favate J, Jackson M, Mok A, Kurowska A, Liu J, MacKenzie E, Shivakumar V, Tilton P, Winterbourne SM, Xue S, Kavoussanakis K, Lareau LF, Shah P, Wallace EWJ. riboviz 2: a flexible and robust ribosome profiling data analysis and visualization workflow. Bioinformatics 2022; 38:2358-2360. [PMID: 35157051 PMCID: PMC9004635 DOI: 10.1093/bioinformatics/btac093] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 09/28/2021] [Accepted: 02/09/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Ribosome profiling, or Ribo-seq, is the state-of-the-art method for quantifying protein synthesis in living cells. Computational analysis of Ribo-seq data remains challenging due to the complexity of the procedure, as well as variations introduced for specific organisms or specialized analyses. RESULTS We present riboviz 2, an updated riboviz package, for the comprehensive transcript-centric analysis and visualization of Ribo-seq data. riboviz 2 includes an analysis workflow built on the Nextflow workflow management system for end-to-end processing of Ribo-seq data. riboviz 2 has been extensively tested on diverse species and library preparation strategies, including multiplexed samples. riboviz 2 is flexible and uses open, documented file formats, allowing users to integrate new analyses with the pipeline. AVAILABILITY AND IMPLEMENTATION riboviz 2 is freely available at github.com/riboviz/riboviz.
Collapse
Affiliation(s)
- Alexander L Cope
- Department of Genetics, Rutgers University, Piscataway, NJ 08854-8082, USA
| | - Felicity Anderson
- Institute for Cell Biology and SynthSys, School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3BF, UK
| | - John Favate
- Department of Genetics, Rutgers University, Piscataway, NJ 08854-8082, USA
| | | | - Amanda Mok
- Center for Computational Biology, University of California, Berkeley, CA 94720, USA
| | - Anna Kurowska
- Institute for Cell Biology and SynthSys, School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Junchen Liu
- EPCC, The University of Edinburgh, Edinburgh EH8 9BT, UK
| | - Emma MacKenzie
- Institute for Cell Biology and SynthSys, School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Vikram Shivakumar
- Department of Bioengineering, University of California, Berkeley, CA 94720, USA
| | - Peter Tilton
- Department of Genetics, Rutgers University, Piscataway, NJ 08854-8082, USA
| | - Sophie M Winterbourne
- Institute for Cell Biology and SynthSys, School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Siyin Xue
- Institute for Cell Biology and SynthSys, School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3BF, UK
| | | | - Liana F Lareau
- Center for Computational Biology, University of California, Berkeley, CA 94720, USA
- Department of Bioengineering, University of California, Berkeley, CA 94720, USA
| | - Premal Shah
- Department of Genetics, Rutgers University, Piscataway, NJ 08854-8082, USA
| | - Edward W J Wallace
- Institute for Cell Biology and SynthSys, School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3BF, UK
| |
Collapse
|
10
|
Identification and characterisation of sPEPs in Cryptococcus neoformans. Fungal Genet Biol 2022; 160:103688. [PMID: 35339703 DOI: 10.1016/j.fgb.2022.103688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 03/02/2022] [Accepted: 03/21/2022] [Indexed: 11/24/2022]
Abstract
Short open reading frame (sORF)-encoded peptides (sPEPs) have been found across a wide range of genomic locations in a variety of species. To date, their identification, validation, and characterisation in the human fungal pathogen Cryptococcus neoformans has been limited due to a lack of standardised protocols. We have developed an enrichment process that enables sPEP detection within a protein sample from this polysaccharide-encapsulated yeast, and implemented proteogenomics to provide insights into the validity of predicted and hypothetical sORFs annotated in the C. neoformans genome. Novel sORFs were discovered within the 5' and 3' UTRs of known transcripts as well as in "non-coding" RNAs. One novel candidate, dubbed NPB1, that resided in an RNA annotated as "non-coding", was chosen for characterisation. Through the creation of both specific point mutations and a full deletion allele, the function of the new sPEP, Npb1, was shown to resemble that of the bacterial trans-translation protein SmpB.
Collapse
|
11
|
Gelhausen R, Müller T, Svensson SL, Alkhnbashi OS, Sharma CM, Eggenhofer F, Backofen R. RiboReport - benchmarking tools for ribosome profiling-based identification of open reading frames in bacteria. Brief Bioinform 2022; 23:bbab549. [PMID: 35037022 PMCID: PMC8921622 DOI: 10.1093/bib/bbab549] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 11/22/2021] [Accepted: 11/29/2021] [Indexed: 11/19/2022] Open
Abstract
Small proteins encoded by short open reading frames (ORFs) with 50 codons or fewer are emerging as an important class of cellular macromolecules in diverse organisms. However, they often evade detection by proteomics or in silico methods. Ribosome profiling (Ribo-seq) has revealed widespread translation in genomic regions previously thought to be non-coding, driving the development of ORF detection tools using Ribo-seq data. However, only a handful of tools have been designed for bacteria, and these have not yet been systematically compared. Here, we aimed to identify tools that use Ribo-seq data to correctly determine the translational status of annotated bacterial ORFs and also discover novel translated regions with high sensitivity. To this end, we generated a large set of annotated ORFs from four diverse bacterial organisms, manually labeled for their translation status based on Ribo-seq data, which are available for future benchmarking studies. This set was used to investigate the predictive performance of seven Ribo-seq-based ORF detection tools (REPARATION_blast, DeepRibo, Ribo-TISH, PRICE, smORFer, ribotricer and SPECtre), as well as IRSOM, which uses coding potential and RNA-seq coverage only. DeepRibo and REPARATION_blast robustly predicted translated ORFs, including sORFs, with no significant difference for ORFs in close proximity to other genes versus stand-alone genes. However, no tool predicted a set of novel, experimentally verified sORFs with high sensitivity. Start codon predictions with smORFer show the value of initiation site profiling data to further improve the sensitivity of ORF prediction tools in bacteria. Overall, we find that bacterial tools perform well for sORF detection, although there is potential for improving their performance, applicability, usability and reproducibility.
Collapse
Affiliation(s)
- Rick Gelhausen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Teresa Müller
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Sarah L Svensson
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Str. 2 / D15, 97080, Würzburg, Germany
| | - Omer S Alkhnbashi
- Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Saudi Arabia
- SDAIA-KFUPM Joint Research Center for Artificial Intelligence (JRC-AI), King Fahd University of Petroleum and Minerals, Saudi Arabia
| | - Cynthia M Sharma
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Str. 2 / D15, 97080, Würzburg, Germany
| | - Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schänzlestr. 18, 79104, State, Germany
| |
Collapse
|
12
|
Kute PM, Soukarieh O, Tjeldnes H, Trégouët DA, Valen E. Small Open Reading Frames, How to Find Them and Determine Their Function. Front Genet 2022; 12:796060. [PMID: 35154250 PMCID: PMC8831751 DOI: 10.3389/fgene.2021.796060] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 12/30/2021] [Indexed: 12/12/2022] Open
Abstract
Advances in genomics and molecular biology have revealed an abundance of small open reading frames (sORFs) across all types of transcripts. While these sORFs are often assumed to be non-functional, many have been implicated in physiological functions and a significant number of sORFs have been described in human diseases. Thus, sORFs may represent a hidden repository of functional elements that could serve as therapeutic targets. Unlike protein-coding genes, it is not necessarily the encoded peptide of an sORF that enacts its function, sometimes simply the act of translating an sORF might have a regulatory role. Indeed, the most studied sORFs are located in the 5′UTRs of coding transcripts and can have a regulatory impact on the translation of the downstream protein-coding sequence. However, sORFs have also been abundantly identified in non-coding RNAs including lncRNAs, circular RNAs and ribosomal RNAs suggesting that sORFs may be diverse in function. Of the many different experimental methods used to discover sORFs, the most commonly used are ribosome profiling and mass spectrometry. These can confirm interactions between transcripts and ribosomes and the production of a peptide, respectively. Extensions to ribosome profiling, which also capture scanning ribosomes, have further made it possible to see how sORFs impact the translation initiation of mRNAs. While high-throughput techniques have made the identification of sORFs less difficult, defining their function, if any, is typically more challenging. Together, the abundance and potential function of many of these sORFs argues for the necessity of including sORFs in gene annotations and systematically characterizing these to understand their potential functional roles. In this review, we will focus on the high-throughput methods used in the detection and characterization of sORFs and discuss techniques for validation and functional characterization.
Collapse
Affiliation(s)
- Preeti Madhav Kute
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
| | - Omar Soukarieh
- Department of Molecular Epidemiology Of Vascular and Brain Disorders, INSERM, BPH, U1219, University of Bordeaux, Bordeaux, France
| | - Håkon Tjeldnes
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - David-Alexandre Trégouët
- Department of Molecular Epidemiology Of Vascular and Brain Disorders, INSERM, BPH, U1219, University of Bordeaux, Bordeaux, France
| | - Eivind Valen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- *Correspondence: Eivind Valen,
| |
Collapse
|
13
|
Willems P, Ndah E, Jonckheere V, Van Breusegem F, Van Damme P. To New Beginnings: Riboproteogenomics Discovery of N-Terminal Proteoforms in Arabidopsis Thaliana. FRONTIERS IN PLANT SCIENCE 2022; 12:778804. [PMID: 35069635 PMCID: PMC8770321 DOI: 10.3389/fpls.2021.778804] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 11/18/2021] [Indexed: 06/14/2023]
Abstract
Alternative translation initiation is a widespread event in biology that can shape multiple protein forms or proteoforms from a single gene. However, the respective contribution of alternative translation to protein complexity remains largely enigmatic. By complementary ribosome profiling and N-terminal proteomics (i.e., riboproteogenomics), we provide clear-cut evidence for ~90 N-terminal proteoform pairs shaped by (alternative) translation initiation in Arabidopsis thaliana. Next to several cases additionally confirmed by directed mutagenesis, identified alternative protein N-termini follow the enzymatic rules of co-translational N-terminal protein acetylation and initiator methionine removal. In contrast to other eukaryotic models, N-terminal acetylation in plants cannot generally be considered as a proxy of translation initiation because of its posttranslational occurrence on mature proteolytic neo-termini (N-termini) localized in the chloroplast stroma. Quantification of N-terminal acetylation revealed differing co- vs. posttranslational N-terminal acetylation patterns. Intriguingly, our data additionally hints to alternative translation initiation serving as a common mechanism to supply protein copies in multiple cellular compartments, as alternative translation sites are often in close proximity to cleavage sites of N-terminal transit sequences of nuclear-encoded chloroplastic and mitochondrial proteins. Overall, riboproteogenomics screening enables the identification of (differential localized) N-terminal proteoforms raised upon alternative translation.
Collapse
Affiliation(s)
- Patrick Willems
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Vlaams Instituut voor Biotechnologie (VIB)-Center for Plant Systems Biology, Ghent, Belgium
| | - Elvis Ndah
- integrative Riboproteogenomics, Interactomics and Proteomics Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| | - Veronique Jonckheere
- integrative Riboproteogenomics, Interactomics and Proteomics Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| | - Frank Van Breusegem
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Vlaams Instituut voor Biotechnologie (VIB)-Center for Plant Systems Biology, Ghent, Belgium
| | - Petra Van Damme
- integrative Riboproteogenomics, Interactomics and Proteomics Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| |
Collapse
|
14
|
Vitorino R, Choudhury M, Guedes S, Ferreira R, Thongboonkerd V, Sharma L, Amado F, Srivastava S. Peptidomics and proteogenomics: background, challenges and future needs. Expert Rev Proteomics 2021; 18:643-659. [PMID: 34517741 DOI: 10.1080/14789450.2021.1980388] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
INTRODUCTION With available genomic data and related information, it is becoming possible to better highlight mutations or genomic alterations associated with a particular disease or disorder. The advent of high-throughput sequencing technologies has greatly advanced diagnostics, prognostics, and drug development. AREAS COVERED Peptidomics and proteogenomics are the two post-genomic technologies that enable the simultaneous study of peptides and proteins/transcripts/genes. Both technologies add a remarkably large amount of data to the pool of information on various peptides associated with gene mutations or genome remodeling. Literature search was performed in the PubMed database and is up to date. EXPERT OPINION This article lists various techniques used for peptidomic and proteogenomic analyses. It also explains various bioinformatics workflows developed to understand differentially expressed peptides/proteins and their role in disease pathogenesis. Their role in deciphering disease pathways, cancer research, and biomarker discovery using biofluids is highlighted. Finally, the challenges and future requirements to overcome the current limitations for their effective clinical use are also discussed.
Collapse
Affiliation(s)
- Rui Vitorino
- Faculdade de Medicina da Universidade do Porto, Porto, Portugal.,iBiMED, Department of Medical Sciences, University of Aveiro, Aveiro, Portugal.,Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Manisha Choudhury
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Powai, India
| | - Sofia Guedes
- Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Rita Ferreira
- Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Visith Thongboonkerd
- Medical Proteomics Unit, Office for Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | | | - Francisco Amado
- Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Sanjeeva Srivastava
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Powai, India
| |
Collapse
|
15
|
Karimi MR, Karimi AH, Abolmaali S, Sadeghi M, Schmitz U. Prospects and challenges of cancer systems medicine: from genes to disease networks. Brief Bioinform 2021; 23:6361045. [PMID: 34471925 PMCID: PMC8769701 DOI: 10.1093/bib/bbab343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 12/20/2022] Open
Abstract
It is becoming evident that holistic perspectives toward cancer are crucial in deciphering the overwhelming complexity of tumors. Single-layer analysis of genome-wide data has greatly contributed to our understanding of cellular systems and their perturbations. However, fundamental gaps in our knowledge persist and hamper the design of effective interventions. It is becoming more apparent than ever, that cancer should not only be viewed as a disease of the genome but as a disease of the cellular system. Integrative multilayer approaches are emerging as vigorous assets in our endeavors to achieve systemic views on cancer biology. Herein, we provide a comprehensive review of the approaches, methods and technologies that can serve to achieve systemic perspectives of cancer. We start with genome-wide single-layer approaches of omics analyses of cellular systems and move on to multilayer integrative approaches in which in-depth descriptions of proteogenomics and network-based data analysis are provided. Proteogenomics is a remarkable example of how the integration of multiple levels of information can reduce our blind spots and increase the accuracy and reliability of our interpretations and network-based data analysis is a major approach for data interpretation and a robust scaffold for data integration and modeling. Overall, this review aims to increase cross-field awareness of the approaches and challenges regarding the omics-based study of cancer and to facilitate the necessary shift toward holistic approaches.
Collapse
Affiliation(s)
| | | | | | - Mehdi Sadeghi
- Department of Cell & Molecular Biology, Semnan University, Semnan, Iran
| | - Ulf Schmitz
- Department of Molecular & Cell Biology, James Cook University, Townsville, QLD 4811, Australia
| |
Collapse
|
16
|
Gelhausen R, Svensson SL, Froschauer K, Heyl F, Hadjeras L, Sharma CM, Eggenhofer F, Backofen R. HRIBO: high-throughput analysis of bacterial ribosome profiling data. Bioinformatics 2021; 37:2061-2063. [PMID: 33175953 PMCID: PMC8337001 DOI: 10.1093/bioinformatics/btaa959] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Revised: 09/25/2020] [Accepted: 11/03/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Ribosome profiling (Ribo-seq) is a powerful approach based on deep sequencing of cDNA libraries generated from ribosome-protected RNA fragments to explore the translatome of a cell, and is especially useful for the detection of small proteins (50-100 amino acids) that are recalcitrant to many standard biochemical and in silico approaches. While pipelines are available to analyze Ribo-seq data, none are designed explicitly for the automatic processing and analysis of data from bacteria, nor are they focused on the discovery of unannotated open reading frames (ORFs). RESULTS We present HRIBO (High-throughput annotation by Ribo-seq), a workflow to enable reproducible and high-throughput analysis of bacterial Ribo-seq data. The workflow performs all required pre-processing and quality control steps. Importantly, HRIBO outputs annotation-independent ORF predictions based on two complementary bacteria-focused tools, and integrates them with additional feature information and expression values. This facilitates the rapid and high-confidence discovery of novel ORFs and their prioritization for functional characterization. AVAILABILITY AND IMPLEMENTATION HRIBO is a free and open source project available under the GPL-3 license at: https://github.com/RickGelhausen/HRIBO.
Collapse
Affiliation(s)
- Rick Gelhausen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg, Germany
| | - Sarah L Svensson
- Chair of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, 97080 Würzburg, Germany
| | - Kathrin Froschauer
- Chair of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, 97080 Würzburg, Germany
| | - Florian Heyl
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg, Germany
| | - Lydia Hadjeras
- Chair of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, 97080 Würzburg, Germany
| | - Cynthia M Sharma
- Chair of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, 97080 Würzburg, Germany
| | - Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg, Germany.,Signalling Research Centres BIOSS and CIBSS, University of Freiburg, 79104 Freiburg, Germany
| |
Collapse
|
17
|
Kiniry SJ, Judge CE, Michel AM, Baranov PV. Trips-Viz: an environment for the analysis of public and user-generated ribosome profiling data. Nucleic Acids Res 2021; 49:W662-W670. [PMID: 33950201 PMCID: PMC8262740 DOI: 10.1093/nar/gkab323] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 04/11/2021] [Accepted: 04/20/2021] [Indexed: 02/07/2023] Open
Abstract
Trips-Viz (https://trips.ucc.ie/) is an interactive platform for the analysis and visualization of ribosome profiling (Ribo-Seq) and shotgun RNA sequencing (RNA-seq) data. This includes publicly available and user generated data, hence Trips-Viz can be classified as a database and as a server. As a database it provides access to many processed Ribo-Seq and RNA-seq data aligned to reference transcriptomes which has been expanded considerably since its inception. Here, we focus on the server functionality of Trips-viz which also has been greatly improved. Trips-viz now enables visualisation of proteomics data from a large number of processed mass spectrometry datasets. It can be used to support translation inferred from Ribo-Seq data. Users are now able to upload a custom reference transcriptome as well as data types other than Ribo-Seq/RNA-Seq. Incorporating custom data has been streamlined with RiboGalaxy (https://ribogalaxy.ucc.ie/) integration. The other new functionality is the rapid detection of translated open reading frames (ORFs) through a simple easy to use interface. The analysis of differential expression has been also improved via integration of DESeq2 and Anota2seq in addition to a number of other improvements of existing Trips-viz features.
Collapse
Affiliation(s)
- Stephen J Kiniry
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Ciara E Judge
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Audrey M Michel
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
- Ribomaps Ltd, Western Gateway Bld, Western Rd, Cork, Ireland
| | - Pavel V Baranov
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, RAS, Moscow, Russia
| |
Collapse
|
18
|
Tjeldnes H, Labun K, Torres Cleuren Y, Chyżyńska K, Świrski M, Valen E. ORFik: a comprehensive R toolkit for the analysis of translation. BMC Bioinformatics 2021; 22:336. [PMID: 34147079 PMCID: PMC8214792 DOI: 10.1186/s12859-021-04254-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 06/09/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND With the rapid growth in the use of high-throughput methods for characterizing translation and the continued expansion of multi-omics, there is a need for back-end functions and streamlined tools for processing, analyzing, and characterizing data produced by these assays. RESULTS Here, we introduce ORFik, a user-friendly R/Bioconductor API and toolbox for studying translation and its regulation. It extends GenomicRanges from the genome to the transcriptome and implements a framework that integrates data from several sources. ORFik streamlines the steps to process, analyze, and visualize the different steps of translation with a particular focus on initiation and elongation. It accepts high-throughput sequencing data from ribosome profiling to quantify ribosome elongation or RCP-seq/TCP-seq to also quantify ribosome scanning. In addition, ORFik can use CAGE data to accurately determine 5'UTRs and RNA-seq for determining translation relative to RNA abundance. ORFik supports and calculates over 30 different translation-related features and metrics from the literature and can annotate translated regions such as proteins or upstream open reading frames (uORFs). As a use-case, we demonstrate using ORFik to rapidly annotate the dynamics of 5' UTRs across different tissues, detect their uORFs, and characterize their scanning and translation in the downstream protein-coding regions. CONCLUSION In summary, ORFik introduces hundreds of tested, documented and optimized methods. ORFik is designed to be easily customizable, enabling users to create complete workflows from raw data to publication-ready figures for several types of sequencing data. Finally, by improving speed and scope of many core Bioconductor functions, ORFik offers enhancement benefiting the entire Bioconductor environment. AVAILABILITY http://bioconductor.org/packages/ORFik .
Collapse
Affiliation(s)
- Håkon Tjeldnes
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Kornel Labun
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Yamila Torres Cleuren
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.,Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
| | - Katarzyna Chyżyńska
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Michał Świrski
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, Warsaw, Poland
| | - Eivind Valen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway. .,Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway.
| |
Collapse
|
19
|
Tsang O, Wong JWH. Proteogenomic interrogation of cancer cell lines: an overview of the field. Expert Rev Proteomics 2021; 18:221-232. [PMID: 33877947 DOI: 10.1080/14789450.2021.1914594] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Introduction: Cancer cell lines (CCLs) have been a major resource for cancer research. Over the past couple of decades, they have been instrumental in omic profiling method development and as model systems to generate new knowledge in cell and cancer biology. More recently, with the increasing amount of genomic, transcriptomic and proteomic data being generated in hundreds of CCLs, there is growing potential for integrative proteogenomic data analyses to be performed.Areas covered: In this review, we first describe the most commonly used proteome profiling methods in CCLs. We then discuss how these proteomics data can be integrated with genomics data for proteogenomics analyses. Finally, we highlight some of the recent biological discoveries that have arisen from proteogenomics analyses of CCLs.Expert opinion: Protegeonomics analyses of CCLs have so far enabled the discovery of novel proteins and proteoforms. It has also improved our understanding of biological processes including post-transcriptional regulation of protein abundance and the presentation of antigens by major histocompatibility complex alleles. With proteomics data to be generated in hundreds to thousands of CCLs in coming years, there will be further potential for large-scale proteogenomics analyses and data integration with the phenotypically well-characterized CCLs.
Collapse
Affiliation(s)
- Olson Tsang
- Centre for PanorOmic Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR
| | - Jason W H Wong
- Centre for PanorOmic Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR.,School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR
| |
Collapse
|
20
|
Verbruggen S, Gessulat S, Gabriels R, Matsaroki A, Van de Voorde H, Kuster B, Degroeve S, Martens L, Van Criekinge W, Wilhelm M, Menschaert G. Spectral Prediction Features as a Solution for the Search Space Size Problem in Proteogenomics. Mol Cell Proteomics 2021; 20:100076. [PMID: 33823297 PMCID: PMC8214147 DOI: 10.1016/j.mcpro.2021.100076] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 03/04/2021] [Accepted: 03/25/2021] [Indexed: 11/17/2022] Open
Abstract
Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting. First proteogenomics with PSM rescoring using machine learning–predicted spectra Demonstrated on both ribosome profiling and nanopore RNA-Seq–derived databases Rescoring leads to elevated stringency and increased identification rates Rescoring compensates for the search space size issues in proteogenomics
Collapse
Affiliation(s)
- Steven Verbruggen
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; OHMX.bio, Ghent, Belgium
| | - Siegfried Gessulat
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Ralf Gabriels
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | | | | | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Sven Degroeve
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Lennart Martens
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Wim Van Criekinge
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Gerben Menschaert
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; OHMX.bio, Ghent, Belgium.
| |
Collapse
|
21
|
Vitorino R, Guedes S, Amado F, Santos M, Akimitsu N. The role of micropeptides in biology. Cell Mol Life Sci 2021; 78:3285-3298. [PMID: 33507325 PMCID: PMC11073438 DOI: 10.1007/s00018-020-03740-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 12/01/2020] [Accepted: 12/11/2020] [Indexed: 12/11/2022]
Abstract
Micropeptides are small polypeptides coded by small open-reading frames. Progress in computational biology and the analyses of large-scale transcriptomes and proteomes have revealed that mammalian genomes produce a large number of transcripts encoding micropeptides. Many of these have been previously annotated as long noncoding RNAs. The role of micropeptides in cellular homeostasis maintenance has been demonstrated. This review discusses different types of micropeptides as well as methods to identify them, such as computational approaches, ribosome profiling, and mass spectrometry.
Collapse
Affiliation(s)
- Rui Vitorino
- Departamento de Cirurgia E Fisiologia, Faculdade de Medicina da Universidade Do Porto, UnIC, Porto, Portugal.
- Department of Medical Sciences, iBiMED, University of Aveiro, Aveiro, Portugal.
| | - Sofia Guedes
- Departamento de Química, LAQV-REQUIMTE, Universidade de Aveiro, Aveiro, Portugal
- Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Francisco Amado
- Departamento de Química, LAQV-REQUIMTE, Universidade de Aveiro, Aveiro, Portugal
- Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Manuel Santos
- Department of Medical Sciences, iBiMED, University of Aveiro, Aveiro, Portugal
| | | |
Collapse
|
22
|
Fijalkowska D, Fijalkowski I, Willems P, Van Damme P. Bacterial riboproteogenomics: the era of N-terminal proteoform existence revealed. FEMS Microbiol Rev 2021; 44:418-431. [PMID: 32386204 DOI: 10.1093/femsre/fuaa013] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 05/07/2020] [Indexed: 12/17/2022] Open
Abstract
With the rapid increase in the number of sequenced prokaryotic genomes, relying on automated gene annotation became a necessity. Multiple lines of evidence, however, suggest that current bacterial genome annotations may contain inconsistencies and are incomplete, even for so-called well-annotated genomes. We here discuss underexplored sources of protein diversity and new methodologies for high-throughput genome reannotation. The expression of multiple molecular forms of proteins (proteoforms) from a single gene, particularly driven by alternative translation initiation, is gaining interest as a prominent contributor to bacterial protein diversity. In consequence, riboproteogenomic pipelines were proposed to comprehensively capture proteoform expression in prokaryotes by the complementary use of (positional) proteomics and the direct readout of translated genomic regions using ribosome profiling. To complement these discoveries, tailored strategies are required for the functional characterization of newly discovered bacterial proteoforms.
Collapse
Affiliation(s)
- Daria Fijalkowska
- Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, B-9000 Ghent, Belgium
| | - Igor Fijalkowski
- Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, B-9000 Ghent, Belgium
| | - Patrick Willems
- Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, B-9000 Ghent, Belgium
| | - Petra Van Damme
- Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, B-9000 Ghent, Belgium
| |
Collapse
|
23
|
Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage. mSystems 2020; 5:5/5/e00833-20. [PMID: 33109751 PMCID: PMC7593589 DOI: 10.1128/msystems.00833-20] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Delineation of open reading frames (ORFs) causes persistent inconsistencies in prokaryote genome annotation. We demonstrate that by advanced (re)analysis of omics data, a higher proteome coverage and sensitive detection of unannotated ORFs can be achieved, which can be exploited for conditional bacterial genome (re)annotation, which is especially relevant in view of annotating the wealth of sequenced prokaryotic genomes obtained in recent years. Prokaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribosome profiling (ribo-seq) and proteomic data for Salmonella enterica serovar Typhimurium to identify unannotated proteins or alternative protein forms. This data analysis encompasses the searching of cofragmenting peptides and postprocessing with extended peptide-to-spectrum quality features, including comparison to predicted fragment ion intensities. When this strategy is applied, an enhanced proteome depth is achieved, as well as greater confidence for unannotated peptide hits. We demonstrate the general applicability of our pipeline by reanalyzing public Deinococcus radiodurans data sets. Taken together, our results show that systematic reanalysis using available prokaryotic (proteome) data sets holds great promise to assist in experimentally based genome annotation. IMPORTANCE Delineation of open reading frames (ORFs) causes persistent inconsistencies in prokaryote genome annotation. We demonstrate that by advanced (re)analysis of omics data, a higher proteome coverage and sensitive detection of unannotated ORFs can be achieved, which can be exploited for conditional bacterial genome (re)annotation, which is especially relevant in view of annotating the wealth of sequenced prokaryotic genomes obtained in recent years.
Collapse
|
24
|
Zhou B, Yang H, Yang C, Bao YL, Yang SM, Liu J, Xiao YF. Translation of noncoding RNAs and cancer. Cancer Lett 2020; 497:89-99. [PMID: 33038492 DOI: 10.1016/j.canlet.2020.10.002] [Citation(s) in RCA: 89] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 09/30/2020] [Accepted: 10/01/2020] [Indexed: 02/07/2023]
Abstract
The human genome contains thousands of noncoding RNAs (ncRNAs), which are thought to lack open reading frames (ORFs) and cannot be translated. Some ncRNAs reportedly have important functions, including epigenetic regulation, chromatin remolding, protein modification, and RNA degradation, but the functions of most ncRNAs remain elusive. Through the application and development of ribosome profiling and sequencing technologies, an increasing number of studies have discovered the translation of ncRNAs. Although ncRNAs were initially defined as noncoding RNAs, a number of ncRNAs actually contain ORFs that are translated into peptides. Here, we summarize the available methods, tools, and databases for identifying and validating ncRNA-encoded peptides/proteins, and the recent findings regarding ncRNA-encoded small peptides/proteins in cancer are compiled and synthesized. Importantly, the role of ncRNA-encoding peptides/proteins has application prospects in cancer research, but some potential challenges remain unresolved. The aim of this review is to provide a theoretical basis that might promote the discovery of more peptides/proteins encoded by ncRNAs and aid the further development of novel diagnostic and prognostic cancer markers and therapeutic targets.
Collapse
Affiliation(s)
- Bo Zhou
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China
| | - Huan Yang
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China
| | - Chuan Yang
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China
| | - Yu-Lu Bao
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China
| | - Shi-Ming Yang
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China
| | - Jiao Liu
- Department of Endoscope, General Hospital of Northern Theater Command, Shenyang, 110016, Liaoning, China.
| | - Yu-Feng Xiao
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China.
| |
Collapse
|
25
|
Liu Q, Shvarts T, Sliz P, Gregory RI. RiboToolkit: an integrated platform for analysis and annotation of ribosome profiling data to decode mRNA translation at codon resolution. Nucleic Acids Res 2020; 48:W218-W229. [PMID: 32427338 PMCID: PMC7319539 DOI: 10.1093/nar/gkaa395] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 04/23/2020] [Accepted: 05/15/2020] [Indexed: 12/31/2022] Open
Abstract
Ribosome profiling (Ribo-seq) is a powerful technology for globally monitoring RNA translation; ranging from codon occupancy profiling, identification of actively translated open reading frames (ORFs), to the quantification of translational efficiency under various physiological or experimental conditions. However, analyzing and decoding translation information from Ribo-seq data is not trivial. Although there are many existing tools to analyze Ribo-seq data, most of these tools are designed for specific or limited functionalities and an easy-to-use integrated tool to analyze Ribo-seq data is lacking. Fortunately, the small size (26–34 nt) of ribosome protected fragments (RPFs) in Ribo-seq and the relatively small amount of sequencing data greatly facilitates the development of such a web platform, which is easy to manipulate for users with or without bioinformatic expertise. Thus, we developed RiboToolkit (http://rnabioinfor.tch.harvard.edu/RiboToolkit), a convenient, freely available, web-based service to centralize Ribo-seq data analyses, including data cleaning and quality evaluation, expression analysis based on RPFs, codon occupancy, translation efficiency analysis, differential translation analysis, functional annotation, translation metagene analysis, and identification of actively translated ORFs. Besides, easy-to-use web interfaces were developed to facilitate data analysis and intuitively visualize results. Thus, RiboToolkit will greatly facilitate the study of mRNA translation based on ribosome profiling.
Collapse
Affiliation(s)
- Qi Liu
- Stem Cell Program, Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA 02115, USA.,Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Tanya Shvarts
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02115, USA
| | - Piotr Sliz
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.,Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02115, USA
| | - Richard I Gregory
- Stem Cell Program, Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA 02115, USA.,Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.,Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA.,Harvard Initiative for RNA Medicine, Boston, MA 02115, USA.,Harvard Stem Cell Institute, Cambridge, MA 02138, USA
| |
Collapse
|
26
|
Lau E, Han Y, Williams DR, Thomas CT, Shrestha R, Wu JC, Lam MPY. Splice-Junction-Based Mapping of Alternative Isoforms in the Human Proteome. Cell Rep 2020; 29:3751-3765.e5. [PMID: 31825849 PMCID: PMC6961840 DOI: 10.1016/j.celrep.2019.11.026] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 09/24/2019] [Accepted: 11/06/2019] [Indexed: 12/18/2022] Open
Abstract
The protein-level translational status and function of many alternative splicing events remain poorly understood. We use an RNA sequencing (RNA-seq)-guided proteomics method to identify protein alternative splicing isoforms in the human proteome by constructing tissue-specific protein databases that prioritize transcript splice junction pairs with high translational potential. Using the custom databases to reanalyze ~80 million mass spectra in public proteomics datasets, we identify more than 1,500 noncanonical protein isoforms across 12 human tissues, including ~400 sequences undocumented on TrEMBL and RefSeq databases. We apply the method to original quantitative mass spectrometry experiments and observe widespread isoform regulation during human induced pluripotent stem cell cardiomyocyte differentiation. On a proteome scale, alternative isoform regions overlap frequently with disordered sequences and post-translational modification sites, suggesting that alternative splicing may regulate protein function through modulating intrinsically disordered regions. The described approach may help elucidate functional consequences of alternative splicing and expand the scope of proteomics investigations in various systems. The translation and function of many alternative splicing events await confirmation at the protein level. Lau et al. use an integrated proteotranscriptomics approach to identify non-canonical and undocumented isoforms from 12 organs in the human proteome. Alternative isoforms interfere with functional sequence features and are differentially regulated during iPSC cardiomyocyte differentiation.
Collapse
Affiliation(s)
- Edward Lau
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Yu Han
- Consortium for Fibrosis Research and Translation, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA; Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA
| | - Damon R Williams
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Cody T Thomas
- Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA
| | - Rajani Shrestha
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Joseph C Wu
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA; Department of Radiology, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Maggie P Y Lam
- Consortium for Fibrosis Research and Translation, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA; Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA.
| |
Collapse
|
27
|
Calviello L, Hirsekorn A, Ohler U. Quantification of translation uncovers the functions of the alternative transcriptome. Nat Struct Mol Biol 2020; 27:717-725. [DOI: 10.1038/s41594-020-0450-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 05/13/2020] [Indexed: 11/09/2022]
|
28
|
The Archaeal Proteome Project advances knowledge about archaeal cell biology through comprehensive proteomics. Nat Commun 2020; 11:3145. [PMID: 32561711 PMCID: PMC7305310 DOI: 10.1038/s41467-020-16784-7] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Accepted: 05/18/2020] [Indexed: 11/08/2022] Open
Abstract
While many aspects of archaeal cell biology remain relatively unexplored, systems biology approaches like mass spectrometry (MS) based proteomics offer an opportunity for rapid advances. Unfortunately, the enormous amount of MS data generated often remains incompletely analyzed due to a lack of sophisticated bioinformatic tools and field-specific biological expertise for data interpretation. Here we present the initiation of the Archaeal Proteome Project (ArcPP), a community-based effort to comprehensively analyze archaeal proteomes. Starting with the model archaeon Haloferax volcanii, we reanalyze MS datasets from various strains and culture conditions. Optimized peptide spectrum matching, with strict control of false discovery rates, facilitates identifying > 72% of the reference proteome, with a median protein sequence coverage of 51%. These analyses, together with expert knowledge in diverse aspects of cell biology, provide meaningful insights into processes such as N-terminal protein maturation, N-glycosylation, and metabolism. Altogether, ArcPP serves as an invaluable blueprint for comprehensive prokaryotic proteomics.
Collapse
|
29
|
McGowan T, Johnson JE, Kumar P, Sajulga R, Mehta S, Jagtap PD, Griffin TJ. Multi-omics Visualization Platform: An extensible Galaxy plug-in for multi-omics data visualization and exploration. Gigascience 2020; 9:giaa025. [PMID: 32236523 PMCID: PMC7102281 DOI: 10.1093/gigascience/giaa025] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 02/13/2020] [Accepted: 02/24/2020] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Proteogenomics integrates genomics, transcriptomics, and mass spectrometry (MS)-based proteomics data to identify novel protein sequences arising from gene and transcript sequence variants. Proteogenomic data analysis requires integration of disparate 'omic software tools, as well as customized tools to view and interpret results. The flexible Galaxy platform has proven valuable for proteogenomic data analysis. Here, we describe a novel Multi-omics Visualization Platform (MVP) for organizing, visualizing, and exploring proteogenomic results, adding a critically needed tool for data exploration and interpretation. FINDINGS MVP is built as an HTML Galaxy plug-in, primarily based on JavaScript. Via the Galaxy API, MVP uses SQLite databases as input-a custom data type (mzSQLite) containing MS-based peptide identification information, a variant annotation table, and a coding sequence table. Users can interactively filter identified peptides based on sequence and data quality metrics, view annotated peptide MS data, and visualize protein-level information, along with genomic coordinates. Peptides that pass the user-defined thresholds can be sent back to Galaxy via the API for further analysis; processed data and visualizations can also be saved and shared. MVP leverages the Integrated Genomics Viewer JavaScript framework, enabling interactive visualization of peptides and corresponding transcript and genomic coding information within the MVP interface. CONCLUSIONS MVP provides a powerful, extensible platform for automated, interactive visualization of proteogenomic results within the Galaxy environment, adding a unique and critically needed tool for empowering exploration and interpretation of results. The platform is extensible, providing a basis for further development of new functionalities for proteogenomic data visualization.
Collapse
Affiliation(s)
- Thomas McGowan
- Minnesota Supercomputing Institute, University of Minnesota, 599 Walter Library, 117 Pleasant Street SE, Minneapolis, MN 55455, USA
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, 599 Walter Library, 117 Pleasant Street SE, Minneapolis, MN 55455, USA
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
- Bioinformatics and Computational Biology program, University of Minnesota-Rochester, 111 South Broadway, Suite 300, Rochester, MN 55904, USA
| | - Ray Sajulga
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| |
Collapse
|
30
|
Peeters MKR, Menschaert G. The hunt for sORFs: A multidisciplinary strategy. Exp Cell Res 2020; 391:111923. [PMID: 32135166 DOI: 10.1016/j.yexcr.2020.111923] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 02/21/2020] [Accepted: 02/23/2020] [Indexed: 11/28/2022]
Abstract
Growing evidence illustrates the shortcomings on the current understanding of the full complexity of the proteome. Previously overlooked small open reading frames (sORFs) and their encoded microproteins have filled important gaps, exerting their function as biologically relevant regulators. The characterization of the full small proteome has potential applications in many fields. Continuous development of techniques and tools led to an improved sORF discovery, where these can originate from bioinformatics analyses, from sequencing routines or proteomics approaches. In this mini review, we discuss the ongoing trends in the three fields and suggest some strategies for further characterization of high potential candidates.
Collapse
Affiliation(s)
- Marlies K R Peeters
- BioBix, Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653, 900, Gent, Belgium
| | - Gerben Menschaert
- BioBix, Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653, 900, Gent, Belgium.
| |
Collapse
|
31
|
Boonen K, Hens K, Menschaert G, Baggerman G, Valkenborg D, Ertaylan G. Beyond Genes: Re-Identifiability of Proteomic Data and Its Implications for Personalized Medicine. Genes (Basel) 2019; 10:E682. [PMID: 31492022 PMCID: PMC6770961 DOI: 10.3390/genes10090682] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 08/30/2019] [Accepted: 09/01/2019] [Indexed: 02/07/2023] Open
Abstract
The increasing availability of high throughput proteomics data provides us with opportunities as well as posing new ethical challenges regarding data privacy and re-identifiability of participants. Moreover, the fact that proteomics represents a level between the genotype and the phenotype further exacerbates the situation, introducing dilemmas related to publicly available data, anonymization, ownership of information and incidental findings. In this paper, we try to differentiate proteomics from genomics data and cover the ethical challenges related to proteomics data sharing. Finally, we give an overview of the proposed solutions and the outlook for future studies.
Collapse
Affiliation(s)
- Kurt Boonen
- VITO Health, Boeretang 200, Mol 2400, Belgium.
- Centre for Proteomics, University of Antwerpen, Antwerp 2020, Belgium.
| | - Kristien Hens
- Department of Philosophy, University of Antwerp, Antwerp 2000 & Institute of Philosophy, KU Leuven, Leuven 3000, Belgium.
| | - Gerben Menschaert
- Biobix, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent 9000, Belgium.
| | - Geert Baggerman
- VITO Health, Boeretang 200, Mol 2400, Belgium.
- Centre for Proteomics, University of Antwerpen, Antwerp 2020, Belgium.
| | | | | |
Collapse
|
32
|
Zhang B, Kuster B. Proteomics Is Not an Island: Multi-omics Integration Is the Key to Understanding Biological Systems. Mol Cell Proteomics 2019; 18:S1-S4. [PMID: 31399542 PMCID: PMC6692779 DOI: 10.1074/mcp.e119.001693] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Indexed: 12/18/2022] Open
Affiliation(s)
- Bing Zhang
- ‡Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas
- §Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Bernhard Kuster
- ¶Chair of Proteomics and Bioanalytics, Technische Universitat Munchen, Freising, Germany
- ‖Bavarian Biomolecular Mass Spectrometry Center, Technische Universitat Munchen, Freising, Germany
| |
Collapse
|
33
|
Binz PA, Shofstahl J, Vizcaíno JA, Barsnes H, Chalkley RJ, Menschaert G, Alpi E, Clauser K, Eng JK, Lane L, Seymour SL, Sánchez LFH, Mayer G, Eisenacher M, Perez-Riverol Y, Kapp EA, Mendoza L, Baker PR, Collins A, Van Den Bossche T, Deutsch EW. Proteomics Standards Initiative Extended FASTA Format. J Proteome Res 2019; 18:2686-2692. [PMID: 31081335 DOI: 10.1021/acs.jproteome.9b00064] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Mass-spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs) in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI extended FASTA format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backward compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without supporting any of the extra capabilities of PEFF. PEFF is defined by a full specification document, controlled vocabulary terms, a set of example files, software libraries, and a file validator. Popular software and resources are starting to support PEFF, including the sequence search engine Comet and the knowledge bases neXtProt and UniProtKB. Widespread implementation of PEFF is expected to further enable proteogenomics and top-down proteomics applications by providing a standardized mechanism for encoding protein sequences and their known variations. All the related documentation, including the detailed file format specification and example files, are available at http://www.psidev.info/peff .
Collapse
Affiliation(s)
- Pierre-Alain Binz
- CHUV Centre Hospitalier Universitaire Vaudois , CH-1011 Lausanne 14 , Switzerland
| | - Jim Shofstahl
- Thermo Fisher Scientific , 355 River Oaks Parkway , San Jose , California 95134 , United States
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Harald Barsnes
- Proteomics Unit, Department of Biomedicine , University of Bergen , N-5009 Bergen , Norway.,Computational Biology Unit, Department of Informatics , University of Bergen , N-5008 Bergen , Norway
| | - Robert J Chalkley
- University California at San Francisco , San Francisco , California 94143 , United States
| | - Gerben Menschaert
- Biobix, Department of Data Analysis and Mathematical Modelling , Ghent University , 9000 Ghent , Belgium
| | - Emanuele Alpi
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Karl Clauser
- Broad Institute , Cambridge , Massachusetts 02142 , United States
| | - Jimmy K Eng
- University of Washington , Seattle , Washington 98195 , United States
| | - Lydie Lane
- SIB Swiss Institute of Bioinformatics , CH-1211 Geneva 4 , Switzerland.,Department of Microbiology and Molecular Medicine, Faculty of Medicine , University of Geneva , CH-1211 Geneva 4 , Switzerland
| | - Sean L Seymour
- Seymour Data Science, LLC , San Francisco , California 95000 , United States
| | - Luis Francisco Hernández Sánchez
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science , University of Bergen , 5021 Bergen , Norway.,Center for Medical Genetics and Molecular Medicine , Haukeland University Hospital , 5021 Bergen , Norway
| | - Gerhard Mayer
- Medical Faculty, Medizinisches Proteom-Center , Ruhr University Bochum , D-44801 Bochum , Germany
| | - Martin Eisenacher
- Medical Faculty, Medizinisches Proteom-Center , Ruhr University Bochum , D-44801 Bochum , Germany
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Eugene A Kapp
- Walter & Eliza Hall Institute of Medical Research and the University of Melbourne , Melbourne , VIC 3052 , Australia
| | - Luis Mendoza
- Institute for Systems Biology , Seattle , Washington 98109 , United States
| | - Peter R Baker
- University California at San Francisco , San Francisco , California 94143 , United States
| | - Andrew Collins
- Department of Functional and Comparative Genomics, Institute of Integrated Biology , University of Liverpool , Liverpool L69 7ZB , United Kingdom
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology , Ghent University , 9000 Ghent , Belgium
| | - Eric W Deutsch
- Institute for Systems Biology , Seattle , Washington 98109 , United States
| |
Collapse
|