1
|
Lee S, Kim H. Bidirectional de novo peptide sequencing using a transformer model. PLoS Comput Biol 2024; 20:e1011892. [PMID: 38416757 PMCID: PMC10901305 DOI: 10.1371/journal.pcbi.1011892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 02/02/2024] [Indexed: 03/01/2024] Open
Abstract
In proteomics, a crucial aspect is to identify peptide sequences. De novo sequencing methods have been widely employed to identify peptide sequences, and numerous tools have been proposed over the past two decades. Recently, deep learning approaches have been introduced for de novo sequencing. Previous methods focused on encoding tandem mass spectra and predicting peptide sequences from the first amino acid onwards. However, when predicting peptides using tandem mass spectra, the peptide sequence can be predicted not only from the first amino acid but also from the last amino acid due to the coexistence of b-ion (or a- or c-ion) and y-ion (or x- or z-ion) fragments in the tandem mass spectra. Therefore, it is essential to predict peptide sequences bidirectionally. Our approach, called NovoB, utilizes a Transformer model to predict peptide sequences bidirectionally, starting with both the first and last amino acids. In comparison to Casanovo, our method achieved an improvement of the average peptide-level accuracy rate of approximately 9.8% across all species.
Collapse
Affiliation(s)
- Sangjeong Lee
- Center for Biomedical Computing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
| | - Hyunwoo Kim
- Center for Biomedical Computing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
| |
Collapse
|
2
|
Fuchs S, Engelmann S. Small proteins in bacteria - Big challenges in prediction and identification. Proteomics 2023; 23:e2200421. [PMID: 37609810 DOI: 10.1002/pmic.202200421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/24/2023]
Abstract
Proteins with up to 100 amino acids have been largely overlooked due to the challenges associated with predicting and identifying them using traditional methods. Recent advances in bioinformatics and machine learning, DNA sequencing, RNA and Ribo-seq technologies, and mass spectrometry (MS) have greatly facilitated the detection and characterisation of these elusive proteins in recent years. This has revealed their crucial role in various cellular processes including regulation, signalling and transport, as toxins and as folding helpers for protein complexes. Consequently, the systematic identification and characterisation of these proteins in bacteria have emerged as a prominent field of interest within the microbial research community. This review provides an overview of different strategies for predicting and identifying these proteins on a large scale, leveraging the power of these advanced technologies. Furthermore, the review offers insights into the future developments that may be expected in this field.
Collapse
Affiliation(s)
- Stephan Fuchs
- Genome Competence Center (MF1), Department MFI, Robert-Koch-Institut, Berlin, Germany
| | - Susanne Engelmann
- Institute for Microbiology, Technische Universität Braunschweig, Braunschweig, Germany
- Microbial Proteomics, Helmholtzzentrum für Infektionsforschung GmbH, Braunschweig, Germany
| |
Collapse
|
3
|
Fan SM, Li ZQ, Zhang SZ, Chen LY, Wei XY, Liang J, Zhao XQ, Su C. Multi-integrated approach for unraveling small open reading frames potentially associated with secondary metabolism in Streptomyces. mSystems 2023; 8:e0024523. [PMID: 37712700 PMCID: PMC10654065 DOI: 10.1128/msystems.00245-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 07/20/2023] [Indexed: 09/16/2023] Open
Abstract
IMPORTANCE Due to their small size and special chemical features, small open reading frame (smORF)-encoding peptides (SEPs) are often neglected. However, they may play critical roles in regulating gene expression, enzyme activity, and metabolite production. Studies on bacterial microproteins have mainly focused on pathogenic bacteria, which are importance to systematically investigate SEPs in streptomycetes and are rich sources of bioactive secondary metabolites. Our study is the first to perform a global identification of smORFs in streptomycetes. We established a peptidogenomic workflow for non-model microbial strains and identified multiple novel smORFs that are potentially linked to secondary metabolism in streptomycetes. Our multi-integrated approach in this study is meaningful to improve the quality and quantity of the detected smORFs. Ultimately, the workflow we established could be extended to other organisms and would benefit the genome mining of microproteins with critical functions for regulation and engineering useful microorganisms.
Collapse
Affiliation(s)
- Si-Min Fan
- National Engineering Laboratory for Resource Developing of Endangered Chinese Crude Drugs in Northwest China, College of Life Sciences, Shaanxi Normal University, Shaanxi, China
| | - Ze-Qi Li
- National Engineering Laboratory for Resource Developing of Endangered Chinese Crude Drugs in Northwest China, College of Life Sciences, Shaanxi Normal University, Shaanxi, China
| | - Shi-Zhe Zhang
- National Engineering Laboratory for Resource Developing of Endangered Chinese Crude Drugs in Northwest China, College of Life Sciences, Shaanxi Normal University, Shaanxi, China
| | - Liang-Yu Chen
- ProteinT (Tianjin) biotechnology Co. Ltd., Tianjin, China
| | - Xi-Ying Wei
- National Engineering Laboratory for Resource Developing of Endangered Chinese Crude Drugs in Northwest China, College of Life Sciences, Shaanxi Normal University, Shaanxi, China
| | - Jian Liang
- National Engineering Laboratory for Resource Developing of Endangered Chinese Crude Drugs in Northwest China, College of Life Sciences, Shaanxi Normal University, Shaanxi, China
- College of Biology and Geography, Yili Normal University, Yining, China
| | - Xin-Qing Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai Jiao, China
| | - Chun Su
- National Engineering Laboratory for Resource Developing of Endangered Chinese Crude Drugs in Northwest China, College of Life Sciences, Shaanxi Normal University, Shaanxi, China
| |
Collapse
|
4
|
McDonnell K, Howley E, Abram F. Critical evaluation of the use of artificial data for machine learning based de novo peptide identification. Comput Struct Biotechnol J 2023; 21:2732-2743. [PMID: 37168871 PMCID: PMC10165132 DOI: 10.1016/j.csbj.2023.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 04/16/2023] [Accepted: 04/16/2023] [Indexed: 05/13/2023] Open
Abstract
Proteins are essential components of all living cells and so the study of their in situ expression, proteomics, has wide reaching applications. Peptide identification in proteomics typically relies on matching high resolution tandem mass spectra to a protein database but can also be performed de novo. While artificial spectra have been successfully incorporated into database search pipelines to increase peptide identification rates, little work has been done to investigate the utility of artificial spectra in the context of de novo peptide identification. Here, we perform a critical analysis of the use of artificial data for the training and evaluation of de novo peptide identification algorithms. First, we classify the different fragment ion types present in real spectra and then estimate the number of spurious matches using random peptides. We then categorise the different types of noise present in real spectra. Finally, we transfer this knowledge to artificial data and test the performance of a state-of-the-art de novo peptide identification algorithm trained using artificial spectra with and without relevant noise addition. Noise supplementation increased artificial training data performance from 30% to 77% of real training data peptide recall. While real data performance was not fully replicated, this work provides the first steps towards an artificial spectrum framework for the training and evaluation of de novo peptide identification algorithms. Further enhanced artificial spectra may allow for more in depth analysis of de novo algorithms as well as alleviating the reliance on database searches for training data.
Collapse
Affiliation(s)
- Kevin McDonnell
- Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland
- School of Computer Science, University of Galway, Ireland
- Corresponding author at: Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland.
| | - Enda Howley
- School of Computer Science, University of Galway, Ireland
| | - Florence Abram
- Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland
- Corresponding author.
| |
Collapse
|
5
|
McDonnell K, Abram F, Howley E. Application of a Novel Hybrid CNN-GNN for Peptide Ion Encoding. J Proteome Res 2022; 22:323-333. [PMID: 36534699 PMCID: PMC9903319 DOI: 10.1021/acs.jproteome.2c00234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Almost all state-of-the-art de novo peptide sequencing algorithms now use machine learning models to encode fragment peaks and hence identify amino acids in mass spectrometry (MS) spectra. Previous work has highlighted how the inherent MS challenges of noise and missing peptide peaks detrimentally affect the performance of these models. In the present research we extracted and evaluated the encoding modules from 3 state-of-the-art de novo peptide sequencing algorithms. We also propose a convolutional neural network-graph neural network machine learning model for encoding peptide ions in tandem MS spectra. We compared the proposed encoding module to those used in the state-of-the-art de novo peptide sequencing algorithms by assessing their ability to identify b-ions and y-ions in MS spectra. This included a comprehensive evaluation in both real and artificial data across various levels of noise and missing peptide peaks. The proposed model performed best across all data sets using two different metrics (area under the receiver operating characteristic curve (AUC) and average precision). The work also highlighted the effect of including additional features such as intensity rank in these encoding modules as well as issues with using the AUC as a metric. This work is of significance to those designing future de novo peptide identification algorithms as it is the first step toward a new approach.
Collapse
Affiliation(s)
- Kevin McDonnell
- Department
of Information Technology, School of Computer Science, University of Galway, GalwayH91 TK33, Ireland,Functional
Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, GalwayH91 TK33, Ireland,E-mail:
| | - Florence Abram
- Functional
Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, GalwayH91 TK33, Ireland
| | - Enda Howley
- Department
of Information Technology, School of Computer Science, University of Galway, GalwayH91 TK33, Ireland
| |
Collapse
|
6
|
Yang Y, Wang H, Zhang Y, Chen L, Chen G, Bao Z, Yang Y, Xie Z, Zhao Q. An Optimized Proteomics Approach Reveals Novel Alternative Proteins in Mouse Liver Development. Mol Cell Proteomics 2022; 22:100480. [PMID: 36494044 PMCID: PMC9823216 DOI: 10.1016/j.mcpro.2022.100480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 11/15/2022] [Accepted: 12/04/2022] [Indexed: 12/12/2022] Open
Abstract
Alternative ORFs (AltORFs) are unannotated sequences in genome that encode novel peptides or proteins named alternative proteins (AltProts). Although ribosome profiling and bioinformatics predict a large number of AltProts, mass spectrometry as the only direct way of identification is hampered by the short lengths and relative low abundance of AltProts. There is an urgent need for improvement of mass spectrometry methodologies for AltProt identification. Here, we report an approach based on size-exclusion chromatography for simultaneous enrichment and fractionation of AltProts from complex proteome. This method greatly simplifies the variance of AltProts discovery by enriching small proteins smaller than 40 kDa. In a systematic comparison between 10 methods, the approach we reported enabled the discovery of more AltProts with overall higher intensities, with less cost of time and effort compared to other workflows. We applied this approach to identify 89 novel AltProts from mouse liver, 39 of which were differentially expressed between embryonic and adult mice. During embryonic development, the upregulated AltProts were mainly involved in biological pathways on RNA splicing and processing, whereas the AltProts involved in metabolisms were more active in adult livers. Our study not only provides an effective approach for identifying AltProts but also novel AltProts that are potentially important in developmental biology.
Collapse
Affiliation(s)
- Ying Yang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, SAR, China
| | - Hongwei Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yuanliang Zhang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, SAR, China
| | - Lei Chen
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, SAR, China
| | - Gennong Chen
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zhaoshi Bao
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical School, Beijing, China
| | - Yang Yang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, SAR, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Qian Zhao
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, SAR, China,For correspondence: Qian Zhao
| |
Collapse
|
7
|
Dual-RNAseq Analysis Unravels Virus-Host Interactions of MetSV and Methanosarcina mazei. Viruses 2022; 14:v14112585. [PMID: 36423194 PMCID: PMC9694453 DOI: 10.3390/v14112585] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 11/05/2022] [Accepted: 11/19/2022] [Indexed: 11/23/2022] Open
Abstract
Methanosarcina spherical virus (MetSV), infecting Methanosarcina species, encodes 22 genes, but their role in the infection process in combination with host genes has remained unknown. To study the infection process in detail, infected and uninfected M. mazei cultures were compared using dual-RNAseq, qRT-PCRs, and transmission electron microscopy (TEM). The transcriptome analysis strongly indicates a combined role of virus and host genes in replication, virus assembly, and lysis. Thereby, 285 host and virus genes were significantly regulated. Within these 285 regulated genes, a network of the viral polymerase, MetSVORF6, MetSVORF5, MetSVORF2, and the host genes encoding NrdD, NrdG, a CDC48 family protein, and a SSB protein with a role in viral replication was postulated. Ultrastructural analysis at 180 min p.i. revealed many infected cells with virus particles randomly scattered throughout the cytoplasm or attached at the cell surface, and membrane fragments indicating cell lysis. Dual-RNAseq and qRT-PCR analyses suggested a multifactorial lysis reaction in potential connection to the regulation of a cysteine proteinase, a pirin-like protein and a HicB-solo protein. Our study's results led to the first preliminary infection model of MetSV infecting M. mazei, summarizing the key infection steps as follows: replication, assembly, and host cell lysis.
Collapse
|
8
|
Zhang Z, Li Y, Yuan W, Wang Z, Wan C. Proteomic-driven identification of short open reading frame-encoded peptides. Proteomics 2022; 22:e2100312. [PMID: 35384297 DOI: 10.1002/pmic.202100312] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/29/2022] [Accepted: 03/30/2022] [Indexed: 11/10/2022]
Abstract
Accumulating evidence has shown that a large number of short open reading frames (sORFs) also have the ability to encode proteins. The discovery of sORFs opens up a new research area, leading to the identification and functional study of sORF encoded peptides (SEPs) at the omics level. Besides bioinformatics prediction and ribosomal profiling, mass spectrometry (MS) has become a significant tool as it directly detects the sequence of SEPs. Though MS-based proteomics methods have proved to be effective for qualitative and quantitative analysis of SEPs, the detection of SEPs is still a great challenge due to their low abundance and short sequence. To illustrate the progress in method development, we described and discussed the main steps of large-scale proteomics identification of SEPs, including SEP extraction and enrichment, MS detection, data processing and quality control, quantification, and function prediction and validation methods. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Zheng Zhang
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Yujie Li
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Wenqian Yuan
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Zhiwei Wang
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Cuihong Wan
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| |
Collapse
|
9
|
Weidenbach K, Gutt M, Cassidy L, Chibani C, Schmitz RA. Small Proteins in Archaea, a Mainly Unexplored World. J Bacteriol 2022; 204:e0031321. [PMID: 34543104 PMCID: PMC8765429 DOI: 10.1128/jb.00313-21] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
In recent years, increasing numbers of small proteins have moved into the focus of science. Small proteins have been identified and characterized in all three domains of life, but the majority remains functionally uncharacterized, lack secondary structure, and exhibit limited evolutionary conservation. While quite a few have already been described for bacteria and eukaryotic organisms, the amount of known and functionally analyzed archaeal small proteins is still very limited. In this review, we compile the current state of research, show strategies for systematic approaches for global identification of small archaeal proteins, and address selected functionally characterized examples. Besides, we document exemplarily for one archaeon the tool development and optimization to identify small proteins using genome-wide approaches.
Collapse
Affiliation(s)
- Katrin Weidenbach
- Institute for General Microbiology, Christian Albrechts University, Kiel, Germany
| | - Miriam Gutt
- Institute for General Microbiology, Christian Albrechts University, Kiel, Germany
| | - Liam Cassidy
- AG Proteomics & Bioanalytics, Institute for Experimental Medicine, Christian Albrechts University, Kiel, Germany
| | - Cynthia Chibani
- Institute for General Microbiology, Christian Albrechts University, Kiel, Germany
| | - Ruth A. Schmitz
- Institute for General Microbiology, Christian Albrechts University, Kiel, Germany
| |
Collapse
|
10
|
Chen L, Yang Y, Zhang Y, Li K, Cai H, Wang H, Zhao Q. The Small Open Reading Frame-Encoded Peptides: Advances in Methodologies and Functional Studies. Chembiochem 2021; 23:e202100534. [PMID: 34862721 DOI: 10.1002/cbic.202100534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 11/15/2021] [Indexed: 11/07/2022]
Abstract
Small open reading frames (sORFs) are an important class of genes with less than 100 codons. They were historically annotated as noncoding or even junk sequences. In recent years, accumulating evidence suggests that sORFs could encode a considerable number of polypeptides, many of which play important roles in both physiology and disease pathology. However, it has been technically challenging to directly detect sORF-encoded peptides (SEPs). Here, we discuss the latest advances in methodologies for identifying SEPs with mass spectrometry, as well as the progress on functional studies of SEPs.
Collapse
Affiliation(s)
- Lei Chen
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China.,Laboratory for Synthetic Chemistry and Chemical Biology Limited, Hong Kong Science and Technology Park, New Territories, Hong Kong SAR, 999077, P. R. China
| | - Ying Yang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Yuanliang Zhang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Kecheng Li
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510623, P. R. China
| | - Hongwei Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510623, P. R. China
| | - Qian Zhao
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| |
Collapse
|
11
|
Abstract
Transcriptional regulators that integrate cellular and environmental signals to control cell division are well known in bacteria and eukaryotes, but their existence is poorly understood in archaea. We identified a conserved gene (cdrS) that encodes a small protein and is highly transcribed in the model archaeon Haloferax volcanii. The cdrS gene could not be deleted, but CRISPR interference (CRISPRi)-mediated repression of the cdrS gene caused slow growth and cell division defects and changed the expression of multiple genes and their products associated with cell division, protein degradation, and metabolism. Consistent with this complex regulatory network, overexpression of cdrS inhibited cell division, whereas overexpression of the operon encoding both CdrS and a tubulin-like cell division protein (FtsZ2) stimulated division. Chromatin immunoprecipitation-DNA sequencing (ChIP-Seq) identified 18 DNA-binding sites of the CdrS protein, including one upstream of the promoter for a cell division gene, ftsZ1, and another upstream of the essential gene dacZ, encoding diadenylate cyclase involved in c-di-AMP signaling, which is implicated in the regulation of cell division. These findings suggest that CdrS is a transcription factor that plays a central role in a regulatory network coordinating metabolism and cell division. IMPORTANCE Cell division is a central mechanism of life and is essential for growth and development. Members of the Bacteria and Eukarya have different mechanisms for cell division, which have been studied in detail. In contrast, cell division in members of the Archaea is still understudied, and its regulation is poorly understood. Interestingly, different cell division machineries appear in members of the Archaea, with the Euryarchaeota using a cell division apparatus based on the tubulin-like cytoskeletal protein FtsZ, as in bacteria. Here, we identify the small protein CdrS as essential for survival and a central regulator of cell division in the euryarchaeon Haloferax volcanii. CdrS also appears to coordinate other cellular pathways, including synthesis of signaling molecules and protein degradation. Our results show that CdrS plays a sophisticated role in cell division, including regulation of numerous associated genes. These findings are expected to initiate investigations into conditional regulation of division in archaea.
Collapse
|
12
|
Cassidy L, Kaulich PT, Maaß S, Bartel J, Becher D, Tholey A. Bottom-up and top-down proteomic approaches for the identification, characterization, and quantification of the low molecular weight proteome with focus on short open reading frame-encoded peptides. Proteomics 2021; 21:e2100008. [PMID: 34145981 DOI: 10.1002/pmic.202100008] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 01/14/2023]
Abstract
The recent discovery of alternative open reading frames creates a need for suitable analytical approaches to verify their translation and to characterize the corresponding gene products at the molecular level. As the analysis of small proteins within a background proteome by means of classical bottom-up proteomics is challenging, method development for the analysis of small open reading frame encoded peptides (SEPs) have become a focal point for research. Here, we highlight bottom-up and top-down proteomics approaches established for the analysis of SEPs in both pro- and eukaryotes. Major steps of analysis, including sample preparation and (small) proteome isolation, separation and mass spectrometry, data interpretation and quality control, quantification, the analysis of post-translational modifications, and exploration of functional aspects of the SEPs by means of proteomics technologies are described. These methods do not exclusively cover the analytics of SEPs but simultaneously include the low molecular weight proteome, and moreover, can also be used for the proteome-wide analysis of proteolytic processing events.
Collapse
Affiliation(s)
- Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Philipp T Kaulich
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Sandra Maaß
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Jürgen Bartel
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Dörte Becher
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| |
Collapse
|
13
|
Fuchs S, Kucklick M, Lehmann E, Beckmann A, Wilkens M, Kolte B, Mustafayeva A, Ludwig T, Diwo M, Wissing J, Jänsch L, Ahrens CH, Ignatova Z, Engelmann S. Towards the characterization of the hidden world of small proteins in Staphylococcus aureus, a proteogenomics approach. PLoS Genet 2021; 17:e1009585. [PMID: 34061833 PMCID: PMC8195425 DOI: 10.1371/journal.pgen.1009585] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 06/11/2021] [Accepted: 05/07/2021] [Indexed: 01/08/2023] Open
Abstract
Small proteins play essential roles in bacterial physiology and virulence, however, automated algorithms for genome annotation are often not yet able to accurately predict the corresponding genes. The accuracy and reliability of genome annotations, particularly for small open reading frames (sORFs), can be significantly improved by integrating protein evidence from experimental approaches. Here we present a highly optimized and flexible bioinformatics workflow for bacterial proteogenomics covering all steps from (i) generation of protein databases, (ii) database searches and (iii) peptide-to-genome mapping to (iv) visualization of results. We used the workflow to identify high quality peptide spectrum matches (PSMs) for small proteins (≤ 100 aa, SP100) in Staphylococcus aureus Newman. Protein extracts from S. aureus were subjected to different experimental workflows for protein digestion and prefractionation and measured with highly sensitive mass spectrometers. In total, 175 proteins with up to 100 aa (SP100) were identified. Out of these 24 (ranging from 9 to 99 aa) were novel and not contained in the used genome annotation.144 SP100 are highly conserved and were found in at least 50% of the publicly available S. aureus genomes, while 127 are additionally conserved in other staphylococci. Almost half of the identified SP100 were basic, suggesting a role in binding to more acidic molecules such as nucleic acids or phospholipids.
Collapse
Affiliation(s)
- Stephan Fuchs
- Robert Koch Institute, Methodenentwicklung und Forschungsinfrastruktur (MF), Berlin, Germany
| | - Martin Kucklick
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Erik Lehmann
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Alexander Beckmann
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Maya Wilkens
- Robert Koch Institute, Methodenentwicklung und Forschungsinfrastruktur (MF), Berlin, Germany
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Baban Kolte
- University of Hamburg, Institute of Biochemistry and Molecular Biology, Hamburg, Germany
| | - Ayten Mustafayeva
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Tobias Ludwig
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Maurice Diwo
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Josef Wissing
- Helmholtz Center for Infection Research GmbH, Cellular Proteomics, Braunschweig, Germany
| | - Lothar Jänsch
- Helmholtz Center for Infection Research GmbH, Cellular Proteomics, Braunschweig, Germany
| | - Christian H Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Zoya Ignatova
- University of Hamburg, Institute of Biochemistry and Molecular Biology, Hamburg, Germany
| | - Susanne Engelmann
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| |
Collapse
|
14
|
Kaulich PT, Cassidy L, Bartel J, Schmitz RA, Tholey A. Multi-protease Approach for the Improved Identification and Molecular Characterization of Small Proteins and Short Open Reading Frame-Encoded Peptides. J Proteome Res 2021; 20:2895-2903. [PMID: 33760615 DOI: 10.1021/acs.jproteome.1c00115] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The identification of proteins below approximately 70-100 amino acids in bottom-up proteomics is still a challenging task due to the limited number of peptides generated by proteolytic digestion. This includes the short open reading frame-encoded peptides (SEPs), which are a subset of the small proteins that were not previously annotated or that are alternatively encoded. Here, we systematically investigated the use of multiple proteases (trypsin, chymotrypsin, LysC, LysargiNase, and GluC) in GeLC-MS/MS analysis to improve the sequence coverage and the number of identified peptides for small proteins, with a focus on SEPs, in the archaeon Methanosarcina mazei. Combining the data of all proteases, we identified 63 small proteins and additional 28 SEPs with at least two unique peptides, while only 55 small proteins and 22 SEP could be identified using trypsin only. For 27 small proteins and 12 SEPs, a complete sequence coverage was achieved. Moreover, for five SEPs, incorrectly predicted translation start points or potential in vivo proteolytic processing were identified, confirming the data of a previous top-down proteomics study of this organism. The results show clearly that a multi-protease approach allows to improve the identification and molecular characterization of small proteins and SEPs. LC-MS data: ProteomeXchange PXD023921.
Collapse
Affiliation(s)
- Philipp T Kaulich
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel 24105, Germany
| | - Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel 24105, Germany
| | - Jürgen Bartel
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Greifswald 17489, Germany
| | - Ruth A Schmitz
- Institute for General Microbiology, Christian-Albrechts-Universität zu Kiel, Kiel 24118, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel 24105, Germany
| |
Collapse
|
15
|
Gutt M, Jordan B, Weidenbach K, Gudzuhn M, Kiessling C, Cassidy L, Helbig A, Tholey A, Pyper DJ, Kubatova N, Schwalbe H, Schmitz RA. High complexity of Glutamine synthetase regulation in
Methanosarcina mazei
: Small protein 26 interacts and enhances glutamine synthetase activity. FEBS J 2021; 288:5350-5373. [DOI: 10.1111/febs.15799] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 01/05/2021] [Accepted: 03/02/2021] [Indexed: 12/13/2022]
Affiliation(s)
- Miriam Gutt
- Institute for General Microbiology Christian‐Albrechts‐University Kiel Germany
| | - Britta Jordan
- Institute for General Microbiology Christian‐Albrechts‐University Kiel Germany
| | - Katrin Weidenbach
- Institute for General Microbiology Christian‐Albrechts‐University Kiel Germany
| | - Mirja Gudzuhn
- Institute for General Microbiology Christian‐Albrechts‐University Kiel Germany
| | - Claudia Kiessling
- Institute for General Microbiology Christian‐Albrechts‐University Kiel Germany
| | - Liam Cassidy
- AG Proteomics & Bioanalytics Institute for Experimental Medicine Christian‐Albrechts‐University Kiel Germany
| | - Andreas Helbig
- AG Proteomics & Bioanalytics Institute for Experimental Medicine Christian‐Albrechts‐University Kiel Germany
| | - Andreas Tholey
- AG Proteomics & Bioanalytics Institute for Experimental Medicine Christian‐Albrechts‐University Kiel Germany
| | - Dennis Joshua Pyper
- Institute of Organic Chemistry and Chemical Biology Center for Biomolecular Magnetic Resonance (BMRZ) Johann Wolfgang Goethe University Frankfurt am Main Germany
| | - Nina Kubatova
- Institute of Organic Chemistry and Chemical Biology Center for Biomolecular Magnetic Resonance (BMRZ) Johann Wolfgang Goethe University Frankfurt am Main Germany
| | - Harald Schwalbe
- Institute of Organic Chemistry and Chemical Biology Center for Biomolecular Magnetic Resonance (BMRZ) Johann Wolfgang Goethe University Frankfurt am Main Germany
| | - Ruth Anne Schmitz
- Institute for General Microbiology Christian‐Albrechts‐University Kiel Germany
| |
Collapse
|
16
|
Petruschke H, Schori C, Canzler S, Riesbeck S, Poehlein A, Daniel R, Frei D, Segessemann T, Zimmerman J, Marinos G, Kaleta C, Jehmlich N, Ahrens CH, von Bergen M. Discovery of novel community-relevant small proteins in a simplified human intestinal microbiome. MICROBIOME 2021; 9:55. [PMID: 33622394 PMCID: PMC7903761 DOI: 10.1186/s40168-020-00981-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 12/16/2020] [Indexed: 05/13/2023]
Abstract
BACKGROUND The intestinal microbiota plays a crucial role in protecting the host from pathogenic microbes, modulating immunity and regulating metabolic processes. We studied the simplified human intestinal microbiota (SIHUMIx) consisting of eight bacterial species with a particular focus on the discovery of novel small proteins with less than 100 amino acids (= sProteins), some of which may contribute to shape the simplified human intestinal microbiota. Although sProteins carry out a wide range of important functions, they are still often missed in genome annotations, and little is known about their structure and function in individual microbes and especially in microbial communities. RESULTS We created a multi-species integrated proteogenomics search database (iPtgxDB) to enable a comprehensive identification of novel sProteins. Six of the eight SIHUMIx species, for which no complete genomes were available, were sequenced and de novo assembled. Several proteomics approaches including two earlier optimized sProtein enrichment strategies were applied to specifically increase the chances for novel sProtein discovery. The search of tandem mass spectrometry (MS/MS) data against the multi-species iPtgxDB enabled the identification of 31 novel sProteins, of which the expression of 30 was supported by metatranscriptomics data. Using synthetic peptides, we were able to validate the expression of 25 novel sProteins. The comparison of sProtein expression in each single strain versus a multi-species community cultivation showed that six of these sProteins were only identified in the SIHUMIx community indicating a potentially important role of sProteins in the organization of microbial communities. Two of these novel sProteins have a potential antimicrobial function. Metabolic modelling revealed that a third sProtein is located in a genomic region encoding several enzymes relevant for the community metabolism within SIHUMIx. CONCLUSIONS We outline an integrated experimental and bioinformatics workflow for the discovery of novel sProteins in a simplified intestinal model system that can be generically applied to other microbial communities. The further analysis of novel sProteins uniquely expressed in the SIHUMIx multi-species community is expected to enable new insights into the role of sProteins on the functionality of bacterial communities such as those of the human intestinal tract. Video abstract.
Collapse
Affiliation(s)
- Hannes Petruschke
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Christian Schori
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Sebastian Canzler
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Sarah Riesbeck
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Anja Poehlein
- Institute of Microbiology and Genetics, Department of Genomic and Applied Microbiology, Georg-August University of Göttingen, Göttingen, Germany
| | - Rolf Daniel
- Institute of Microbiology and Genetics, Department of Genomic and Applied Microbiology, Georg-August University of Göttingen, Göttingen, Germany
| | - Daniel Frei
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Tina Segessemann
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Johannes Zimmerman
- Research Group Medical Systems Biology, Institute for Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
| | - Georgios Marinos
- Research Group Medical Systems Biology, Institute for Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
| | - Christoph Kaleta
- Research Group Medical Systems Biology, Institute for Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
| | - Nico Jehmlich
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Christian H Ahrens
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland.
| | - Martin von Bergen
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany.
- Institute of Biochemistry, Faculty of Biosciences, Pharmacy and Psychology, University of Leipzig, Leipzig, Germany.
| |
Collapse
|
17
|
Fabre B, Combier JP, Plaza S. Recent advances in mass spectrometry-based peptidomics workflows to identify short-open-reading-frame-encoded peptides and explore their functions. Curr Opin Chem Biol 2021; 60:122-130. [PMID: 33401134 DOI: 10.1016/j.cbpa.2020.12.002] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 11/26/2020] [Accepted: 12/03/2020] [Indexed: 12/12/2022]
Abstract
Short open reading frame (sORF)-encoded polypeptides (SEPs) have recently emerged as key regulators of major cellular processes. Computational methods for the annotation of sORFs combined with transcriptomics and ribosome profiling approaches predicted the existence of tens of thousands of SEPs across the kingdom of life. Although, we still lack unambiguous evidence for most of them. The method of choice to validate the expression of SEPs is mass spectrometry (MS)-based peptidomics. Peptides are less abundant than proteins, which tends to hinder their detection. Therefore, optimization and enrichment methods are necessary to validate the existence of SEPs. In this article, we discuss the challenges for the detection of SEPs by MS and recent developments of biochemical approaches applied to the study of these peptides. We detail the advances made in the different key steps of a typical peptidomics workflow and highlight possible alternatives that have not been explored yet.
Collapse
Affiliation(s)
- Bertrand Fabre
- Laboratoire de Recherche en Sciences Végétales, UMR5546, Université de Toulouse, UPS, CNRS, 31320, Auzeville-Tolosane, France.
| | - Jean-Philippe Combier
- Laboratoire de Recherche en Sciences Végétales, UMR5546, Université de Toulouse, UPS, CNRS, 31320, Auzeville-Tolosane, France
| | - Serge Plaza
- Laboratoire de Recherche en Sciences Végétales, UMR5546, Université de Toulouse, UPS, CNRS, 31320, Auzeville-Tolosane, France
| |
Collapse
|
18
|
Cardon T, Fournier I, Salzet M. Shedding Light on the Ghost Proteome. Trends Biochem Sci 2020; 46:239-250. [PMID: 33246829 DOI: 10.1016/j.tibs.2020.10.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 10/21/2020] [Accepted: 10/22/2020] [Indexed: 01/19/2023]
Abstract
Conventionally, eukaryotic mRNAs were thought to be monocistronic, leading to the translation of a single protein. However, large-scale proteomics has led to the identification of proteins translated from alternative open reading frames (AltORFs) in mRNAs. AltORFs are found in addition to predicted reference ORFs and noncoding RNA. Alternative proteins are not represented in the conventional protein databases, and this 'Ghost proteome' was not considered until recently. Some of these proteins are functional, and there is growing evidence that they are involved in central functions in physiological and physiopathological contexts. Here, we review how this Ghost proteome fills the gap in our understanding of signaling pathways, establishes new markers of pathologies, and highlights therapeutic targets.
Collapse
Affiliation(s)
- Tristan Cardon
- Laboratoire Protéomique, Réponse Inflammatoire Spectrométrie de Masse (PRISM), Inserm U1192, University of Lille, CHU Lille, F-59000 Lille, France.
| | - Isabelle Fournier
- Laboratoire Protéomique, Réponse Inflammatoire Spectrométrie de Masse (PRISM), Inserm U1192, University of Lille, CHU Lille, F-59000 Lille, France; Institut Universitaire de France, Paris, France.
| | - Michel Salzet
- Laboratoire Protéomique, Réponse Inflammatoire Spectrométrie de Masse (PRISM), Inserm U1192, University of Lille, CHU Lille, F-59000 Lille, France; Institut Universitaire de France, Paris, France.
| |
Collapse
|
19
|
Villalobos Solis MI, Poudel S, Bonnot C, Shrestha HK, Hettich RL, Veneault-Fourrey C, Martin F, Abraham PE. A Viable New Strategy for the Discovery of Peptide Proteolytic Cleavage Products in Plant-Microbe Interactions. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2020; 33:1177-1188. [PMID: 32597696 DOI: 10.1094/mpmi-04-20-0082-ta] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Small peptides that are proteolytic cleavage products (PCPs) of less than 100 amino acids are emerging as key signaling molecules that mediate cell-to-cell communication and biological processes that occur between and within plants, fungi, and bacteria. Yet, the discovery and characterization of these molecules is largely overlooked. Today, selective enrichment and subsequent characterization by mass spectrometry-based sequencing offers the greatest potential for their comprehensive characterization, however qualitative and quantitative performance metrics are rarely captured. Herein, we addressed this need by benchmarking the performance of an enrichment strategy, optimized specifically for small PCPs, using state-of-the-art de novo-assisted peptide sequencing. As a case study, we implemented this approach to identify PCPs from different root and foliar tissues of the hybrid poplar Populus × canescens 717-1B4 in interaction with the ectomycorrhizal basidiomycete Laccaria bicolor. In total, we identified 1,660 and 2,870 Populus and L. bicolor unique PCPs, respectively. Qualitative results supported the identification of well-known PCPs, like the mature form of the photosystem II complex 5-kDa protein (approximately 3 kDa). A total of 157 PCPs were determined to be significantly more abundant in root tips with established ectomycorrhiza when compared with root tips without established ectomycorrhiza and extramatrical mycelium of L. bicolor. These PCPs mapped to 64 Populus proteins and 69 L. bicolor proteins in our database, with several of them previously implicated in biologically relevant associations between plant and fungus.
Collapse
Affiliation(s)
- Manuel I Villalobos Solis
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, U.S.A
- Department of Genome Science and Technology, University of Tennessee-Knoxville, Knoxville, TN 37996, U.S.A
| | - Suresh Poudel
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, U.S.A
| | - Clemence Bonnot
- UMR 1136 INRA-Université de Lorraine 'Interactions Arbres/Microorganismes', Laboratoire d'Excellence ARBRE, Centre INRA-Lorraine, 54280 Champenoux, France
| | - Him K Shrestha
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, U.S.A
- Department of Genome Science and Technology, University of Tennessee-Knoxville, Knoxville, TN 37996, U.S.A
| | - Robert L Hettich
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, U.S.A
| | - Claire Veneault-Fourrey
- UMR 1136 INRA-Université de Lorraine 'Interactions Arbres/Microorganismes', Laboratoire d'Excellence ARBRE, Centre INRA-Lorraine, 54280 Champenoux, France
| | - Francis Martin
- UMR 1136 INRA-Université de Lorraine 'Interactions Arbres/Microorganismes', Laboratoire d'Excellence ARBRE, Centre INRA-Lorraine, 54280 Champenoux, France
| | - Paul E Abraham
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, U.S.A
| |
Collapse
|
20
|
Zahn S, Kubatova N, Pyper DJ, Cassidy L, Saxena K, Tholey A, Schwalbe H, Soppa J. Biological functions, genetic and biochemical characterization, and NMR structure determination of the small zinc finger protein HVO_2753 from
Haloferax volcanii. FEBS J 2020; 288:2042-2062. [DOI: 10.1111/febs.15559] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 06/26/2020] [Accepted: 09/02/2020] [Indexed: 12/26/2022]
Affiliation(s)
- Sebastian Zahn
- Institute for Molecular Biosciences Goethe‐University Frankfurt Germany
| | - Nina Kubatova
- Institute for Organic Chemistry and Chemical Biology Center for Biomolecular Magnetic Resonance Goethe‐University Frankfurt/Main Germany
| | - Dennis J. Pyper
- Institute for Organic Chemistry and Chemical Biology Center for Biomolecular Magnetic Resonance Goethe‐University Frankfurt/Main Germany
| | - Liam Cassidy
- Systematic Proteome Research & Bioanalytics Institute for Experimental Medicine Christian‐Albrechts‐Universität zu Kiel Kiel Germany
| | - Krishna Saxena
- Institute for Organic Chemistry and Chemical Biology Center for Biomolecular Magnetic Resonance Goethe‐University Frankfurt/Main Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics Institute for Experimental Medicine Christian‐Albrechts‐Universität zu Kiel Kiel Germany
| | - Harald Schwalbe
- Institute for Organic Chemistry and Chemical Biology Center for Biomolecular Magnetic Resonance Goethe‐University Frankfurt/Main Germany
| | - Jörg Soppa
- Institute for Molecular Biosciences Goethe‐University Frankfurt Germany
- Johann Wolfgang Goethe‐Universität Frankfurt am Main Germany
| |
Collapse
|
21
|
Cassidy L, Helbig AO, Kaulich PT, Weidenbach K, Schmitz RA, Tholey A. Multidimensional separation schemes enhance the identification and molecular characterization of low molecular weight proteomes and short open reading frame-encoded peptides in top-down proteomics. J Proteomics 2020; 230:103988. [PMID: 32949814 DOI: 10.1016/j.jprot.2020.103988] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/17/2020] [Accepted: 09/14/2020] [Indexed: 12/13/2022]
Abstract
Short open reading frame-encoded peptides (SEP) represent a widely undiscovered part of the proteome. The detailed analysis of SEP has, despite inherent limitations such as incomplete sequence coverage, challenges encountered with protein inference, the identification of posttranslational modifications and the assignment of potential N- and C-terminal truncations, predominantly been assessed using bottom-up proteomic workflows. The use of top-down based proteomic workflows is capable of providing an unparalleled level of characterization information, which is of increased importance in the case of alternatively encoded protein products. However, top-down based analysis is not without its own limitations, for which efficient separation prior to MS analysis is a major issue. We established a sample preparation approach for the combined bottom-up and top-down proteomic analysis of SEP. Key improvements were made by the application of solid phase extraction (SPE), which supported enrichment of proteins below ca. 20 kDa, followed by 2D-LC-MS top-down analysis encompassing both HCD and EThcD ion activation. Bottom-up experiments were used to support and confirm top-down data interpretation. This strategy allowed for the top-down characterization of 36 proteoforms mapping to 12 SEP from the archaeon Methanosarcina mazei strain Gö1, with the concurrent detection and identification of several posttranslational modifications in SEP. BIOLOGICAL SIGNIFICANCE: Small or short open reading frames (sORF) have been widely neglected in genome research in the past. With their increasing discovery, the question about the presence and molecular function of their translation products, the short open reading frame-encoded peptides (SEP), arises. As these small proteins are usually below the 10 kDa range, the number of peptides identifiable by bottom-up proteomics is limited which hampers both the identification and the recognition of potential posttranslational modifications. The presented top-down approach allowed for the detection of full length SEP, as well as of terminally truncated proteoforms, and further enabled the identification of disulfide bonds in these small proteins. This demonstrates, that this yet widely undiscovered part of the proteome undergoes the same modifications as classical proteins which is an essential step for future understanding of the biological functions of these molecules.
Collapse
Affiliation(s)
- Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany
| | - Andreas O Helbig
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany
| | - Philipp T Kaulich
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany
| | - Kathrin Weidenbach
- Institute for General Microbiology, Christian-Albrechts-Universität zu Kiel, 24118 Kiel, Germany
| | - Ruth A Schmitz
- Institute for General Microbiology, Christian-Albrechts-Universität zu Kiel, 24118 Kiel, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany.
| |
Collapse
|
22
|
Bartel J, Varadarajan AR, Sura T, Ahrens CH, Maaß S, Becher D. Optimized Proteomics Workflow for the Detection of Small Proteins. J Proteome Res 2020; 19:4004-4018. [DOI: 10.1021/acs.jproteome.0c00286] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Jürgen Bartel
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| | - Adithi R. Varadarajan
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Thomas Sura
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| | - Christian H. Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Sandra Maaß
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| | - Dörte Becher
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| |
Collapse
|
23
|
Kaulich PT, Cassidy L, Weidenbach K, Schmitz RA, Tholey A. Complementarity of Different SDS‐PAGE Gel Staining Methods for the Identification of Short Open Reading Frame‐Encoded Peptides. Proteomics 2020; 20:e2000084. [DOI: 10.1002/pmic.202000084] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 06/15/2020] [Indexed: 12/14/2022]
Affiliation(s)
- Philipp T. Kaulich
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine Christian‐Albrechts‐Universität zu Kiel Kiel 24105 Germany
| | - Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine Christian‐Albrechts‐Universität zu Kiel Kiel 24105 Germany
| | - Katrin Weidenbach
- Institute for General Microbiology Christian‐Albrechts‐Universität zu Kiel Kiel 24118 Germany
| | - Ruth A. Schmitz
- Institute for General Microbiology Christian‐Albrechts‐Universität zu Kiel Kiel 24118 Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine Christian‐Albrechts‐Universität zu Kiel Kiel 24105 Germany
| |
Collapse
|
24
|
Yang H, Chi H, Zeng WF, Zhou WJ, He SM. pNovo 3: precise de novo peptide sequencing using a learning-to-rank framework. Bioinformatics 2020; 35:i183-i190. [PMID: 31510687 PMCID: PMC6612832 DOI: 10.1093/bioinformatics/btz366] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION De novo peptide sequencing based on tandem mass spectrometry data is the key technology of shotgun proteomics for identifying peptides without any database and assembling unknown proteins. However, owing to the low ion coverage in tandem mass spectra, the order of certain consecutive amino acids cannot be determined if all of their supporting fragment ions are missing, which results in the low precision of de novo sequencing. RESULTS In order to solve this problem, we developed pNovo 3, which used a learning-to-rank framework to distinguish similar peptide candidates for each spectrum. Three metrics for measuring the similarity between each experimental spectrum and its corresponding theoretical spectrum were used as important features, in which the theoretical spectra can be precisely predicted by the pDeep algorithm using deep learning. On seven benchmark datasets from six diverse species, pNovo 3 recalled 29-102% more correct spectra, and the precision was 11-89% higher than three other state-of-the-art de novo sequencing algorithms. Furthermore, compared with the newly developed DeepNovo, which also used the deep learning approach, pNovo 3 still identified 21-50% more spectra on the nine datasets used in the study of DeepNovo. In summary, the deep learning and learning-to-rank techniques implemented in pNovo 3 significantly improve the precision of de novo sequencing, and such machine learning framework is worth extending to other related research fields to distinguish the similar sequences. AVAILABILITY AND IMPLEMENTATION pNovo 3 can be freely downloaded from http://pfind.ict.ac.cn/software/pNovo/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hao Yang
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing. Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Hao Chi
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing. Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Wen-Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing. Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Wen-Jing Zhou
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing. Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Si-Min He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing. Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
25
|
Petruschke H, Anders J, Stadler PF, Jehmlich N, von Bergen M. Enrichment and identification of small proteins in a simplified human gut microbiome. J Proteomics 2020; 213:103604. [DOI: 10.1016/j.jprot.2019.103604] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 11/22/2019] [Accepted: 12/07/2019] [Indexed: 02/06/2023]
|
26
|
Cardon T, Hervé F, Delcourt V, Roucou X, Salzet M, Franck J, Fournier I. Optimized Sample Preparation Workflow for Improved Identification of Ghost Proteins. Anal Chem 2019; 92:1122-1129. [PMID: 31829555 DOI: 10.1021/acs.analchem.9b04188] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Large scale proteomic strategies rely on database interrogation. Thus, only referenced proteins can be identified. Recently, Alternative Proteins (AltProts) translated from nonannotated Alternative Open reading frame (AltORFs) were discovered using customized databases. Because of their small size which confers them peptide-like physicochemical properties, they are more difficult to detect using standard proteomics strategies. In this study, we tested different preparation workflows for improving the identification of AltProts in NCH82 human glioma cell line. The highest number of identified AltProts was achieved with RIPA buffer or boiling water extraction followed by acetic acid precipitation.
Collapse
Affiliation(s)
- Tristan Cardon
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM) , Université de Lille , F-59000 Lille , France
| | - Flore Hervé
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM) , Université de Lille , F-59000 Lille , France
| | - Vivian Delcourt
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM) , Université de Lille , F-59000 Lille , France.,Department of Biochemistry , Université de Sherbrooke , Quebec , Sherbrooke , Canada
| | - Xavier Roucou
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM) , Université de Lille , F-59000 Lille , France.,Department of Biochemistry , Université de Sherbrooke , Quebec , Sherbrooke , Canada
| | - Michel Salzet
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM) , Université de Lille , F-59000 Lille , France.,Institut Universitaire de France (IUF) , Paris , France
| | - Julien Franck
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM) , Université de Lille , F-59000 Lille , France
| | - Isabelle Fournier
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM) , Université de Lille , F-59000 Lille , France.,Institut Universitaire de France (IUF) , Paris , France
| |
Collapse
|
27
|
Kubatova N, Jonker HRA, Saxena K, Richter C, Vogel V, Schreiber S, Marchfelder A, Schwalbe H. Solution Structure and Dynamics of the Small Protein HVO_2922 from
Haloferax volcanii. Chembiochem 2019; 21:149-156. [DOI: 10.1002/cbic.201900085] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 06/04/2019] [Indexed: 11/06/2022]
Affiliation(s)
- Nina Kubatova
- Organic Chemistry and Chemical BiologyGoethe University Frankfurt Max von Laue Strasse 7 60438 Frankfurt am Main Germany
| | - Hendrik R. A. Jonker
- Organic Chemistry and Chemical BiologyGoethe University Frankfurt Max von Laue Strasse 7 60438 Frankfurt am Main Germany
| | - Krishna Saxena
- Organic Chemistry and Chemical BiologyGoethe University Frankfurt Max von Laue Strasse 7 60438 Frankfurt am Main Germany
| | - Christian Richter
- Organic Chemistry and Chemical BiologyGoethe University Frankfurt Max von Laue Strasse 7 60438 Frankfurt am Main Germany
| | | | | | | | - Harald Schwalbe
- Organic Chemistry and Chemical BiologyGoethe University Frankfurt Max von Laue Strasse 7 60438 Frankfurt am Main Germany
| |
Collapse
|
28
|
Cassidy L, Kaulich PT, Tholey A. Depletion of High-Molecular-Mass Proteins for the Identification of Small Proteins and Short Open Reading Frame Encoded Peptides in Cellular Proteomes. J Proteome Res 2019; 18:1725-1734. [DOI: 10.1021/acs.jproteome.8b00948] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany
| | - Philipp T. Kaulich
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany
| |
Collapse
|
29
|
Abstract
Global (metabolic) regulatory networks allow microorganisms to survive periods of nitrogen starvation or general nutrient stress. Uptake and utilization of various nitrogen sources are thus commonly tightly regulated in Prokarya (Bacteria and Archaea) in response to available nitrogen sources. Those well-studied regulations occur mainly at the transcriptional and posttranslational level. Surprisingly, and in contrast to their involvement in most other stress responses, small RNAs (sRNAs) involved in the response to environmental nitrogen fluctuations are only rarely reported. In addition to sRNAs indirectly affecting nitrogen metabolism, only recently it was demonstrated that three sRNAs were directly involved in regulation of nitrogen metabolism in response to changes in available nitrogen sources. All three trans-acting sRNAs are under direct transcriptional control of global nitrogen regulators and affect expression of components of nitrogen metabolism (glutamine synthetase, nitrogenase, and PII-like proteins) by either masking the ribosome binding site and thus inhibiting translation initiation or stabilizing the respective target mRNAs. Most likely, there are many more sRNAs and other types of noncoding RNAs, e.g., riboswitches, involved in the regulation of nitrogen metabolism in Prokarya that remain to be uncovered. The present review summarizes the current knowledge on sRNAs involved in nitrogen metabolism and their biological functions and targets.
Collapse
|
30
|
Miller SE, Rizzo AI, Waldbauer JR. Postnovo: Postprocessing Enables Accurate and FDR-Controlled de Novo Peptide Sequencing. J Proteome Res 2018; 17:3671-3680. [PMID: 30277077 DOI: 10.1021/acs.jproteome.8b00278] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
De novo sequencing offers an alternative to database search methods for peptide identification from mass spectra. Since it does not rely on a predetermined database of expected or potential sequences in the sample, de novo sequencing is particularly appropriate for samples lacking a well-defined or comprehensive reference database. However, the low accuracy of many de novo sequence predictions has prevented the widespread use of the variety of sequencing tools currently available. Here, we present a new open-source tool, Postnovo, that postprocesses de novo sequence predictions to find high-accuracy results. Postnovo uses a predictive model to rescore and rerank candidate sequences in a manner akin to database search postprocessing tools such as Percolator. Postnovo leverages the output from multiple de novo sequencing tools in its own analyses, producing many times the length of amino acid sequence information (including both full- and partial-length peptide sequences) at an equivalent false discovery rate (FDR) compared to any individual tool. We present a methodology to reliably screen the sequence predictions to a desired FDR given the Postnovo sequence score. We validate Postnovo with multiple data sets and demonstrate its ability to identify proteins that are missed by database search even in samples with paired reference databases.
Collapse
Affiliation(s)
- Samuel E Miller
- Department of the Geophysical Sciences , University of Chicago , 5734 South Ellis Avenue , Chicago , Illinois 60637 , United States
| | - Adriana I Rizzo
- Department of the Geophysical Sciences , University of Chicago , 5734 South Ellis Avenue , Chicago , Illinois 60637 , United States
| | - Jacob R Waldbauer
- Department of the Geophysical Sciences , University of Chicago , 5734 South Ellis Avenue , Chicago , Illinois 60637 , United States
| |
Collapse
|
31
|
Nickel L, Ulbricht A, Alkhnbashi OS, Förstner KU, Cassidy L, Weidenbach K, Backofen R, Schmitz RA. Cross-cleavage activity of Cas6b in crRNA processing of two different CRISPR-Cas systems in Methanosarcina mazei Gö1. RNA Biol 2018; 16:492-503. [PMID: 30153081 DOI: 10.1080/15476286.2018.1514234] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The clustered regularly interspaced short palindromic repeat (CRISPR) system is a prokaryotic adaptive defense system against foreign nucleic acids. In the methanoarchaeon Methanosarcina mazei Gö1, two types of CRISPR-Cas systems are present (type I-B and type III-C). Both loci encode a Cas6 endonuclease, Cas6b-IB and Cas6b-IIIC, typically responsible for maturation of functional short CRISPR RNAs (crRNAs). To evaluate potential cross cleavage activity, we biochemically characterized both Cas6b proteins regarding their crRNA binding behavior and their ability to process pre-crRNA from the respective CRISPR array in vivo. Maturation of crRNA was studied in the respective single deletion mutants by northern blot and RNA-Seq analysis demonstrating that in vivo primarily Cas6b-IB is responsible for crRNA processing of both CRISPR arrays. Tentative protein level evidence for the translation of both Cas6b proteins under standard growth conditions was detected, arguing for different activities or a potential non-redundant role of Cas6b-IIIC within the cell. Conservation of both Cas6 endonucleases was observed in several other M. mazei isolates, though a wide variety was displayed. In general, repeat and leader sequence conservation revealed a close correlation in the M. mazei strains. The repeat sequences from both CRISPR arrays from M. mazei Gö1 contain the same sequence motif with differences only in two nucleotides. These data stand in contrast to all other analyzed M. mazei isolates, which have at least one additional CRISPR array with repeats belonging to another sequence motif. This conforms to the finding that Cas6b-IB is the crucial and functional endonuclease in M. mazei Gö1. Abbreviations: sRNA: small RNA; crRNA: CRISPR RNA; pre-crRNAs: Precursor CRISPR RNA; CRISPR: clustered regularly interspaced short palindromic repeats; Cas: CRISPR associated; nt: nucleotide; RNP: ribonucleoprotein; RBS: ribosome binding site.
Collapse
Affiliation(s)
- Lisa Nickel
- a Institute of General Microbiology , Christian-Albrechts-University of Kiel , Kiel , Germany
| | - Andrea Ulbricht
- a Institute of General Microbiology , Christian-Albrechts-University of Kiel , Kiel , Germany
| | - Omer S Alkhnbashi
- b Bioinformatics Group, Department of Computer Science , University of Freiburg , Freiburg , Germany
| | - Konrad U Förstner
- c Core Unit Systems Medicine , Institute of Molecular Infection Biology, University of Würzburg , Würzburg , Germany
| | - Liam Cassidy
- d Institute for Experimental Medicine , Christian-Albrechts-University of Kiel , Kiel , Germany
| | - Katrin Weidenbach
- a Institute of General Microbiology , Christian-Albrechts-University of Kiel , Kiel , Germany
| | - Rolf Backofen
- b Bioinformatics Group, Department of Computer Science , University of Freiburg , Freiburg , Germany.,e BIOSS Centre for Biological Signaling Studies , University of Freiburg , Freiburg , Germany
| | - Ruth A Schmitz
- a Institute of General Microbiology , Christian-Albrechts-University of Kiel , Kiel , Germany
| |
Collapse
|
32
|
Budamgunta H, Olexiouk V, Luyten W, Schildermans K, Maes E, Boonen K, Menschaert G, Baggerman G. Comprehensive Peptide Analysis of Mouse Brain Striatum Identifies Novel sORF-Encoded Polypeptides. Proteomics 2018; 18:e1700218. [DOI: 10.1002/pmic.201700218] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 03/30/2018] [Indexed: 11/10/2022]
Affiliation(s)
| | - Volodimir Olexiouk
- BioBix; Lab for Bioinformatics and Computational Genomics; Department of Mathematical Modelling; Statistics and Bio-informatics; Ghent University; Ghent Belgium
| | - Walter Luyten
- Animal Physiology and Neurobiology; KULeuven; Leuven Belgium
| | | | - Evelyne Maes
- Centre for Proteomics; UAntwerp; Antwerp Belgium
- Proteins and Biomaterials; AgResearch; Christchurch New Zealand
| | - Kurt Boonen
- Centre for Proteomics; UAntwerp; Antwerp Belgium
- Unit Environmental Risk and Health; VITO; Mol Belgium
| | - Gerben Menschaert
- BioBix; Lab for Bioinformatics and Computational Genomics; Department of Mathematical Modelling; Statistics and Bio-informatics; Ghent University; Ghent Belgium
| | - Geert Baggerman
- Centre for Proteomics; UAntwerp; Antwerp Belgium
- Unit Environmental Risk and Health; VITO; Mol Belgium
| |
Collapse
|
33
|
Hücker SM, Ardern Z, Goldberg T, Schafferhans A, Bernhofer M, Vestergaard G, Nelson CW, Schloter M, Rost B, Scherer S, Neuhaus K. Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome. PLoS One 2017; 12:e0184119. [PMID: 28902868 PMCID: PMC5597208 DOI: 10.1371/journal.pone.0184119] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 08/20/2017] [Indexed: 12/29/2022] Open
Abstract
In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escherichia coli O157:H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure. All intergenic open reading frames potentially encoding a protein of ≥ 30 amino acids were investigated with regard to coverage by transcription and translation signals and their translatability expressed by the ribosomal coverage value. This led to discovery of 465 unique, putative novel genes not yet annotated in this E. coli strain, which are evenly distributed over both DNA strands of the genome. For 255 of the novel genes, annotated homologs in other bacteria were found, and a machine-learning algorithm, trained on small protein-coding E. coli genes, predicted that 89% of these translated open reading frames represent bona fide genes. The remaining 210 putative novel genes without annotated homologs were compared to the 255 novel genes with homologs and to 250 short annotated genes of this E. coli strain. All three groups turned out to be similar with respect to their translatability distribution, fractions of differentially regulated genes, secondary structure composition, and the distribution of evolutionary constraint, suggesting that both novel groups represent legitimate genes. However, the machine-learning algorithm only recognized a small fraction of the 210 genes without annotated homologs. It is possible that these genes represent a novel group of genes, which have unusual features dissimilar to the genes of the machine-learning algorithm training set.
Collapse
Affiliation(s)
- Sarah M. Hücker
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Zachary Ardern
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Tatyana Goldberg
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Andrea Schafferhans
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Michael Bernhofer
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Gisle Vestergaard
- Research Unit Environmental Genomics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Chase W. Nelson
- Sackler Institute for Comparative Genomics, American Museum of Natural History New York, New York, United States of America
| | - Michael Schloter
- Research Unit Environmental Genomics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Burkhard Rost
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Klaus Neuhaus
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- Core Facility Microbiome/NGS, ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
- * E-mail:
| |
Collapse
|
34
|
Abstract
A large body of evidence indicates that genome annotation pipelines have biased our view of coding sequences because they generally undersample small proteins and peptides. The recent development of genome-wide translation profiling reveals the prevalence of small/short open reading frames (smORFs or sORFs), which are scattered over all classes of transcripts, including both mRNAs and presumptive long noncoding RNAs. Proteomic approaches further confirm an unexpected variety of smORF-encoded peptides (SEPs), representing an overlooked reservoir of bioactive molecules. Indeed, functional studies in a broad range of species from yeast to humans demonstrate that SEPs can harbor key activities for the control of development, differentiation, and physiology. Here we summarize recent advances in the discovery and functional characterization of smORF/SEPs and discuss why these small players can no longer be ignored with regard to genome function.
Collapse
Affiliation(s)
- Serge Plaza
- Laboratoire de Recherches en Sciences Végétales, Université de Toulouse, Université Paul Sabatier, 31326 Castanet Tolosan, France; .,CNRS, UMR5546, Laboratoire de Recherches en Sciences Végétales, 31326 Castanet Tolosan, France
| | - Gerben Menschaert
- Department of Mathematical Modeling, Statistics and Bioinformatics, University of Ghent, 9000 Gent, Belgium
| | - François Payre
- Centre de Biologie du Développement, Centre de Biologie Intégrative, Université de Toulouse, CNRS, Université Paul Sabatier, 31062 Toulouse, France;
| |
Collapse
|
35
|
Abstract
De novo peptide sequencing from tandem MS data is the key technology in proteomics for the characterization of proteins, especially for new sequences, such as mAbs. In this study, we propose a deep neural network model, DeepNovo, for de novo peptide sequencing. DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides. The networks are further integrated with local dynamic programming to solve the complex optimization task of de novo sequencing. We evaluated the method on a wide variety of species and found that DeepNovo considerably outperformed state of the art methods, achieving 7.7-22.9% higher accuracy at the amino acid level and 38.1-64.0% higher accuracy at the peptide level. We further used DeepNovo to automatically reconstruct the complete sequences of antibody light and heavy chains of mouse, achieving 97.5-100% coverage and 97.2-99.5% accuracy, without assisting databases. Moreover, DeepNovo is retrainable to adapt to any sources of data and provides a complete end-to-end training and prediction solution to the de novo sequencing problem. Not only does our study extend the deep learning revolution to a new field, but it also shows an innovative approach in solving optimization problems by using deep learning and dynamic programming.
Collapse
|