1
|
Venturini E, Maaß S, Bischler T, Becher D, Vogel J, Westermann AJ. Functional characterization of the DUF1127-containing small protein YjiS of Salmonella Typhimurium. MICROLIFE 2025; 6:uqae026. [PMID: 39790481 PMCID: PMC11707872 DOI: 10.1093/femsml/uqae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Revised: 11/19/2024] [Accepted: 12/30/2024] [Indexed: 01/12/2025]
Abstract
Bacterial small proteins impact diverse physiological processes, however, technical challenges posed by small size hampered their systematic identification and biochemical characterization. In our quest to uncover small proteins relevant for Salmonella pathogenicity, we previously identified YjiS, a 54 amino acid protein, which is strongly induced during this pathogen's intracellular infection stage. Here, we set out to further characterize the role of YjiS. Cell culture infection assays with Salmonella mutants lacking or overexpressing YjiS suggested this small protein to delay bacterial escape from macrophages. Mutant scanning of the protein's conserved, arginine-rich DUF1127 domain excluded a major effect of single amino acid substitutions on the infection phenotype. A comparative dual RNA-seq assay uncovered the molecular footprint of YjiS in the macrophage response to infection, with host effects related to oxidative stress and the cell cortex. Bacterial cell fractionation experiments demonstrated YjiS to associate with the inner membrane and proteins interacting with YjiS in pull-down experiments were enriched for inner membrane processes. Among the YjiS interactors was the two-component system SsrA/B, the master transcriptional activator of intracellular virulence genes and a suppressor of flagellar genes. Indeed, in the absence of YjiS, we observed elevated expression of motility genes and an increased number of flagella per bacterium. Together, our study points to a role for Salmonella YjiS as a membrane-associated timer of pathogen dissemination.
Collapse
Affiliation(s)
- Elisa Venturini
- Institute of Molecular Infection Biology (IMIB), University of Würzburg, D-97080 Würzburg, Germany
| | - Sandra Maaß
- Institute of Microbiology, Department of Microbial Proteomics, University of Greifswald, D-17489 Greifswald, Germany
| | - Thorsten Bischler
- Core Unit Systems Medicine, University of Würzburg, D-97080 Würzburg, Germany
| | - Dörte Becher
- Institute of Microbiology, Department of Microbial Proteomics, University of Greifswald, D-17489 Greifswald, Germany
| | - Jörg Vogel
- Institute of Molecular Infection Biology (IMIB), University of Würzburg, D-97080 Würzburg, Germany
- Helmholtz Institute for RNA-based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), D-97080 Würzburg, Germany
| | - Alexander J Westermann
- Helmholtz Institute for RNA-based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), D-97080 Würzburg, Germany
- Department of Microbiology, Biocenter, University of Würzburg, D-97074 Würzburg, Germany
| |
Collapse
|
2
|
Zhu L, Chen H, Yang S. LncSL: A Novel Stacked Ensemble Computing Tool for Subcellular Localization of lncRNA by Amino Acid-Enhanced Features and Two-Stage Automated Selection Strategy. Int J Mol Sci 2024; 25:13734. [PMID: 39769496 PMCID: PMC11678684 DOI: 10.3390/ijms252413734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 12/17/2024] [Accepted: 12/19/2024] [Indexed: 01/11/2025] Open
Abstract
Long non-coding RNA (lncRNA) is a non-coding RNA longer than 200 nucleotides, crucial for functions like cell cycle regulation and gene transcription. Accurate localization prediction from sequence information is vital for understanding lncRNA's biological roles. Computational methods offer an effective alternative to traditional experimental methods for annotating lncRNA subcellular positions. Existing machine learning-based methods are limited and often overlook regions with coding potential that affect the function of lncRNA. Therefore, we propose a new model called LncSL. For feature encoding, both lncRNA sequences and amino acid sequences from open reading frames (ORFs) are employed. And we selected the most suitable features by CatBoost and integrated them into a new feature set. Additionally, a voting process with seven feature selection algorithms identified the higher contributive features for training our final stacked model. Additionally, an automatic model selection strategy is constructed to find a better performance meta-model for assembling LncSL. This study specifically focuses on predicting the subcellular localization of lncRNA in the nucleus and cytoplasm. On two benchmark datasets called S1 and S2 datasets, LncSL outperformed existing methods by 6.3% to 12.3% in the Matthew's correlation coefficient on a balanced test dataset. On an unbalanced independent test dataset sourced from S1, LncSL improved by 4.7% to 18.6% in the Matthew's correlation coefficient, which further demonstrates that LncSL is superior to other compared methods. In all, this study presents an effective method for predicting lncRNA subcellular localization through enhancing sequence information, which is always overlooked by traditional methods, and addressing contributive meta-model selection problems, which can offer new insights for other bioinformatics problems.
Collapse
Affiliation(s)
| | | | - Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China; (L.Z.); (H.C.)
| |
Collapse
|
3
|
Tufail MA, Jordan B, Hadjeras L, Gelhausen R, Cassidy L, Habenicht T, Gutt M, Hellwig L, Backofen R, Tholey A, Sharma CM, Schmitz RA. Uncovering the small proteome of Methanosarcina mazei using Ribo-seq and peptidomics under different nitrogen conditions. Nat Commun 2024; 15:8659. [PMID: 39370430 PMCID: PMC11456600 DOI: 10.1038/s41467-024-53008-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Accepted: 09/25/2024] [Indexed: 10/08/2024] Open
Abstract
The mesophilic methanogenic archaeal model organism Methanosarcina mazei strain Gö1 is crucial for climate and environmental research due to its ability to produce methane. Here, we establish a Ribo-seq protocol for M. mazei strain Gö1 under two growth conditions (nitrogen sufficiency and limitation). The translation of 93 previously annotated and 314 unannotated small ORFs, coding for proteins ≤ 70 amino acids, is predicted with high confidence based on Ribo-seq data. LC-MS analysis validates the translation for 62 annotated small ORFs and 26 unannotated small ORFs. Epitope tagging followed by immunoblotting analysis confirms the translation of 13 out of 16 selected unannotated small ORFs. A comprehensive differential transcription and translation analysis reveals that 29 of 314 unannotated small ORFs are differentially regulated in response to nitrogen availability at the transcriptional and 49 at the translational level. A high number of reported small RNAs are emerging as dual-function RNAs, including sRNA154, the central regulatory small RNA of nitrogen metabolism. Several unannotated small ORFs are conserved in Methanosarcina species and overproducing several (small ORF encoded) small proteins suggests key physiological functions. Overall, the comprehensive analysis opens an avenue to elucidate the function(s) of multitudinous small proteins and dual-function RNAs in M. mazei.
Collapse
Affiliation(s)
| | - Britta Jordan
- Institute for General Microbiology, Kiel University, 24118, Kiel, Germany
| | - Lydia Hadjeras
- Institute of Molecular Infection Biology, University of Würzburg, 97080, Würzburg, Germany
| | - Rick Gelhausen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110, Freiburg, Germany
| | - Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Kiel University, 24105, Kiel, Germany
| | - Tim Habenicht
- Institute for General Microbiology, Kiel University, 24118, Kiel, Germany
| | - Miriam Gutt
- Institute for General Microbiology, Kiel University, 24118, Kiel, Germany
| | - Lisa Hellwig
- Institute for General Microbiology, Kiel University, 24118, Kiel, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110, Freiburg, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Kiel University, 24105, Kiel, Germany
| | - Cynthia M Sharma
- Institute of Molecular Infection Biology, University of Würzburg, 97080, Würzburg, Germany
| | - Ruth A Schmitz
- Institute for General Microbiology, Kiel University, 24118, Kiel, Germany.
| |
Collapse
|
4
|
Gray J, Torres VVL, Goodall E, McKeand SA, Scales D, Collins C, Wetherall L, Lian ZJ, Bryant JA, Milner MT, Dunne KA, Icke C, Rooke JL, Schneiders T, Lund PA, Cunningham AF, Cole JA, Henderson IR. Transposon mutagenesis screen in Klebsiella pneumoniae identifies genetic determinants required for growth in human urine and serum. eLife 2024; 12:RP88971. [PMID: 39189918 PMCID: PMC11349299 DOI: 10.7554/elife.88971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/28/2024] Open
Abstract
Klebsiella pneumoniae is a global public health concern due to the rising myriad of hypervirulent and multidrug-resistant clones both alarmingly associated with high mortality. The molecular mechanisms underpinning these recalcitrant K. pneumoniae infection, and how virulence is coupled with the emergence of lineages resistant to nearly all present-day clinically important antimicrobials, are unclear. In this study, we performed a genome-wide screen in K. pneumoniae ECL8, a member of the endemic K2-ST375 pathotype most often reported in Asia, to define genes essential for growth in a nutrient-rich laboratory medium (Luria-Bertani [LB] medium), human urine, and serum. Through transposon directed insertion-site sequencing (TraDIS), a total of 427 genes were identified as essential for growth on LB agar, whereas transposon insertions in 11 and 144 genes decreased fitness for growth in either urine or serum, respectively. These studies not only provide further knowledge on the genetics of this pathogen but also provide a strong impetus for discovering new antimicrobial targets to improve current therapeutic options for K. pneumoniae infections.
Collapse
Affiliation(s)
- Jessica Gray
- Institute of Microbiology and Infection, University of BirminghamBirminghamUnited Kingdom
- Institute for Molecular Bioscience, University of QueenslandBrisbaneAustralia
| | - Von Vergel L Torres
- Institute for Molecular Bioscience, University of QueenslandBrisbaneAustralia
| | - Emily Goodall
- Institute for Molecular Bioscience, University of QueenslandBrisbaneAustralia
| | - Samantha A McKeand
- Institute of Microbiology and Infection, University of BirminghamBirminghamUnited Kingdom
| | - Danielle Scales
- Institute of Microbiology and Infection, University of BirminghamBirminghamUnited Kingdom
| | - Christy Collins
- Institute of Microbiology and Infection, University of BirminghamBirminghamUnited Kingdom
| | - Laura Wetherall
- Institute of Microbiology and Infection, University of BirminghamBirminghamUnited Kingdom
| | - Zheng Jie Lian
- Institute for Molecular Bioscience, University of QueenslandBrisbaneAustralia
| | - Jack A Bryant
- Institute of Microbiology and Infection, University of BirminghamBirminghamUnited Kingdom
| | - Matthew T Milner
- Institute of Microbiology and Infection, University of BirminghamBirminghamUnited Kingdom
| | - Karl A Dunne
- Institute of Microbiology and Infection, University of BirminghamBirminghamUnited Kingdom
| | - Christopher Icke
- Institute for Molecular Bioscience, University of QueenslandBrisbaneAustralia
| | - Jessica L Rooke
- Institute for Molecular Bioscience, University of QueenslandBrisbaneAustralia
| | - Thamarai Schneiders
- Division of Infection Medicine, University of EdinburghEdinburghUnited Kingdom
| | - Peter A Lund
- Institute of Microbiology and Infection, University of BirminghamBirminghamUnited Kingdom
| | - Adam F Cunningham
- Institute of Immunology and Immunotherapy, University of BirminghamBirminghamUnited Kingdom
| | - Jeff A Cole
- Institute of Microbiology and Infection, University of BirminghamBirminghamUnited Kingdom
| | - Ian R Henderson
- Institute for Molecular Bioscience, University of QueenslandBrisbaneAustralia
| |
Collapse
|
5
|
Weston M, Hu H, Li X. PSPI: A deep learning approach for prokaryotic small protein identification. Front Genet 2024; 15:1439423. [PMID: 39050248 PMCID: PMC11266045 DOI: 10.3389/fgene.2024.1439423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Accepted: 06/18/2024] [Indexed: 07/27/2024] Open
Abstract
Small Proteins (SPs) are pivotal in various cellular functions such as immunity, defense, and communication. Despite their significance, identifying them is still in its infancy. Existing computational tools are tailored to specific eukaryotic species, leaving only a few options for SP identification in prokaryotes. In addition, these existing tools still have suboptimal performance in SP identification. To fill this gap, we introduce PSPI, a deep learning-based approach designed specifically for predicting prokaryotic SPs. We showed that PSPI had a high accuracy in predicting generalized sets of prokaryotic SPs and sets specific to the human metagenome. Compared with three existing tools, PSPI was faster and showed greater precision, sensitivity, and specificity not only for prokaryotic SPs but also for eukaryotic ones. We also observed that the incorporation of (n, k)-mers greatly enhances the performance of PSPI, suggesting that many SPs may contain short linear motifs. The PSPI tool, which is freely available at https://www.cs.ucf.edu/∼xiaoman/tools/PSPI/, will be useful for studying SPs as a tool for identifying prokaryotic SPs and it can be trained to identify other types of SPs as well.
Collapse
Affiliation(s)
- Matthew Weston
- Department of Computer Science, University of Central Florida, Orlando, FL, United States
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL, United States
| | - Xiaoman Li
- Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL, United States
| |
Collapse
|
6
|
Coelho LP, Santos-Júnior CD, de la Fuente-Nunez C. Challenges in computational discovery of bioactive peptides in 'omics data. Proteomics 2024; 24:e2300105. [PMID: 38458994 PMCID: PMC11537280 DOI: 10.1002/pmic.202300105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 02/06/2024] [Accepted: 02/06/2024] [Indexed: 03/10/2024]
Abstract
Peptides have a plethora of activities in biological systems that can potentially be exploited biotechnologically. Several peptides are used clinically, as well as in industry and agriculture. The increase in available 'omics data has recently provided a large opportunity for mining novel enzymes, biosynthetic gene clusters, and molecules. While these data primarily consist of DNA sequences, other types of data provide important complementary information. Due to their size, the approaches proven successful at discovering novel proteins of canonical size cannot be naïvely applied to the discovery of peptides. Peptides can be encoded directly in the genome as short open reading frames (smORFs), or they can be derived from larger proteins by proteolysis. Both of these peptide classes pose challenges as simple methods for their prediction result in large numbers of false positives. Similarly, functional annotation of larger proteins, traditionally based on sequence similarity to infer orthology and then transferring functions between characterized proteins and uncharacterized ones, cannot be applied for short sequences. The use of these techniques is much more limited and alternative approaches based on machine learning are used instead. Here, we review the limitations of traditional methods as well as the alternative methods that have recently been developed for discovering novel bioactive peptides with a focus on prokaryotic genomes and metagenomes.
Collapse
Affiliation(s)
- Luis Pedro Coelho
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Woolloongabba, Queensland, Australia
- Institute of Science and Technology for Brain-Inspired Intelligence – ISTBI, Fudan University, Shanghai, China
| | - Célio Dias Santos-Júnior
- Institute of Science and Technology for Brain-Inspired Intelligence – ISTBI, Fudan University, Shanghai, China
- Laboratory of Microbial Processes & Biodiversity – LMPB, Hydrobiology Department, Federal University of São Carlos – UFSCar, São Paulo, Brazil
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
7
|
Beals J, Hu H, Li X. A survey of experimental and computational identification of small proteins. Brief Bioinform 2024; 25:bbae345. [PMID: 39007598 PMCID: PMC11247407 DOI: 10.1093/bib/bbae345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 05/27/2024] [Accepted: 07/02/2024] [Indexed: 07/16/2024] Open
Abstract
Small proteins (SPs) are typically characterized as eukaryotic proteins shorter than 100 amino acids and prokaryotic proteins shorter than 50 amino acids. Historically, they were disregarded because of the arbitrary size thresholds to define proteins. However, recent research has revealed the existence of many SPs and their crucial roles. Despite this, the identification of SPs and the elucidation of their functions are still in their infancy. To pave the way for future SP studies, we briefly introduce the limitations and advancements in experimental techniques for SP identification. We then provide an overview of available computational tools for SP identification, their constraints, and their evaluation. Additionally, we highlight existing resources for SP research. This survey aims to initiate further exploration into SPs and encourage the development of more sophisticated computational tools for SP identification in prokaryotes and microbiomes.
Collapse
Affiliation(s)
- Joshua Beals
- Burnett School of Biomedical Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL 32816, United States
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL 32816, United States
| | - Xiaoman Li
- Burnett School of Biomedical Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL 32816, United States
| |
Collapse
|
8
|
Sinha PR, Balasubramanian R, Hegde SR. Integrated sequence and -omic features reveal novel small proteome of Mycobacterium tuberculosis. Front Microbiol 2024; 15:1335310. [PMID: 38812687 PMCID: PMC11133741 DOI: 10.3389/fmicb.2024.1335310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 04/15/2024] [Indexed: 05/31/2024] Open
Abstract
Bioinformatic studies on small proteins are under-represented due to difficulties in annotation posed by their small size. However, recent discoveries emphasize the functional significance of small proteins in cellular processes including cell signaling, metabolism, and adaptation to stress. In this study, we utilized a Random Forest classifier trained on sequence features, RNA-Seq, and Ribo-Seq data to uncover small proteins (smORFs) in M. tuberculosis. Independent predictions for the exponential and starvation conditions resulted in 695 potential smORFs. We examined the functional implications of these smORFs using homology searches, LC-MS/MS, and ChIP-seq data, testing their expression in diverse growth conditions, and identifying protein domains. We provide evidence that some of these smORFs could be part of operons, or exist as upstream ORFs. This expanded data resource for the proteins of M. tuberculosis would aid in fine-tuning the existing protein and gene regulatory networks, thereby improving system-wide studies. The primary goal of this study was to uncover and characterize smORFs in M. tuberculosis through bioinformatic analysis, shedding light on their functional roles and genomic organization. Further investigation of these potential smORFs would provide valuable insights into the genome organization and functional diversity of the M. tuberculosis proteome.
Collapse
Affiliation(s)
| | | | - Shubhada R. Hegde
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, India
| |
Collapse
|
9
|
Bolay P, Dodge N, Janssen K, Jensen PE, Lindberg P. Tailoring regulatory components for metabolic engineering in cyanobacteria. PHYSIOLOGIA PLANTARUM 2024; 176:e14316. [PMID: 38686633 DOI: 10.1111/ppl.14316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/26/2024] [Accepted: 04/03/2024] [Indexed: 05/02/2024]
Abstract
The looming climate crisis has prompted an ever-growing interest in cyanobacteria due to their potential as sustainable production platforms for the synthesis of energy carriers and value-added chemicals from CO2 and sunlight. Nonetheless, cyanobacteria are yet to compete with heterotrophic systems in terms of space-time yields and consequently production costs. One major drawback leading to the low production performance observed in cyanobacteria is the limited ability to utilize the full capacity of the photosynthetic apparatus and its associated systems, i.e. CO2 fixation and the directly connected metabolism. In this review, novel insights into various levels of metabolic regulation of cyanobacteria are discussed, including the potential of targeting these regulatory mechanisms to create a chassis with a phenotype favorable for photoautotrophic production. Compared to conventional metabolic engineering approaches, minor perturbations of regulatory mechanisms can have wide-ranging effects.
Collapse
Affiliation(s)
- Paul Bolay
- Microbial Chemistry, Department of Chemistry - Ångström, Uppsala University, Uppsala, SE, Sweden
| | - Nadia Dodge
- Plant Based Foods and Biochemistry, Food Analytics and Biotechnology, Department of Food Science, University of Copenhagen, Denmark
| | - Kim Janssen
- Microbial Chemistry, Department of Chemistry - Ångström, Uppsala University, Uppsala, SE, Sweden
| | - Poul Erik Jensen
- Plant Based Foods and Biochemistry, Food Analytics and Biotechnology, Department of Food Science, University of Copenhagen, Denmark
| | - Pia Lindberg
- Microbial Chemistry, Department of Chemistry - Ångström, Uppsala University, Uppsala, SE, Sweden
| |
Collapse
|
10
|
Miravet-Verde S, Mazzolini R, Segura-Morales C, Broto A, Lluch-Senar M, Serrano L. ProTInSeq: transposon insertion tracking by ultra-deep DNA sequencing to identify translated large and small ORFs. Nat Commun 2024; 15:2091. [PMID: 38453908 PMCID: PMC10920889 DOI: 10.1038/s41467-024-46112-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 02/14/2024] [Indexed: 03/09/2024] Open
Abstract
Identifying open reading frames (ORFs) being translated is not a trivial task. ProTInSeq is a technique designed to characterize proteomes by sequencing transposon insertions engineered to express a selection marker when they occur in-frame within a protein-coding gene. In the bacterium Mycoplasma pneumoniae, ProTInSeq identifies 83% of its annotated proteins, along with 5 proteins and 153 small ORF-encoded proteins (SEPs; ≤100 aa) that were not previously annotated. Moreover, ProTInSeq can be utilized for detecting translational noise, as well as for relative quantification and transmembrane topology estimation of fitness and non-essential proteins. By integrating various identification approaches, the number of initially annotated SEPs in this bacterium increases from 27 to 329, with a quarter of them predicted to possess antimicrobial potential. Herein, we describe a methodology complementary to Ribo-Seq and mass spectroscopy that can identify SEPs while providing other insights in a proteome with a flexible and cost-effective DNA ultra-deep sequencing approach.
Collapse
Affiliation(s)
- Samuel Miravet-Verde
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003, Barcelona, Spain.
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zurich, Zurich, Switzerland.
| | | | - Carolina Segura-Morales
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003, Barcelona, Spain
| | - Alicia Broto
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003, Barcelona, Spain
| | - Maria Lluch-Senar
- Pulmobiotics, Dr Aiguader 88, 08003, Barcelona, Spain.
- Institute of Biotechnology and Biomedicine "Vicent Villar Palasi" (IBB), Universitat Autònoma de Barcelona, Barcelona, Spain.
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003, Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- ICREA, Pg. Lluis Companys 23, 08010, Barcelona, Spain.
| |
Collapse
|
11
|
Genth J, Schäfer K, Cassidy L, Graspeuntner S, Rupp J, Tholey A. Identification of proteoforms of short open reading frame-encoded peptides in Blautia producta under different cultivation conditions. Microbiol Spectr 2023; 11:e0252823. [PMID: 37782090 PMCID: PMC10715070 DOI: 10.1128/spectrum.02528-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 08/14/2023] [Indexed: 10/03/2023] Open
Abstract
IMPORTANCE The identification of short open reading frame-encoded peptides (SEP) and different proteoforms in single cultures of gut microbes offers new insights into a largely neglected part of the microbial proteome landscape. This is of particular importance as SEP provide various predicted functions, such as acting as antimicrobial peptides, maintaining cell homeostasis under stress conditions, or even contributing to the virulence pattern. They are, thus, taking a poorly understood role in structure and function of microbial networks in the human body. A better understanding of SEP in the context of human health requires a precise understanding of the abundance of SEP both in commensal microbes as well as pathogens. For the gut beneficial B. producta, we demonstrate the importance of specific environmental conditions for biosynthesis of SEP expanding previous findings about their role in microbial interactions.
Collapse
Affiliation(s)
- Jerome Genth
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Kathrin Schäfer
- Department of Infectious Diseases and Microbiology, University of Lübeck, Lübeck, Germany
| | - Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Simon Graspeuntner
- Department of Infectious Diseases and Microbiology, University of Lübeck, Lübeck, Germany
- German Center for Infection Research (DZIF), Partner Site Hamburg-Lübeck-Borstel-Riems, Lübeck, Germany
| | - Jan Rupp
- Department of Infectious Diseases and Microbiology, University of Lübeck, Lübeck, Germany
- German Center for Infection Research (DZIF), Partner Site Hamburg-Lübeck-Borstel-Riems, Lübeck, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| |
Collapse
|
12
|
Fuchs S, Engelmann S. Small proteins in bacteria - Big challenges in prediction and identification. Proteomics 2023; 23:e2200421. [PMID: 37609810 DOI: 10.1002/pmic.202200421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/24/2023]
Abstract
Proteins with up to 100 amino acids have been largely overlooked due to the challenges associated with predicting and identifying them using traditional methods. Recent advances in bioinformatics and machine learning, DNA sequencing, RNA and Ribo-seq technologies, and mass spectrometry (MS) have greatly facilitated the detection and characterisation of these elusive proteins in recent years. This has revealed their crucial role in various cellular processes including regulation, signalling and transport, as toxins and as folding helpers for protein complexes. Consequently, the systematic identification and characterisation of these proteins in bacteria have emerged as a prominent field of interest within the microbial research community. This review provides an overview of different strategies for predicting and identifying these proteins on a large scale, leveraging the power of these advanced technologies. Furthermore, the review offers insights into the future developments that may be expected in this field.
Collapse
Affiliation(s)
- Stephan Fuchs
- Genome Competence Center (MF1), Department MFI, Robert-Koch-Institut, Berlin, Germany
| | - Susanne Engelmann
- Institute for Microbiology, Technische Universität Braunschweig, Braunschweig, Germany
- Microbial Proteomics, Helmholtzzentrum für Infektionsforschung GmbH, Braunschweig, Germany
| |
Collapse
|
13
|
Frumkin I, Laub MT. Selection of a de novo gene that can promote survival of Escherichia coli by modulating protein homeostasis pathways. Nat Ecol Evol 2023; 7:2067-2079. [PMID: 37945946 PMCID: PMC10697842 DOI: 10.1038/s41559-023-02224-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Accepted: 09/12/2023] [Indexed: 11/12/2023]
Abstract
Cellular novelty can emerge when non-functional loci become functional genes in a process termed de novo gene birth. But how proteins with random amino acid sequences beneficially integrate into existing cellular pathways remains poorly understood. We screened ~108 genes, generated from random nucleotide sequences and devoid of homology to natural genes, for their ability to rescue growth arrest of Escherichia coli cells producing the ribonuclease toxin MazF. We identified ~2,000 genes that could promote growth, probably by reducing transcription from the promoter driving toxin expression. Additionally, one random protein, named Random antitoxin of MazF (RamF), modulated protein homeostasis by interacting with chaperones, leading to MazF proteolysis and a consequent loss of its toxicity. Finally, we demonstrate that random proteins can improve during evolution by identifying beneficial mutations that turned RamF into a more efficient inhibitor. Our work provides a mechanistic basis for how de novo gene birth can produce functional proteins that effectively benefit cells evolving under stress.
Collapse
Affiliation(s)
- Idan Frumkin
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Michael T Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Cambridge, MA, USA.
| |
Collapse
|
14
|
Mohsen JJ, Martel AA, Slavoff SA. Microproteins-Discovery, structure, and function. Proteomics 2023; 23:e2100211. [PMID: 37603371 PMCID: PMC10841188 DOI: 10.1002/pmic.202100211] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/22/2023]
Abstract
Advances in proteogenomic technologies have revealed hundreds to thousands of translated small open reading frames (sORFs) that encode microproteins in genomes across evolutionary space. While many microproteins have now been shown to play critical roles in biology and human disease, a majority of recently identified microproteins have little or no experimental evidence regarding their functionality. Computational tools have some limitations for analysis of short, poorly conserved microprotein sequences, so additional approaches are needed to determine the role of each member of this recently discovered polypeptide class. A currently underexplored avenue in the study of microproteins is structure prediction and determination, which delivers a depth of functional information. In this review, we provide a brief overview of microprotein discovery methods, then examine examples of microprotein structures (and, conversely, intrinsic disorder) that have been experimentally determined using crystallography, cryo-electron microscopy, and NMR, which provide insight into their molecular functions and mechanisms. Additionally, we discuss examples of predicted microprotein structures that have provided insight or context regarding their function. Analysis of microprotein structure at the angstrom level, and confirmation of predicted structures, therefore, has potential to identify translated microproteins that are of biological importance and to provide molecular mechanism for their in vivo roles.
Collapse
Affiliation(s)
- Jessica J. Mohsen
- Department of Chemistry, Yale University, New Haven, CT, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Alina A. Martel
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Sarah A. Slavoff
- Department of Chemistry, Yale University, New Haven, CT, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| |
Collapse
|
15
|
Dimonaco NJ, Clare A, Kenobi K, Aubrey W, Creevey CJ. StORF-Reporter: finding genes between genes. Nucleic Acids Res 2023; 51:11504-11517. [PMID: 37897345 PMCID: PMC10682499 DOI: 10.1093/nar/gkad814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 09/04/2023] [Accepted: 09/27/2023] [Indexed: 10/30/2023] Open
Abstract
Large regions of prokaryotic genomes are currently without any annotation, in part due to well-established limitations of annotation tools. For example, it is routine for genes using alternative start codons to be misreported or completely omitted. Therefore, we present StORF-Reporter, a tool that takes an annotated genome and returns regions that may contain missing CDS genes from unannotated regions. StORF-Reporter consists of two parts. The first begins with the extraction of unannotated regions from an annotated genome. Next, Stop-ORFs (StORFs) are identified in these unannotated regions. StORFs are open reading frames that are delimited by stop codons and thus can capture those genes most often missing in genome annotations. We show this methodology recovers genes missing from canonical genome annotations. We inspect the results of the genomes of model organisms, the pangenome of Escherichia coli, and a set of 5109 prokaryotic genomes of 247 genera from the Ensembl Bacteria database. StORF-Reporter extended the core, soft-core and accessory gene collections, identified novel gene families and extended families into additional genera. The high levels of sequence conservation observed between genera suggest that many of these StORFs are likely to be functional genes that should now be considered for inclusion in canonical annotations.
Collapse
Affiliation(s)
- Nicholas J Dimonaco
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth SY23 3PD, Wales, UK
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, Wales, UK
- Department of Medicine, McMaster University, Hamilton, ON, Canada
- Farncombe Family Digestive Health Research Institute, McMaster University, Hamilton, ON, Canada
- School of Biological Sciences, Queen’s University Belfast, Belfast BT7 1NN, Northern Ireland, UK
| | - Amanda Clare
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, Wales, UK
| | - Kim Kenobi
- Department of Mathematics, Aberystwyth University, Aberystwyth SY23 3BZ, Wales, UK
| | - Wayne Aubrey
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, Wales, UK
| | - Christopher J Creevey
- School of Biological Sciences, Queen’s University Belfast, Belfast BT7 1NN, Northern Ireland, UK
| |
Collapse
|
16
|
Si D, Sun J, Guo L, Yang F, Tian X, He S, Li J. Hypothetical Proteins of Mycoplasma synoviae Reannotation and Expression Changes Identified via RNA-Sequencing. Microorganisms 2023; 11:2716. [PMID: 38004728 PMCID: PMC10673309 DOI: 10.3390/microorganisms11112716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 10/25/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023] Open
Abstract
Mycoplasma synoviae infection rates in chickens are increasing worldwide. Genomic studies have considerably improved our understanding of M. synoviae biology and virulence. However, approximately 20% of the predicted proteins have unknown functions. In particular, the M. synoviae ATCC 25204 genome has 663 encoding DNA sequences, among which 155 are considered encoding hypothetical proteins (HPs). Several of these genes may encode unknown virulence factors. This study aims to reannotate all 155 proteins in M. synoviae ATCC 25204 to predict new potential virulence factors using currently available databases and bioinformatics tools. Finally, 125 proteins were reannotated, including enzymes (39%), lipoproteins (10%), DNA-binding proteins (6%), phase-variable hemagglutinin (19%), and other protein types (26%). Among 155 proteins, 28 proteins associated with virulence were detected, five of which were reannotated. Furthermore, HP expression was compared before and after the M. synoviae infection of cells to identify potential virulence-related proteins. The expression of 14 HP genes was upregulated, including that of five virulence-related genes. Our study improved the functional annotation of M. synoviae ATCC 25204 from 76% to 95% and enabled the discovery of potential virulence factors in the genome. Moreover, 14 proteins that may be involved in M. synoviae infection were identified, providing candidate proteins and facilitating the exploration of the infection mechanism of M. synoviae.
Collapse
Affiliation(s)
| | | | | | | | | | - Shenghu He
- College of Animal Science and Technology, Clinical Veterinary Laboratory, Ningxia University, Yinchuan 750021, China; (D.S.); (J.S.); (L.G.); (F.Y.); (X.T.)
| | - Jidong Li
- College of Animal Science and Technology, Clinical Veterinary Laboratory, Ningxia University, Yinchuan 750021, China; (D.S.); (J.S.); (L.G.); (F.Y.); (X.T.)
| |
Collapse
|
17
|
Simoens L, Fijalkowski I, Van Damme P. Exposing the small protein load of bacterial life. FEMS Microbiol Rev 2023; 47:fuad063. [PMID: 38012116 PMCID: PMC10723866 DOI: 10.1093/femsre/fuad063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 11/10/2023] [Accepted: 11/24/2023] [Indexed: 11/29/2023] Open
Abstract
The ever-growing repertoire of genomic techniques continues to expand our understanding of the true diversity and richness of prokaryotic genomes. Riboproteogenomics laid the foundation for dynamic studies of previously overlooked genomic elements. Most strikingly, bacterial genomes were revealed to harbor robust repertoires of small open reading frames (sORFs) encoding a diverse and broadly expressed range of small proteins, or sORF-encoded polypeptides (SEPs). In recent years, continuous efforts led to great improvements in the annotation and characterization of such proteins, yet many challenges remain to fully comprehend the pervasive nature of small proteins and their impact on bacterial biology. In this work, we review the recent developments in the dynamic field of bacterial genome reannotation, catalog the important biological roles carried out by small proteins and identify challenges obstructing the way to full understanding of these elusive proteins.
Collapse
Affiliation(s)
- Laure Simoens
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| | - Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| |
Collapse
|
18
|
Brantl S, Ul Haq I. Small proteins in Gram-positive bacteria. FEMS Microbiol Rev 2023; 47:fuad064. [PMID: 38052429 PMCID: PMC10730256 DOI: 10.1093/femsre/fuad064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 11/27/2023] [Accepted: 12/04/2023] [Indexed: 12/07/2023] Open
Abstract
Small proteins comprising less than 100 amino acids have been often ignored in bacterial genome annotations. About 10 years ago, focused efforts started to investigate whole peptidomes, which resulted in the discovery of a multitude of small proteins, but only a number of them have been characterized in detail. Generally, small proteins can be either membrane or cytosolic proteins. The latter interact with larger proteins, RNA or even metal ions. Here, we summarize our current knowledge on small proteins from Gram-positive bacteria with a special emphasis on the model organism Bacillus subtilis. Our examples include membrane-bound toxins of type I toxin-antitoxin systems, proteins that block the assembly of higher order structures, regulate sporulation or modulate the RNA degradosome. We do not consider antimicrobial peptides. Furthermore, we present methods for the identification and investigation of small proteins.
Collapse
Affiliation(s)
- Sabine Brantl
- AG Bakteriengenetik, Matthias-Schleiden-Institut, Friedrich-Schiller-Universität Jena, Philosophenweg 12, Jena D-07743, Germany
| | - Inam Ul Haq
- AG Bakteriengenetik, Matthias-Schleiden-Institut, Friedrich-Schiller-Universität Jena, Philosophenweg 12, Jena D-07743, Germany
| |
Collapse
|
19
|
Weber M, Sogues A, Yus E, Burgos R, Gallo C, Martínez S, Lluch‐Senar M, Serrano L. Comprehensive quantitative modeling of translation efficiency in a genome-reduced bacterium. Mol Syst Biol 2023; 19:e11301. [PMID: 37642167 PMCID: PMC10568206 DOI: 10.15252/msb.202211301] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 07/17/2023] [Accepted: 07/24/2023] [Indexed: 08/31/2023] Open
Abstract
Translation efficiency has been mainly studied by ribosome profiling, which only provides an incomplete picture of translation kinetics. Here, we integrated the absolute quantifications of tRNAs, mRNAs, RNA half-lives, proteins, and protein half-lives with ribosome densities and derived the initiation and elongation rates for 475 genes (67% of all genes), 73 with high precision, in the bacterium Mycoplasma pneumoniae (Mpn). We found that, although the initiation rate varied over 160-fold among genes, most of the known factors had little impact on translation efficiency. Local codon elongation rates could not be fully explained by the adaptation to tRNA abundances, which varied over 100-fold among tRNA isoacceptors. We provide a comprehensive quantitative view of translation efficiency, which suggests the existence of unidentified mechanisms of translational regulation in Mpn.
Collapse
Affiliation(s)
- Marc Weber
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Adrià Sogues
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Eva Yus
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Raul Burgos
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Carolina Gallo
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Sira Martínez
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Maria Lluch‐Senar
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Luis Serrano
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
- Universitat Pompeu Fabra (UPF)BarcelonaSpain
- ICREABarcelonaSpain
| |
Collapse
|
20
|
González-Arzola K, Díaz-Quintana A. Mitochondrial Factors in the Cell Nucleus. Int J Mol Sci 2023; 24:13656. [PMID: 37686461 PMCID: PMC10563088 DOI: 10.3390/ijms241713656] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 08/31/2023] [Accepted: 08/31/2023] [Indexed: 09/10/2023] Open
Abstract
The origin of eukaryotic organisms involved the integration of mitochondria into the ancestor cell, with a massive gene transfer from the original proteobacterium to the host nucleus. Thus, mitochondrial performance relies on a mosaic of nuclear gene products from a variety of genomes. The concerted regulation of their synthesis is necessary for metabolic housekeeping and stress response. This governance involves crosstalk between mitochondrial, cytoplasmic, and nuclear factors. While anterograde and retrograde regulation preserve mitochondrial homeostasis, the mitochondria can modulate a wide set of nuclear genes in response to an extensive variety of conditions, whose response mechanisms often merge. In this review, we summarise how mitochondrial metabolites and proteins-encoded either in the nucleus or in the organelle-target the cell nucleus and exert different actions modulating gene expression and the chromatin state, or even causing DNA fragmentation in response to common stress conditions, such as hypoxia, oxidative stress, unfolded protein stress, and DNA damage.
Collapse
Affiliation(s)
- Katiuska González-Arzola
- Centro Andaluz de Biología Molecular y Medicina Regenerativa—CABIMER, Consejo Superior de Investigaciones Científicas—Universidad de Sevilla—Universidad Pablo de Olavide, 41092 Seville, Spain
- Departamento de Bioquímica Vegetal y Biología Molecular, Universidad de Sevilla, 41012 Seville, Spain
| | - Antonio Díaz-Quintana
- Departamento de Bioquímica Vegetal y Biología Molecular, Universidad de Sevilla, 41012 Seville, Spain
- Instituto de Investigaciones Químicas—cicCartuja, Universidad de Sevilla—C.S.I.C, 41092 Seville, Spain
| |
Collapse
|
21
|
Anders J, Stadler PF. RNAcode_Web - Convenient identification of evolutionary conserved protein coding regions. J Integr Bioinform 2023; 20:jib-2022-0046. [PMID: 37615674 PMCID: PMC10757073 DOI: 10.1515/jib-2022-0046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 02/15/2023] [Indexed: 08/25/2023] Open
Abstract
The differentiation of regions with coding potential from non-coding regions remains a key task in computational biology. Methods such as RNAcode that exploit patterns of sequence conservation for this task have a substantial advantage in classification accuracy in particular for short coding sequences, compared to methods that rely on a single input sequence. However, they require sequence alignments as input. Frequently, suitable multiple sequence alignments are not readily available and are tedious, and sometimes difficult to construct. We therefore introduce here a new web service that provides access to the well-known coding sequence detector RNAcode with minimal user overhead. It requires as input only a single target nucleotide sequence. The service automates the collection, selection, and preparation of homologous sequences from the NCBI database, as well as the construction of the multiple sequence alignment that are needed as input for RNAcode. The service automatizes the entire pre- and postprocessing and thus makes the investigation of specific genomic regions for previously unannotated coding regions, such as small peptides or additional introns, a simple task that is easily accessible to non-expert users. RNAcode_Web is accessible online at rnacode.bioinf.uni-leipzig.de.
Collapse
Affiliation(s)
- John Anders
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, D-04107Leipzig, Germany
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, D-04107Leipzig, Germany
- Max-Planck-Institute for Mathematics in the Sciences, Inselstraße 22, D-04103Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501, USA
| |
Collapse
|
22
|
Thomas KE, Gagniuc PA, Gagniuc E. Moonlighting genes harbor antisense ORFs that encode potential membrane proteins. Sci Rep 2023; 13:12591. [PMID: 37537268 PMCID: PMC10400600 DOI: 10.1038/s41598-023-39869-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 08/01/2023] [Indexed: 08/05/2023] Open
Abstract
Moonlighting genes encode for single polypeptide molecules that perform multiple and often unrelated functions. These genes occur across all domains of life. Their ubiquity and functional diversity raise many questions as to their origins, evolution, and role in the cell cycle. In this study, we present a simple bioinformatics probe that allows us to rank genes by antisense translation potential, and we show that this probe enriches, reliably, for moonlighting genes across a variety of organisms. We find that moonlighting genes harbor putative antisense open reading frames (ORFs) rich in codons for non-polar amino acids. We also find that moonlighting genes tend to co-locate with genes involved in cell wall, cell membrane, or cell envelope production. On the basis of this and other findings, we offer a model in which we propose that moonlighting gene products are likely to escape the cell through gaps in the cell wall and membrane, at wall/membrane construction sites; and we propose that antisense ORFs produce "membrane-sticky" protein products, effectively binding moonlighting-gene DNA to the cell membrane in porous areas where intensive cell-wall/cell-membrane construction is underway. This leads to high potential for escape of moonlighting proteins to the cell surface. Evolutionary and other implications of these findings are discussed.
Collapse
Affiliation(s)
| | - Paul A Gagniuc
- Faculty of Engineering in Foreign Languages, University Politehnica of Bucharest, Bucharest, Romania.
| | - Elvira Gagniuc
- Synevovet Laboratory, Bucharest, Romania
- Faculty of Veterinary Medicine, University of Agronomic Sciences and Veterinary Medicine, Bucharest, Romania
| |
Collapse
|
23
|
Hadjeras L, Bartel J, Maier LK, Maaß S, Vogel V, Svensson SL, Eggenhofer F, Gelhausen R, Müller T, Alkhnbashi OS, Backofen R, Becher D, Sharma CM, Marchfelder A. Revealing the small proteome of Haloferax volcanii by combining ribosome profiling and small-protein optimized mass spectrometry. MICROLIFE 2023; 4:uqad001. [PMID: 37223747 PMCID: PMC10117724 DOI: 10.1093/femsml/uqad001] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 11/29/2022] [Accepted: 01/13/2023] [Indexed: 05/25/2023]
Abstract
In contrast to extensively studied prokaryotic 'small' transcriptomes (encompassing all small noncoding RNAs), small proteomes (here defined as including proteins ≤70 aa) are only now entering the limelight. The absence of a complete small protein catalogue in most prokaryotes precludes our understanding of how these molecules affect physiology. So far, archaeal genomes have not yet been analyzed broadly with a dedicated focus on small proteins. Here, we present a combinatorial approach, integrating experimental data from small protein-optimized mass spectrometry (MS) and ribosome profiling (Ribo-seq), to generate a high confidence inventory of small proteins in the model archaeon Haloferax volcanii. We demonstrate by MS and Ribo-seq that 67% of the 317 annotated small open reading frames (sORFs) are translated under standard growth conditions. Furthermore, annotation-independent analysis of Ribo-seq data showed ribosomal engagement for 47 novel sORFs in intergenic regions. A total of seven of these were also detected by proteomics, in addition to an eighth novel small protein solely identified by MS. We also provide independent experimental evidence in vivo for the translation of 12 sORFs (annotated and novel) using epitope tagging and western blotting, underlining the validity of our identification scheme. Several novel sORFs are conserved in Haloferax species and might have important functions. Based on our findings, we conclude that the small proteome of H. volcanii is larger than previously appreciated, and that combining MS with Ribo-seq is a powerful approach for the discovery of novel small protein coding genes in archaea.
Collapse
Affiliation(s)
- Lydia Hadjeras
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Straße 2 / D15, 97080 Würzburg, Germany
| | - Jürgen Bartel
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | | | - Sandra Maaß
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Verena Vogel
- Biology II, Ulm University, Albert-Einstein-Allee 11, 89081 Ulm, Germany
| | - Sarah L Svensson
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Straße 2 / D15, 97080 Würzburg, Germany
| | - Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Rick Gelhausen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Teresa Müller
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Omer S Alkhnbashi
- Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany
| | - Dörte Becher
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Cynthia M Sharma
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Straße 2 / D15, 97080 Würzburg, Germany
| | - Anita Marchfelder
- Biology II, Ulm University, Albert-Einstein-Allee 11, 89081 Ulm, Germany
| |
Collapse
|
24
|
Ardern Z, Chakraborty S, Lenk F, Kaster AK. Elucidating the functional roles of prokaryotic proteins using big data and artificial intelligence. FEMS Microbiol Rev 2023; 47:fuad003. [PMID: 36725215 PMCID: PMC9960493 DOI: 10.1093/femsre/fuad003] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 01/11/2023] [Accepted: 01/31/2023] [Indexed: 02/03/2023] Open
Abstract
Annotating protein sequences according to their biological functions is one of the key steps in understanding microbial diversity, metabolic potentials, and evolutionary histories. However, even in the best-studied prokaryotic genomes, not all proteins can be characterized by classical in vivo, in vitro, and/or in silico methods-a challenge rapidly growing alongside the advent of next-generation sequencing technologies and their enormous extension of 'omics' data in public databases. These so-called hypothetical proteins (HPs) represent a huge knowledge gap and hidden potential for biotechnological applications. Opportunities for leveraging the available 'Big Data' have recently proliferated with the use of artificial intelligence (AI). Here, we review the aims and methods of protein annotation and explain the different principles behind machine and deep learning algorithms including recent research examples, in order to assist both biologists wishing to apply AI tools in developing comprehensive genome annotations and computer scientists who want to contribute to this leading edge of biological research.
Collapse
Affiliation(s)
- Zachary Ardern
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany
- Wellcome Trust Sanger Institute, Hinxton, Saffron Walden CB10 1RQ, United Kingdom
| | - Sagarika Chakraborty
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany
| | - Florian Lenk
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany
| | - Anne-Kristin Kaster
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
25
|
Álvarez-Urdiola R, Borràs E, Valverde F, Matus JT, Sabidó E, Riechmann JL. Peptidomics Methods Applied to the Study of Flower Development. Methods Mol Biol 2023; 2686:509-536. [PMID: 37540375 DOI: 10.1007/978-1-0716-3299-4_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Understanding the global and dynamic nature of plant developmental processes requires not only the study of the transcriptome, but also of the proteome, including its largely uncharacterized peptidome fraction. Recent advances in proteomics and high-throughput analyses of translating RNAs (ribosome profiling) have begun to address this issue, evidencing the existence of novel, uncharacterized, and possibly functional peptides. To validate the accumulation in tissues of sORF-encoded polypeptides (SEPs), the basic setup of proteomic analyses (i.e., LC-MS/MS) can be followed. However, the detection of peptides that are small (up to ~100 aa, 6-7 kDa) and novel (i.e., not annotated in reference databases) presents specific challenges that need to be addressed both experimentally and with computational biology resources. Several methods have been developed in recent years to isolate and identify peptides from plant tissues. In this chapter, we outline two different peptide extraction protocols and the subsequent peptide identification by mass spectrometry using the database search or the de novo identification methods.
Collapse
Affiliation(s)
- Raquel Álvarez-Urdiola
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
| | - Eva Borràs
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Federico Valverde
- Institute for Plant Biochemistry and Photosynthesis CSIC - University of Seville, Seville, Spain
| | - José Tomás Matus
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València-CSIC, Paterna, Valencia, Spain
| | - Eduard Sabidó
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - José Luis Riechmann
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
26
|
Vakirlis N, Vance Z, Duggan KM, McLysaght A. De novo birth of functional microproteins in the human lineage. Cell Rep 2022; 41:111808. [PMID: 36543139 PMCID: PMC10073203 DOI: 10.1016/j.celrep.2022.111808] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 06/21/2022] [Accepted: 11/18/2022] [Indexed: 12/24/2022] Open
Abstract
Small open reading frames (sORFs) can encode functional "microproteins" that perform crucial biological tasks. However, their size makes them less amenable to genomic analysis, and their origins and conservation are poorly understood. Given their short length, it is plausible that some of these functional microproteins have recently originated entirely de novo from noncoding sequences. Here we sought to identify such cases in the human lineage by reconstructing the evolutionary origins of human microproteins previously found to have measurable, statistically significant fitness effects. By tracing the formation of each ORF and its transcriptional activation, we show that novel microproteins with significant phenotypic effects have emerged de novo throughout animal evolution, including two after the human-chimpanzee split. Notably, traditional methods for assessing coding potential would miss most of these cases. This evidence demonstrates that the functional potential intrinsic to sORFs can be relatively rapidly and frequently realized through de novo gene emergence.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari, Greece.
| | - Zoe Vance
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Kate M Duggan
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland.
| |
Collapse
|
27
|
Ventroux M, Noirot-Gros MF. Prophage-encoded small protein YqaH counteracts the activities of the replication initiator DnaA in Bacillus subtilis. MICROBIOLOGY (READING, ENGLAND) 2022; 168. [PMID: 36748575 DOI: 10.1099/mic.0.001268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Bacterial genomes harbour cryptic prophages that are mostly transcriptionally silent with many unannotated genes. Still, cryptic prophages may contribute to their host fitness and phenotypes. In Bacillus subtilis, the yqaF-yqaN operon belongs to the prophage element skin, and is tightly repressed by the Xre-like repressor SknR. This operon contains several small ORFs (smORFs) potentially encoding small-sized proteins. The smORF-encoded peptide YqaH was previously reported to bind to the replication initiator DnaA. Here, using a yeast two-hybrid assay, we found that YqaH binds to the DNA binding domain IV of DnaA and interacts with Spo0A, a master regulator of sporulation. We isolated single amino acid substitutions in YqaH that abolished the interaction with DnaA but not with Spo0A. Then, using a plasmid-based inducible system to overexpress yqaH WT and mutant derivatives, we studied in B. subtilis the phenotypes associated with the specific loss-of-interaction with DnaA (DnaA_LOI). We found that expression of yqaH carrying DnaA_LOI mutations abolished the deleterious effects of yqaH WT expression on chromosome segregation, replication initiation and DnaA-regulated transcription. When YqaH was induced after vegetative growth, DnaA_LOI mutations abolished the drastic effects of YqaH WT on sporulation and biofilm formation. Thus, YqaH inhibits replication, sporulation and biofilm formation mainly by antagonizing DnaA in a manner that is independent of the cell cycle checkpoint Sda.
Collapse
Affiliation(s)
- Magali Ventroux
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France
| | | |
Collapse
|
28
|
Probing the sORF-Encoded Peptides of Deinococcus radiodurans in Response to Extreme Stress. Mol Cell Proteomics 2022; 21:100423. [PMID: 36210010 PMCID: PMC9650054 DOI: 10.1016/j.mcpro.2022.100423] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 09/27/2022] [Accepted: 10/03/2022] [Indexed: 11/09/2022] Open
Abstract
Organisms have developed different mechanisms to respond to stresses. However, the roles of small ORF-encoded peptides (SEPs) in these regulatory systems remain elusive, which is partially because of the lack of comprehensive knowledge regarding these biomolecules. We chose the extremophile Deinococcus radiodurans R1 as a model species and conducted large-scale profiling of the SEPs related to the stress response. The integrated workflow consisting of multiple omics approaches for SEP identification was streamlined, and an SEPome of D. radiodurans containing 109 novel and high-confidence SEPs was drafted. Forty-four percent of these SEPs were predicted to function as antimicrobial peptides. Quantitative peptidomics analysis indicated that the expression of SEP068184 was upregulated upon oxidative treatment and gamma irradiation of the bacteria. SEP068184 was conserved in Deinococcus and exhibited negative regulation of oxidative stress resistance in a comparative phenotypic assay of its mutants. Further quantitative and interactive proteomics analyses suggested that SEP068184 might function through metabolic pathways and interact with cytoplasmic proteins. Collectively, our findings demonstrate that SEPs are involved in the regulation of oxidative resistance, and the SEPome dataset provides a rich resource for research on the molecular mechanisms of the response to extreme stress in organisms.
Collapse
|
29
|
Soto I, Couvillion M, Hansen KG, McShane E, Moran JC, Barrientos A, Churchman LS. Balanced mitochondrial and cytosolic translatomes underlie the biogenesis of human respiratory complexes. Genome Biol 2022; 23:170. [PMID: 35945592 PMCID: PMC9361522 DOI: 10.1186/s13059-022-02732-9] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 07/18/2022] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Oxidative phosphorylation (OXPHOS) complexes consist of nuclear and mitochondrial DNA-encoded subunits. Their biogenesis requires cross-compartment gene regulation to mitigate the accumulation of disproportionate subunits. To determine how human cells coordinate mitochondrial and nuclear gene expression processes, we tailored ribosome profiling for the unique features of the human mitoribosome. RESULTS We resolve features of mitochondrial translation initiation and identify a small ORF in the 3' UTR of MT-ND5. Analysis of ribosome footprints in five cell types reveals that average mitochondrial synthesis levels correspond precisely to cytosolic levels across OXPHOS complexes, and these average rates reflect the relative abundances of the complexes. Balanced mitochondrial and cytosolic synthesis does not rely on rapid feedback between the two translation systems, and imbalance caused by mitochondrial translation deficiency is associated with the induction of proteotoxicity pathways. CONCLUSIONS Based on our findings, we propose that human OXPHOS complexes are synthesized proportionally to each other, with mitonuclear balance relying on the regulation of OXPHOS subunit translation across cellular compartments, which may represent a proteostasis vulnerability.
Collapse
Affiliation(s)
- Iliana Soto
- Blavatnik Institute, Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
| | - Mary Couvillion
- Blavatnik Institute, Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
| | - Katja G Hansen
- Blavatnik Institute, Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
| | - Erik McShane
- Blavatnik Institute, Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
| | - J Conor Moran
- Department of Neurology, University of Miami Miller School of Medicine, Miami, FL, 33136, USA
| | - Antoni Barrientos
- Department of Neurology, University of Miami Miller School of Medicine, Miami, FL, 33136, USA
| | - L Stirling Churchman
- Blavatnik Institute, Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
30
|
Elfmann C, Zhu B, Pedreira T, Hoßbach B, Lluch-Senar M, Serrano L, Stülke J. MycoWiki: Functional annotation of the minimal model organism Mycoplasma pneumoniae. Front Microbiol 2022; 13:935066. [PMID: 35958127 PMCID: PMC9358437 DOI: 10.3389/fmicb.2022.935066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 07/01/2022] [Indexed: 11/13/2022] Open
Abstract
The human pathogen Mycoplasma pneumoniae is viable independently from host cells or organisms, despite its strongly reduced genome with only about 700 protein-coding genes. The investigation of M. pneumoniae can therefore help to obtain general insights concerning the basic requirements for cellular life. Accordingly, M. pneumoniae has become a model organism for systems biology in the past decade. To support the investigation of the components of this minimal bacterium, we have generated the database MycoWiki. (http://mycowiki.uni-goettingen.de) MycoWiki organizes data under a relational database and provides access to curated and state-of-the-art information on the genes and proteins of M. pneumoniae. Interestingly, M. pneumoniae has undergone an evolution that resulted in the limited similarity of many proteins to proteins of model organisms. To facilitate the analysis of the functions of M. pneumoniae proteins, we have integrated structure predictions from the AlphaFold Protein Structure Database for most proteins, structural information resulting from in vivo cross-linking, and protein-protein interactions based on a global in vivo study. MycoWiki is an important tool for the systems and synthetic biology community that will support the comprehensive understanding of a minimal organism and the functional annotation of so far uncharacterized proteins.
Collapse
Affiliation(s)
- Christoph Elfmann
- Department of General Microbiology, Göttingen Center for Molecular Biosciences, Georg-August University Göttingen, Göttingen, Germany
| | - Bingyao Zhu
- Department of General Microbiology, Göttingen Center for Molecular Biosciences, Georg-August University Göttingen, Göttingen, Germany
| | - Tiago Pedreira
- Department of General Microbiology, Göttingen Center for Molecular Biosciences, Georg-August University Göttingen, Göttingen, Germany
| | - Ben Hoßbach
- Department of General Microbiology, Göttingen Center for Molecular Biosciences, Georg-August University Göttingen, Göttingen, Germany
| | - Maria Lluch-Senar
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Luis Serrano
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jörg Stülke
- Department of General Microbiology, Göttingen Center for Molecular Biosciences, Georg-August University Göttingen, Göttingen, Germany
| |
Collapse
|
31
|
Odrzywolek K, Karwowska Z, Majta J, Byrski A, Milanowska-Zabel K, Kosciolek T. Deep embeddings to comprehend and visualize microbiome protein space. Sci Rep 2022; 12:10332. [PMID: 35725732 PMCID: PMC9209496 DOI: 10.1038/s41598-022-14055-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 05/31/2022] [Indexed: 12/13/2022] Open
Abstract
Understanding the function of microbial proteins is essential to reveal the clinical potential of the microbiome. The application of high-throughput sequencing technologies allows for fast and increasingly cheaper acquisition of data from microbial communities. However, many of the inferred protein sequences are novel and not catalogued, hence the possibility of predicting their function through conventional homology-based approaches is limited, which indicates the need for further research on alignment-free methods. Here, we leverage a deep-learning-based representation of proteins to assess its utility in alignment-free analysis of microbial proteins. We trained a language model on the Unified Human Gastrointestinal Protein catalogue and validated the resulting protein representation on the bacterial part of the SwissProt database. Finally, we present a use case on proteins involved in SCFA metabolism. Results indicate that the deep learning model manages to accurately represent features related to protein structure and function, allowing for alignment-free protein analyses. Technologies that contextualize metagenomic data are a promising direction to deeply understand the microbiome.
Collapse
Affiliation(s)
- Krzysztof Odrzywolek
- Ardigen, Podole 76, 30-394, Krakow, Poland
- Institute of Computer Science, Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology, Mickiewicza 30, 30-059, Krakow, Poland
| | - Zuzanna Karwowska
- Malopolska Centre of Biotechnology, Jagiellonian University, Gronostajowa 7A, 30-387, Krakow, Poland
| | - Jan Majta
- Ardigen, Podole 76, 30-394, Krakow, Poland
- Department of Computational Biophysics and Bioinformatics, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, 30-387, Krakow, Poland
| | - Aleksander Byrski
- Institute of Computer Science, Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology, Mickiewicza 30, 30-059, Krakow, Poland
| | | | - Tomasz Kosciolek
- Malopolska Centre of Biotechnology, Jagiellonian University, Gronostajowa 7A, 30-387, Krakow, Poland.
| |
Collapse
|
32
|
Fijalkowski I, Willems P, Jonckheere V, Simoens L, Van Damme P. Hidden in plain sight: challenges in proteomics detection of small ORF-encoded polypeptides. MICROLIFE 2022; 3:uqac005. [PMID: 37223358 PMCID: PMC10117744 DOI: 10.1093/femsml/uqac005] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 04/18/2022] [Accepted: 04/29/2022] [Indexed: 05/25/2023]
Abstract
Genomic studies of bacteria have long pointed toward widespread prevalence of small open reading frames (sORFs) encoding for short proteins, <100 amino acids in length. Despite the mounting genomic evidence of their robust expression, relatively little progress has been made in their mass spectrometry-based detection and various blanket statements have been used to explain this observed discrepancy. In this study, we provide a large-scale riboproteogenomics investigation of the challenging nature of proteomic detection of such small proteins as informed by conditional translation data. A panel of physiochemical properties alongside recently developed mass spectrometry detectability metrics was interrogated to provide a comprehensive evidence-based assessment of sORF-encoded polypeptide (SEP) detectability. Moreover, a large-scale proteomics and translatomics compendium of proteins produced by Salmonella Typhimurium (S. Typhimurium), a model human pathogen, across a panel of growth conditions is presented and used in support of our in silico SEP detectability analysis. This integrative approach is used to provide a data-driven census of small proteins expressed by S. Typhimurium across growth phases and infection-relevant conditions. Taken together, our study pinpoints current limitations in proteomics-based detection of novel small proteins currently missing from bacterial genome annotations.
Collapse
Affiliation(s)
- Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Patrick Willems
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Veronique Jonckheere
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Laure Simoens
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
33
|
Wittekind MA, Frey A, Bonsall AE, Briaud P, Keogh RA, Wiemels RE, Shaw LN, Carroll RK. The novel protein ScrA acts through the SaeRS two-component system to regulate virulence gene expression in Staphylococcus aureus. Mol Microbiol 2022; 117:1196-1212. [PMID: 35366366 PMCID: PMC9324805 DOI: 10.1111/mmi.14901] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 03/24/2022] [Accepted: 03/25/2022] [Indexed: 11/29/2022]
Abstract
Staphylococcus aureus is a Gram-positive commensal that can also cause a variety of infections in humans. S. aureus virulence factor gene expression is under tight control by a complex regulatory network, which includes, sigma factors, sRNAs, and two-component systems (TCS). Previous work in our laboratory demonstrated that overexpression of the sRNA tsr37 leads to an increase in bacterial aggregation. Here, we demonstrate that the clumping phenotype is dependent on a previously unannotated 88 amino acid protein encoded within the tsr37 sRNA transcript (which we named ScrA for S. aureus clumping regulator A). To investigate the mechanism of action of ScrA we performed proteomics and transcriptomics in a ScrA overexpressing strain and show that a number of surface adhesins are upregulated, while secreted proteases are downregulated. Results also showed upregulation of the SaeRS TCS, suggesting that ScrA is influencing SaeRS activity. Overexpression of ScrA in a saeR mutant abrogates the clumping phenotype confirming that ScrA functions via the Sae system. Finally, we identified the ArlRS TCS as a positive regulator of scrA expression. Collectively, our results show that ScrA is an activator of the SaeRS system and suggests that ScrA may act as an intermediary between the ArlRS and SaeRS systems.
Collapse
Affiliation(s)
| | - Andrew Frey
- Department of Cell Biology, Microbiology & Molecular BiologyUniversity of South FloridaTampaFloridaUSA
| | | | - Paul Briaud
- Department of Biological SciencesOhio UniversityAthensOhioUSA
| | - Rebecca A. Keogh
- Department of Biological SciencesOhio UniversityAthensOhioUSA
- Present address:
Department of Immunology & MicrobiologyUniversity of Colorado School of MedicineAuroraColoradoUSA
| | | | - Lindsey N. Shaw
- Department of Cell Biology, Microbiology & Molecular BiologyUniversity of South FloridaTampaFloridaUSA
| | - Ronan K. Carroll
- Department of Biological SciencesOhio UniversityAthensOhioUSA
- Infectious and Tropical Disease InstituteOhio UniversityAthensOhioUSA
| |
Collapse
|
34
|
Broto A, Gaspari E, Miravet-Verde S, Dos Santos VAPM, Isalan M. A genetic toolkit and gene switches to limit Mycoplasma growth for biosafety applications. Nat Commun 2022; 13:1910. [PMID: 35393441 PMCID: PMC8991246 DOI: 10.1038/s41467-022-29574-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 03/24/2022] [Indexed: 12/18/2022] Open
Abstract
Mycoplasmas have exceptionally streamlined genomes and are strongly adapted to their many hosts, which provide them with essential nutrients. Owing to their relative genomic simplicity, Mycoplasmas have been used to develop chassis for biotechnological applications. However, the dearth of robust and precise toolkits for genomic manipulation and tight regulation has hindered any substantial advance. Herein we describe the construction of a robust genetic toolkit for M. pneumoniae, and its successful deployment to engineer synthetic gene switches that control and limit Mycoplasma growth, for biosafety containment applications. We found these synthetic gene circuits to be stable and robust in the long-term, in the context of a minimal cell. With this work, we lay a foundation to develop viable and robust biosafety systems to exploit a synthetic Mycoplasma chassis for live attenuated vectors for therapeutic applications.
Collapse
Affiliation(s)
- Alicia Broto
- Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Erika Gaspari
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, the Netherlands
- European & Developing Countries Clinical Trials Partnership (EDCTP), The Hague, The Netherlands
| | - Samuel Miravet-Verde
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003, Barcelona, Spain
| | - Vitor A P Martins Dos Santos
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, the Netherlands
- LifeGlimmer GmbH, Berlin, Germany
| | - Mark Isalan
- Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
35
|
Identification and characterisation of sPEPs in Cryptococcus neoformans. Fungal Genet Biol 2022; 160:103688. [PMID: 35339703 DOI: 10.1016/j.fgb.2022.103688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 03/02/2022] [Accepted: 03/21/2022] [Indexed: 11/24/2022]
Abstract
Short open reading frame (sORF)-encoded peptides (sPEPs) have been found across a wide range of genomic locations in a variety of species. To date, their identification, validation, and characterisation in the human fungal pathogen Cryptococcus neoformans has been limited due to a lack of standardised protocols. We have developed an enrichment process that enables sPEP detection within a protein sample from this polysaccharide-encapsulated yeast, and implemented proteogenomics to provide insights into the validity of predicted and hypothetical sORFs annotated in the C. neoformans genome. Novel sORFs were discovered within the 5' and 3' UTRs of known transcripts as well as in "non-coding" RNAs. One novel candidate, dubbed NPB1, that resided in an RNA annotated as "non-coding", was chosen for characterisation. Through the creation of both specific point mutations and a full deletion allele, the function of the new sPEP, Npb1, was shown to resemble that of the bacterial trans-translation protein SmpB.
Collapse
|
36
|
Gelhausen R, Müller T, Svensson SL, Alkhnbashi OS, Sharma CM, Eggenhofer F, Backofen R. RiboReport - benchmarking tools for ribosome profiling-based identification of open reading frames in bacteria. Brief Bioinform 2022; 23:bbab549. [PMID: 35037022 PMCID: PMC8921622 DOI: 10.1093/bib/bbab549] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 11/22/2021] [Accepted: 11/29/2021] [Indexed: 11/19/2022] Open
Abstract
Small proteins encoded by short open reading frames (ORFs) with 50 codons or fewer are emerging as an important class of cellular macromolecules in diverse organisms. However, they often evade detection by proteomics or in silico methods. Ribosome profiling (Ribo-seq) has revealed widespread translation in genomic regions previously thought to be non-coding, driving the development of ORF detection tools using Ribo-seq data. However, only a handful of tools have been designed for bacteria, and these have not yet been systematically compared. Here, we aimed to identify tools that use Ribo-seq data to correctly determine the translational status of annotated bacterial ORFs and also discover novel translated regions with high sensitivity. To this end, we generated a large set of annotated ORFs from four diverse bacterial organisms, manually labeled for their translation status based on Ribo-seq data, which are available for future benchmarking studies. This set was used to investigate the predictive performance of seven Ribo-seq-based ORF detection tools (REPARATION_blast, DeepRibo, Ribo-TISH, PRICE, smORFer, ribotricer and SPECtre), as well as IRSOM, which uses coding potential and RNA-seq coverage only. DeepRibo and REPARATION_blast robustly predicted translated ORFs, including sORFs, with no significant difference for ORFs in close proximity to other genes versus stand-alone genes. However, no tool predicted a set of novel, experimentally verified sORFs with high sensitivity. Start codon predictions with smORFer show the value of initiation site profiling data to further improve the sensitivity of ORF prediction tools in bacteria. Overall, we find that bacterial tools perform well for sORF detection, although there is potential for improving their performance, applicability, usability and reproducibility.
Collapse
Affiliation(s)
- Rick Gelhausen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Teresa Müller
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Sarah L Svensson
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Str. 2 / D15, 97080, Würzburg, Germany
| | - Omer S Alkhnbashi
- Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Saudi Arabia
- SDAIA-KFUPM Joint Research Center for Artificial Intelligence (JRC-AI), King Fahd University of Petroleum and Minerals, Saudi Arabia
| | - Cynthia M Sharma
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Str. 2 / D15, 97080, Würzburg, Germany
| | - Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schänzlestr. 18, 79104, State, Germany
| |
Collapse
|
37
|
Saati-Santamaría Z, Selem-Mojica N, Peral-Aranega E, Rivas R, García-Fraile P. Unveiling the genomic potential of Pseudomonas type strains for discovering new natural products. Microb Genom 2022; 8:000758. [PMID: 35195510 PMCID: PMC8942027 DOI: 10.1099/mgen.0.000758] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 12/07/2021] [Indexed: 12/20/2022] Open
Abstract
Microbes host a huge variety of biosynthetic gene clusters that produce an immeasurable array of secondary metabolites with many different biological activities such as antimicrobial, anticarcinogenic and antiviral. Despite the complex task of isolating and characterizing novel natural products, microbial genomic strategies can be useful for carrying out these types of studies. However, although genomic-based research on secondary metabolism is on the increase, there is still a lack of reports focusing specifically on the genus Pseudomonas. In this work, we aimed (i) to unveil the main biosynthetic systems related to secondary metabolism in Pseudomonas type strains, (ii) to study the evolutionary processes that drive the diversification of their coding regions and (iii) to select Pseudomonas strains showing promising results in the search for useful natural products. We performed a comparative genomic study on 194 Pseudomonas species, paying special attention to the evolution and distribution of different classes of biosynthetic gene clusters and the coding features of antimicrobial peptides. Using EvoMining, a bioinformatic approach for studying evolutionary processes related to secondary metabolism, we sought to decipher the protein expansion of enzymes related to the lipid metabolism, which may have evolved toward the biosynthesis of novel secondary metabolites in Pseudomonas. The types of metabolites encoded in Pseudomonas type strains were predominantly non-ribosomal peptide synthetases, bacteriocins, N-acetylglutaminylglutamine amides and ß-lactones. Also, the evolution of genes related to secondary metabolites was found to coincide with Pseudomonas species diversification. Interestingly, only a few Pseudomonas species encode polyketide synthases, which are related to the lipid metabolism broadly distributed among bacteria. Thus, our EvoMining-based search may help to discover new types of secondary metabolite gene clusters in which lipid-related enzymes are involved. This work provides information about uncharacterized metabolites produced by Pseudomonas type strains, whose gene clusters have evolved in a species-specific way. Our results provide novel insight into the secondary metabolism of Pseudomonas and will serve as a basis for the prioritization of the isolated strains. This article contains data hosted by Microreact.
Collapse
Affiliation(s)
- Zaki Saati-Santamaría
- Microbiology and Genetics Department, University of Salamanca, 37007 Salamanca, Spain
- Institute for Agribiotechnology Research (CIALE), 37185 Salamanca, Spain
| | | | - Ezequiel Peral-Aranega
- Microbiology and Genetics Department, University of Salamanca, 37007 Salamanca, Spain
- Institute for Agribiotechnology Research (CIALE), 37185 Salamanca, Spain
| | - Raúl Rivas
- Microbiology and Genetics Department, University of Salamanca, 37007 Salamanca, Spain
- Institute for Agribiotechnology Research (CIALE), 37185 Salamanca, Spain
- Associated Research Unit of Plant-Microorganism Interaction, University of Salamanca-IRNASA-CSIC, 37008 Salamanca, Spain
| | - Paula García-Fraile
- Microbiology and Genetics Department, University of Salamanca, 37007 Salamanca, Spain
- Institute for Agribiotechnology Research (CIALE), 37185 Salamanca, Spain
- Associated Research Unit of Plant-Microorganism Interaction, University of Salamanca-IRNASA-CSIC, 37008 Salamanca, Spain
| |
Collapse
|
38
|
Weidenbach K, Gutt M, Cassidy L, Chibani C, Schmitz RA. Small Proteins in Archaea, a Mainly Unexplored World. J Bacteriol 2022; 204:e0031321. [PMID: 34543104 PMCID: PMC8765429 DOI: 10.1128/jb.00313-21] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
In recent years, increasing numbers of small proteins have moved into the focus of science. Small proteins have been identified and characterized in all three domains of life, but the majority remains functionally uncharacterized, lack secondary structure, and exhibit limited evolutionary conservation. While quite a few have already been described for bacteria and eukaryotic organisms, the amount of known and functionally analyzed archaeal small proteins is still very limited. In this review, we compile the current state of research, show strategies for systematic approaches for global identification of small archaeal proteins, and address selected functionally characterized examples. Besides, we document exemplarily for one archaeon the tool development and optimization to identify small proteins using genome-wide approaches.
Collapse
Affiliation(s)
- Katrin Weidenbach
- Institute for General Microbiology, Christian Albrechts University, Kiel, Germany
| | - Miriam Gutt
- Institute for General Microbiology, Christian Albrechts University, Kiel, Germany
| | - Liam Cassidy
- AG Proteomics & Bioanalytics, Institute for Experimental Medicine, Christian Albrechts University, Kiel, Germany
| | - Cynthia Chibani
- Institute for General Microbiology, Christian Albrechts University, Kiel, Germany
| | - Ruth A. Schmitz
- Institute for General Microbiology, Christian Albrechts University, Kiel, Germany
| |
Collapse
|
39
|
Yadavalli SS, Yuan J. Bacterial Small Membrane Proteins: the Swiss Army Knife of Regulators at the Lipid Bilayer. J Bacteriol 2022; 204:e0034421. [PMID: 34516282 PMCID: PMC8765417 DOI: 10.1128/jb.00344-21] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Small membrane proteins represent a subset of recently discovered small proteins (≤100 amino acids), which are a ubiquitous class of emerging regulators underlying bacterial adaptation to environmental stressors. Until relatively recently, small open reading frames encoding these proteins were not designated genes in genome annotations. Therefore, our understanding of small protein biology was primarily limited to a few candidates associated with previously characterized larger partner proteins. Following the first systematic analyses of small proteins in Escherichia coli over a decade ago, numerous small proteins across different bacteria have been uncovered. An estimated one-third of these newly discovered proteins in E. coli are localized to the cell membrane, where they may interact with distinct groups of membrane proteins, such as signal receptors, transporters, and enzymes, and affect their activities. Recently, there has been considerable progress in functionally characterizing small membrane protein regulators aided by innovative tools adapted specifically to study small proteins. Our review covers prototypical proteins that modulate a broad range of cellular processes, such as transport, signal transduction, stress response, respiration, cell division, sporulation, and membrane stability. Thus, small membrane proteins represent a versatile group of physiology regulators at the membrane and the whole cell. Additionally, small membrane proteins have the potential for clinical applications, where some of the proteins may act as antibacterial agents themselves while others serve as alternative drug targets for the development of novel antimicrobials.
Collapse
Affiliation(s)
- Srujana S. Yadavalli
- Waksman Institute of Microbiology, Rutgers University, Piscataway, New Jersey, USA
- Department of Genetics, Rutgers University, Piscataway, New Jersey, USA
| | - Jing Yuan
- Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
- LOEWE Center for Synthetic Microbiology (SYNMIKRO), Marburg, Germany
| |
Collapse
|
40
|
Abstract
In recent years, there has been increased appreciation that a whole category of proteins, small proteins of around 50 amino acids or fewer in length, has been missed by annotation as well as by genetic and biochemical assays. With the increased recognition that small proteins are stable within cells and have regulatory functions, there has been intensified study of these proteins. As a result, important questions about small proteins in bacteria and archaea are coming to the fore. Here, we give an overview of these questions, the initial answers, and the approaches needed to address these questions more fully. More detailed discussions of how small proteins can be identified by ribosome profiling and mass spectrometry approaches are provided by two accompanying reviews (N. Vazquez-Laslop, C. M. Sharma, A. S. Mankin, and A. R. Buskirk, J Bacteriol 204:e00294-21, 2022, https://doi.org/10.1128/JB.00294-21; C. H. Ahrens, J. T. Wade, M. M. Champion, and J. D. Langer, J Bacteriol 204:e00353-21, 2022, https://doi.org/10.1128/JB.00353-21). We are excited by the prospects of new insights and possible therapeutic approaches coming from this emerging field.
Collapse
Affiliation(s)
- Todd Gray
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
- Department of Biomedical Sciences, University at Albany, Albany, New York, USA
| | - Gisela Storz
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland, USA
| | - Kai Papenfort
- Institute of Microbiology, Friedrich Schiller University, Jena, Germany
- Microverse Cluster, Friedrich Schiller University, Jena, Germany
| |
Collapse
|
41
|
Vajjala M, Johnson B, Kasparek L, Leuze M, Yao Q. Profiling a Community-Specific Function Landscape for Bacterial Peptides Through Protein-Level Meta-Assembly and Machine Learning. Front Genet 2022; 13:935351. [PMID: 35938008 PMCID: PMC9354662 DOI: 10.3389/fgene.2022.935351] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 06/17/2022] [Indexed: 11/13/2022] Open
Abstract
Small proteins, encoded by small open reading frames, are only beginning to emerge with the current advancement of omics technology and bioinformatics. There is increasing evidence that small proteins play roles in diverse critical biological functions, such as adjusting cellular metabolism, regulating other protein activities, controlling cell cycles, and affecting disease physiology. In prokaryotes such as bacteria, the small proteins are largely unexplored for their sequence space and functional groups. For most bacterial species from a natural community, the sample cannot be easily isolated or cultured, and the bacterial peptides must be better characterized in a metagenomic manner. The bacterial peptides identified from metagenomic samples can not only enrich the pool of small proteins but can also reveal the community-specific microbe ecology information from a small protein perspective. In this study, metaBP (Bacterial Peptides for metagenomic sample) has been developed as a comprehensive toolkit to explore the small protein universe from metagenomic samples. It takes raw sequencing reads as input, performs protein-level meta-assembly, and computes bacterial peptide homolog groups with sample-specific mutations. The metaBP also integrates general protein annotation tools as well as our small protein-specific machine learning module metaBP-ML to construct a full landscape for bacterial peptides. The metaBP-ML shows advantages for discovering functions of bacterial peptides in a microbial community and increases the yields of annotations by up to five folds. The metaBP toolkit demonstrates its novelty in adopting the protein-level assembly to discover small proteins, integrating protein-clustering tool in a new and flexible environment of RBiotools, and presenting the first-time small protein landscape by metaBP-ML. Taken together, metaBP (and metaBP-ML) can profile functional bacterial peptides from metagenomic samples with potential diverse mutations, in order to depict a unique landscape of small proteins from a microbial community.
Collapse
Affiliation(s)
- Mitra Vajjala
- School of Computing, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Brady Johnson
- School of Computing, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Lauren Kasparek
- School of Computing, University of Nebraska-Lincoln, Lincoln, NE, United States
| | | | - Qiuming Yao
- School of Computing, University of Nebraska-Lincoln, Lincoln, NE, United States
- *Correspondence: Qiuming Yao,
| |
Collapse
|
42
|
Minigene as a Novel Regulatory Element in Toxin-Antitoxin Systems. Int J Mol Sci 2021; 22:ijms222413389. [PMID: 34948189 PMCID: PMC8708949 DOI: 10.3390/ijms222413389] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 12/08/2021] [Accepted: 12/09/2021] [Indexed: 12/05/2022] Open
Abstract
The axe-txe type II toxin-antitoxin (TA) system is characterized by a complex and multilayered mode of gene expression regulation. Precise and tight control of this process is crucial to keep the toxin in an appropriate balance with the cognate antitoxin until its activation is needed for the cell. In this report, we provide evidence that a minigene encoded within the axe-txe operon influences translation of the Txe toxin. This is the first example to date of such a regulatory mechanism identified in the TA modules. Here, in a series of genetic studies, we employed translational reporter gene fusions to establish the molecular basis of this phenomenon. Our results show that translation of the two-codon mini-ORF displays an in cis mode of action, and positively affects the expression of txe, possibly by increasing its mRNA stability through protection from an endonuclease attack. Moreover, we established that the reading frame in which the two cistrons are encoded, as well as the distance between them, are critical parameters that affect the level of such regulation. In addition, by searching for two-codon ORFs we found sequences of several potential minigenes in the leader sequences of several other toxins belonging to the type II TA family. These findings suggest that this type of gene regulation may not only apply for the axe-txe cassette, but could be more widespread among other TA systems.
Collapse
|
43
|
Dimonaco NJ, Aubrey W, Kenobi K, Clare A, Creevey CJ. No one tool to rule them all: prokaryotic gene prediction tool annotations are highly dependent on the organism of study. Bioinformatics 2021; 38:1198-1207. [PMID: 34875010 PMCID: PMC8825762 DOI: 10.1093/bioinformatics/btab827] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 11/13/2021] [Accepted: 12/02/2021] [Indexed: 01/06/2023] Open
Abstract
MOTIVATION The biases in CoDing Sequence (CDS) prediction tools, which have been based on historic genomic annotations from model organisms, impact our understanding of novel genomes and metagenomes. This hinders the discovery of new genomic information as it results in predictions being biased towards existing knowledge. To date, users have lacked a systematic and replicable approach to identify the strengths and weaknesses of any CDS prediction tool and allow them to choose the right tool for their analysis. RESULTS We present an evaluation framework (ORForise) based on a comprehensive set of 12 primary and 60 secondary metrics that facilitate the assessment of the performance of CDS prediction tools. This makes it possible to identify which performs better for specific use-cases. We use this to assess 15 ab initio- and model-based tools representing those most widely used (historically and currently) to generate the knowledge in genomic databases. We find that the performance of any tool is dependent on the genome being analysed, and no individual tool ranked as the most accurate across all genomes or metrics analysed. Even the top-ranked tools produced conflicting gene collections, which could not be resolved by aggregation. The ORForise evaluation framework provides users with a replicable, data-led approach to make informed tool choices for novel genome annotations and for refining historical annotations. AVAILABILITY AND IMPLEMENTATION Code and datasets for reproduction and customisation are available at https://github.com/NickJD/ORForise. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nicholas J Dimonaco
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth SY23 3PD, UK,To whom correspondence should be addressed.
| | - Wayne Aubrey
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, UK
| | - Kim Kenobi
- Department of Mathematics, Aberystwyth University, Aberystwyth SY23 3BZ, UK
| | - Amanda Clare
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, UK
| | | |
Collapse
|
44
|
Tavares BADR, Paes JA, Zaha A, Ferreira HB. Reannotation of Mycoplasma hyopneumoniae hypothetical proteins revealed novel potential virulence factors. Microb Pathog 2021; 162:105344. [PMID: 34864146 DOI: 10.1016/j.micpath.2021.105344] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 11/29/2021] [Accepted: 11/30/2021] [Indexed: 01/08/2023]
Abstract
Mycoplasma hyopneumoniae is a bacterium that inhabits the swine respiratory tract, causing porcine enzootic pneumonia, which generates significant economic losses to the swine industry worldwide. The knowledge on M. hyopneumoniae biology and virulence have been significantly increased by genomics studies. However, around 30% of the predicted proteins remained of unknown function so far. According to the original annotation, the genome of M. hyopneumoniae 7448, a Brazilian pathogenic strain, had 693 coding DNA sequences, 244 of which were annotated as coding for hypothetical or uncharacterized proteins. Among them, there may be still several genes coding for unknown virulence factors. Therefore, this study aimed to functionally reannotate the whole set of 244 M. hyopneumoniae 7448 proteins of unknown function based on currently available database and bioinformatic tools, in order to predict novel potential virulence factors. Predictions of physicochemical properties, subcellular localization, function, overall association to virulence and antigenicity are provided. With that, 159 out of the set of 244 proteins of unknown function had a putative function associated to them, allowing identification of novel enzymes, membrane transporters, lipoproteins, DNA-binding proteins and adhesins. Furthermore, 139 proteins were generally associated to virulence, 14 of which had a function assigned and were differentially expressed between pathogenic and non-pathogenic strains of M. hyopneumoniae. Moreover, all extracellular or cytoplasmic membrane predicted proteins had putative epitopes identified. Overall, these analyses improved the functional annotation of M. hyopneumoniae 7448 genome from 65% to 87% and allowed the identification of new potential virulence factors.
Collapse
Affiliation(s)
- Bryan Augusto da Rosa Tavares
- Laboratório de Genômica Estrutural e Funcional, Centro de Biotecnologia, Universidade Federal do Rio Grande Do Sul (UFRGS), Porto Alegre, Brazil; Programa de Pós-Graduação em Biologia Celular e Molecular, Centro de Biotecnologia, UFRGS, Porto Alegre, Brazil
| | - Jéssica Andrade Paes
- Laboratório de Genômica Estrutural e Funcional, Centro de Biotecnologia, Universidade Federal do Rio Grande Do Sul (UFRGS), Porto Alegre, Brazil; Programa de Pós-Graduação em Biologia Celular e Molecular, Centro de Biotecnologia, UFRGS, Porto Alegre, Brazil
| | - Arnaldo Zaha
- Laboratório de Genômica Estrutural e Funcional, Centro de Biotecnologia, Universidade Federal do Rio Grande Do Sul (UFRGS), Porto Alegre, Brazil; Programa de Pós-Graduação em Biologia Celular e Molecular, Centro de Biotecnologia, UFRGS, Porto Alegre, Brazil; Laboratório de Biologia Molecular de Cestódeos, Centro de Biotecnologia, UFRGS, Porto Alegre, Brazil; Departamento de Biologia Molecular e Biotecnologia, Instituto de Biociências, UFRGS, Porto Alegre, Brazil
| | - Henrique Bunselmeyer Ferreira
- Laboratório de Genômica Estrutural e Funcional, Centro de Biotecnologia, Universidade Federal do Rio Grande Do Sul (UFRGS), Porto Alegre, Brazil; Programa de Pós-Graduação em Biologia Celular e Molecular, Centro de Biotecnologia, UFRGS, Porto Alegre, Brazil; Laboratório de Biologia Molecular de Cestódeos, Centro de Biotecnologia, UFRGS, Porto Alegre, Brazil; Departamento de Biologia Molecular e Biotecnologia, Instituto de Biociências, UFRGS, Porto Alegre, Brazil.
| |
Collapse
|
45
|
Proteogenomic discovery of sORF-encoded peptides associated with bacterial virulence in Yersinia pestis. Commun Biol 2021; 4:1248. [PMID: 34728737 PMCID: PMC8563848 DOI: 10.1038/s42003-021-02759-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Accepted: 10/08/2021] [Indexed: 11/08/2022] Open
Abstract
Plague caused by Yersinia pestis is one of the deadliest diseases. However, many molecular mechanisms of bacterial virulence remain unclear. This study engaged in the discovery of small open reading frame (sORF)-encoded peptides (SEPs) in Y. pestis. An integrated proteogenomic pipeline was established, and an atlas containing 76 SEPs was described. Bioinformatic analysis indicated that 20% of these SEPs were secreted or localized to the transmembrane and that 33% contained functional domains. Two SEPs, named SEPs-yp1 and -yp2 and encoded in noncoding regions, were selected by comparative peptidomics analysis under host-specific environments and high-salinity stress. They displayed important roles in the regulation of antiphagocytic capability in a thorough functional assay. Remarkable attenuation of virulence in mice was observed in the SEP-deleted mutants. Further global proteomic analysis indicated that SEPs-yp1 and -yp2 affected the bacterial metabolic pathways, and SEP-yp1 was associated with the bacterial virulence by modulating the expression of key virulence factors of the Yersinia type III secretion system. Our study provides a rich resource for research on Y. pestis and plague, and the findings on SEP-yp1 and SEP-yp2 shed light on the molecular mechanism of bacterial virulence. Shiyang Cao, Xinyue Liu, Yin Huang, and Yanfeng Yan et al. utilized an integrated proteogenomic approach to describe an atlas of small open reading frame-encoded peptides (SEPs) in the pathogen, Yersinia pestis. They demonstrate that two of these SEPs are associated with regulation of bacterial virulence, and altogether develop a valuable resource for future research into Y. pestis physiology.
Collapse
|
46
|
Shaw D, Miravet‐Verde S, Piñero‐Lambea C, Serrano L, Lluch‐Senar M. LoxTnSeq: random transposon insertions combined with cre/lox recombination and counterselection to generate large random genome reductions. Microb Biotechnol 2021; 14:2403-2419. [PMID: 33325626 PMCID: PMC8601177 DOI: 10.1111/1751-7915.13714] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 11/04/2020] [Accepted: 11/04/2020] [Indexed: 12/13/2022] Open
Abstract
The removal of unwanted genetic material is a key aspect in many synthetic biology efforts and often requires preliminary knowledge of which genomic regions are dispensable. Typically, these efforts are guided by transposon mutagenesis studies, coupled to deepsequencing (TnSeq) to identify insertion points and gene essentiality. However, epistatic interactions can cause unforeseen changes in essentiality after the deletion of a gene, leading to the redundancy of these essentiality maps. Here, we present LoxTnSeq, a new methodology to generate and catalogue libraries of genome reduction mutants. LoxTnSeq combines random integration of lox sites by transposon mutagenesis, and the generation of mutants via Cre recombinase, catalogued via deep sequencing. When LoxTnSeq was applied to the naturally genome reduced bacterium Mycoplasma pneumoniae, we obtained a mutant pool containing 285 unique deletions. These deletions spanned from > 50 bp to 28 Kb, which represents 21% of the total genome. LoxTnSeq also highlighted large regions of non-essential genes that could be removed simultaneously, and other non-essential regions that could not, providing a guide for future genome reductions.
Collapse
Affiliation(s)
- Daniel Shaw
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyDr. Aiguader 88Barcelona08003Spain
| | - Samuel Miravet‐Verde
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyDr. Aiguader 88Barcelona08003Spain
| | - Carlos Piñero‐Lambea
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyDr. Aiguader 88Barcelona08003Spain
- Present address:
Pulmobiotics ltdDr. Aiguader 88Barcelona08003Spain
| | - Luis Serrano
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyDr. Aiguader 88Barcelona08003Spain
- Universitat Pompeu Fabra (UPF)Barcelona08002Spain
- ICREAPg. Lluís Companys 23Barcelona08010Spain
| | - Maria Lluch‐Senar
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyDr. Aiguader 88Barcelona08003Spain
- Basic Sciences DepartmentFaculty of Medicine and Health SciencesUniversitat Internacional de CatalunyaSant Cugat del Vallès08195Spain
| |
Collapse
|
47
|
Li B, Geng H, Li Z, Peng B, Wang J, Yin X, Li N, Shi J, Zhao M, Li C, Yin F. Clinical significance of novel identified high-frequency tumor-specific peptides associated signature in predicting disease status of gastric cancer patients. Biofactors 2021; 47:1042-1052. [PMID: 34414616 DOI: 10.1002/biof.1778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/13/2021] [Accepted: 07/22/2021] [Indexed: 01/30/2023]
Abstract
The effectively early detection and determination of disease progression of gastric cancer (GC) are still required. An emerging demand for identifying the novel targets adherent to cancer cells has been still challenged since those valuable profilings not only could act as for early gastric tumor discovery but also being potential therapeutic views. We have retrospectively analyzed GC biopsies to identify those specific target peptides in association with disease progression. We have detected the polypeptide by liquid mass technology initiated BIO-HIGH innovational assay technology for tumor-specific target peptide identification. We have validated the accessibility and feasibility of multiple target cytotoxic T-lymphocyte for the assessment of potential molecular markers by equally comparing the frequencies of tumor peptides' loci identified in 138 GC patients. The aim was to separate peripheral blood lymphocytes by density gradient centrifugation and use specific target peptides in in vitro culture of lymphocytes. The Cell Counting Kit-8 assay was set up to prove the lymphocytes' proliferation stimulated by identified peptides. Both of GC-specific peptide and shared peptide were detected in the peripheral blood, and the frequencies and quantities were correlated with disease status and cancer differentiation, in which BHGa1510 (78%), BHGa1310 (66%), BHGa0910 (57%), BHGa0310 (54%), BHGa0210 (40%), BHGa0810 (35%), BHGa0110 (33%), and BHGa1410 (30%) were apparently scoped out as high-frequency (HF) peptides could be potentially specific tumor markers. Moreover, BHGa1410 was significantly associated with cancer progression, and BHGa0910 and BHGa0210 were significantly associated with TNM stage. The IHC data have shown that both the HF BHGa1510 and HF BHGa1310 were expressions by 100% in contrast with paracancerous tissues of 40% (p < 0.05) and 33%, respectively (p < 0.05). Those specific peptide pools could be valued in assessment of advanced tumor and differential status in GC patients.
Collapse
Affiliation(s)
- Bin Li
- Hebei Bio-High Technology Co., Ltd, Shijiazhuang, China
| | - Huizhen Geng
- Hebei Bio-technology Co., Ltd, Shijiazhuang, China
| | - Zibo Li
- Department of Molecular Biology with Biotechnology, School of Biological Sciences, University of Bangor, Bangor, UK
| | - Bing Peng
- Hebei Bio-High Technology Co., Ltd, Shijiazhuang, China
| | - Jinfeng Wang
- The Fourth Hospital of Hebei Medical University, Shijiazhuang, China
| | - Xiaolei Yin
- The Fourth Hospital of Hebei Medical University, Shijiazhuang, China
| | - Ning Li
- The Fourth Hospital of Hebei Medical University, Shijiazhuang, China
| | - Jianfei Shi
- The Fourth Hospital of Hebei Medical University, Shijiazhuang, China
| | - Man Zhao
- The Fourth Hospital of Hebei Medical University, Shijiazhuang, China
| | - Cuizhen Li
- The Fourth Hospital of Hebei Medical University, Shijiazhuang, China
| | - Fei Yin
- The Fourth Hospital of Hebei Medical University, Shijiazhuang, China
| |
Collapse
|
48
|
Fijalkowski I, Peeters MKR, Van Damme P. Small Protein Enrichment Improves Proteomics Detection of sORF Encoded Polypeptides. Front Genet 2021; 12:713400. [PMID: 34721520 PMCID: PMC8554064 DOI: 10.3389/fgene.2021.713400] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 10/01/2021] [Indexed: 11/13/2022] Open
Abstract
With the rapid growth in the number of sequenced genomes, genome annotation efforts became almost exclusively reliant on automated pipelines. Despite their unquestionable utility, these methods have been shown to underestimate the true complexity of the studied genomes, with small open reading frames (sORFs; ORFs typically considered shorter than 300 nucleotides) and, in consequence, their protein products (sORF encoded polypeptides or SEPs) being the primary example of a poorly annotated and highly underexplored class of genomic elements. With the advent of advanced translatomics such as ribosome profiling, reannotation efforts have progressed a great deal in providing translation evidence for numerous, previously unannotated sORFs. However, proteomics validation of these riboproteogenomics discoveries remains challenging due to their short length and often highly variable physiochemical properties. In this work we evaluate and compare tailored, yet easily adaptable, protein extraction methodologies for their efficacy in the extraction and concomitantly proteomics detection of SEPs expressed in the prokaryotic model pathogen Salmonella typhimurium (S. typhimurium). Further, an optimized protocol for the enrichment and efficient detection of SEPs making use of the of amphipathic polymer amphipol A8-35 and relying on differential peptide vs. protein solubility was developed and compared with global extraction methods making use of chaotropic agents. Given the versatile biological functions SEPs have been shown to exert, this work provides an accessible protocol for proteomics exploration of this fascinating class of small proteins.
Collapse
Affiliation(s)
- Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Gent, Belgium
| | - Marlies K. R. Peeters
- BioBix, Department of Data Analysis and Mathematical Modelling, Ghent University, Gent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Gent, Belgium
| |
Collapse
|
49
|
Fesenko I, Shabalina SA, Mamaeva A, Knyazev A, Glushkevich A, Lyapina I, Ziganshin R, Kovalchuk S, Kharlampieva D, Lazarev V, Taliansky M, Koonin EV. A vast pool of lineage-specific microproteins encoded by long non-coding RNAs in plants. Nucleic Acids Res 2021; 49:10328-10346. [PMID: 34570232 DOI: 10.1093/nar/gkab816] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 08/17/2021] [Accepted: 09/17/2021] [Indexed: 12/17/2022] Open
Abstract
Pervasive transcription of eukaryotic genomes results in expression of long non-coding RNAs (lncRNAs) most of which are poorly conserved in evolution and appear to be non-functional. However, some lncRNAs have been shown to perform specific functions, in particular, transcription regulation. Thousands of small open reading frames (smORFs, <100 codons) located on lncRNAs potentially might be translated into peptides or microproteins. We report a comprehensive analysis of the conservation and evolutionary trajectories of lncRNAs-smORFs from the moss Physcomitrium patens across transcriptomes of 479 plant species. Although thousands of smORFs are subject to substantial purifying selection, the majority of the smORFs appear to be evolutionary young and could represent a major pool for functional innovation. Using nanopore RNA sequencing, we show that, on average, the transcriptional level of conserved smORFs is higher than that of non-conserved smORFs. Proteomic analysis confirmed translation of 82 novel species-specific smORFs. Numerous conserved smORFs containing low complexity regions (LCRs) or transmembrane domains were identified, the biological functions of a selected LCR-smORF were demonstrated experimentally. Thus, microproteins encoded by smORFs are a major, functionally diverse component of the plant proteome.
Collapse
Affiliation(s)
- Igor Fesenko
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Anna Mamaeva
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Andrey Knyazev
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Anna Glushkevich
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Irina Lyapina
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Rustam Ziganshin
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Sergey Kovalchuk
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Daria Kharlampieva
- Department of Cell Biology, Federal Research and Clinical Center of Physical -Chemical Medicine of Federal Medical Biological Agency, Moscow 119435, Russian Federation
| | - Vassili Lazarev
- Department of Cell Biology, Federal Research and Clinical Center of Physical -Chemical Medicine of Federal Medical Biological Agency, Moscow 119435, Russian Federation.,Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Moscow region, 141701, Russian Federation
| | - Michael Taliansky
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation.,The James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
50
|
Burgos R, Weber M, Gallo C, Lluch-Senar M, Serrano L. Widespread ribosome stalling in a genome-reduced bacterium and the need for translational quality control. iScience 2021; 24:102985. [PMID: 34485867 PMCID: PMC8403727 DOI: 10.1016/j.isci.2021.102985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 07/22/2021] [Accepted: 08/11/2021] [Indexed: 11/21/2022] Open
Abstract
Trans-translation is a ubiquitous bacterial mechanism of ribosome rescue mediated by a transfer-messenger RNA (tmRNA) that adds a degradation tag to the truncated nascent polypeptide. Here, we characterize this quality control system in a genome-reduced bacterium, Mycoplasma pneumoniae (MPN), and perform a comparative analysis of protein quality control components in slow and fast-growing prokaryotes. We show in vivo that in MPN the sole quality control cytoplasmic protease (Lon) degrades efficiently tmRNA-tagged proteins. Analysis of tmRNA-mutants encoding a tag resistant to proteolysis reveals extensive tagging activity under normal growth. Unlike knockout strains, these mutants are viable demonstrating the requirement of tmRNA-mediated ribosome recycling. Chaperone and Lon steady-state levels maintain proteostasis in these mutants suggesting a model in which co-evolution of Lon and their substrates offer simple mechanisms of regulation without specialized degradation machineries. Finally, comparative analysis shows relative increase in Lon/Chaperone levels in slow-growing bacteria suggesting physiological adaptation to growth demand. Lon degrades efficiently tmRNA-tagged proteins in a genome-reduced bacterium tmRNA-tag mutants are viable and reveal extensive tagging activity in M. pneumoniae Co-evolution of Lon and their substrates offer simple mechanisms of regulation Chaperone and Lon relative levels correlate with bacterial growth rates
Collapse
Affiliation(s)
- Raul Burgos
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
- Corresponding author
| | - Marc Weber
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Carolina Gallo
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Maria Lluch-Senar
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- ICREA, Pg. Lluis Companys 23, Barcelona 08010, Spain
- Corresponding author
| |
Collapse
|