1
|
Deng Z, Liu C, Wang F, Song N, Liu J, Li H, Liu S, Li T, Liu Z, Xiao F, Li W. A Versatile Thioesterase Involved in Dimerization during Cinnamoyl Lipid Biosynthesis. Angew Chem Int Ed Engl 2024; 63:e202402010. [PMID: 38462490 DOI: 10.1002/anie.202402010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 03/07/2024] [Accepted: 03/07/2024] [Indexed: 03/12/2024]
Abstract
The cinnamoyl lipid compound youssoufene A1 (1), featuring a unique dearomatic carbon-bridged dimeric skeleton, exhibits increased inhibition against multidrug resistant Enterococcus faecalis as compared to monomeric youssoufenes. However, the formation process of this intriguing dearomatization/dimerization remains unknown. In this study, an unusual "gene-within-gene" thioesterase (TE) gene ysfF was functionally characterized. The gene was found to naturally encodes two proteins, an entire YsfF with α/β-hydrolase and 4-hydroxybenzoyl-CoA thioesterase (4-HBT)-like enzyme domains, and a nested YsfFHBT (4-HBT-like enzyme). Using an intracellular tagged carrier-protein tracking (ITCT) strategy, in vitro reconstitution and in vivo experiments, we found that: i) both domains of YsfF displayed thioesterase activities; ii) YsfF/YsfFHBT could accomplish the 6π-electrocyclic ring closure for benzene ring formation; and iii) YsfF and cyclase YsfX together were responsible for the ACP-tethered dearomatization/dimerization process, possibly through an unprecedented Michael-type addition reaction. Moreover, site-directed mutagenesis experiments demonstrated that N301, E483 and H566 of YsfF are critical residues for both the 6π-electrocyclization and dimerization processes. This study enhances our understanding of the multifunctionality of the TE protein family.
Collapse
Affiliation(s)
- Zirong Deng
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Shaanxi Key Laboratory of Natural Products & Chemical Biology, College of Chemistry & Pharmacy, Northwest A&F University, Yangling, Shannxi, 712100, China
- Key Laboratory of Marine Drugs, Ministry of Education, School of Medicine and Pharmacy, Ocean University of China, Qingdao, Shandong, 266003, China
| | - Chunni Liu
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Shaanxi Key Laboratory of Natural Products & Chemical Biology, College of Chemistry & Pharmacy, Northwest A&F University, Yangling, Shannxi, 712100, China
| | - Fang Wang
- Key Laboratory of Marine Drugs, Ministry of Education, School of Medicine and Pharmacy, Ocean University of China, Qingdao, Shandong, 266003, China
| | - Ni Song
- Key Laboratory of Marine Drugs, Ministry of Education, School of Medicine and Pharmacy, Ocean University of China, Qingdao, Shandong, 266003, China
| | - Jing Liu
- Key Laboratory of Marine Drugs, Ministry of Education, School of Medicine and Pharmacy, Ocean University of China, Qingdao, Shandong, 266003, China
| | - Huayue Li
- Key Laboratory of Marine Drugs, Ministry of Education, School of Medicine and Pharmacy, Ocean University of China, Qingdao, Shandong, 266003, China
- Laboratory for Marine Drugs and Bioproducts of Qingdao National Laboratory for Marine Science and Technology, Qingdao, Shandong, 266237, China
| | - Siyu Liu
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Shaanxi Key Laboratory of Natural Products & Chemical Biology, College of Chemistry & Pharmacy, Northwest A&F University, Yangling, Shannxi, 712100, China
| | - Tong Li
- Key Laboratory of Marine Drugs, Ministry of Education, School of Medicine and Pharmacy, Ocean University of China, Qingdao, Shandong, 266003, China
| | - Zengzhi Liu
- Key Laboratory of Marine Drugs, Ministry of Education, School of Medicine and Pharmacy, Ocean University of China, Qingdao, Shandong, 266003, China
| | - Fei Xiao
- Key Laboratory of Marine Drugs, Ministry of Education, School of Medicine and Pharmacy, Ocean University of China, Qingdao, Shandong, 266003, China
| | - Wenli Li
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Shaanxi Key Laboratory of Natural Products & Chemical Biology, College of Chemistry & Pharmacy, Northwest A&F University, Yangling, Shannxi, 712100, China
- Key Laboratory of Marine Drugs, Ministry of Education, School of Medicine and Pharmacy, Ocean University of China, Qingdao, Shandong, 266003, China
- Laboratory for Marine Drugs and Bioproducts of Qingdao National Laboratory for Marine Science and Technology, Qingdao, Shandong, 266237, China
| |
Collapse
|
2
|
Fijalkowski I, Snauwaert V, Van Damme P. Proteins à la carte: riboproteogenomic exploration of bacterial N-terminal proteoform expression. mBio 2024; 15:e0033324. [PMID: 38511928 PMCID: PMC11005335 DOI: 10.1128/mbio.00333-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Accepted: 02/28/2024] [Indexed: 03/22/2024] Open
Abstract
In recent years, it has become evident that the true complexity of bacterial proteomes remains underestimated. Gene annotation tools are known to propagate biases and overlook certain classes of truly expressed proteins, particularly proteoforms-protein isoforms arising from a single gene. Recent (re-)annotation efforts heavily rely on ribosome profiling by providing a direct readout of translation to fully describe bacterial proteomes. In this study, we employ a robust riboproteogenomic pipeline to conduct a systematic census of expressed N-terminal proteoform pairs, representing two isoforms encoded by a single gene raised by annotated and alternative translation initiation, in Salmonella. Intriguingly, conditional-dependent changes in relative utilization of annotated and alternative translation initiation sites (TIS) were observed in several cases. This suggests that TIS selection is subject to regulatory control, adding yet another layer of complexity to our understanding of bacterial proteomes. IMPORTANCE With the emerging theme of genes within genes comprising the existence of alternative open reading frames (ORFs) generated by translation initiation at in-frame start codons, mechanisms that control the relative utilization of annotated and alternative TIS need to be unraveled and our molecular understanding of resulting proteoforms broadened. Utilizing complementary ribosome profiling strategies to map ORF boundaries, we uncovered dual-encoding ORFs generated by in-frame TIS usage in Salmonella. Besides demonstrating that alternative TIS usage may generate proteoforms with different characteristics, such as differential localization and specialized function, quantitative aspects of conditional retapamulin-assisted ribosome profiling (Ribo-RET) translation initiation maps offer unprecedented insights into the relative utilization of annotated and alternative TIS, enabling the exploration of gene regulatory mechanisms that control TIS usage and, consequently, the translation of N-terminal proteoform pairs.
Collapse
Affiliation(s)
- Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| | - Valdes Snauwaert
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| |
Collapse
|
3
|
Fuchs S, Engelmann S. Small proteins in bacteria - Big challenges in prediction and identification. Proteomics 2023; 23:e2200421. [PMID: 37609810 DOI: 10.1002/pmic.202200421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/24/2023]
Abstract
Proteins with up to 100 amino acids have been largely overlooked due to the challenges associated with predicting and identifying them using traditional methods. Recent advances in bioinformatics and machine learning, DNA sequencing, RNA and Ribo-seq technologies, and mass spectrometry (MS) have greatly facilitated the detection and characterisation of these elusive proteins in recent years. This has revealed their crucial role in various cellular processes including regulation, signalling and transport, as toxins and as folding helpers for protein complexes. Consequently, the systematic identification and characterisation of these proteins in bacteria have emerged as a prominent field of interest within the microbial research community. This review provides an overview of different strategies for predicting and identifying these proteins on a large scale, leveraging the power of these advanced technologies. Furthermore, the review offers insights into the future developments that may be expected in this field.
Collapse
Affiliation(s)
- Stephan Fuchs
- Genome Competence Center (MF1), Department MFI, Robert-Koch-Institut, Berlin, Germany
| | - Susanne Engelmann
- Institute for Microbiology, Technische Universität Braunschweig, Braunschweig, Germany
- Microbial Proteomics, Helmholtzzentrum für Infektionsforschung GmbH, Braunschweig, Germany
| |
Collapse
|
4
|
Zhong A, Jiang X, Hickman AB, Klier K, Teodoro GIC, Dyda F, Laub MT, Storz G. Toxic antiphage defense proteins inhibited by intragenic antitoxin proteins. Proc Natl Acad Sci U S A 2023; 120:e2307382120. [PMID: 37487082 PMCID: PMC10400941 DOI: 10.1073/pnas.2307382120] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 06/21/2023] [Indexed: 07/26/2023] Open
Abstract
Recombination-promoting nuclease (Rpn) proteins are broadly distributed across bacterial phyla, yet their functions remain unclear. Here, we report that these proteins are toxin-antitoxin systems, comprised of genes-within-genes, that combat phage infection. We show the small, highly variable Rpn C-terminal domains (RpnS), which are translated separately from the full-length proteins (RpnL), directly block the activities of the toxic RpnL. The crystal structure of RpnAS revealed a dimerization interface encompassing α helix that can have four amino acid repeats whose number varies widely among strains of the same species. Consistent with strong selection for the variation, we document that plasmid-encoded RpnP2L protects Escherichia coli against certain phages. We propose that many more intragenic-encoded proteins that serve regulatory roles remain to be discovered in all organisms.
Collapse
Affiliation(s)
- Aoshu Zhong
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD20892
| | - Xiaofang Jiang
- Intramural Research Program, National Library of Medicine, NIH, Bethesda, MD20894
| | - Alison B. Hickman
- Laboratory of Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD20892
| | - Katherine Klier
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD20892
| | | | - Fred Dyda
- Laboratory of Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD20892
| | - Michael T. Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA02139
- HHMI, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Gisela Storz
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD20892
| |
Collapse
|
5
|
Chlebek JL, Leonard SP, Kang-Yun C, Yung MC, Ricci DP, Jiao Y, Park DM. Prolonging genetic circuit stability through adaptive evolution of overlapping genes. Nucleic Acids Res 2023; 51:7094-7108. [PMID: 37260076 PMCID: PMC10359631 DOI: 10.1093/nar/gkad484] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 05/12/2023] [Accepted: 05/23/2023] [Indexed: 06/02/2023] Open
Abstract
The development of synthetic biological circuits that maintain functionality over application-relevant time scales remains a significant challenge. Here, we employed synthetic overlapping sequences in which one gene is encoded or 'entangled' entirely within an alternative reading frame of another gene. In this design, the toxin-encoding relE was entangled within ilvA, which encodes threonine deaminase, an enzyme essential for isoleucine biosynthesis. A functional entanglement construct was obtained upon modification of the ribosome-binding site of the internal relE gene. Using this optimized design, we found that the selection pressure to maintain functional IlvA stabilized the production of burdensome RelE for >130 generations, which compares favorably with the most stable kill-switch circuits developed to date. This stabilizing effect was achieved through a complete alteration of the allowable landscape of mutations such that mutations inactivating the entangled genes were disfavored. Instead, the majority of lineages accumulated mutations within the regulatory region of ilvA. By reducing baseline relE expression, these more 'benign' mutations lowered circuit burden, which suppressed the accumulation of relE-inactivating mutations, thereby prolonging kill-switch function. Overall, this work demonstrates the utility of sequence entanglement paired with an adaptive laboratory evolution campaign to increase the evolutionary stability of burdensome synthetic circuits.
Collapse
Affiliation(s)
- Jennifer L Chlebek
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | - Sean P Leonard
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | - Christina Kang-Yun
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | - Mimi C Yung
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | - Dante P Ricci
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | - Yongqin Jiao
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | - Dan M Park
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| |
Collapse
|
6
|
Kienzle L, Bettinazzi S, Choquette T, Brunet M, Khorami HH, Jacques JF, Moreau M, Roucou X, Landry CR, Angers A, Breton S. A small protein coded within the mitochondrial canonical gene nd4 regulates mitochondrial bioenergetics. BMC Biol 2023; 21:111. [PMID: 37198654 DOI: 10.1186/s12915-023-01609-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 05/03/2023] [Indexed: 05/19/2023] Open
Abstract
BACKGROUND Mitochondria have a central role in cellular functions, aging, and in certain diseases. They possess their own genome, a vestige of their bacterial ancestor. Over the course of evolution, most of the genes of the ancestor have been lost or transferred to the nucleus. In humans, the mtDNA is a very small circular molecule with a functional repertoire limited to only 37 genes. Its extremely compact nature with genes arranged one after the other and separated by short non-coding regions suggests that there is little room for evolutionary novelties. This is radically different from bacterial genomes, which are also circular but much larger, and in which we can find genes inside other genes. These sequences, different from the reference coding sequences, are called alternatives open reading frames or altORFs, and they are involved in key biological functions. However, whether altORFs exist in mitochondrial protein-coding genes or elsewhere in the human mitogenome has not been fully addressed. RESULTS We found a downstream alternative ATG initiation codon in the + 3 reading frame of the human mitochondrial nd4 gene. This newly characterized altORF encodes a 99-amino-acid-long polypeptide, MTALTND4, which is conserved in primates. Our custom antibody, but not the pre-immune serum, was able to immunoprecipitate MTALTND4 from HeLa cell lysates, confirming the existence of an endogenous MTALTND4 peptide. The protein is localized in mitochondria and cytoplasm and is also found in the plasma, and it impacts cell and mitochondrial physiology. CONCLUSIONS Many human mitochondrial translated ORFs might have so far gone unnoticed. By ignoring mtaltORFs, we have underestimated the coding potential of the mitogenome. Alternative mitochondrial peptides such as MTALTND4 may offer a new framework for the investigation of mitochondrial functions and diseases.
Collapse
Affiliation(s)
- Laura Kienzle
- Département de sciences biologiques, Université de Montréal, Montréal, Canada
| | - Stefano Bettinazzi
- Département de sciences biologiques, Université de Montréal, Montréal, Canada
| | - Thierry Choquette
- Département de sciences biologiques, Université de Montréal, Montréal, Canada
| | - Marie Brunet
- Service de génétique médicale, Département de pédiatrie, Université de Sherbrooke, Sherbrooke, Canada
- Centre de recherche du Centre hospitalier universitaire de Sherbrooke (CRCHUS), Sherbrooke, Canada
| | | | - Jean-François Jacques
- Département de biochimie et génomique fonctionnelle, Université de Sherbrooke, Sherbrooke, Canada
| | - Mathilde Moreau
- Département de biochimie et génomique fonctionnelle, Université de Sherbrooke, Sherbrooke, Canada
| | - Xavier Roucou
- Centre de recherche du Centre hospitalier universitaire de Sherbrooke (CRCHUS), Sherbrooke, Canada
- Département de biochimie et génomique fonctionnelle, Université de Sherbrooke, Sherbrooke, Canada
| | - Christian R Landry
- Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, Québec, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec, Canada
- Centre de recherche sur les données massives, Université Laval, Québec, Canada
- Département de biologie, Faculté des sciences et de génie, Université Laval, Québec, Canada
| | - Annie Angers
- Département de sciences biologiques, Université de Montréal, Montréal, Canada
| | - Sophie Breton
- Département de sciences biologiques, Université de Montréal, Montréal, Canada.
| |
Collapse
|
7
|
Zhong A, Jiang X, Hickman AB, Klier K, Teodoro GIC, Dyda F, Laub MT, Storz G. Toxic anti-phage defense proteins inhibited by intragenic antitoxin proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.02.539157. [PMID: 37425788 PMCID: PMC10327210 DOI: 10.1101/2023.05.02.539157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Recombination-promoting nuclease (Rpn) proteins are broadly distributed across bacterial phyla, yet their functions remain unclear. Here we report these proteins are new toxin-antitoxin systems, comprised of genes-within-genes, that combat phage infection. We show the small, highly variable Rpn C -terminal domains (Rpn S ), which are translated separately from the full-length proteins (Rpn L ), directly block the activities of the toxic full-length proteins. The crystal structure of RpnA S revealed a dimerization interface encompassing a helix that can have four amino acid repeats whose number varies widely among strains of the same species. Consistent with strong selection for the variation, we document plasmid-encoded RpnP2 L protects Escherichia coli against certain phages. We propose many more intragenic-encoded proteins that serve regulatory roles remain to be discovered in all organisms. Significance Here we document the function of small genes-within-genes, showing they encode antitoxin proteins that block the functions of the toxic DNA endonuclease proteins encoded by the longer rpn genes. Intriguingly, a sequence present in both long and short protein shows extensive variation in the number of four amino acid repeats. Consistent with a strong selection for the variation, we provide evidence that the Rpn proteins represent a phage defense system.
Collapse
|
8
|
Smith C, Canestrari JG, Wang AJ, Champion MM, Derbyshire KM, Gray TA, Wade JT. Pervasive translation in Mycobacterium tuberculosis. eLife 2022; 11:e73980. [PMID: 35343439 PMCID: PMC9094748 DOI: 10.7554/elife.73980] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 03/25/2022] [Indexed: 11/13/2022] Open
Abstract
Most bacterial ORFs are identified by automated prediction algorithms. However, these algorithms often fail to identify ORFs lacking canonical features such as a length of >50 codons or the presence of an upstream Shine-Dalgarno sequence. Here, we use ribosome profiling approaches to identify actively translated ORFs in Mycobacterium tuberculosis. Most of the ORFs we identify have not been previously described, indicating that the M. tuberculosis transcriptome is pervasively translated. The newly described ORFs are predominantly short, with many encoding proteins of ≤50 amino acids. Codon usage of the newly discovered ORFs suggests that most have not been subject to purifying selection, and hence are unlikely to contribute to cell fitness. Nevertheless, we identify 90 new ORFs (median length of 52 codons) that bear the hallmarks of purifying selection. Thus, our data suggest that pervasive translation of short ORFs in Mycobacterium tuberculosis serves as a rich source for the evolution of new functional proteins.
Collapse
Affiliation(s)
- Carol Smith
- Wadsworth Center, Division of Genetics, New York State Department of HealthAlbanyUnited States
| | - Jill G Canestrari
- Wadsworth Center, Division of Genetics, New York State Department of HealthAlbanyUnited States
| | - Archer J Wang
- Wadsworth Center, Division of Genetics, New York State Department of HealthAlbanyUnited States
| | - Matthew M Champion
- Department of Chemistry and Biochemistry, University of Notre DameNotre DameUnited States
| | - Keith M Derbyshire
- Wadsworth Center, Division of Genetics, New York State Department of HealthAlbanyUnited States
- Department of Biomedical Sciences, School of Public Health, University at AlbanyNew YorkUnited States
| | - Todd A Gray
- Wadsworth Center, Division of Genetics, New York State Department of HealthAlbanyUnited States
- Department of Biomedical Sciences, School of Public Health, University at AlbanyNew YorkUnited States
| | - Joseph T Wade
- Wadsworth Center, Division of Genetics, New York State Department of HealthAlbanyUnited States
- Department of Biomedical Sciences, School of Public Health, University at AlbanyNew YorkUnited States
| |
Collapse
|
9
|
Gelhausen R, Müller T, Svensson SL, Alkhnbashi OS, Sharma CM, Eggenhofer F, Backofen R. RiboReport - benchmarking tools for ribosome profiling-based identification of open reading frames in bacteria. Brief Bioinform 2022; 23:6509045. [PMID: 35037022 PMCID: PMC8921622 DOI: 10.1093/bib/bbab549] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 11/22/2021] [Accepted: 11/29/2021] [Indexed: 11/19/2022] Open
Abstract
Small proteins encoded by short open reading frames (ORFs) with 50 codons or fewer are emerging as an important class of cellular macromolecules in diverse organisms. However, they often evade detection by proteomics or in silico methods. Ribosome profiling (Ribo-seq) has revealed widespread translation in genomic regions previously thought to be non-coding, driving the development of ORF detection tools using Ribo-seq data. However, only a handful of tools have been designed for bacteria, and these have not yet been systematically compared. Here, we aimed to identify tools that use Ribo-seq data to correctly determine the translational status of annotated bacterial ORFs and also discover novel translated regions with high sensitivity. To this end, we generated a large set of annotated ORFs from four diverse bacterial organisms, manually labeled for their translation status based on Ribo-seq data, which are available for future benchmarking studies. This set was used to investigate the predictive performance of seven Ribo-seq-based ORF detection tools (REPARATION_blast, DeepRibo, Ribo-TISH, PRICE, smORFer, ribotricer and SPECtre), as well as IRSOM, which uses coding potential and RNA-seq coverage only. DeepRibo and REPARATION_blast robustly predicted translated ORFs, including sORFs, with no significant difference for ORFs in close proximity to other genes versus stand-alone genes. However, no tool predicted a set of novel, experimentally verified sORFs with high sensitivity. Start codon predictions with smORFer show the value of initiation site profiling data to further improve the sensitivity of ORF prediction tools in bacteria. Overall, we find that bacterial tools perform well for sORF detection, although there is potential for improving their performance, applicability, usability and reproducibility.
Collapse
Affiliation(s)
- Rick Gelhausen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Teresa Müller
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Sarah L Svensson
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Str. 2 / D15, 97080, Würzburg, Germany
| | - Omer S Alkhnbashi
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Cynthia M Sharma
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Str. 2 / D15, 97080, Würzburg, Germany
| | - Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany.,Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schänzlestr. 18, 79104, State, Germany
| |
Collapse
|
10
|
Abstract
Modern genome-scale methods that identify new genes, such as proteogenomics and ribosome profiling, have revealed, to the surprise of many, that overlap in genes, open reading frames and even coding sequences is widespread and functionally integrated into prokaryotic, eukaryotic and viral genomes. In parallel, the constraints that overlapping regions place on genome sequences and their evolution can be harnessed in bioengineering to build more robust synthetic strains and constructs. With a focus on overlapping protein-coding and RNA-coding genes, this Review examines their discovery, topology and biogenesis in the context of their genome biology. We highlight exciting new uses for sequence overlap to control translation, compress synthetic genetic constructs, and protect against mutation.
Collapse
|
11
|
Wichmann S, Scherer S, Ardern Z. Biological factors in the synthetic construction of overlapping genes. BMC Genomics 2021; 22:888. [PMID: 34895142 PMCID: PMC8665328 DOI: 10.1186/s12864-021-08181-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 11/17/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Overlapping genes (OLGs) with long protein-coding overlapping sequences are disallowed by standard genome annotation programs, outside of viruses. Recently however they have been discovered in Archaea, diverse Bacteria, and Mammals. The biological factors underlying life's ability to create overlapping genes require more study, and may have important applications in understanding evolution and in biotechnology. A previous study claimed that protein domains from viruses were much better suited to forming overlaps than those from other cellular organisms - in this study we assessed this claim, in order to discover what might underlie taxonomic differences in the creation of gene overlaps. RESULTS After overlapping arbitrary Pfam domain pairs and evaluating them with Hidden Markov Models we find OLG construction to be much less constrained than expected. For instance, close to 10% of the constructed sequences cannot be distinguished from typical sequences in their protein family. Most are also indistinguishable from natural protein sequences regarding identity and secondary structure. Surprisingly, contrary to a previous study, virus domains were much less suitable for designing OLGs than bacterial or eukaryotic domains were. In general, the amount of amino acid change required to force a domain to overlap is approximately equal to the variation observed within a typical domain family. The resulting high similarity between natural sequences and those altered so as to overlap is mostly due to the combination of high redundancy in the genetic code and the evolutionary exchangeability of many amino acids. CONCLUSIONS Synthetic overlapping genes which closely resemble natural gene sequences, as measured by HMM profiles, are remarkably easy to construct, and most arbitrary domain pairs can be altered so as to overlap while retaining high similarity to the original sequences. Future work however will need to assess important factors not considered such as intragenic interactions which affect protein folding. While the analysis here is not sufficient to guarantee functional folding proteins, further analysis of constructed OLGs will improve our understanding of the origin of these remarkable genetic elements across life and opens up exciting possibilities for synthetic biology.
Collapse
Affiliation(s)
- Stefan Wichmann
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Siegfried Scherer
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Zachary Ardern
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
12
|
Nelson CW, Ardern Z, Wei X. OLGenie: Estimating Natural Selection to Predict Functional Overlapping Genes. Mol Biol Evol 2021; 37:2440-2449. [PMID: 32243542 PMCID: PMC7531306 DOI: 10.1093/molbev/msaa087] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Purifying (negative) natural selection is a hallmark of functional biological sequences, and can be detected in protein-coding genes using the ratio of nonsynonymous to synonymous substitutions per site (dN/dS). However, when two genes overlap the same nucleotide sites in different frames, synonymous changes in one gene may be nonsynonymous in the other, perturbing dN/dS. Thus, scalable methods are needed to estimate functional constraint specifically for overlapping genes (OLGs). We propose OLGenie, which implements a modification of the Wei–Zhang method. Assessment with simulations and controls from viral genomes (58 OLGs and 176 non-OLGs) demonstrates low false-positive rates and good discriminatory ability in differentiating true OLGs from non-OLGs. We also apply OLGenie to the unresolved case of HIV-1’s putative antisense protein gene, showing significant purifying selection. OLGenie can be used to study known OLGs and to predict new OLGs in genome annotation. Software and example data are freely available at https://github.com/chasewnelson/OLGenie (last accessed April 10, 2020).
Collapse
Affiliation(s)
- Chase W Nelson
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY.,Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Zachary Ardern
- Microbial Ecology, ZIEL-Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Xinzhu Wei
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI.,Department of Integrative Biology and Statistics, University of California, Berkeley, CA
| |
Collapse
|
13
|
McBride TM, Schwartz EA, Kumar A, Taylor DW, Fineran PC, Fagerlund RD. Diverse CRISPR-Cas Complexes Require Independent Translation of Small and Large Subunits from a Single Gene. Mol Cell 2020; 80:971-979.e7. [PMID: 33248026 DOI: 10.1016/j.molcel.2020.11.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 10/22/2020] [Accepted: 10/29/2020] [Indexed: 12/26/2022]
Abstract
CRISPR-Cas adaptive immune systems provide prokaryotes with defense against viruses by degradation of specific invading nucleic acids. Despite advances in the biotechnological exploitation of select systems, multiple CRISPR-Cas types remain uncharacterized. Here, we investigated the previously uncharacterized type I-D interference complex and revealed that it is a genetic and structural hybrid with similarity to both type I and type III systems. Surprisingly, formation of the functional complex required internal in-frame translation of small subunits from within the large subunit gene. We further show that internal translation to generate small subunits is widespread across diverse type I-D, I-B, and I-C systems, which account for roughly one quarter of CRISPR-Cas systems. Our work reveals the unexpected expansion of protein coding potential from within single cas genes, which has important implications for understanding CRISPR-Cas function and evolution.
Collapse
Affiliation(s)
- Tess M McBride
- Department of Microbiology and Immunology, University of Otago, PO Box 56, Dunedin 9054, New Zealand
| | - Evan A Schwartz
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712-1597, USA; Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX 78712-1597, USA
| | - Abhishek Kumar
- Centre for Protein Research, University of Otago, PO Box 56, Dunedin 9054, New Zealand
| | - David W Taylor
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712-1597, USA; Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX 78712-1597, USA; Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, TX 78712-1597, USA; LIVESTRONG Cancer Institutes, Dell Medical School, Austin, TX 78712-1597, USA
| | - Peter C Fineran
- Department of Microbiology and Immunology, University of Otago, PO Box 56, Dunedin 9054, New Zealand; Bio-Protection Research Centre, University of Otago, PO Box 56, Dunedin 9054, New Zealand; Genetics Otago, University of Otago, Dunedin, New Zealand
| | - Robert D Fagerlund
- Department of Microbiology and Immunology, University of Otago, PO Box 56, Dunedin 9054, New Zealand; Genetics Otago, University of Otago, Dunedin, New Zealand.
| |
Collapse
|
14
|
Orr MW, Mao Y, Storz G, Qian SB. Alternative ORFs and small ORFs: shedding light on the dark proteome. Nucleic Acids Res 2020; 48:1029-1042. [PMID: 31504789 DOI: 10.1093/nar/gkz734] [Citation(s) in RCA: 146] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 08/03/2019] [Accepted: 08/15/2019] [Indexed: 02/06/2023] Open
Abstract
Traditional annotation of protein-encoding genes relied on assumptions, such as one open reading frame (ORF) encodes one protein and minimal lengths for translated proteins. With the serendipitous discoveries of translated ORFs encoded upstream and downstream of annotated ORFs, from alternative start sites nested within annotated ORFs and from RNAs previously considered noncoding, it is becoming clear that these initial assumptions are incorrect. The findings have led to the realization that genetic information is more densely coded and that the proteome is more complex than previously anticipated. As such, interest in the identification and characterization of the previously ignored 'dark proteome' is increasing, though we note that research in eukaryotes and bacteria has largely progressed in isolation. To bridge this gap and illustrate exciting findings emerging from studies of the dark proteome, we highlight recent advances in both eukaryotic and bacterial cells. We discuss progress in the detection of alternative ORFs as well as in the understanding of functions and the regulation of their expression and posit questions for future work.
Collapse
Affiliation(s)
- Mona Wu Orr
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892, USA
| | - Yuanhui Mao
- Division of Nutritional Sciences, Cornell University, Ithaca, NY 14853, USA
| | - Gisela Storz
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892, USA
| | - Shu-Bing Qian
- Division of Nutritional Sciences, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
15
|
D'Agostino PM, Al-Sinawi B, Mazmouz R, Muenchhoff J, Neilan BA, Moffitt MC. Identification of promoter elements in the Dolichospermum circinale AWQC131C saxitoxin gene cluster and the experimental analysis of their use for heterologous expression. BMC Microbiol 2020; 20:35. [PMID: 32070286 PMCID: PMC7027233 DOI: 10.1186/s12866-020-1720-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 02/03/2020] [Indexed: 01/06/2023] Open
Abstract
Background Dolichospermum circinale is a filamentous bloom-forming cyanobacterium responsible for biosynthesis of the paralytic shellfish toxins (PST), including saxitoxin. PSTs are neurotoxins and in their purified form are important analytical standards for monitoring the quality of water and seafood and biomedical research tools for studying neuronal sodium channels. More recently, PSTs have been recognised for their utility as local anaesthetics. Characterisation of the transcriptional elements within the saxitoxin (sxt) biosynthetic gene cluster (BGC) is a first step towards accessing these molecules for biotechnology. Results In D. circinale AWQC131C the sxt BGC is transcribed from two bidirectional promoter regions encoding five individual promoters. These promoters were identified experimentally using 5′ RACE and their activity assessed via coupling to a lux reporter system in E. coli and Synechocystis sp. PCC 6803. Transcription of the predicted drug/metabolite transporter (DMT) encoded by sxtPER was found to initiate from two promoters, PsxtPER1 and PsxtPER2. In E. coli, strong expression of lux from PsxtP, PsxtD and PsxtPER1 was observed while expression from Porf24 and PsxtPER2 was remarkably weaker. In contrast, heterologous expression in Synechocystis sp. PCC 6803 showed that expression of lux from PsxtP, PsxtPER1, and Porf24 promoters was statistically higher compared to the non-promoter control, while PsxtD showed poor activity under the described conditions. Conclusions Both of the heterologous hosts investigated in this study exhibited high expression levels from three of the five sxt promoters. These results indicate that the majority of the native sxt promoters appear active in different heterologous hosts, simplifying initial cloning efforts. Therefore, heterologous expression of the sxt BGC in either E. coli or Synechocystis could be a viable first option for producing PSTs for industrial or biomedical purposes.
Collapse
Affiliation(s)
- Paul M D'Agostino
- School of Science, Western Sydney University, Sydney, NSW, Australia.,School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia.,Biosystems Chemistry, Department of Chemistry, Technische Universität München, Garching, Germany.,Technical Biochemistry, Faculty of Chemistry and Food Chemistry, Technische Universität Dresden, Dresden, Germany
| | - Bakir Al-Sinawi
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Rabia Mazmouz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Julia Muenchhoff
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia.,Centre for Healthy Brain Ageing, School of Psychiatry, University of New South Wales, Sydney, Australia
| | - Brett A Neilan
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia. .,School of Environmental and Life Sciences, University of Newcastle, Callaghan, Australia.
| | | |
Collapse
|
16
|
Clauwaert J, Menschaert G, Waegeman W. DeepRibo: a neural network for precise gene annotation of prokaryotes by combining ribosome profiling signal and binding site patterns. Nucleic Acids Res 2019; 47:e36. [PMID: 30753697 PMCID: PMC6451124 DOI: 10.1093/nar/gkz061] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 01/02/2019] [Accepted: 01/30/2019] [Indexed: 12/13/2022] Open
Abstract
Annotation of gene expression in prokaryotes often finds itself corrected due to small variations of the annotated gene regions observed between different (sub)-species. It has become apparent that traditional sequence alignment algorithms, used for the curation of genomes, are not able to map the full complexity of the genomic landscape. We present DeepRibo, a novel neural network utilizing features extracted from ribosome profiling information and binding site sequence patterns that shows to be a precise tool for the delineation and annotation of expressed genes in prokaryotes. The neural network combines recurrent memory cells and convolutional layers, adapting the information gained from both the high-throughput ribosome profiling data and ribosome binding translation initiation sequence region into one model. DeepRibo is designed as a single model trained on a variety of ribosome profiling experiments, used for the identification of open reading frames in prokaryotes without a priori knowledge of the translational landscape. Through extensive validation of the model trained on various sets of data, multiple species sequence similarity, mass spectrometry and Edman degradation verified proteins, the effectiveness of DeepRibo is highlighted.
Collapse
Affiliation(s)
- Jim Clauwaert
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653, 9000 Gent, Belgium
| | - Gerben Menschaert
- Biobix, Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653, 9000 Gent, Belgium
| | - Willem Waegeman
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653, 9000 Gent, Belgium
| |
Collapse
|
17
|
Abstract
ABSTRACT
Although bacterial genomes are usually densely protein-coding, genome-wide mapping approaches of transcriptional start sites revealed that a significant fraction of the identified promoters drive the transcription of noncoding RNAs. These can be
trans
-acting RNAs, mainly originating from intergenic regions and, in many studied examples, possessing regulatory functions. However, a significant fraction of these noncoding RNAs consist of natural antisense transcripts (asRNAs), which overlap other transcriptional units. Naturally occurring asRNAs were first observed to play a role in bacterial plasmid replication and in bacteriophage λ more than 30 years ago. Today’s view is that asRNAs abound in all three domains of life. There are several examples of asRNAs in bacteria with clearly defined functions. Nevertheless, many asRNAs appear to result from pervasive initiation of transcription, and some data point toward global functions of such widespread transcriptional activity, explaining why the search for a specific regulatory role is sometimes futile. In this review, we give an overview about the occurrence of antisense transcription in bacteria, highlight particular examples of functionally characterized asRNAs, and discuss recent evidence pointing at global relevance in RNA processing and transcription-coupled DNA repair.
Collapse
|