1
|
Ruiz-Orera J, Miller DC, Greiner J, Genehr C, Grammatikaki A, Blachut S, Mbebi J, Patone G, Myronova A, Adami E, Dewani N, Liang N, Hummel O, Muecke MB, Hildebrandt TB, Fritsch G, Schrade L, Zimmermann WH, Kondova I, Diecke S, van Heesch S, Hübner N. Evolution of translational control and the emergence of genes and open reading frames in human and non-human primate hearts. NATURE CARDIOVASCULAR RESEARCH 2024:10.1038/s44161-024-00544-7. [PMID: 39317836 DOI: 10.1038/s44161-024-00544-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 08/28/2024] [Indexed: 09/26/2024]
Abstract
Evolutionary innovations can be driven by changes in the rates of RNA translation and the emergence of new genes and small open reading frames (sORFs). In this study, we characterized the transcriptional and translational landscape of the hearts of four primate and two rodent species through integrative ribosome and transcriptomic profiling, including adult left ventricle tissues and induced pluripotent stem cell-derived cardiomyocyte cell cultures. We show here that the translational efficiencies of subunits of the mitochondrial oxidative phosphorylation chain complexes IV and V evolved rapidly across mammalian evolution. Moreover, we discovered hundreds of species-specific and lineage-specific genomic innovations that emerged during primate evolution in the heart, including 551 genes, 504 sORFs and 76 evolutionarily conserved genes displaying human-specific cardiac-enriched expression. Overall, our work describes the evolutionary processes and mechanisms that have shaped cardiac transcription and translation in recent primate evolution and sheds light on how these can contribute to cardiac development and disease.
Collapse
Affiliation(s)
- Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany.
| | - Duncan C Miller
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Technology Platform Pluripotent Stem Cells, Berlin, Germany
| | - Johannes Greiner
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Carolin Genehr
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Technology Platform Pluripotent Stem Cells, Berlin, Germany
| | - Aliki Grammatikaki
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Susanne Blachut
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Jeanne Mbebi
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Giannino Patone
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Anna Myronova
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Eleonora Adami
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Nikita Dewani
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Ning Liang
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Oliver Hummel
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Michael B Muecke
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Thomas B Hildebrandt
- Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany
- Freie Universitaet Berlin, Berlin, Germany
| | - Guido Fritsch
- Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany
| | - Lisa Schrade
- Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany
| | - Wolfram H Zimmermann
- Institute of Pharmacology and Toxicology, University Medical Center Göttingen, Göttingen, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Lower Saxony, Göttingen, Germany
- DZNE (German Center for Neurodegenerative Diseases), Göttingen, Germany
- Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Göttingen, Germany
| | - Ivanela Kondova
- Biomedical Primate Research Centre (BPRC), Rijswijk, The Netherlands
| | - Sebastian Diecke
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Technology Platform Pluripotent Stem Cells, Berlin, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Berlin, Berlin, Germany
| | - Sebastiaan van Heesch
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Norbert Hübner
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany.
- DZHK (German Center for Cardiovascular Research), Partner Site Berlin, Berlin, Germany.
- Charité-Universitätsmedizin, Berlin, Germany.
- Helmholtz Institute for Translational AngioCardioScience (HI-TAC) of the Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC) at Heidelberg University, Heidelberg, Germany.
| |
Collapse
|
2
|
Engel SR, Aleksander S, Nash RS, Wong ED, Weng S, Miyasato SR, Sherlock G, Cherry JM. Saccharomyces Genome Database: Advances in Genome Annotation, Expanded Biochemical Pathways, and Other Key Enhancements. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.16.613348. [PMID: 39345624 PMCID: PMC11430078 DOI: 10.1101/2024.09.16.613348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Budding yeast (Saccharomyces cerevisiae) is the most extensively characterized eukaryotic model organism and has long been used to gain insight into the fundamentals of genetics, cellular biology, and the functions of specific genes and proteins. The Saccharomyces Genome Database (SGD) is a scientific resource that provides information about the genome and biology of S. cerevisiae. For more than 30 years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation for budding yeast along with search and analysis tools to explore these data. Here we describe recent updates at SGD, including the two most recent reference genome annotation updates, expanded biochemical pathways representation, changes to SGD search and data files, and other enhancements to the SGD website and user interface. These activities are part of our continuing effort to promote insights gained from yeast to enable the discovery of functional relationships between sequence and gene products in fungi and higher eukaryotes.
Collapse
Affiliation(s)
- Stacia R Engel
- Department of Genetics, Stanford University, Palo Alto, CA 94304, USA
| | - Suzi Aleksander
- Department of Genetics, Stanford University, Palo Alto, CA 94304, USA
| | - Robert S Nash
- Department of Genetics, Stanford University, Palo Alto, CA 94304, USA
| | - Edith D Wong
- Department of Genetics, Stanford University, Palo Alto, CA 94304, USA
| | - Shuai Weng
- Department of Genetics, Stanford University, Palo Alto, CA 94304, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Palo Alto, CA 94304, USA
| | - Gavin Sherlock
- Department of Genetics, Stanford University, Palo Alto, CA 94304, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Palo Alto, CA 94304, USA
| |
Collapse
|
3
|
Whited AM, Jungreis I, Allen J, Cleveland CL, Mudge JM, Kellis M, Rinn JL, Hough LE. Biophysical characterization of high-confidence, small human proteins. BIOPHYSICAL REPORTS 2024; 4:100167. [PMID: 38909903 PMCID: PMC11305224 DOI: 10.1016/j.bpr.2024.100167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/09/2024] [Accepted: 06/20/2024] [Indexed: 06/25/2024]
Abstract
Significant efforts have been made to characterize the biophysical properties of proteins. Small proteins have received less attention because their annotation has historically been less reliable. However, recent improvements in sequencing, proteomics, and bioinformatics techniques have led to the high-confidence annotation of small open reading frames (smORFs) that encode for functional proteins, producing smORF-encoded proteins (SEPs). SEPs have been found to perform critical functions in several species, including humans. While significant efforts have been made to annotate SEPs, less attention has been given to the biophysical properties of these proteins. We characterized the distributions of predicted and curated biophysical properties, including sequence composition, structure, localization, function, and disease association of a conservative list of previously identified human SEPs. We found significant differences between SEPs and both larger proteins and control sets. In addition, we provide an example of how our characterization of biophysical properties can contribute to distinguishing protein-coding smORFs from noncoding ones in otherwise ambiguous cases.
Collapse
Affiliation(s)
- A M Whited
- BioFrontiers Institute, University of Colorado, Boulder, Colorado
| | - Irwin Jungreis
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts; MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts
| | - Jeffre Allen
- BioFrontiers Institute, University of Colorado, Boulder, Colorado; Department of Biochemistry, University of Colorado Boulder, Boulder, Colorado
| | | | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Manolis Kellis
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts; MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts
| | - John L Rinn
- BioFrontiers Institute, University of Colorado, Boulder, Colorado; Department of Biochemistry, University of Colorado Boulder, Boulder, Colorado
| | - Loren E Hough
- BioFrontiers Institute, University of Colorado, Boulder, Colorado; Department of Physics, University of Colorado Boulder, Boulder, Colorado.
| |
Collapse
|
4
|
Houghton CJ, Coelho NC, Chiang A, Hedayati S, Parikh SB, Ozbaki-Yagan N, Wacholder A, Iannotta J, Berger A, Carvunis AR, O'Donnell AF. Cellular processing of beneficial de novo emerging proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.28.610198. [PMID: 39257767 PMCID: PMC11384008 DOI: 10.1101/2024.08.28.610198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Novel proteins can originate de novo from non-coding DNA and contribute to species-specific adaptations. It is challenging to conceive how de novo emerging proteins may integrate pre-existing cellular systems to bring about beneficial traits, given that their sequences are previously unseen by the cell. To address this apparent paradox, we investigated 26 de novo emerging proteins previously associated with growth benefits in yeast. Microscopy revealed that these beneficial emerging proteins preferentially localize to the endoplasmic reticulum (ER). Sequence and structure analyses uncovered a common protein organization among all ER-localizing beneficial emerging proteins, characterized by a short hydrophobic C-terminus immediately preceded by a transmembrane domain. Using genetic and biochemical approaches, we showed that ER localization of beneficial emerging proteins requires the GET and SND pathways, both of which are evolutionarily conserved and known to recognize transmembrane domains to promote post-translational ER insertion. The abundance of ER-localizing beneficial emerging proteins was regulated by conserved proteasome- and vacuole-dependent processes, through mechanisms that appear to be facilitated by the emerging proteins' C-termini. Consequently, we propose that evolutionarily conserved pathways can convergently govern the cellular processing of de novo emerging proteins with unique sequences, likely owing to common underlying protein organization patterns.
Collapse
Affiliation(s)
- Carly J Houghton
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Nelson Castilho Coelho
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Annette Chiang
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Stefanie Hedayati
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Saurin B Parikh
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Nejla Ozbaki-Yagan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Aaron Wacholder
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - John Iannotta
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Alexis Berger
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Anne-Ruxandra Carvunis
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Allyson F O'Donnell
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States
| |
Collapse
|
5
|
Vakirlis N, Acar O, Cherupally V, Carvunis AR. Ancestral Sequence Reconstruction as a Tool to Detect and Study De Novo Gene Emergence. Genome Biol Evol 2024; 16:evae151. [PMID: 39004885 PMCID: PMC11299112 DOI: 10.1093/gbe/evae151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 06/17/2024] [Accepted: 07/09/2024] [Indexed: 07/16/2024] Open
Abstract
New protein-coding genes can evolve from previously noncoding genomic regions through a process known as de novo gene emergence. Evidence suggests that this process has likely occurred throughout evolution and across the tree of life. Yet, confidently identifying de novo emerged genes remains challenging. Ancestral sequence reconstruction is a promising approach for inferring whether a gene has emerged de novo or not, as it allows us to inspect whether a given genomic locus ancestrally harbored protein-coding capacity. However, the use of ancestral sequence reconstruction in the context of de novo emergence is still in its infancy and its capabilities, limitations, and overall potential are largely unknown. Notably, it is difficult to formally evaluate the protein-coding capacity of ancestral sequences, particularly when new gene candidates are short. How well-suited is ancestral sequence reconstruction as a tool for the detection and study of de novo genes? Here, we address this question by designing an ancestral sequence reconstruction workflow incorporating different tools and sets of parameters and by introducing a formal criterion that allows to estimate, within a desired level of confidence, when protein-coding capacity originated at a particular locus. Applying this workflow on ∼2,600 short, annotated budding yeast genes (<1,000 nucleotides), we found that ancestral sequence reconstruction robustly predicts an ancient origin for the most widely conserved genes, which constitute "easy" cases. For less robust cases, we calculated a randomization-based empirical P-value estimating whether the observed conservation between the extant and ancestral reading frame could be attributed to chance. This formal criterion allowed us to pinpoint a branch of origin for most of the less robust cases, identifying 49 genes that can unequivocally be considered de novo originated since the split of the Saccharomyces genus, including 37 Saccharomyces cerevisiae-specific genes. We find that for the remaining equivocal cases we cannot rule out different evolutionary scenarios including rapid evolution, multiple gene losses, or a recent de novo origin. Overall, our findings suggest that ancestral sequence reconstruction is a valuable tool to study de novo gene emergence but should be applied with caution and awareness of its limitations.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Omer Acar
- Pittsburgh Center for Evolutionary Biology and Medicine, Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Vijay Cherupally
- Pittsburgh Center for Evolutionary Biology and Medicine, Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Pittsburgh Center for Evolutionary Biology and Medicine, Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
6
|
Roginski P, Grandchamp A, Quignot C, Lopes A. De Novo Emerged Gene Search in Eukaryotes with DENSE. Genome Biol Evol 2024; 16:evae159. [PMID: 39212967 PMCID: PMC11363675 DOI: 10.1093/gbe/evae159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/07/2024] [Indexed: 09/04/2024] Open
Abstract
The discovery of de novo emerged genes, originating from previously noncoding DNA regions, challenges traditional views of species evolution. Indeed, the hypothesis of neutrally evolving sequences giving rise to functional proteins is highly unlikely. This conundrum has sparked numerous studies to quantify and characterize these genes, aiming to understand their functional roles and contributions to genome evolution. Yet, no fully automated pipeline for their identification is available. Therefore, we introduce DENSE (DE Novo emerged gene SEarch), an automated Nextflow pipeline based on two distinct steps: detection of taxonomically restricted genes (TRGs) through phylostratigraphy, and filtering of TRGs for de novo emerged genes via genome comparisons and synteny search. DENSE is available as a user-friendly command-line tool, while the second step is accessible through a web server upon providing a list of TRGs. Highly flexible, DENSE provides various strategy and parameter combinations, enabling users to adapt to specific configurations or define their own strategy through a rational framework, facilitating protocol communication, and study interoperability. We apply DENSE to seven model organisms, exploring the impact of its strategies and parameters on de novo gene predictions. This thorough analysis across species with different evolutionary rates reveals useful metrics for users to define input datasets, identify favorable/unfavorable conditions for de novo gene detection, and control potential biases in genome annotations. Additionally, predictions made for the seven model organisms are compiled into a requestable database, which we hope will serve as a reference for de novo emerged gene lists generated with specific criteria combinations.
Collapse
Affiliation(s)
- Paul Roginski
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France
| | - Anna Grandchamp
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - Chloé Quignot
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France
| | - Anne Lopes
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France
| |
Collapse
|
7
|
Vakirlis N, Kupczok A. Large-scale investigation of species-specific orphan genes in the human gut microbiome elucidates their evolutionary origins. Genome Res 2024; 34:888-903. [PMID: 38977308 PMCID: PMC11293555 DOI: 10.1101/gr.278977.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 06/12/2024] [Indexed: 07/10/2024]
Abstract
Species-specific genes, also known as orphans, are ubiquitous across life's domains. In prokaryotes, species-specific orphan genes (SSOGs) are mostly thought to originate in external elements such as viruses followed by horizontal gene transfer, whereas the scenario of native origination, through rapid divergence or de novo, is mostly dismissed. However, quantitative evidence supporting either scenario is lacking. Here, we systematically analyzed genomes from 4644 human gut microbiome species and identified more than 600,000 unique SSOGs, representing an average of 2.6% of a given species' pangenome. These sequences are mostly rare within each species yet show signs of purifying selection. Overall, SSOGs use optimal codons less frequently, and their proteins are more disordered than those of conserved genes (i.e., non-SSOGs). Importantly, across species, the GC content of SSOGs closely matches that of conserved ones. In contrast, the ∼5% of SSOGs that share similarity to known viral sequences have distinct characteristics, including lower GC content. Thus, SSOGs with similarity to viruses differ from the remaining SSOGs, contrasting an external origination scenario for most of them. By examining the orthologous genomic region in closely related species, we show that a small subset of SSOGs likely evolved natively de novo and find that these genes also differ in their properties from the remaining SSOGs. Our results challenge the notion that external elements are the dominant source of prokaryotic genetic novelty and will enable future studies into the biological role and relevance of species-specific genes in the human gut.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute For Fundamental Biomedical Research, B.S.R.C. "Alexander Fleming," Vari 166 72, Greece;
- Institute for General Microbiology, Kiel University, 24118 Kiel, Germany
| | - Anne Kupczok
- Bioinformatics Group, Wageningen University, 6700 PB Wageningen, The Netherlands
| |
Collapse
|
8
|
Iyengar BR, Grandchamp A, Bornberg-Bauer E. How antisense transcripts can evolve to encode novel proteins. Nat Commun 2024; 15:6187. [PMID: 39043684 PMCID: PMC11266595 DOI: 10.1038/s41467-024-50550-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 07/12/2024] [Indexed: 07/25/2024] Open
Abstract
Protein coding features can emerge de novo in non coding transcripts, resulting in emergence of new protein coding genes. Studies across many species show that a large fraction of evolutionarily novel non-coding RNAs have an antisense overlap with protein coding genes. The open reading frames (ORFs) in these antisense RNAs could also overlap with existing ORFs. In this study, we investigate how the evolution an ORF could be constrained by its overlap with an existing ORF in three different reading frames. Using a combination of mathematical modeling and genome/transcriptome data analysis in two different model organisms, we show that antisense overlap can increase the likelihood of ORF emergence and reduce the likelihood of ORF loss, especially in one of the three reading frames. In addition to rationalising the repeatedly reported prevalence of de novo emerged genes in antisense transcripts, our work also provides a generic modeling and an analytical framework that can be used to understand evolution of antisense genes.
Collapse
Affiliation(s)
- Bharat Ravi Iyengar
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, Germany.
| | - Anna Grandchamp
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, Germany
- Aix-Marseille Université, INSERM, TAGC, Marseille, France
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, Germany
- Department of Protein Evolution, Max Planck Institute for Biology Tübingen, Max-Planck-Ring 5, Tübingen, Germany
| |
Collapse
|
9
|
Rich A, Acar O, Carvunis AR. Massively integrated coexpression analysis reveals transcriptional regulation, evolution and cellular implications of the yeast noncanonical translatome. Genome Biol 2024; 25:183. [PMID: 38978079 PMCID: PMC11232214 DOI: 10.1186/s13059-024-03287-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 05/20/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Recent studies uncovered pervasive transcription and translation of thousands of noncanonical open reading frames (nORFs) outside of annotated genes. The contribution of nORFs to cellular phenotypes is difficult to infer using conventional approaches because nORFs tend to be short, of recent de novo origins, and lowly expressed. Here we develop a dedicated coexpression analysis framework that accounts for low expression to investigate the transcriptional regulation, evolution, and potential cellular roles of nORFs in Saccharomyces cerevisiae. RESULTS Our results reveal that nORFs tend to be preferentially coexpressed with genes involved in cellular transport or homeostasis but rarely with genes involved in RNA processing. Mechanistically, we discover that young de novo nORFs located downstream of conserved genes tend to leverage their neighbors' promoters through transcription readthrough, resulting in high coexpression and high expression levels. Transcriptional piggybacking also influences the coexpression profiles of young de novo nORFs located upstream of genes, but to a lesser extent and without detectable impact on expression levels. Transcriptional piggybacking influences, but does not determine, the transcription profiles of de novo nORFs emerging nearby genes. About 40% of nORFs are not strongly coexpressed with any gene but are transcriptionally regulated nonetheless and tend to form entirely new transcription modules. We offer a web browser interface ( https://carvunislab.csb.pitt.edu/shiny/coexpression/ ) to efficiently query, visualize, and download our coexpression inferences. CONCLUSIONS Our results suggest that nORF transcription is highly regulated. Our coexpression dataset serves as an unprecedented resource for unraveling how nORFs integrate into cellular networks, contribute to cellular phenotypes, and evolve.
Collapse
Affiliation(s)
- April Rich
- Joint Carnegie Mellon University-University of Pittsburgh, University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA
| | - Omer Acar
- Joint Carnegie Mellon University-University of Pittsburgh, University of Pittsburgh Computational Biology PhD Program, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
10
|
Lebherz MK, Iyengar BR, Bornberg-Bauer E. Modeling Length Changes in De Novo Open Reading Frames during Neutral Evolution. Genome Biol Evol 2024; 16:evae129. [PMID: 38879874 PMCID: PMC11339603 DOI: 10.1093/gbe/evae129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/06/2024] [Indexed: 07/06/2024] Open
Abstract
For protein coding genes to emerge de novo from a non-genic DNA, the DNA sequence must gain an open reading frame (ORF) and the ability to be transcribed. The newborn de novo gene can further evolve to accumulate changes in its sequence. Consequently, it can also elongate or shrink with time. Existing literature shows that older de novo genes have longer ORF, but it is not clear if they elongated with time or remained of the same length since their inception. To address this question we developed a mathematical model of ORF elongation as a Markov-jump process, and show that ORFs tend to keep their length in short evolutionary timescales. We also show that if change occurs it is likely to be a truncation. Our genomics and transcriptomics data analyses of seven Drosophila melanogaster populations are also in agreement with the model's prediction. We conclude that selection could facilitate ORF length extension that may explain why longer ORFs were observed in old de novo genes in studies analysing longer evolutionary time scales. Alternatively, shorter ORFs may be purged because they may be less likely to yield functional proteins.
Collapse
Affiliation(s)
- Marie Kristin Lebherz
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster 48149, Germany
| | - Bharat Ravi Iyengar
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster 48149, Germany
- Department of Protein Evolution, Max Planck Institute for Biology Tübingen, Max-Planck-Ring 5, Tübingen 72076, Germany
| |
Collapse
|
11
|
Vara C, Montañés JC, Albà MM. High Polymorphism Levels of De Novo ORFs in a Yoruba Human Population. Genome Biol Evol 2024; 16:evae126. [PMID: 38934859 PMCID: PMC11221430 DOI: 10.1093/gbe/evae126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 05/08/2024] [Accepted: 06/01/2024] [Indexed: 06/28/2024] Open
Abstract
During evolution, new open reading frames (ORFs) with the potential to give rise to novel proteins continuously emerge. A recent compilation of noncanonical ORFs with translation signatures in humans has identified thousands of cases with a putative de novo origin. However, it is not known which is their distribution in the population. Are they universally translated? Here, we use ribosome profiling data from 65 lymphoblastoid cell lines from individuals of Yoruba origin to investigate this question. We identify 2,587 de novo ORFs translated in at least one of the cell lines. In line with their de novo origin, the encoded proteins tend to be smaller than 100 amino acids and encode positively charged proteins. We observe that the de novo ORFs are more polymorphic in the population than the set of canonical proteins, with a substantial fraction of them being translated in only some of the cell lines. Remarkably, this difference remains significant after controlling for differences in the translation levels. These results suggest that variations in the level translation of de novo ORFs could be a relevant source of intraspecies phenotypic diversity in humans.
Collapse
Affiliation(s)
- Covadonga Vara
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
| | - José Carlos Montañés
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
| | - M Mar Albà
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
- Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain
| |
Collapse
|
12
|
Mohsen JJ, Mohsen MG, Jiang K, Landajuela A, Quinto L, Isaacs FJ, Karatekin E, Slavoff SA. Cellular function of the GndA small open reading frame-encoded polypeptide during heat shock. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.29.601336. [PMID: 38979229 PMCID: PMC11230408 DOI: 10.1101/2024.06.29.601336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Over the past 15 years, hundreds of previously undiscovered bacterial small open reading frame (sORF)-encoded polypeptides (SEPs) of fewer than fifty amino acids have been identified, and biological functions have been ascribed to an increasing number of SEPs from intergenic regions and small RNAs. However, despite numbering in the dozens in Escherichia coli, and hundreds to thousands in humans, same-strand nested sORFs that overlap protein coding genes in alternative reading frames remain understudied. In order to provide insight into this enigmatic class of unannotated genes, we characterized GndA, a 36-amino acid, heat shock-regulated SEP encoded within the +2 reading frame of the gnd gene in E. coli K-12 MG1655. We show that GndA pulls down components of respiratory complex I (RCI) and is required for proper localization of a RCI subunit during heat shock. At high temperature GndA deletion (ΔGndA) cells exhibit perturbations in cell growth, NADH+/NAD ratio, and expression of a number of genes including several associated with oxidative stress. These findings suggest that GndA may function in maintenance of homeostasis during heat shock. Characterization of GndA therefore supports the nascent but growing consensus that functional, overlapping genes occur in genomes from viruses to humans.
Collapse
Affiliation(s)
- Jessica J. Mohsen
- Department of Chemistry, Yale University, New Haven, CT 06511
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT 06516
| | - Michael G. Mohsen
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06511
- Howard Hughes Medical Institute, Yale University, New Haven, CT 06511
| | - Kevin Jiang
- Department of Chemistry, Yale University, New Haven, CT 06511
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT 06516
| | - Ane Landajuela
- Department of Cellular and Molecular Physiology, Yale School of Medicine, New Haven, CT 06510
- Nanobiology Institute, Yale University, West Haven, CT 06516
| | - Laura Quinto
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06511
- Systems Biology Institute, Yale University, West Haven, CT 06516
| | - Farren J. Isaacs
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06511
- Systems Biology Institute, Yale University, West Haven, CT 06516
| | - Erdem Karatekin
- Department of Cellular and Molecular Physiology, Yale School of Medicine, New Haven, CT 06510
- Nanobiology Institute, Yale University, West Haven, CT 06516
- Wu Tsai Institute, Yale University, New Haven, CT 06511
- Université de Paris, Saints-Pères Paris Institute for the Neurosciences (SPPIN), Centre National de la Recherche Scientifique (CNRS), 75006 Paris, France
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06511
| | - Sarah A. Slavoff
- Department of Chemistry, Yale University, New Haven, CT 06511
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT 06516
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06511
| |
Collapse
|
13
|
Andjus S, Szachnowski U, Vogt N, Gioftsidi S, Hatin I, Cornu D, Papadopoulos C, Lopes A, Namy O, Wery M, Morillon A. Pervasive translation of Xrn1-sensitive unstable long noncoding RNAs in yeast. RNA (NEW YORK, N.Y.) 2024; 30:662-679. [PMID: 38443115 PMCID: PMC11098462 DOI: 10.1261/rna.079903.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 02/15/2024] [Indexed: 03/07/2024]
Abstract
Despite being predicted to lack coding potential, cytoplasmic long noncoding (lnc)RNAs can associate with ribosomes. However, the landscape and biological relevance of lncRNA translation remain poorly studied. In yeast, cytoplasmic Xrn1-sensitive unstable transcripts (XUTs) are targeted by nonsense-mediated mRNA decay (NMD), suggesting a translation-dependent degradation process. Here, we report that XUTs are pervasively translated, which impacts their decay. We show that XUTs globally accumulate upon translation elongation inhibition, but not when initial ribosome loading is impaired. Ribo-seq confirmed ribosomes binding to XUTs and identified ribosome-associated 5'-proximal small ORFs. Mechanistically, the NMD-sensitivity of XUTs mainly depends on the 3'-untranslated region length. Finally, we show that the peptide resulting from the translation of an NMD-sensitive XUT reporter exists in NMD-competent cells. Our work highlights the role of translation in the posttranscriptional metabolism of XUTs. We propose that XUT-derived peptides could be exposed to natural selection, while NMD restricts XUT levels.
Collapse
Affiliation(s)
- Sara Andjus
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, PSL University, Sorbonne Université, CNRS UMR3244, F-75248 Paris Cedex 05, France
| | - Ugo Szachnowski
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, Sorbonne Université, CNRS UMR3244, F-75248 Paris Cedex 05, France
| | - Nicolas Vogt
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, Sorbonne Université, CNRS UMR3244, F-75248 Paris Cedex 05, France
| | - Stamatia Gioftsidi
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, Sorbonne Université, CNRS UMR3244, F-75248 Paris Cedex 05, France
| | - Isabelle Hatin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - David Cornu
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Chris Papadopoulos
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Anne Lopes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Namy
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Maxime Wery
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, Sorbonne Université, CNRS UMR3244, F-75248 Paris Cedex 05, France
| | - Antonin Morillon
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, Sorbonne Université, CNRS UMR3244, F-75248 Paris Cedex 05, France
| |
Collapse
|
14
|
Tierney JAS, Świrski M, Tjeldnes H, Mudge JM, Kufel J, Whiffin N, Valen E, Baranov PV. Ribosome decision graphs for the representation of eukaryotic RNA translation complexity. Genome Res 2024; 34:530-538. [PMID: 38719470 PMCID: PMC11146595 DOI: 10.1101/gr.278810.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 04/01/2024] [Indexed: 05/21/2024]
Abstract
The application of ribosome profiling has revealed an unexpected abundance of translation in addition to that responsible for the synthesis of previously annotated protein-coding regions. Multiple short sequences have been found to be translated within single RNA molecules, within both annotated protein-coding and noncoding regions. The biological significance of this translation is a matter of intensive investigation. However, current schematic or annotation-based representations of mRNA translation generally do not account for the apparent multitude of translated regions within the same molecules. They also do not take into account the stochasticity of the process that allows alternative translations of the same RNA molecules by different ribosomes. There is a need for formal representations of mRNA complexity that would enable the analysis of quantitative information on translation and more accurate models for predicting the phenotypic effects of genetic variants affecting translation. To address this, we developed a conceptually novel abstraction that we term ribosome decision graphs (RDGs). RDGs represent translation as multiple ribosome paths through untranslated and translated mRNA segments. We termed the latter "translons." Nondeterministic events, such as initiation, reinitiation, selenocysteine insertion, or ribosomal frameshifting, are then represented as branching points. This representation allows for an adequate representation of eukaryotic translation complexity and focuses on locations critical for translation regulation. We show how RDGs can be used for depicting translated regions and for analyzing genetic variation and quantitative genome-wide data on translation for characterization of regulatory modulators of translation.
Collapse
Affiliation(s)
- Jack A S Tierney
- School of Biochemistry and Cell Biology, University College Cork, Cork T12 K8AF, Ireland
- SFI Centre for Research Training in Genomics Data Science, University College Cork, Cork T12 K8AF, Ireland
| | - Michał Świrski
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, 02-106 Warsaw, Poland
| | - Håkon Tjeldnes
- School of Biochemistry and Cell Biology, University College Cork, Cork T12 K8AF, Ireland
- Computational Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, Cambridge, United Kingdom
| | - Joanna Kufel
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, 02-106 Warsaw, Poland
| | - Nicola Whiffin
- The Big Data Institute and Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom
| | - Eivind Valen
- Computational Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
- Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Pavel V Baranov
- School of Biochemistry and Cell Biology, University College Cork, Cork T12 K8AF, Ireland;
| |
Collapse
|
15
|
Kohram M, Sanderson AE, Loui A, Thompson PV, Vashistha H, Shomar A, Oltvai ZN, Salman H. Nonlethal deleterious mutation-induced stress accelerates bacterial aging. Proc Natl Acad Sci U S A 2024; 121:e2316271121. [PMID: 38709929 PMCID: PMC11098108 DOI: 10.1073/pnas.2316271121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 03/29/2024] [Indexed: 05/08/2024] Open
Abstract
Random mutagenesis, including when it leads to loss of gene function, is a key mechanism enabling microorganisms' long-term adaptation to new environments. However, loss-of-function mutations are often deleterious, triggering, in turn, cellular stress and complex homeostatic stress responses, called "allostasis," to promote cell survival. Here, we characterize the differential impacts of 65 nonlethal, deleterious single-gene deletions on Escherichia coli growth in three different growth environments. Further assessments of select mutants, namely, those bearing single adenosine triphosphate (ATP) synthase subunit deletions, reveal that mutants display reorganized transcriptome profiles that reflect both the environment and the specific gene deletion. We also find that ATP synthase α-subunit deleted (ΔatpA) cells exhibit elevated metabolic rates while having slower growth compared to wild-type (wt) E. coli cells. At the single-cell level, compared to wt cells, individual ΔatpA cells display near normal proliferation profiles but enter a postreplicative state earlier and exhibit a distinct senescence phenotype. These results highlight the complex interplay between genomic diversity, adaptation, and stress response and uncover an "aging cost" to individual bacterial cells for maintaining population-level resilience to environmental and genetic stress; they also suggest potential bacteriostatic antibiotic targets and -as select human genetic diseases display highly similar phenotypes, - a bacterial origin of some human diseases.
Collapse
Affiliation(s)
- Maryam Kohram
- Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh, PA15260
| | - Amy E. Sanderson
- Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh, PA15260
| | - Alicia Loui
- Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh, PA15260
| | | | - Harsh Vashistha
- Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh, PA15260
| | - Aseel Shomar
- Department of Chemical Engineering, Technion–Israel Institute of Technology, Haifa32000, Israel
| | - Zoltán N. Oltvai
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA15260
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA15260
- Department of Pathology and Laboratory Medicine, University of Rochester, Rochester, NY14627
| | - Hanna Salman
- Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh, PA15260
| |
Collapse
|
16
|
Whited AM, Jungreis I, Allen J, Cleveland CL, Mudge JM, Kellis M, Rinn JL, Hough LE. Biophysical characterization of high-confidence, small human proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.12.589296. [PMID: 38659920 PMCID: PMC11042228 DOI: 10.1101/2024.04.12.589296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Significant efforts have been made to characterize the biophysical properties of proteins. Small proteins have received less attention because their annotation has historically been less reliable. However, recent improvements in sequencing, proteomics, and bioinformatics techniques have led to the high-confidence annotation of small open reading frames (smORFs) that encode for functional proteins, producing smORF-encoded proteins (SEPs). SEPs have been found to perform critical functions in several species, including humans. While significant efforts have been made to annotate SEPs, less attention has been given to the biophysical properties of these proteins. We characterized the distributions of predicted and curated biophysical properties, including sequence composition, structure, localization, function, and disease association of a conservative list of previously identified human SEPs. We found significant differences between SEPs and both larger proteins and control sets. Additionally, we provide an example of how our characterization of biophysical properties can contribute to distinguishing protein-coding smORFs from non-coding ones in otherwise ambiguous cases.
Collapse
Affiliation(s)
- A M Whited
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
| | - Irwin Jungreis
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
| | - Jeffre Allen
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Biochemistry, University of Colorado Boulder, CO, USA
| | | | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Manolis Kellis
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
| | - John L Rinn
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Biochemistry, University of Colorado Boulder, CO, USA
| | - Loren E Hough
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Physics, University of Colorado Boulder, CO, USA
| |
Collapse
|
17
|
Turcan A, Lee J, Wacholder A, Carvunis AR. Integrative detection of genome-wide translation using iRibo. STAR Protoc 2024; 5:102826. [PMID: 38217852 PMCID: PMC10826316 DOI: 10.1016/j.xpro.2023.102826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 11/21/2023] [Accepted: 12/21/2023] [Indexed: 01/15/2024] Open
Abstract
Ribosome profiling is a sequencing technique that provides a global picture of translation across a genome. Here, we present iRibo, a software program for integrating any number of ribosome profiling samples to obtain sensitive inference of annotated or unannotated translated open reading frames. We describe the process of using iRibo to generate a species' translatome from a set of ribosome profiling samples using S. cerevisiae as an example. For complete details on the use and execution of this protocol, please refer to Wacholder et al. (2023).1.
Collapse
Affiliation(s)
- Alistair Turcan
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt Ph.D. Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| | - Jiwon Lee
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt Ph.D. Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
18
|
Fesenko I, Sahakyan H, Shabalina SA, Koonin EV. The Cryptic Bacterial Microproteome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.17.580829. [PMID: 38903115 PMCID: PMC11188072 DOI: 10.1101/2024.02.17.580829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/22/2024]
Abstract
Microproteins encoded by small open reading frames (smORFs) comprise the "dark matter" of proteomes. Although functional microproteins were identified in diverse organisms from all three domains of life, bacterial smORFs remain poorly characterized. In this comprehensive study of intergenic smORFs (ismORFs, 15-70 codons) in 5,668 bacterial genomes of the family Enterobacteriaceae, we identified 67,297 clusters of ismORFs subject to purifying selection. The ismORFs mainly code for hydrophobic, potentially transmembrane, unstructured, or minimally structured microproteins. Using AlphaFold Multimer, we predicted interactions of some of the predicted microproteins encoded by transcribed ismORFs with proteins encoded by neighboring genes, revealing the potential of microproteins to regulate the activity of various proteins, particularly, under stress. We compiled a catalog of predicted microprotein families with different levels of evidence from synteny analysis, structure prediction, and transcription and translation data. This study offers a resource for investigation of biological functions of microproteins.
Collapse
Affiliation(s)
- Igor Fesenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Harutyun Sahakyan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Svetlana A. Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
19
|
Hlouchova K. Toxin rescue by a random sequence. Nat Ecol Evol 2023; 7:1963-1964. [PMID: 37945945 DOI: 10.1038/s41559-023-02252-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Affiliation(s)
- Klara Hlouchova
- Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague, Czech Republic.
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, Czech Republic.
| |
Collapse
|
20
|
Mohsen JJ, Martel AA, Slavoff SA. Microproteins-Discovery, structure, and function. Proteomics 2023; 23:e2100211. [PMID: 37603371 PMCID: PMC10841188 DOI: 10.1002/pmic.202100211] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/22/2023]
Abstract
Advances in proteogenomic technologies have revealed hundreds to thousands of translated small open reading frames (sORFs) that encode microproteins in genomes across evolutionary space. While many microproteins have now been shown to play critical roles in biology and human disease, a majority of recently identified microproteins have little or no experimental evidence regarding their functionality. Computational tools have some limitations for analysis of short, poorly conserved microprotein sequences, so additional approaches are needed to determine the role of each member of this recently discovered polypeptide class. A currently underexplored avenue in the study of microproteins is structure prediction and determination, which delivers a depth of functional information. In this review, we provide a brief overview of microprotein discovery methods, then examine examples of microprotein structures (and, conversely, intrinsic disorder) that have been experimentally determined using crystallography, cryo-electron microscopy, and NMR, which provide insight into their molecular functions and mechanisms. Additionally, we discuss examples of predicted microprotein structures that have provided insight or context regarding their function. Analysis of microprotein structure at the angstrom level, and confirmation of predicted structures, therefore, has potential to identify translated microproteins that are of biological importance and to provide molecular mechanism for their in vivo roles.
Collapse
Affiliation(s)
- Jessica J. Mohsen
- Department of Chemistry, Yale University, New Haven, CT, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Alina A. Martel
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Sarah A. Slavoff
- Department of Chemistry, Yale University, New Haven, CT, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| |
Collapse
|
21
|
Tierney JAS, Świrski M, Tjeldnes H, Mudge JM, Kufel J, Whiffin N, Valen E, Baranov PV. Ribosome Decision Graphs for the Representation of Eukaryotic RNA Translation Complexity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.10.566564. [PMID: 37986835 PMCID: PMC10659439 DOI: 10.1101/2023.11.10.566564] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
The application of ribosome profiling has revealed an unexpected abundance of translation in addition to that responsible for the synthesis of previously annotated protein-coding regions. Multiple short sequences have been found to be translated within single RNA molecules, both within annotated protein-coding and non-coding regions. The biological significance of this translation is a matter of intensive investigation. However, current schematic or annotation-based representations of mRNA translation generally do not account for the apparent multitude of translated regions within the same molecules. They also do not take into account the stochasticity of the process that allows alternative translations of the same RNA molecules by different ribosomes. There is a need for formal representations of mRNA complexity that would enable the analysis of quantitative information on translation and more accurate models for predicting the phenotypic effects of genetic variants affecting translation. To address this, we developed a conceptually novel abstraction that we term Ribosome Decision Graphs (RDGs). RDGs represent translation as multiple ribosome paths through untranslated and translated mRNA segments. We termed the later 'translons'. Non-deterministic events, such as initiation, re-initiation, selenocysteine insertion or ribosomal frameshifting are then represented as branching points. This representation allows for an adequate representation of eukaryotic translation complexity and focuses on locations critical for translation regulation. We show how RDGs can be used for depicting translated regions, analysis of genetic variation and quantitative genome-wide data on translation for characterisation of regulatory modulators of translation.
Collapse
Affiliation(s)
- Jack A S Tierney
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
- SFI Centre for Research Training in Genomics Data Science, University College Cork, Cork, Ireland
| | - Michał Świrski
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, Warsaw, Poland
| | - Håkon Tjeldnes
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Joanna Kufel
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, Warsaw, Poland
| | - Nicola Whiffin
- The Big Data Institute and Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
| | - Eivind Valen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
- Department of Biosciences, University of Oslo, Oslo, Norway
| | - Pavel V Baranov
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| |
Collapse
|
22
|
Musalgaonkar S, Yelland J, Chitale R, Rao S, Ozadam H, Cenik C, Taylor D, Johnson A. The Ribosome Assembly Factor Reh1 is Released from the Polypeptide Exit Tunnel in the Pioneer Round of Translation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.23.563604. [PMID: 37961559 PMCID: PMC10634756 DOI: 10.1101/2023.10.23.563604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Assembly of functional ribosomal subunits and successfully delivering them to the translating pool is a prerequisite for protein synthesis and cell growth. In S. cerevisiae, the ribosome assembly factor Reh1 binds to pre-60S subunits at a late stage during their cytoplasmic maturation. Previous work shows that the C-terminus of Reh1 inserts into the polypeptide exit tunnel (PET) of the pre-60S subunit. Unlike canonical assembly factors, which associate exclusively with pre-60S subunits, we observed that Reh1 sediments with polysomes in addition to free 60S subunits. We therefore investigated the intriguing possibility that Reh1 remains associated with 60S subunits after the release of the anti-association factor Tif6 and after subunit joining. Here, we show that Reh1-bound nascent 60S subunits associate with 40S subunits to form actively translating ribosomes. Using selective ribosome profiling, we found that Reh1-bound ribosomes populate open reading frames near start codons. Reh1-bound ribosomes are also strongly enriched for initiator tRNA, indicating they are associated with early elongation events. Using single particle cryo-electron microscopy to image cycloheximide-arrested Reh1-bound 80S ribosomes, we found that Reh1-bound 80S contain A site peptidyl tRNA, P site tRNA and eIF5A indicating that Reh1 does not dissociate from 60S until early stages of translation elongation. We propose that Reh1 is displaced by the elongating peptide chain. These results identify Reh1 as the last assembly factor released from the nascent 60S subunit during its pioneer round of translation.
Collapse
|
23
|
Ardern Z, Uz-Zaman MH. Between noise and function: Toward a taxonomy of the non-canonical translatome. Cell Syst 2023; 14:343-345. [PMID: 37201506 DOI: 10.1016/j.cels.2023.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 04/17/2023] [Indexed: 05/20/2023]
Abstract
Eukaryotic genomes are pervasively translated, but the properties of translated sequences outside of canonical genes are poorly understood. A new study in Cell Systems reveals a large translatome that is not under significant evolutionary constraint but is still an active part of diverse cellular systems.
Collapse
Affiliation(s)
- Zachary Ardern
- Parasites and Microbes Programme, Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK.
| | - Md Hassan Uz-Zaman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA.
| |
Collapse
|