1
|
Aparicio B, Theunissen P, Hervas-Stubbs S, Fortes P, Sarobe P. Relevance of mutation-derived neoantigens and non-classical antigens for anticancer therapies. Hum Vaccin Immunother 2024; 20:2303799. [PMID: 38346926 PMCID: PMC10863374 DOI: 10.1080/21645515.2024.2303799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 01/06/2024] [Indexed: 02/15/2024] Open
Abstract
Efficacy of cancer immunotherapies relies on correct recognition of tumor antigens by lymphocytes, eliciting thus functional responses capable of eliminating tumor cells. Therefore, important efforts have been carried out in antigen identification, with the aim of understanding mechanisms of response to immunotherapy and to design safer and more efficient strategies. In addition to classical tumor-associated antigens identified during the last decades, implementation of next-generation sequencing methodologies is enabling the identification of neoantigens (neoAgs) arising from mutations, leading to the development of new neoAg-directed therapies. Moreover, there are numerous non-classical tumor antigens originated from other sources and identified by new methodologies. Here, we review the relevance of neoAgs in different immunotherapies and the results obtained by applying neoAg-based strategies. In addition, the different types of non-classical tumor antigens and the best approaches for their identification are described. This will help to increase the spectrum of targetable molecules useful in cancer immunotherapies.
Collapse
Affiliation(s)
- Belen Aparicio
- Program of Immunology and Immunotherapy, Center for Applied Medical Research (CIMA) University of Navarra, Pamplona, Spain
- Cancer Center Clinica Universidad de Navarra (CCUN), Pamplona, Spain
- Navarra Institute for Health Research (IDISNA), Pamplona, Spain
- CIBERehd, Pamplona, Spain
| | - Patrick Theunissen
- Cancer Center Clinica Universidad de Navarra (CCUN), Pamplona, Spain
- Navarra Institute for Health Research (IDISNA), Pamplona, Spain
- CIBERehd, Pamplona, Spain
- DNA and RNA Medicine Division, Center for Applied Medical Research (CIMA), University of Navarra, Pamplona, Spain
| | - Sandra Hervas-Stubbs
- Program of Immunology and Immunotherapy, Center for Applied Medical Research (CIMA) University of Navarra, Pamplona, Spain
- Cancer Center Clinica Universidad de Navarra (CCUN), Pamplona, Spain
- Navarra Institute for Health Research (IDISNA), Pamplona, Spain
- CIBERehd, Pamplona, Spain
| | - Puri Fortes
- Cancer Center Clinica Universidad de Navarra (CCUN), Pamplona, Spain
- Navarra Institute for Health Research (IDISNA), Pamplona, Spain
- CIBERehd, Pamplona, Spain
- DNA and RNA Medicine Division, Center for Applied Medical Research (CIMA), University of Navarra, Pamplona, Spain
- Spanish Network for Advanced Therapies (TERAV ISCIII), Spain
| | - Pablo Sarobe
- Program of Immunology and Immunotherapy, Center for Applied Medical Research (CIMA) University of Navarra, Pamplona, Spain
- Cancer Center Clinica Universidad de Navarra (CCUN), Pamplona, Spain
- Navarra Institute for Health Research (IDISNA), Pamplona, Spain
- CIBERehd, Pamplona, Spain
| |
Collapse
|
2
|
Hofman DA, Prensner JR, van Heesch S. Microproteins in cancer: identification, biological functions, and clinical implications. Trends Genet 2024:S0168-9525(24)00211-7. [PMID: 39379206 DOI: 10.1016/j.tig.2024.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 08/19/2024] [Accepted: 09/17/2024] [Indexed: 10/10/2024]
Abstract
Cancer continues to be a major global health challenge, accounting for 10 million deaths annually worldwide. Since the inception of genome-wide cancer sequencing studies 20 years ago, a core set of ~700 oncogenes and tumor suppressor genes has become the basis for cancer research. However, this research has been based largely on an understanding that the human genome encodes ~19 500 protein-coding genes. Complementing this genomic landscape, recent advances have described numerous microproteins which are now poised to redefine our understanding of oncogenic processes and open new avenues for therapeutic intervention. This review explores the emerging evidence for microprotein involvement in cancer mechanisms and discusses potential therapeutic applications, with an emphasis on highlighting recent advances in the field.
Collapse
Affiliation(s)
- Damon A Hofman
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584, CS, Utrecht, The Netherlands; Oncode Institute, Utrecht, The Netherlands
| | - John R Prensner
- Department of Pediatrics, Division of Pediatric Hematology/Oncology and Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | - Sebastiaan van Heesch
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584, CS, Utrecht, The Netherlands; Oncode Institute, Utrecht, The Netherlands.
| |
Collapse
|
3
|
Ruiz-Orera J, Miller DC, Greiner J, Genehr C, Grammatikaki A, Blachut S, Mbebi J, Patone G, Myronova A, Adami E, Dewani N, Liang N, Hummel O, Muecke MB, Hildebrandt TB, Fritsch G, Schrade L, Zimmermann WH, Kondova I, Diecke S, van Heesch S, Hübner N. Evolution of translational control and the emergence of genes and open reading frames in human and non-human primate hearts. NATURE CARDIOVASCULAR RESEARCH 2024; 3:1217-1235. [PMID: 39317836 PMCID: PMC11473369 DOI: 10.1038/s44161-024-00544-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 08/28/2024] [Indexed: 09/26/2024]
Abstract
Evolutionary innovations can be driven by changes in the rates of RNA translation and the emergence of new genes and small open reading frames (sORFs). In this study, we characterized the transcriptional and translational landscape of the hearts of four primate and two rodent species through integrative ribosome and transcriptomic profiling, including adult left ventricle tissues and induced pluripotent stem cell-derived cardiomyocyte cell cultures. We show here that the translational efficiencies of subunits of the mitochondrial oxidative phosphorylation chain complexes IV and V evolved rapidly across mammalian evolution. Moreover, we discovered hundreds of species-specific and lineage-specific genomic innovations that emerged during primate evolution in the heart, including 551 genes, 504 sORFs and 76 evolutionarily conserved genes displaying human-specific cardiac-enriched expression. Overall, our work describes the evolutionary processes and mechanisms that have shaped cardiac transcription and translation in recent primate evolution and sheds light on how these can contribute to cardiac development and disease.
Collapse
Affiliation(s)
- Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany.
| | - Duncan C Miller
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Technology Platform Pluripotent Stem Cells, Berlin, Germany
| | - Johannes Greiner
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Carolin Genehr
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Technology Platform Pluripotent Stem Cells, Berlin, Germany
| | - Aliki Grammatikaki
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Susanne Blachut
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Jeanne Mbebi
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Giannino Patone
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Anna Myronova
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Eleonora Adami
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Nikita Dewani
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Ning Liang
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Oliver Hummel
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Michael B Muecke
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Thomas B Hildebrandt
- Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany
- Freie Universitaet Berlin, Berlin, Germany
| | - Guido Fritsch
- Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany
| | - Lisa Schrade
- Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany
| | - Wolfram H Zimmermann
- Institute of Pharmacology and Toxicology, University Medical Center Göttingen, Göttingen, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Lower Saxony, Göttingen, Germany
- DZNE (German Center for Neurodegenerative Diseases), Göttingen, Germany
- Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Göttingen, Germany
| | - Ivanela Kondova
- Biomedical Primate Research Centre (BPRC), Rijswijk, The Netherlands
| | - Sebastian Diecke
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Technology Platform Pluripotent Stem Cells, Berlin, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Berlin, Berlin, Germany
| | - Sebastiaan van Heesch
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Norbert Hübner
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany.
- DZHK (German Center for Cardiovascular Research), Partner Site Berlin, Berlin, Germany.
- Charité-Universitätsmedizin, Berlin, Germany.
- Helmholtz Institute for Translational AngioCardioScience (HI-TAC) of the Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC) at Heidelberg University, Heidelberg, Germany.
| |
Collapse
|
4
|
Leushkin E, Kaessmann H. Identification of old coding regions disproves the hominoid de novo status of genes. Nat Ecol Evol 2024; 8:1826-1830. [PMID: 39187607 DOI: 10.1038/s41559-024-02513-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 07/22/2024] [Indexed: 08/28/2024]
Affiliation(s)
- Evgeny Leushkin
- Center for Molecular Biology, DKFZ-ZMBH Alliance, Heidelberg University, Heidelberg, Germany.
- LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany.
| | - Henrik Kaessmann
- Center for Molecular Biology, DKFZ-ZMBH Alliance, Heidelberg University, Heidelberg, Germany.
| |
Collapse
|
5
|
Deutsch EW, Kok LW, Mudge JM, Ruiz-Orera J, Fierro-Monti I, Sun Z, Abelin JG, Alba MM, Aspden JL, Bazzini AA, Bruford EA, Brunet MA, Calviello L, Carr SA, Carvunis AR, Chothani S, Clauwaert J, Dean K, Faridi P, Frankish A, Hubner N, Ingolia NT, Magrane M, Martin MJ, Martinez TF, Menschaert G, Ohler U, Orchard S, Rackham O, Roucou X, Slavoff SA, Valen E, Wacholder A, Weissman JS, Wu W, Xie Z, Choudhary J, Bassani-Sternberg M, Vizcaíno JA, Ternette N, Moritz RL, Prensner JR, van Heesch S. High-quality peptide evidence for annotating non-canonical open reading frames as human proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.09.612016. [PMID: 39314370 PMCID: PMC11419116 DOI: 10.1101/2024.09.09.612016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
A major scientific drive is to characterize the protein-coding genome as it provides the primary basis for the study of human health. But the fundamental question remains: what has been missed in prior genomic analyses? Over the past decade, the translation of non-canonical open reading frames (ncORFs) has been observed across human cell types and disease states, with major implications for proteomics, genomics, and clinical science. However, the impact of ncORFs has been limited by the absence of a large-scale understanding of their contribution to the human proteome. Here, we report the collaborative efforts of stakeholders in proteomics, immunopeptidomics, Ribo-seq ORF discovery, and gene annotation, to produce a consensus landscape of protein-level evidence for ncORFs. We show that at least 25% of a set of 7,264 ncORFs give rise to translated gene products, yielding over 3,000 peptides in a pan-proteome analysis encompassing 3.8 billion mass spectra from 95,520 experiments. With these data, we developed an annotation framework for ncORFs and created public tools for researchers through GENCODE and PeptideAtlas. This work will provide a platform to advance ncORF-derived proteins in biomedical discovery and, beyond humans, diverse animals and plants where ncORFs are similarly observed.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology (ISB), Seattle, WA, 98109, USA
| | - Leron W Kok
- Princess Máxima Center for Pediatric Oncology, Utrecht, 3584 CS, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, 13125, Germany
| | - Ivo Fierro-Monti
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Zhi Sun
- Institute for Systems Biology (ISB), Seattle, WA, 98109, USA
| | | | - M Mar Alba
- Hospital del Mar Research Institute, Barcelona, Spain
- Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain
| | - Julie L Aspden
- School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK
| | - Ariel A Bazzini
- Stowers Institute for Medical Research, Kansas City, MO, 64110, USA
- Department of Molecular and Integrative Physiology, University of Kansas Medical Center, Kansas City, KS, 66160, USA
| | - Elspeth A Bruford
- HUGO Gene Nomenclature Committee (HGNC), Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge, UK
| | - Marie A Brunet
- Pediatrics Department, University of Sherbrooke, Sherbrooke, Québec, Canada
- Centre de Recherche du Centre hospitalier universitaire de Sherbrooke (CRCHUS), Sherbrooke, Québec, Canada
| | | | - Steven A Carr
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Sonia Chothani
- Centre for Computational Biology and Program in Cardiovascular and Metabolic Disorders, Duke-NUS (National University of Singapore) Medical School, Singapore
| | - Jim Clauwaert
- Department of Pediatrics, Division of Pediatric Hematology/Oncology, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Kellie Dean
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Pouya Faridi
- Centre for Cancer Research, Hudson Institute of Medical Research, Clayton, VIC, Australia
- Monash Proteomics & Metabolomics Platform, Department of Medicine, School of Clinical Sciences, Monash University, Clayton, VIC, Australia
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Norbert Hubner
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, 13125, Germany
- Charité-Universitätsmedizin Berlin, Berlin, 10117, Germany
- Helmholtz-Institute for Translational AngioCardioScience (HI-TAC) of the Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC) at Heidelberg University, Heidelberg, 69117, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Berlin, Berlin, 13347, Germany
| | - Nicholas T Ingolia
- Department of Molecular and Cell Biology, Center for Computational Biology, University of California, Berkeley, Berkeley, CA, 94720-3202, USA
| | - Michele Magrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Maria Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Thomas F Martinez
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, CA, 92617, USA
- Department of Biological Chemistry, University of California, Irvine, Irvine, CA, 92617, USA
- Chao Family Comprehensive Cancer Center, University of California, Irvine, Irvine, CA, 92617, USA
| | - Gerben Menschaert
- Biobix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Ghent, Belgium
| | - Uwe Ohler
- Department of Biology, Humboldt University Berlin, Berlin, 10117, Germany
- Berlin Institute of Medical Systems Biology (BIMSB), Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, 10115, Germany
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | | | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Sarah A Slavoff
- Department of Chemistry, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT, 06516, USA
| | - Eivind Valen
- Department of Biosciences, University of Oslo, Oslo, Norway
| | - Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Jonathan S Weissman
- Whitehead Institute for Biomedical Research, Cambridge, MA, 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, 02138, USA
- David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Wei Wu
- Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), Singapore
- Department of Pharmacy & Pharmaceutical sciences, National University of Singapore (NUS), Singapore
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Jyoti Choudhary
- Functional Proteomics Group, Institute of Cancer Research, Chester Betty Labs, London, SW3 6JB, UK
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, 1005, Switzerland
- Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Lausanne, 1005, Switzerland
- Agora Cancer Research Centre, Lausanne, 1011, Switzerland
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Nicola Ternette
- School of Life Sciences, Division Cell Signalling and Immunology, University of Dundee, Dundee, DD1 5EH, UK
- Centre for Immuno-Oncology, University of Oxford, Oxford, OX37DQ, UK
| | - Robert L Moritz
- Institute for Systems Biology (ISB), Seattle, WA, 98109, USA
| | - John R Prensner
- Department of Pediatrics, Division of Pediatric Hematology/Oncology, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Sebastiaan van Heesch
- Princess Máxima Center for Pediatric Oncology, Utrecht, 3584 CS, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| |
Collapse
|
6
|
Houghton CJ, Coelho NC, Chiang A, Hedayati S, Parikh SB, Ozbaki-Yagan N, Wacholder A, Iannotta J, Berger A, Carvunis AR, O'Donnell AF. Cellular processing of beneficial de novo emerging proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.28.610198. [PMID: 39257767 PMCID: PMC11384008 DOI: 10.1101/2024.08.28.610198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Novel proteins can originate de novo from non-coding DNA and contribute to species-specific adaptations. It is challenging to conceive how de novo emerging proteins may integrate pre-existing cellular systems to bring about beneficial traits, given that their sequences are previously unseen by the cell. To address this apparent paradox, we investigated 26 de novo emerging proteins previously associated with growth benefits in yeast. Microscopy revealed that these beneficial emerging proteins preferentially localize to the endoplasmic reticulum (ER). Sequence and structure analyses uncovered a common protein organization among all ER-localizing beneficial emerging proteins, characterized by a short hydrophobic C-terminus immediately preceded by a transmembrane domain. Using genetic and biochemical approaches, we showed that ER localization of beneficial emerging proteins requires the GET and SND pathways, both of which are evolutionarily conserved and known to recognize transmembrane domains to promote post-translational ER insertion. The abundance of ER-localizing beneficial emerging proteins was regulated by conserved proteasome- and vacuole-dependent processes, through mechanisms that appear to be facilitated by the emerging proteins' C-termini. Consequently, we propose that evolutionarily conserved pathways can convergently govern the cellular processing of de novo emerging proteins with unique sequences, likely owing to common underlying protein organization patterns.
Collapse
Affiliation(s)
- Carly J Houghton
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Nelson Castilho Coelho
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Annette Chiang
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Stefanie Hedayati
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Saurin B Parikh
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Nejla Ozbaki-Yagan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Aaron Wacholder
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - John Iannotta
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Alexis Berger
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Anne-Ruxandra Carvunis
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Allyson F O'Donnell
- Pittsburgh Center for Evolutionary Biology and Medicine (CEBaM), Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States
| |
Collapse
|
7
|
Vara C, Montañés JC, Albà MM. High Polymorphism Levels of De Novo ORFs in a Yoruba Human Population. Genome Biol Evol 2024; 16:evae126. [PMID: 38934859 PMCID: PMC11221430 DOI: 10.1093/gbe/evae126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 05/08/2024] [Accepted: 06/01/2024] [Indexed: 06/28/2024] Open
Abstract
During evolution, new open reading frames (ORFs) with the potential to give rise to novel proteins continuously emerge. A recent compilation of noncanonical ORFs with translation signatures in humans has identified thousands of cases with a putative de novo origin. However, it is not known which is their distribution in the population. Are they universally translated? Here, we use ribosome profiling data from 65 lymphoblastoid cell lines from individuals of Yoruba origin to investigate this question. We identify 2,587 de novo ORFs translated in at least one of the cell lines. In line with their de novo origin, the encoded proteins tend to be smaller than 100 amino acids and encode positively charged proteins. We observe that the de novo ORFs are more polymorphic in the population than the set of canonical proteins, with a substantial fraction of them being translated in only some of the cell lines. Remarkably, this difference remains significant after controlling for differences in the translation levels. These results suggest that variations in the level translation of de novo ORFs could be a relevant source of intraspecies phenotypic diversity in humans.
Collapse
Affiliation(s)
- Covadonga Vara
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
| | - José Carlos Montañés
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
| | - M Mar Albà
- Research Programme on Biomedical Informatics (GRIB),Hospital del Mar Research Institute, Barcelona, Spain
- Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain
| |
Collapse
|
8
|
Bonnet C, Dian AL, Espie-Caullet T, Fabbri L, Lagadec L, Pivron T, Dutertre M, Luco R, Navickas A, Vagner S, Verga D, Uguen P. Post-transcriptional gene regulation: From mechanisms to RNA chemistry and therapeutics. Bull Cancer 2024; 111:782-790. [PMID: 38824069 DOI: 10.1016/j.bulcan.2024.04.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 03/22/2024] [Accepted: 04/03/2024] [Indexed: 06/03/2024]
Abstract
A better understanding of the RNA biology and chemistry is necessary to then develop new RNA therapeutic strategies. This review is the synthesis of a series of conferences that took place during the 6th international course on post-transcriptional gene regulation at Institut Curie. This year, the course made a special focus on RNA chemistry.
Collapse
Affiliation(s)
- Clara Bonnet
- CNRS UMR3348 Genome integrity, RNA and Cancer, Institut Curie, University Paris-Saclay, 91401 Orsay, France
| | - Ana Luisa Dian
- CNRS UMR3348 Genome integrity, RNA and Cancer, Institut Curie, University Paris-Saclay, 91401 Orsay, France
| | - Tristan Espie-Caullet
- CNRS UMR3348 Genome integrity, RNA and Cancer, Institut Curie, University Paris-Saclay, 91401 Orsay, France
| | - Lucilla Fabbri
- CNRS UMR3348 Genome integrity, RNA and Cancer, Institut Curie, University Paris-Saclay, 91401 Orsay, France
| | - Lucie Lagadec
- CNRS UMR3348 Genome integrity, RNA and Cancer, Institut Curie, University Paris-Saclay, 91401 Orsay, France
| | - Thibaud Pivron
- CNRS UMR3348 Genome integrity, RNA and Cancer, Institut Curie, University Paris-Saclay, 91401 Orsay, France
| | - Martin Dutertre
- CNRS UMR3348 Genome integrity, RNA and Cancer, Institut Curie, University Paris-Saclay, 91401 Orsay, France
| | - Reini Luco
- CNRS UMR3348 Genome integrity, RNA and Cancer, Institut Curie, University Paris-Saclay, 91401 Orsay, France
| | - Albertas Navickas
- CNRS UMR3348 Genome integrity, RNA and Cancer, Institut Curie, University Paris-Saclay, 91401 Orsay, France
| | - Stephan Vagner
- CNRS UMR3348 Genome integrity, RNA and Cancer, Institut Curie, University Paris-Saclay, 91401 Orsay, France
| | - Daniela Verga
- CNRS UMR9187, Inserm U1196, Chemistry and Modelling for the Biology of Cancer, Institut Curie, université Paris-Saclay, 91405 Orsay, France
| | - Patricia Uguen
- CNRS UMR3348 Genome integrity, RNA and Cancer, Institut Curie, University Paris-Saclay, 91401 Orsay, France.
| |
Collapse
|
9
|
Sanejouand YH. Are Most Human-Specific Proteins Encoded by Long Noncoding RNAs? J Mol Evol 2024:10.1007/s00239-024-10174-z. [PMID: 38916610 DOI: 10.1007/s00239-024-10174-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 05/03/2024] [Indexed: 06/26/2024]
Abstract
By looking for a lack of homologs in a reference database of 27 well-annotated proteomes of primates and 52 well-annotated proteomes of other mammals, 170 putative human-specific proteins were identified. While most of them are deemed uncertain, 2 are known at the protein level and 23 at the transcript level, according to UniProt. Interestingly, 23 of these 25 proteins are found to be encoded or to have close homologs in an open reading frame of a long noncoding human RNA. However, half of them are predicted to be at least 80% globular, with a single structural domain, according to IUPred, and with at least 80% of ordered residues, according to flDPnn. Strikingly, there is a near-complete lack of structural knowledge about these proteins, with no tertiary structure presently available in the Protein Data Bank and a fair prediction for one of them in the AlphaFold Protein Structure Database. Moreover, knowledge about the function of these possibly key proteins remains scarce.
Collapse
Affiliation(s)
- Yves-Henri Sanejouand
- US2B, UMR 6286 of CNRS, Nantes University, 2 rue de la Houssinière, Nantes, 44322, Pays de la Loire, France.
| |
Collapse
|
10
|
Chen J, Li Q, Xia S, Arsala D, Sosa D, Wang D, Long M. The Rapid Evolution of De Novo Proteins in Structure and Complex. Genome Biol Evol 2024; 16:evae107. [PMID: 38753069 PMCID: PMC11149777 DOI: 10.1093/gbe/evae107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/10/2024] [Indexed: 06/06/2024] Open
Abstract
Recent studies in the rice genome-wide have established that de novo genes, evolving from noncoding sequences, enhance protein diversity through a stepwise process. However, the pattern and rate of their evolution in protein structure over time remain unclear. Here, we addressed these issues within a surprisingly short evolutionary timescale (<1 million years for 97% of Oryza de novo genes) with comparative approaches to gene duplicates. We found that de novo genes evolve faster than gene duplicates in the intrinsically disordered regions (such as random coils), secondary structure elements (such as α helix and β strand), hydrophobicity, and molecular recognition features. In de novo proteins, specifically, we observed an 8% to 14% decay in random coils and intrinsically disordered region lengths and a 2.3% to 6.5% increase in structured elements, hydrophobicity, and molecular recognition features, per million years on average. These patterns of structural evolution align with changes in amino acid composition over time as well. We also revealed higher positive charges but smaller molecular weights for de novo proteins than duplicates. Tertiary structure predictions showed that most de novo proteins, though not typically well folded on their own, readily form low-energy and compact complexes with other proteins facilitated by extensive residue contacts and conformational flexibility, suggesting a faster-binding scenario in de novo proteins to promote interaction. These analyses illuminate a rapid evolution of protein structure in de novo genes in rice genomes, originating from noncoding sequences, highlighting their quick transformation into active, protein complex-forming components within a remarkably short evolutionary timeframe.
Collapse
Affiliation(s)
- Jianhai Chen
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Qingrong Li
- Division of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
- Department of Cellular & Molecular Medicine, School of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Shengqian Xia
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Deanna Arsala
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Dylan Sosa
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Dong Wang
- Division of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
- Department of Cellular & Molecular Medicine, School of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
11
|
Mouhand A, Nakatani K, Kono F, Hippo Y, Matsuo T, Barthe P, Peters J, Suenaga Y, Tamada T, Roumestand C. 1H, 13C and 15N backbone and side-chain resonance assignments of the human oncogenic protein NCYM. BIOMOLECULAR NMR ASSIGNMENTS 2024; 18:65-70. [PMID: 38526839 DOI: 10.1007/s12104-024-10169-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 03/13/2024] [Indexed: 03/27/2024]
Abstract
NCYM is a cis-antisense gene of MYCN oncogene and encodes an oncogenic protein that stabilizes MYCN via inhibition of GSK3b. High NCYM expression levels are associated with poor clinical outcomes in human neuroblastomas, and NCYM overexpression promotes distant metastasis in animal models of neuroblastoma. Using vacuum-ultraviolet circular dichroism and small-angle X-ray scattering, we previously showed that NCYM has high flexibility with partially folded structures; however, further structural characterization is required for the design of anti-cancer agents targeting NCYM. Here we report the 1H, 15N and 13C nuclear magnetic resonance assignments of NCYM. Secondary structure prediction using Secondary Chemical Shifts and TALOS-N analysis demonstrates that the structure of NCYM is essentially disordered, even though residues in the central region of the peptide clearly present a propensity to adopt a dynamic helical structure. This preliminary study provides foundations for further analysis of interaction between NCYM and potential partners.
Collapse
Affiliation(s)
- Assia Mouhand
- Centre de Biologie Structurale (CBS), CNRS, INSERM, Univ Montpellier, Montpellier, France
| | - Kazuma Nakatani
- Laboratory of Evolutionary Oncology, Chiba Cancer Center Research Institute, Chiba, Japan
- Graduate School of Medical and Pharmaceutical Sciences, Chiba University, Chiba, Japan
| | - Fumiaki Kono
- Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Yoshitaka Hippo
- Laboratory of Evolutionary Oncology, Chiba Cancer Center Research Institute, Chiba, Japan
- Laboratory of Precision Tumor Model Systems, Chiba Cancer Center Research Institute, Chiba, Japan
| | - Tatsuhito Matsuo
- Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan
| | - Philippe Barthe
- Centre de Biologie Structurale (CBS), CNRS, INSERM, Univ Montpellier, Montpellier, France
| | - Judith Peters
- Institut Laue Langevin, 38042, Grenoble, France
- Université Grenoble Alpes, CNRS, LiPhy, 38400, Grenoble, France
| | - Yusuke Suenaga
- Laboratory of Evolutionary Oncology, Chiba Cancer Center Research Institute, Chiba, Japan.
| | - Taro Tamada
- Institute for Quantum Life Science, National Institutes for Quantum Science and Technology, Chiba, Japan.
- Department of Quantum Life Science, Graduate School of Science, Chiba University, Chiba, Japan.
| | - Christian Roumestand
- Centre de Biologie Structurale (CBS), CNRS, INSERM, Univ Montpellier, Montpellier, France.
| |
Collapse
|
12
|
Tong G, Hah N, Martinez TF. Comparison of software packages for detecting unannotated translated small open reading frames by Ribo-seq. Brief Bioinform 2024; 25:bbae268. [PMID: 38842510 PMCID: PMC11155197 DOI: 10.1093/bib/bbae268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 05/12/2024] [Accepted: 05/21/2024] [Indexed: 06/07/2024] Open
Abstract
Accurate and comprehensive annotation of microprotein-coding small open reading frames (smORFs) is critical to our understanding of normal physiology and disease. Empirical identification of translated smORFs is carried out primarily using ribosome profiling (Ribo-seq). While effective, published Ribo-seq datasets can vary drastically in quality and different analysis tools are frequently employed. Here, we examine the impact of these factors on identifying translated smORFs. We compared five commonly used software tools that assess open reading frame translation from Ribo-seq (RibORFv0.1, RibORFv1.0, RiboCode, ORFquant, and Ribo-TISH) and found surprisingly low agreement across all tools. Only ~2% of smORFs were called translated by all five tools, and ~15% by three or more tools when assessing the same high-resolution Ribo-seq dataset. For larger annotated genes, the same analysis showed ~74% agreement across all five tools. We also found that some tools are strongly biased against low-resolution Ribo-seq data, while others are more tolerant. Analyzing Ribo-seq coverage revealed that smORFs detected by more than one tool tend to have higher translation levels and higher fractions of in-frame reads, consistent with what was observed for annotated genes. Together these results support employing multiple tools to identify the most confident microprotein-coding smORFs and choosing the tools based on the quality of the dataset and the planned downstream characterization experiments of the predicted smORFs.
Collapse
Affiliation(s)
- Gregory Tong
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, CA 92617, United States
| | - Nasun Hah
- Chapman Charitable Foundations Genomic Sequencing Core, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| | - Thomas F Martinez
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, CA 92617, United States
- Department of Biological Chemistry, University of California, Irvine, Irvine, CA 92617, United States
- Chao Family Comprehensive Cancer Center, University of California, Irvine, Irvine, CA 92617, United States
| |
Collapse
|
13
|
Imamura M, Nakai R, Ohnuki M, Hamazaki Y, Tanabe H, Sato M, Harishima Y, Horikawa M, Watanabe M, Oota H, Nakagawa M, Suzuki S, Enard W. Generation of chimpanzee induced pluripotent stem cell lines for cross-species comparisons. In Vitro Cell Dev Biol Anim 2024; 60:544-554. [PMID: 38386235 DOI: 10.1007/s11626-024-00853-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 01/04/2024] [Indexed: 02/23/2024]
Abstract
As humans' closest living relatives, chimpanzees offer valuable insights into human evolution. However, technical and ethical limitations hinder investigations into the molecular and cellular foundations that distinguish chimpanzee and human traits. Recently, induced pluripotent stem cells (iPSCs) have emerged as a novel model for functional comparative studies and provided a non-invasive alternative for studying embryonic phenomena. In this study, we generated five new chimpanzee iPSC lines from peripheral blood cells and skin fibroblasts with SeV vectors carrying four reprogramming factors (human OCT3/4, SOX2, KLF4, and L-MYC) and characterized their pluripotency and differentiation potential. We also examined the expression of a human-specific non-coding RNA, HSTR1, which is predicted to be involved in human brain development. Our results show that the chimpanzee iPSCs possess pluripotent characteristics and can differentiate into various cell lineages. Moreover, we found that HSTR1 is expressed in human iPSCs and their neural derivatives but not in chimpanzee counterparts, supporting its possible role in human-specific brain development. As iPSCs are inherently variable due to genetic and epigenetic differences in donor cells or reprogramming procedures, it is essential to expand the number of chimpanzee iPSC lines to comprehensively capture the molecular and cellular properties representative of chimpanzees. Hence, our cells provide a valuable resource for investigating the function and regulation of human-specific transcripts such as HSTR1 and for understanding human evolution more generally.
Collapse
Affiliation(s)
- Masanori Imamura
- Molecular Biology Section, Center for the Evolutionary Origins of Human Behavior, Kyoto University, Inuyama, Aichi, 484-8506, Japan.
| | - Risako Nakai
- Molecular Biology Section, Center for the Evolutionary Origins of Human Behavior, Kyoto University, Inuyama, Aichi, 484-8506, Japan
- iPSC-Based Drug Discovery and Development Team, RIKEN BioResource Research Center, Soraku, Kyoto, 619-0237, Japan
- Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto, 606-8507, Japan
| | - Mari Ohnuki
- Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto, 606-8507, Japan
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, 606-8501, Japan
- Hakubi Center, Kyoto University, Kyoto, 606-8501, Japan
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-Universität München, München, Germany
| | - Yusuke Hamazaki
- Molecular Biology Section, Center for the Evolutionary Origins of Human Behavior, Kyoto University, Inuyama, Aichi, 484-8506, Japan
| | - Hideyuki Tanabe
- Research Center for Integrative Evolutionary Science, SOKENDAI (The Graduate University for Advanced Studies), Hayama, 240-0193, Japan
| | - Momoka Sato
- Department of Agricultural and Life Sciences, Faculty of Agriculture, Shinshu University, Kami-Ina, Nagano, 399-4598, Japan
| | - Yu Harishima
- Department of Bioengineering, University of California, Berkeley, CA, 94704, USA
| | - Musashi Horikawa
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Tokyo, 113-0033, Japan
| | - Mao Watanabe
- Department of Agricultural and Life Sciences, Faculty of Agriculture, Shinshu University, Kami-Ina, Nagano, 399-4598, Japan
| | - Hiroki Oota
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Tokyo, 113-0033, Japan
| | - Masato Nakagawa
- Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto, 606-8507, Japan
| | - Shunsuke Suzuki
- Department of Agricultural and Life Sciences, Faculty of Agriculture, Shinshu University, Kami-Ina, Nagano, 399-4598, Japan
| | - Wolfgang Enard
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-Universität München, München, Germany
| |
Collapse
|
14
|
Aubel M, Buchel F, Heames B, Jones A, Honc O, Bornberg-Bauer E, Hlouchova K. High-throughput Selection of Human de novo-emerged sORFs with High Folding Potential. Genome Biol Evol 2024; 16:evae069. [PMID: 38597156 PMCID: PMC11024478 DOI: 10.1093/gbe/evae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 03/23/2024] [Indexed: 04/11/2024] Open
Abstract
De novo genes emerge from previously noncoding stretches of the genome. Their encoded de novo proteins are generally expected to be similar to random sequences and, accordingly, with no stable tertiary fold and high predicted disorder. However, structural properties of de novo proteins and whether they differ during the stages of emergence and fixation have not been studied in depth and rely heavily on predictions. Here we generated a library of short human putative de novo proteins of varying lengths and ages and sorted the candidates according to their structural compactness and disorder propensity. Using Förster resonance energy transfer combined with Fluorescence-activated cell sorting, we were able to screen the library for most compact protein structures, as well as most elongated and flexible structures. We find that compact de novo proteins are on average slightly shorter and contain lower predicted disorder than less compact ones. The predicted structures for most and least compact de novo proteins correspond to expectations in that they contain more secondary structure content or higher disorder content, respectively. Our experiments indicate that older de novo proteins have higher compactness and structural propensity compared with young ones. We discuss possible evolutionary scenarios and their implications underlying the age-dependencies of compactness and structural content of putative de novo proteins.
Collapse
Affiliation(s)
- Margaux Aubel
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Filip Buchel
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
- Department of Biochemistry, Faculty of Science, Charles University, Prague, Czech Republic
| | - Brennen Heames
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Alun Jones
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Ondrej Honc
- Imaging Methods Core Facility, BIOCEV, Prague, Czech Republic
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
- Department of Protein Evolution, Max Planck-Institute for Biology Tuebingen, Tuebingen, Germany
| | - Klara Hlouchova
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, Czech Republic
| |
Collapse
|
15
|
Delihas N. Evolution of a Human-Specific De Novo Open Reading Frame and Its Linked Transcriptional Silencer. Int J Mol Sci 2024; 25:3924. [PMID: 38612733 PMCID: PMC11011693 DOI: 10.3390/ijms25073924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 03/23/2024] [Accepted: 03/26/2024] [Indexed: 04/14/2024] Open
Abstract
In the human genome, two short open reading frames (ORFs) separated by a transcriptional silencer and a small intervening sequence stem from the gene SMIM45. The two ORFs show different translational characteristics, and they also show divergent patterns of evolutionary development. The studies presented here describe the evolution of the components of SMIM45. One ORF consists of an ultra-conserved 68 amino acid (aa) sequence, whose origins can be traced beyond the evolutionary age of divergence of the elephant shark, ~462 MYA. The silencer also has ancient origins, but it has a complex and divergent pattern of evolutionary formation, as it overlaps both at the 68 aa ORF and the intervening sequence. The other ORF consists of 107 aa. It develops during primate evolution but is found to originate de novo from an ancestral non-coding genomic region with root origins within the Afrothere clade of placental mammals, whose evolutionary age of divergence is ~99 MYA. The formation of the complete 107 aa ORF during primate evolution is outlined, whereby sequence development is found to occur through biased mutations, with disruptive random mutations that also occur but lead to a dead-end. The 107 aa ORF is of particular significance, as there is evidence to suggest it is a protein that may function in human brain development. Its evolutionary formation presents a view of a human-specific ORF and its linked silencer that were predetermined in non-primate ancestral species. The genomic position of the silencer offers interesting possibilities for the regulation of transcription of the 107 aa ORF. A hypothesis is presented with respect to possible spatiotemporal expression of the 107 aa ORF in embryonic tissues.
Collapse
Affiliation(s)
- Nicholas Delihas
- Department of Microbiology and Immunology, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY 11794, USA
| |
Collapse
|
16
|
Domazet-Lošo M, Široki T, Šimičević K, Domazet-Lošo T. Macroevolutionary dynamics of gene family gain and loss along multicellular eukaryotic lineages. Nat Commun 2024; 15:2663. [PMID: 38531970 DOI: 10.1038/s41467-024-47017-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 03/11/2024] [Indexed: 03/28/2024] Open
Abstract
The gain and loss of genes fluctuate over evolutionary time in major eukaryotic clades. However, the full profile of these macroevolutionary trajectories is still missing. To give a more inclusive view on the changes in genome complexity across the tree of life, here we recovered the evolutionary dynamics of gene family gain and loss ranging from the ancestor of cellular organisms to 352 eukaryotic species. We show that in all considered lineages the gene family content follows a common evolutionary pattern, where the number of gene families reaches the highest value at a major evolutionary and ecological transition, and then gradually decreases towards extant organisms. This supports theoretical predictions and suggests that the genome complexity is often decoupled from commonly perceived organismal complexity. We conclude that simplification by gene family loss is a dominant force in Phanerozoic genomes of various lineages, probably underpinned by intense ecological specializations and functional outsourcing.
Collapse
Affiliation(s)
- Mirjana Domazet-Lošo
- Department of Applied Computing, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000, Zagreb, Croatia.
| | - Tin Široki
- Department of Applied Computing, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000, Zagreb, Croatia
| | - Korina Šimičević
- Department of Applied Computing, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000, Zagreb, Croatia
| | - Tomislav Domazet-Lošo
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
- School of Medicine, Catholic University of Croatia, Ilica 242, HR-10000, Zagreb, Croatia.
| |
Collapse
|
17
|
Liu X, Xiao C, Xu X, Zhang J, Mo F, Chen JY, Delihas N, Zhang L, An NA, Li CY. Origin of functional de novo genes in humans from "hopeful monsters". WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1845. [PMID: 38605485 DOI: 10.1002/wrna.1845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 03/13/2024] [Accepted: 03/18/2024] [Indexed: 04/13/2024]
Abstract
For a long time, it was believed that new genes arise only from modifications of preexisting genes, but the discovery of de novo protein-coding genes that originated from noncoding DNA regions demonstrates the existence of a "motherless" origination process for new genes. However, the features, distributions, expression profiles, and origin modes of these genes in humans seem to support the notion that their origin is not a purely "motherless" process; rather, these genes arise preferentially from genomic regions encoding preexisting precursors with gene-like features. In such a case, the gene loci are typically not brand new. In this short review, we will summarize the definition and features of human de novo genes and clarify their process of origination from ancestral non-coding genomic regions. In addition, we define the favored precursors, or "hopeful monsters," for the origin of de novo genes and present a discussion of the functional significance of these young genes in brain development and tumorigenesis in humans. This article is categorized under: RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution.
Collapse
Affiliation(s)
- Xiaoge Liu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chunfu Xiao
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Xinwei Xu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jie Zhang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Fan Mo
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jia-Yu Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing, China
| | - Nicholas Delihas
- Department of Microbiology and Immunology, Renaissance School of Medicine, Stony Brook University, Stony Brook, New York, USA
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing, China
| | - Ni A An
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chuan-Yun Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
- Southwest United Graduate School, Kunming, China
| |
Collapse
|
18
|
Hannon Bozorgmehr J. Four classic "de novo" genes all have plausible homologs and likely evolved from retro-duplicated or pseudogenic sequences. Mol Genet Genomics 2024; 299:6. [PMID: 38315248 DOI: 10.1007/s00438-023-02090-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 10/15/2023] [Indexed: 02/07/2024]
Abstract
Despite being previously regarded as extremely unlikely, the idea that entirely novel protein-coding genes can emerge from non-coding sequences has gradually become accepted over the past two decades. Examples of "de novo origination", resulting in lineage-specific "orphan" genes, lacking coding orthologs, are now produced every year. However, many are likely cases of duplicates that are difficult to recognize. Here, I re-examine the claims and show that four very well-known examples of genes alleged to have emerged completely "from scratch"- FLJ33706 in humans, Goddard in fruit flies, BSC4 in baker's yeast and AFGP2 in codfish-may have plausible evolutionary ancestors in pre-existing genes. The first two are likely highly diverged retrogenes coding for regulatory proteins that have been misidentified as orphans. The antifreeze glycoprotein, moreover, may not have evolved from repetitive non-genic sequences but, as in several other related cases, from an apolipoprotein that could have become pseudogenized before later being reactivated. These findings detract from various claims made about de novo gene birth and show there has been a tendency not to invest the necessary effort in searching for homologs outside of a very limited syntenic or phylostratigraphic methodology. A robust approach is used for improving detection that draws upon similarities, not just in terms of statistical sequence analysis, but also relating to biochemistry and function, to obviate notable failures to identify homologs.
Collapse
|
19
|
Mönttinen HAM, Frilander MJ, Löytynoja A. Generation of de novo miRNAs from template switching during DNA replication. Proc Natl Acad Sci U S A 2023; 120:e2310752120. [PMID: 38019864 PMCID: PMC10710096 DOI: 10.1073/pnas.2310752120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 11/01/2023] [Indexed: 12/01/2023] Open
Abstract
The mechanisms generating novel genes and genetic information are poorly known, even for microRNA (miRNA) genes with an extremely constrained design. All miRNA primary transcripts need to fold into a stem-loop structure to yield short gene products ([Formula: see text]22 nt) that bind and repress their mRNA targets. While a substantial number of miRNA genes are ancient and highly conserved, short secondary structures coding for entirely novel miRNA genes have been shown to emerge in a lineage-specific manner. Template switching is a DNA-replication-related mutation mechanism that can introduce complex changes and generate perfect base pairing for entire hairpin structures in a single event. Here, we show that the template-switching mutations (TSMs) have participated in the emergence of over 6,000 suitable hairpin structures in the primate lineage to yield at least 18 new human miRNA genes, that is 26% of the miRNAs inferred to have arisen since the origin of primates. While the mechanism appears random, the TSM-generated miRNAs are enriched in introns where they can be expressed with their host genes. The high frequency of TSM events provides raw material for evolution. Being orders of magnitude faster than other mechanisms proposed for de novo creation of genes, TSM-generated miRNAs enable near-instant rewiring of genetic information and rapid adaptation to changing environments.
Collapse
Affiliation(s)
- Heli A. M. Mönttinen
- Institute of Biotechnology, Helsinki Institute of Life Science, University of Helsinki, HelsinkiFI-000, Finland
| | - Mikko J. Frilander
- Institute of Biotechnology, Helsinki Institute of Life Science, University of Helsinki, HelsinkiFI-000, Finland
| | - Ari Löytynoja
- Institute of Biotechnology, Helsinki Institute of Life Science, University of Helsinki, HelsinkiFI-000, Finland
| |
Collapse
|
20
|
Kore H, Datta KK, Nagaraj SH, Gowda H. Protein-coding potential of non-canonical open reading frames in human transcriptome. Biochem Biophys Res Commun 2023; 684:149040. [PMID: 37897910 DOI: 10.1016/j.bbrc.2023.09.068] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 09/09/2023] [Accepted: 09/23/2023] [Indexed: 10/30/2023]
Abstract
In recent years, proteogenomics and ribosome profiling studies have identified a large number of proteins encoded by noncoding regions in the human genome. They are encoded by small open reading frames (sORFs) in the untranslated regions (UTRs) of mRNAs and long non-coding RNAs (lncRNAs). These sORF encoded proteins (SEPs) are often <150AA and show poor evolutionary conservation. A subset of them have been functionally characterized and shown to play an important role in fundamental biological processes including cardiac and muscle function, DNA repair, embryonic development and various human diseases. How many novel protein-coding regions exist in the human genome and what fraction of them are functionally important remains a mystery. In this review, we discuss current progress in unraveling SEPs, approaches used for their identification, their limitations and reliability of these identifications. We also discuss functionally characterized SEPs and their involvement in various biological processes and diseases. Lastly, we provide insights into their distinctive features compared to canonical proteins and challenges associated with annotating these in protein reference databases.
Collapse
Affiliation(s)
- Hitesh Kore
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Cancer Precision Medicine Group, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Queensland, 4006, Australia; Faculty of Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia.
| | - Keshava K Datta
- Proteomics and Metabolomics Platform, La Trobe University, Melbourne, VIC, 3083, Australia
| | - Shivashankar H Nagaraj
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Faculty of Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia
| | - Harsha Gowda
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Cancer Precision Medicine Group, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Queensland, 4006, Australia; Faculty of Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Faculty of Medicine, The University of Queensland, Queensland, 4072, Australia.
| |
Collapse
|
21
|
Prensner JR, Abelin JG, Kok LW, Clauser KR, Mudge JM, Ruiz-Orera J, Bassani-Sternberg M, Moritz RL, Deutsch EW, van Heesch S. What Can Ribo-Seq, Immunopeptidomics, and Proteomics Tell Us About the Noncanonical Proteome? Mol Cell Proteomics 2023; 22:100631. [PMID: 37572790 PMCID: PMC10506109 DOI: 10.1016/j.mcpro.2023.100631] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 07/21/2023] [Accepted: 08/08/2023] [Indexed: 08/14/2023] Open
Abstract
Ribosome profiling (Ribo-Seq) has proven transformative for our understanding of the human genome and proteome by illuminating thousands of noncanonical sites of ribosome translation outside the currently annotated coding sequences (CDSs). A conservative estimate suggests that at least 7000 noncanonical ORFs are translated, which, at first glance, has the potential to expand the number of human protein CDSs by 30%, from ∼19,500 annotated CDSs to over 26,000 annotated CDSs. Yet, additional scrutiny of these ORFs has raised numerous questions about what fraction of them truly produce a protein product and what fraction of those can be understood as proteins according to conventional understanding of the term. Adding further complication is the fact that published estimates of noncanonical ORFs vary widely by around 30-fold, from several thousand to several hundred thousand. The summation of this research has left the genomics and proteomics communities both excited by the prospect of new coding regions in the human genome but searching for guidance on how to proceed. Here, we discuss the current state of noncanonical ORF research, databases, and interpretation, focusing on how to assess whether a given ORF can be said to be "protein coding."
Collapse
Affiliation(s)
- John R Prensner
- Division of Pediatric Hematology/Oncology, Department of Pediatrics, University of Michigan Medical School, Ann Arbor, Michigan, USA; Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, Michigan, USA.
| | | | - Leron W Kok
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Karl R Clauser
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | - Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, Agora Center Bugnon 25A, University of Lausanne, Lausanne, Switzerland; Department of Oncology, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland; Agora Cancer Research Centre, Lausanne, Switzerland
| | - Robert L Moritz
- Institute for Systems Biology (ISB), Seattle, Washington, USA
| | - Eric W Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington, USA
| | | |
Collapse
|
22
|
Prensner JR, Abelin JG, Kok LW, Clauser KR, Mudge JM, Ruiz-Orera J, Bassani-Sternberg M, Deutsch EW, van Heesch S. What can Ribo-seq and proteomics tell us about the non-canonical proteome? BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.16.541049. [PMID: 37292611 PMCID: PMC10245706 DOI: 10.1101/2023.05.16.541049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Ribosome profiling (Ribo-seq) has proven transformative for our understanding of the human genome and proteome by illuminating thousands of non-canonical sites of ribosome translation outside of the currently annotated coding sequences (CDSs). A conservative estimate suggests that at least 7,000 non-canonical open reading frames (ORFs) are translated, which, at first glance, has the potential to expand the number of human protein-coding sequences by 30%, from ∼19,500 annotated CDSs to over 26,000. Yet, additional scrutiny of these ORFs has raised numerous questions about what fraction of them truly produce a protein product and what fraction of those can be understood as proteins according to conventional understanding of the term. Adding further complication is the fact that published estimates of non-canonical ORFs vary widely by around 30-fold, from several thousand to several hundred thousand. The summation of this research has left the genomics and proteomics communities both excited by the prospect of new coding regions in the human genome, but searching for guidance on how to proceed. Here, we discuss the current state of non-canonical ORF research, databases, and interpretation, focusing on how to assess whether a given ORF can be said to be "protein-coding". In brief The human genome encodes thousands of non-canonical open reading frames (ORFs) in addition to protein-coding genes. As a nascent field, many questions remain regarding non-canonical ORFs. How many exist? Do they encode proteins? What level of evidence is needed for their verification? Central to these debates has been the advent of ribosome profiling (Ribo-seq) as a method to discern genome-wide ribosome occupancy, and immunopeptidomics as a method to detect peptides that are processed and presented by MHC molecules and not observed in traditional proteomics experiments. This article provides a synthesis of the current state of non-canonical ORF research and proposes standards for their future investigation and reporting. Highlights Combined use of Ribo-seq and proteomics-based methods enables optimal confidence in detecting non-canonical ORFs and their protein products.Ribo-seq can provide more sensitive detection of non-canonical ORFs, but data quality and analytical pipelines will impact results.Non-canonical ORF catalogs are diverse and span both high-stringency and low-stringency ORF nominations.A framework for standardized non-canonical ORF evidence will advance the research field.
Collapse
Affiliation(s)
- John R. Prensner
- Department of Pediatrics, Division of Pediatric Hematology/Oncology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | | | - Leron W. Kok
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| | - Karl R. Clauser
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Jonathan M. Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, University of Lausanne, Agora Center Bugnon 25A, 1005 Lausanne, Switzerland
- Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1005 Lausanne, Switzerland
- Agora Cancer Research Centre, 1011 Lausanne, Switzerland
| | - Eric W. Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
| | - Sebastiaan van Heesch
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| |
Collapse
|