1
|
Translation and natural selection of micropeptides from long non-canonical RNAs. Nat Commun 2022; 13:6515. [PMID: 36316320 PMCID: PMC9622821 DOI: 10.1038/s41467-022-34094-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 10/13/2022] [Indexed: 12/25/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) are transcripts longer than 200 nucleotides but lacking canonical coding sequences. Apparently unable to produce peptides, lncRNA function seems to rely only on RNA expression, sequence and structure. Here, we exhaustively detect in-vivo translation of small open reading frames (small ORFs) within lncRNAs using Ribosomal profiling during Drosophila melanogaster embryogenesis. We show that around 30% of lncRNAs contain small ORFs engaged by ribosomes, leading to regulated translation of 100 to 300 micropeptides. We identify lncRNA features that favour translation, such as cistronicity, Kozak sequences, and conservation. For the latter, we develop a bioinformatics pipeline to detect small ORF homologues, and reveal evidence of natural selection favouring the conservation of micropeptide sequence and function across evolution. Our results expand the repertoire of lncRNA biochemical functions, and suggest that lncRNAs give rise to novel coding genes throughout evolution. Since most lncRNAs contain small ORFs with as yet unknown translation potential, we propose to rename them "long non-canonical RNAs".
Collapse
|
2
|
Jimenez J. Protein-coding tRNA sequences? Gene 2022; 814:146154. [PMID: 34995735 DOI: 10.1016/j.gene.2021.146154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 12/14/2021] [Accepted: 12/20/2021] [Indexed: 11/17/2022]
Abstract
Transfer RNAs (tRNAs) are ancient molecules likely predating the translation machinery. These extremely conserved RNA molecules transfer amino acids to the ribosome for the synthesis of proteins encoded by mRNAs, but canonical tRNAs are not protein-coding RNAs. Surprisely, when virtually translated, I observed that peptides derived from tRNA sequences match thousands of protein entries in databases. The analysis of these sequences indicates that the vast majority of these tRNA-derived proteins are annotated as small hypothetical peptides, likely arising from sequencing, prediction and/or annotation errors. But life often surpasses fiction. Importantly, tRNA-encoded amino acid domains were also found embedded in large functional proteins. Phylogenetic analysis of representative tRNA-derived protein domains may provide new insights into the origin, plasticity, and evolution of protein-coding genes.
Collapse
Affiliation(s)
- Juan Jimenez
- Centro Andaluz de Biología del Desarrollo, Universidad Pablo de Olavide/Consejo Superior de Investigaciones Científicas, Carretera de Utrera, km1, 41013 Sevilla, Spain.
| |
Collapse
|
3
|
Rubio A, Jimenez J, Pérez-Pulido AJ. Assessment of selection pressure exerted on genes from complete pangenomes helps to improve the accuracy in the prediction of new genes. Brief Bioinform 2022; 23:6519794. [PMID: 35108356 DOI: 10.1093/bib/bbac010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 12/21/2021] [Accepted: 01/10/2022] [Indexed: 11/14/2022] Open
Abstract
Bacterial genomes are massively sequenced, and they provide valuable data to better know the complete set of genes of a species. The analysis of thousands of bacterial strains can identify both shared genes and those appearing only in the pathogenic ones. Current computational gene finders facilitate this task but often miss some existing genes. However, the present availability of different genomes from the same species is useful to estimate the selective pressure applied on genes of complete pangenomes. It may assist in evaluating gene predictions either by checking the certainty of a new gene or annotating it as a gene under positive selection. Here, we estimated the selective pressure of 19 271 genes that are part of the pangenome of the human opportunistic pathogen Acinetobacter baumannii and found that most genes in this bacterium are subject to negative selection. However, 23% of them showed values compatible with positive selection. These latter were mainly uncharacterized proteins or genes required to evade the host defence system including genes related to resistance and virulence whose changes may be favoured to acquire new functions. Finally, we evaluated the utility of measuring selection pressure in the detection of sequencing errors and the validation of gene prediction.
Collapse
Affiliation(s)
- Alejandro Rubio
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA), Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, Sevilla 41013, Spain
| | - Juan Jimenez
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA), Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, Sevilla 41013, Spain
| | - Antonio J Pérez-Pulido
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA), Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, Sevilla 41013, Spain
| |
Collapse
|
4
|
Abstract
The number of complete genome sequences explodes more and more with each passing year. Thus, methods for genome annotation need to be honed constantly to handle the deluge of information. Annotation of pseudogenes (i.e., gene copies that appear not to make a functional protein) in genomes is a persistent problem; here, we overview pseudogene annotation methods that are based on the detection of sequence homology in genomic DNA.
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Biology, McGill University, Montreal, QC, Canada.
| |
Collapse
|
5
|
Casimiro-Soriguer CS, Rigual MM, Brokate-Llanos AM, Muñoz MJ, Garzón A, Pérez-Pulido AJ, Jimenez J. Using AnABlast for intergenic sORF prediction in the Caenorhabditis elegans genome. Bioinformatics 2020; 36:4827-4832. [PMID: 32614398 PMCID: PMC7723330 DOI: 10.1093/bioinformatics/btaa608] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 06/21/2020] [Accepted: 06/23/2020] [Indexed: 11/29/2022] Open
Abstract
Motivation Short bioactive peptides encoded by small open reading frames (sORFs) play important roles in eukaryotes. Bioinformatics prediction of ORFs is an early step in a genome sequence analysis, but sORFs encoding short peptides, often using non-AUG initiation codons, are not easily discriminated from false ORFs occurring by chance. Results AnABlast is a computational tool designed to highlight putative protein-coding regions in genomic DNA sequences. This protein-coding finder is independent of ORF length and reading frame shifts, thus making of AnABlast a potentially useful tool to predict sORFs. Using this algorithm, here, we report the identification of 82 putative new intergenic sORFs in the Caenorhabditis elegans genome. Sequence similarity, motif presence, expression data and RNA interference experiments support that the underlined sORFs likely encode functional peptides, encouraging the use of AnABlast as a new approach for the accurate prediction of intergenic sORFs in annotated eukaryotic genomes. Availability and implementation AnABlast is freely available at http://www.bioinfocabd.upo.es/ab/. The C.elegans genome browser with AnABlast results, annotated genes and all data used in this study is available at http://www.bioinfocabd.upo.es/celegans. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- C S Casimiro-Soriguer
- Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - M M Rigual
- Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - A M Brokate-Llanos
- Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - M J Muñoz
- Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - A Garzón
- Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - A J Pérez-Pulido
- Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - J Jimenez
- Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, 41013 Sevilla, Spain
| |
Collapse
|
6
|
Huraiova B, Kanovits J, Polakova SB, Cipak L, Benko Z, Sevcovicova A, Anrather D, Ammerer G, Duncan CDS, Mata J, Gregan J. Proteomic analysis of meiosis and characterization of novel short open reading frames in the fission yeast Schizosaccharomyces pombe. Cell Cycle 2020; 19:1777-1785. [PMID: 32594847 PMCID: PMC7469465 DOI: 10.1080/15384101.2020.1779470] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 05/05/2020] [Accepted: 05/11/2020] [Indexed: 01/10/2023] Open
Abstract
Meiosis is the process by which haploid gametes are produced from diploid precursor cells. We used stable isotope labeling by amino acids in cell culture (SILAC) to characterize the meiotic proteome in the fission yeast Schizosaccharomyces pombe. We compared relative levels of proteins extracted from cells harvested around meiosis I with those of meiosis II, and proteins from premeiotic S phase with the interval between meiotic divisions, when S phase is absent. Our proteome datasets revealed peptides corresponding to short open reading frames (sORFs) that have been previously identified by ribosome profiling as new translated regions. We verified expression of selected sORFs by Western blotting and analyzed the phenotype of deletion mutants. Our data provide a resource for studying meiosis that may help understand differences between meiosis I and meiosis II and how S phase is suppressed between the two meiotic divisions.
Collapse
Affiliation(s)
- Barbora Huraiova
- Department of Genetics, Faculty of Natural Sciences, Comenius University in Bratislava, Bratislava, Slovakia
| | - Judit Kanovits
- Department of Genetics, Faculty of Natural Sciences, Comenius University in Bratislava, Bratislava, Slovakia
| | - Silvia Bagelova Polakova
- Department of Genetics, Faculty of Natural Sciences, Comenius University in Bratislava, Bratislava, Slovakia
- Department of Membrane Biochemistry, Inst. Of Animal Biochemistry and Genetics, Centre of Biosciences, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Lubos Cipak
- Department of Genetics, Cancer Research Institute, Biomedical Research Center, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Zsigmond Benko
- Department of Membrane Biochemistry, Inst. Of Animal Biochemistry and Genetics, Centre of Biosciences, Slovak Academy of Sciences, Bratislava, Slovakia
- Department of Molecular Biotechnology and Microbiology, Institute of Biotechnology, Faculty of Science and Technology, University of Debrecen, Debrecen, Hungary
| | - Andrea Sevcovicova
- Department of Genetics, Faculty of Natural Sciences, Comenius University in Bratislava, Bratislava, Slovakia
| | - Dorothea Anrather
- Mass Spectrometry Facility and Department of Biochemistry, Max Perutz Labs, University of Vienna, Vienna Biocenter, Austria
| | - Gustav Ammerer
- Mass Spectrometry Facility and Department of Biochemistry, Max Perutz Labs, University of Vienna, Vienna Biocenter, Austria
| | | | - Juan Mata
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Juraj Gregan
- Department of Chromosome Biology, Max Perutz Labs, Vienna Biocenter, University of Vienna, Vienna, Austria
- Advanced Microscopy Facility, Vienna Biocenter Core Facilities, Vienna, Austria
| |
Collapse
|
7
|
Casimiro-Soriguer CS, Rubio A, Jimenez J, Pérez-Pulido AJ. Ancient evolutionary signals of protein-coding sequences allow the discovery of new genes in the Drosophila melanogaster genome. BMC Genomics 2020; 21:210. [PMID: 32138644 PMCID: PMC7059364 DOI: 10.1186/s12864-020-6632-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 02/28/2020] [Indexed: 12/20/2022] Open
Abstract
Background The current growth in DNA sequencing techniques makes of genome annotation a crucial task in the genomic era. Traditional gene finders focus on protein-coding sequences, but they are far from being exhaustive. The number of this kind of genes continuously increases due to new experimental data and development of improved bioinformatics algorithms. Results In this context, AnABlast represents a novel in silico strategy, based on the accumulation of short evolutionary signals identified by protein sequence alignments of low score. This strategy potentially highlights protein-coding regions in genomic sequences regardless of traditional homology or translation signatures. Here, we analyze the evolutionary information that the accumulation of these short signals encloses. Using the Drosophila melanogaster genome, we stablish optimal parameters for the accurate gene prediction with AnABlast and show that this new strategy significantly contributes to add genes, exons and pseudogenes regions, yet to be discovered in both already annotated and new genomes. Conclusions AnABlast can be freely used to analyze genomic regions of whole genomes where it contributes to complete the previous annotation.
Collapse
Affiliation(s)
- Carlos S Casimiro-Soriguer
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, 41013, Sevilla, Spain
| | - Alejandro Rubio
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, 41013, Sevilla, Spain
| | - Juan Jimenez
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, 41013, Sevilla, Spain
| | - Antonio J Pérez-Pulido
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, 41013, Sevilla, Spain.
| |
Collapse
|
8
|
Rubio A, Mier P, Andrade-Navarro MA, Garzón A, Jiménez J, Pérez-Pulido AJ. CRISPR sequences are sometimes erroneously translated and can contaminate public databases with spurious proteins containing spaced repeats. Database (Oxford) 2020; 2020:baaa088. [PMID: 33206958 PMCID: PMC7673337 DOI: 10.1093/database/baaa088] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 09/07/2020] [Accepted: 09/10/2020] [Indexed: 12/20/2022]
Abstract
The genomics era is resulting in the generation of a plethora of biological sequences that are usually stored in public databases. There are many computational tools that facilitate the annotation of these sequences, but sometimes they produce mistakes that enter the databases and can be propagated when erroneous data are used for secondary analyses, such as gene prediction or homology searching. While developing a computational gene finder based on protein-coding sequences, we discovered that the reference UniProtKB protein database is contaminated with some spurious sequences translated from DNA containing clustered regularly interspaced short palindromic repeats. We therefore encourage developers of prokaryotic computational gene finders and protein database curators to consider this source of error.
Collapse
Affiliation(s)
- Alejandro Rubio
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, Ctra. Utrera, Km.1, 41013, Sevilla, Spain
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg 2, 55128, Mainz, Germany
| | | | - Andrés Garzón
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, Ctra. Utrera, Km.1, 41013, Sevilla, Spain
| | - Juan Jiménez
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, Ctra. Utrera, Km.1, 41013, Sevilla, Spain
| | - Antonio J Pérez-Pulido
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, Ctra. Utrera, Km.1, 41013, Sevilla, Spain
| |
Collapse
|
9
|
Rubio A, Casimiro-Soriguer CS, Mier P, Andrade-Navarro MA, Garzón A, Jimenez J, Pérez-Pulido AJ. AnABlast: Re-searching for Protein-Coding Sequences in Genomic Regions. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2019; 1962:207-214. [PMID: 31020562 DOI: 10.1007/978-1-4939-9173-0_12] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AnABlast is a computational tool that highlights protein-coding regions within intergenic and intronic DNA sequences which escape detection by standard gene prediction algorithms. DNA sequences with small protein-coding genes or exons, complex intron-containing genes, or degenerated DNA fragments are efficiently targeted by AnABlast. Furthermore, this algorithm is particularly useful in detecting protein-coding sequences with nonsignificant homologs to sequences in databases. AnABlast can be executed online at http://www.bioinfocabd.upo.es/anablast/ .
Collapse
Affiliation(s)
- Alejandro Rubio
- Facultad de Ciencias Experimentales (Área de Genética), Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, Sevilla, Spain
| | - Carlos S Casimiro-Soriguer
- Facultad de Ciencias Experimentales (Área de Genética), Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, Sevilla, Spain
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Mainz, Germany
| | | | - Andrés Garzón
- Facultad de Ciencias Experimentales (Área de Genética), Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, Sevilla, Spain
| | - Juan Jimenez
- Facultad de Ciencias Experimentales (Área de Genética), Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, Sevilla, Spain.
| | - Antonio J Pérez-Pulido
- Facultad de Ciencias Experimentales (Área de Genética), Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, Sevilla, Spain.
| |
Collapse
|
10
|
Tang J, Lin J, Li H, Li X, Yang Q, Cheng ZM, Chang Y. Characterization of CIPK Family in Asian Pear (Pyrus bretschneideri Rehd) and Co-expression Analysis Related to Salt and Osmotic Stress Responses. FRONTIERS IN PLANT SCIENCE 2016; 7:1361. [PMID: 27656193 PMCID: PMC5013074 DOI: 10.3389/fpls.2016.01361] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2016] [Accepted: 08/26/2016] [Indexed: 05/24/2023]
Abstract
Asian pear (Pyrus bretschneideri) is one of the most important fruit crops in the world, and its growth and productivity are frequently affected by abiotic stresses. Calcineurin B-like interacting protein kinases (CIPKs) as caladium-sensor protein kinases interact with Ca(2+)-binding CBLs to extensively mediate abiotic stress responses in plants. Although the pear genome sequence has been released, little information is available about the CIPK genes in pear, especially in response to salt and osmotic stresses. In this study, we systematically identified 28 CIPK family members from the sequenced pear genome and analyzed their organization, phylogeny, gene structure, protein motif, and synteny duplication divergences. Most duplicated PbCIPKs underwent purifying selection, and their evolutionary divergences accompanied with the pear whole genome duplication. We also investigated stress -responsive expression patterns and co-expression networks of CIPK family under salt and osmotic stresses, and the distribution of stress-related cis-regulatory elements in promoter regions. Our results suggest that most PbCIPKs could play important roles in the abiotic stress responses. Some PbCIPKs, such as PbCIPK22, -19, -18, -15, -8, and -6 can serve as core regulators in response to salt and osmotic stresses based on co-expression networks of PbCIPKs. Some sets of genes that were involved in response to salt did not overlap with those in response to osmotic responses, suggesting the sub-functionalization of CIPK genes in stress responses. This study revealed some candidate genes that play roles in early responses to salt and osmotic stress for further characterization of abiotic stress responses medicated by CIPKs in pear.
Collapse
Affiliation(s)
- Jun Tang
- Jiangsu Key Laboratory for Horticultural Crop Genetic Improvement, Institute of Horticulture, Jiangsu Academy of Agricultural SciencesNanjing, China
- Department of Plant Sciences, University of Tennessee at Knoxville, KnoxvilleTN, USA
| | - Jing Lin
- Jiangsu Key Laboratory for Horticultural Crop Genetic Improvement, Institute of Horticulture, Jiangsu Academy of Agricultural SciencesNanjing, China
| | - Hui Li
- Jiangsu Key Laboratory for Horticultural Crop Genetic Improvement, Institute of Horticulture, Jiangsu Academy of Agricultural SciencesNanjing, China
| | - Xiaogang Li
- Jiangsu Key Laboratory for Horticultural Crop Genetic Improvement, Institute of Horticulture, Jiangsu Academy of Agricultural SciencesNanjing, China
| | - Qingsong Yang
- Jiangsu Key Laboratory for Horticultural Crop Genetic Improvement, Institute of Horticulture, Jiangsu Academy of Agricultural SciencesNanjing, China
| | - Zong-Ming Cheng
- Department of Plant Sciences, University of Tennessee at Knoxville, KnoxvilleTN, USA
| | - Youhong Chang
- Jiangsu Key Laboratory for Horticultural Crop Genetic Improvement, Institute of Horticulture, Jiangsu Academy of Agricultural SciencesNanjing, China
| |
Collapse
|