Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Harrison PM, Hegyi H, Balasubramanian S, Luscombe NM, Bertone P, Echols N, Johnson T, Gerstein M. Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. Genome Res 2002;12:272-80. [PMID: 11827946 PMCID: PMC155275 DOI: 10.1101/gr.207102] [Citation(s) in RCA: 151] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

For:	Harrison PM, Hegyi H, Balasubramanian S, Luscombe NM, Bertone P, Echols N, Johnson T, Gerstein M. Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. Genome Res 2002;12:272-80. [PMID: 11827946 PMCID: PMC155275 DOI: 10.1101/gr.207102] [Citation(s) in RCA: 151] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Number

Cited by Other Article(s)

Engineering Ribosomes to Alleviate Abiotic Stress in Plants: A Perspective. PLANTS 2022;11:plants11162097. [PMID: 36015400 PMCID: PMC9415564 DOI: 10.3390/plants11162097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 08/10/2022] [Accepted: 08/10/2022] [Indexed: 11/16/2022]

Pseudogene MSTO2P Interacts with miR-128-3p to Regulate Coptisine Sensitivity of Non-Small-Cell Lung Cancer (NSCLC) through TGF-β Signaling and VEGFC. JOURNAL OF ONCOLOGY 2022;2022:9864411. [PMID: 35794983 PMCID: PMC9251142 DOI: 10.1155/2022/9864411] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/06/2022] [Accepted: 06/08/2022] [Indexed: 12/02/2022]

Schultz JA, Hebert PDN. Do pseudogenes pose a problem for metabarcoding marine animal communities? Mol Ecol Resour 2022;22:2897-2914. [PMID: 35700118 DOI: 10.1111/1755-0998.13667] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 06/01/2022] [Indexed: 11/30/2022]

Oh J, Lee SG, Park C. PIC-Me: paralogs and isoforms classifier based on machine-learning approaches. BMC Bioinformatics 2021;22:311. [PMID: 34674638 PMCID: PMC8529730 DOI: 10.1186/s12859-021-04229-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Accepted: 06/01/2021] [Indexed: 11/10/2022] Open

Abstract

Background

Paralogs formed through gene duplication and isoforms formed through alternative splicing have been important processes for increasing protein diversity and maintaining cellular homeostasis. Despite their recognized importance and the advent of large-scale genomic and transcriptomic analyses, paradoxically, accurate annotations of all gene loci to allow the identification of paralogs and isoforms remain surprisingly incomplete. In particular, the global analysis of the transcriptome of a non-model organism for which there is no reference genome is especially challenging.

Results

To reliably discriminate between the paralogs and isoforms in RNA-seq data, we redefined the pre-existing sequence features (sequence similarity, inverse count of consecutive identical or non-identical blocks, and match-mismatch fraction) previously derived from full-length cDNAs and EST sequences and described newly discovered genomic and transcriptomic features (twilight zone of protein sequence alignment and expression level difference). In addition, the effectiveness and relevance of the proposed features were verified with two widely used support vector machine (SVM) and random forest (RF) models. From nine RNA-seq datasets, all AUC (area under the curve) scores of ROC (receiver operating characteristic) curves were over 0.9 in the RF model and significantly higher than those in the SVM model.

Conclusions

In this study, using an RF model with five proposed RNA-seq features, we implemented our method called Paralogs and Isoforms Classifier based on Machine-learning approaches (PIC-Me) and showed that it outperformed an existing method. Finally, we envision that our tool will be a valuable computational resource for the genomics community to help with gene annotation and will aid in comparative transcriptomics and evolutionary genomics studies, especially those on non-model organisms.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-021-04229-x.

Collapse

Garewal N, Goyal N, Pathania S, Kaur J, Singh K. Gauging the trends of pseudogenes in plants. Crit Rev Biotechnol 2021;41:1114-1129. [PMID: 33993808 DOI: 10.1080/07388551.2021.1901648] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Harrison PM. Computational Methods for Pseudogene Annotation Based on Sequence Homology. Methods Mol Biol 2021;2324:35-48. [PMID: 34165707 DOI: 10.1007/978-1-0716-1503-4_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Characterization and molecular evolution of claudin genes in the Pungitius sinensis. J Comp Physiol B 2020;190:749-759. [PMID: 32778926 DOI: 10.1007/s00360-020-01301-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 07/20/2020] [Accepted: 08/04/2020] [Indexed: 10/23/2022]

Pseudogene MSTO2P enhances hypoxia-induced osteosarcoma malignancy by upregulating PD-L1. Biochem Biophys Res Commun 2020;530:673-679. [PMID: 32768186 DOI: 10.1016/j.bbrc.2020.07.113] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 07/22/2020] [Accepted: 07/23/2020] [Indexed: 11/20/2022]

Blommaert J, Riss S, Hecox-Lea B, Mark Welch DB, Stelzer CP. Small, but surprisingly repetitive genomes: transposon expansion and not polyploidy has driven a doubling in genome size in a metazoan species complex. BMC Genomics 2019;20:466. [PMID: 31174483 PMCID: PMC6555955 DOI: 10.1186/s12864-019-5859-y] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 05/29/2019] [Indexed: 02/01/2023] Open

Abstract

BACKGROUND

The causes and consequences of genome size variation across Eukaryotes, which spans five orders of magnitude, have been hotly debated since before the advent of genome sequencing. Previous studies have mostly examined variation among larger taxonomic units (e.g., orders, or genera), while comparisons among closely related species are rare. Rotifers of the Brachionus plicatilis species complex exhibit a seven-fold variation in genome size and thus represent a unique opportunity to study such changes on a relatively short evolutionary timescale. Here, we sequenced and analysed the genomes of four species of this complex with nuclear DNA contents spanning 110-422 Mbp. To establish the likely mechanisms of genome size change, we analysed both sequencing read libraries and assemblies for signatures of polyploidy and repetitive element content. We also compared these genomes to that of B. calyciflorus, the closest relative with a sequenced genome (293 Mbp nuclear DNA content).

RESULTS

Despite the very large differences in genome size, we saw no evidence of ploidy level changes across the B. plicatilis complex. However, repetitive element content explained a large portion of genome size variation (at least 54%). The species with the largest genome, B. asplanchnoidis, has a strikingly high 44% repetitive element content, while the smaller B. plicatilis genomes contain between 14 and 25% repetitive elements. According to our analyses, the B. calyciflorus genome contains 39% repetitive elements, which is substantially higher than previously reported (21%), and suggests that high repetitive element load could be widespread in monogonont rotifers.

CONCLUSIONS

Even though the genome sizes of these species are at the low end of the metazoan spectrum, their genomes contain substantial amounts of repetitive elements. Polyploidy does not appear to play a role in genome size variations in these species, and these variations can be mostly explained by changes in repetitive element content. This contradicts the naïve expectation that small genomes are streamlined, or less complex, and that large variations in nuclear DNA content between closely related species are due to polyploidy.

Collapse

Ershov NI, Mordvinov VA, Prokhortchouk EB, Pakharukova MY, Gunbin KV, Ustyantsev K, Genaev MA, Blinov AG, Mazur A, Boulygina E, Tsygankova S, Khrameeva E, Chekanov N, Fan G, Xiao A, Zhang H, Xu X, Yang H, Solovyev V, Lee SMY, Liu X, Afonnikov DA, Skryabin KG. New insights from Opisthorchis felineus genome: update on genomics of the epidemiologically important liver flukes. BMC Genomics 2019;20:399. [PMID: 31117933 PMCID: PMC6530080 DOI: 10.1186/s12864-019-5752-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 04/29/2019] [Indexed: 01/25/2023] Open

Abstract

Background

The three epidemiologically important Opisthorchiidae liver flukes Opisthorchis felineus, O. viverrini, and Clonorchis sinensis, are believed to harbour similar potencies to provoke hepatobiliary diseases in their definitive hosts, although their populations have substantially different ecogeographical aspects including habitat, preferred hosts, population structure. Lack of O. felineus genomic data is an obstacle to the development of comparative molecular biological approaches necessary to obtain new knowledge about the biology of Opisthorchiidae trematodes, to identify essential pathways linked to parasite-host interaction, to predict genes that contribute to liver fluke pathogenesis and for the effective prevention and control of the disease.

Results

Here we present the first draft genome assembly of O. felineus and its gene repertoire accompanied by a comparative analysis with that of O. viverrini and Clonorchis sinensis. We observed both noticeably high heterozygosity of the sequenced individual and substantial genetic diversity in a pooled sample. This indicates that potency of O. felineus population for rapid adaptive response to control and preventive measures of opisthorchiasis is higher than in O. viverrini and C. sinensis. We also have found that all three species are characterized by more intensive involvement of trans-splicing in RNA processing compared to other trematodes.

Conclusion

All revealed peculiarities of structural organization of genomes are of extreme importance for a proper description of genes and their products in these parasitic species. This should be taken into account both in academic and applied research of epidemiologically important liver flukes. Further comparative genomics studies of liver flukes and non-carcinogenic flatworms allow for generation of well-grounded hypotheses on the mechanisms underlying development of cholangiocarcinoma associated with opisthorchiasis and clonorchiasis as well as species-specific mechanisms of these diseases.

Electronic supplementary material

The online version of this article (10.1186/s12864-019-5752-8) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Nikita I Ershov Institute of Cytology and Genetics SB RAS, 10 Lavrentiev Ave, Novosibirsk, 630090, Russia.
Viatcheslav A Mordvinov Institute of Cytology and Genetics SB RAS, 10 Lavrentiev Ave, Novosibirsk, 630090, Russia.
Egor B Prokhortchouk Russian Federal Research Center for Biotechnology, 33/2 Leninsky prospect, Moscow, 119071, Russia. .,ZAO Genoanalytica, 1 Leninskie Gory street, Moscow, 119234, Russia.
Mariya Y Pakharukova Institute of Cytology and Genetics SB RAS, 10 Lavrentiev Ave, Novosibirsk, 630090, Russia.,Novosibirsk State University, 2 Pirogova Str, Novosibirsk, 630090, Russia
Konstantin V Gunbin Institute of Cytology and Genetics SB RAS, 10 Lavrentiev Ave, Novosibirsk, 630090, Russia
Kirill Ustyantsev Institute of Cytology and Genetics SB RAS, 10 Lavrentiev Ave, Novosibirsk, 630090, Russia
Mikhail A Genaev Institute of Cytology and Genetics SB RAS, 10 Lavrentiev Ave, Novosibirsk, 630090, Russia
Alexander G Blinov Institute of Cytology and Genetics SB RAS, 10 Lavrentiev Ave, Novosibirsk, 630090, Russia
Alexander Mazur Russian Federal Research Center for Biotechnology, 33/2 Leninsky prospect, Moscow, 119071, Russia
Eugenia Boulygina Federal Research Center Kurchatov Institute, Moscow, Russia
Svetlana Tsygankova Federal Research Center Kurchatov Institute, Moscow, Russia
Ekaterina Khrameeva ZAO Genoanalytica, 1 Leninskie Gory street, Moscow, 119234, Russia
Nikolay Chekanov Russian Federal Research Center for Biotechnology, 33/2 Leninsky prospect, Moscow, 119071, Russia
Guangyi Fan BGI-Shenzhen, 11 Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China.,State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macao, China
An Xiao BGI-Shenzhen, 11 Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
He Zhang BGI-Shenzhen, 11 Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
Xun Xu BGI-Shenzhen, 11 Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
Huanming Yang BGI-Shenzhen, 11 Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
Victor Solovyev Softberry Inc., 116 Radio Circle, Suite 400, Mount Kisco, NY, 10549, USA
Simon Ming-Yuen Lee State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macao, China
Xin Liu BGI-Shenzhen, 11 Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
Dmitry A Afonnikov Institute of Cytology and Genetics SB RAS, 10 Lavrentiev Ave, Novosibirsk, 630090, Russia.,Novosibirsk State University, 2 Pirogova Str, Novosibirsk, 630090, Russia
Konstantin G Skryabin Russian Federal Research Center for Biotechnology, 33/2 Leninsky prospect, Moscow, 119071, Russia.,Federal Research Center Kurchatov Institute, Moscow, Russia

Collapse

Zhao X, Hao S, Wang M, Xing D, Wang C. Knockdown of pseudogene DUXAP8 expression in glioma suppresses tumor cell proliferation. Oncol Lett 2019;17:3511-3516. [PMID: 30867791 DOI: 10.3892/ol.2019.9994] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2018] [Accepted: 12/03/2018] [Indexed: 12/17/2022] Open

Emadi-Baygi M, Sedighi R, Nourbakhsh N, Nikpour P. Pseudogenes in gastric cancer pathogenesis: a review article. Brief Funct Genomics 2018;16:348-360. [PMID: 28459995 DOI: 10.1093/bfgp/elx004] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Wei Y, Chang Z, Wu C, Zhu Y, Li K, Xu Y. Identification of potential cancer-related pseudogenes in lung adenocarcinoma based on ceRNA hypothesis. Oncotarget 2017;8:59036-59047. [PMID: 28938616 PMCID: PMC5601712 DOI: 10.18632/oncotarget.19933] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 07/26/2017] [Indexed: 01/01/2023] Open

Casola C, Betrán E. The Genomic Impact of Gene Retrocopies: What Have We Learned from Comparative Genomics, Population Genomics, and Transcriptomic Analyses? Genome Biol Evol 2017;9:1351-1373. [PMID: 28605529 PMCID: PMC5470649 DOI: 10.1093/gbe/evx081] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/18/2017] [Indexed: 02/07/2023] Open

Abstract

Gene duplication is a major driver of organismal evolution. Gene retroposition is a mechanism of gene duplication whereby a gene's transcript is used as a template to generate retroposed gene copies, or retrocopies. Intriguingly, the formation of retrocopies depends upon the enzymatic machinery encoded by retrotransposable elements, genomic parasites occurring in the majority of eukaryotes. Most retrocopies are depleted of the regulatory regions found upstream of their parental genes; therefore, they were initially considered transcriptionally incompetent gene copies, or retropseudogenes. However, examples of functional retrocopies, or retrogenes, have accumulated since the 1980s. Here, we review what we have learned about retrocopies in animals, plants and other eukaryotic organisms, with a particular emphasis on comparative and population genomic analyses complemented with transcriptomic datasets. In addition, these data have provided information about the dynamics of the different "life cycle" stages of retrocopies (i.e., polymorphic retrocopy number variants, fixed retropseudogenes and retrogenes) and have provided key insights into the retroduplication mechanisms, the patterns and evolutionary forces at work during the fixation process and the biological function of retrogenes. Functional genomic and transcriptomic data have also revealed that many retropseudogenes are transcriptionally active and a biological role has been experimentally determined for many. Finally, we have learned that not only non-long terminal repeat retroelements but also long terminal repeat retroelements play a role in the emergence of retrocopies across eukaryotes. This body of work has shown that mRNA-mediated duplication represents a widespread phenomenon that produces an array of new genes that contribute to organismal diversity and adaptation.

Collapse

Xiao J, Sekhwal MK, Li P, Ragupathy R, Cloutier S, Wang X, You FM. Pseudogenes and Their Genome-Wide Prediction in Plants. Int J Mol Sci 2016;17:E1991. [PMID: 27916797 PMCID: PMC5187791 DOI: 10.3390/ijms17121991] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2016] [Revised: 11/20/2016] [Accepted: 11/22/2016] [Indexed: 11/17/2022] Open

Tine M. Evolutionary significance and diversification of the phosphoglucose isomerase genes in vertebrates. BMC Res Notes 2015;8:799. [PMID: 26682538 PMCID: PMC4684624 DOI: 10.1186/s13104-015-1683-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Accepted: 11/09/2015] [Indexed: 01/20/2024] Open

Abstract

Background

Phosphoglucose isomerase (PGI) genes are important multifunctional proteins whose evolution has, until now, not been well elucidated because of the limited number of completely sequenced genomes. Although the multifunctionality of this gene family has been considered as an original and innate characteristic, PGI genes may have acquired novel functions through changes in coding sequences and exon/intron structure, which are known to lead to functional divergence after gene duplication. A whole-genome comparative approach was used to estimate the rates of molecular evolution of this protein family.

Results

The results confirm the presence of two isoforms in teleost fishes and only one variant in all other vertebrates. Phylogenetic reconstructions grouped the PGI genes into five main groups: lungfishes/coelacanth/cartilaginous fishes, teleost fishes, amphibians, reptiles/birds and mammals, with the teleost group being subdivided into two subclades comprising PGI1 and PGI2. This PGI partitioning into groups is consistent with the synteny and molecular evolution results based on the estimation of the ratios of nonsynonymous to synonymous changes (Ka/Ks) and divergence rates between both PGI paralogs and orthologs. Teleost PGI2 shares more similarity with the variant found in all other vertebrates, suggesting that it has less evolved than PGI1 relative to the PGI of common vertebrate ancestor.

Conclusions

The diversification of PGI genes into PGI1 and PGI2 is consistent with a teleost-specific duplication before the radiation of this lineage, and after its split from the other infraclasses of ray-finned fishes. The low average Ka/Ks ratios within teleost and mammalian lineages suggest that both PGI1 and PGI2 are functionally constrained by purifying selection and may, therefore, have the same functions. By contrast, the high average Ka/Ks ratios and divergence rates within reptiles and birds indicate that PGI may be involved in different functions. The synteny analyses show that the genomic region harbouring PGI genes has independently undergone genomic rearrangements in mammals versus the reptile/bird lineage in particular, which may have contributed to the actual functional diversification of this gene family.

Electronic supplementary material

The online version of this article (doi:10.1186/s13104-015-1683-x) contains supplementary material, which is available to authorized users.

Collapse

Esposito F, De Martino M, Forzati F, Fusco A. HMGA1-pseudogene overexpression contributes to cancer progression. Cell Cycle 2015;13:3636-9. [PMID: 25483074 DOI: 10.4161/15384101.2014.974440] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open

Esposito F, De Martino M, Petti MG, Forzati F, Tornincasa M, Federico A, Arra C, Pierantoni GM, Fusco A. HMGA1 pseudogenes as candidate proto-oncogenic competitive endogenous RNAs. Oncotarget 2015;5:8341-54. [PMID: 25268743 PMCID: PMC4226687 DOI: 10.18632/oncotarget.2202] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Affiliation(s)

Francesco Esposito Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy
Marco De Martino Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy
Maria Grazia Petti Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy
Floriana Forzati Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy
Mara Tornincasa Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy
Antonella Federico Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy
Claudio Arra Istituto Nazionale dei Tumori, Fondazione Pascale, Naples, Italy
Giovanna Maria Pierantoni Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy
Alfredo Fusco Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy

Collapse

Comparative analysis of pseudogenes across three phyla. Proc Natl Acad Sci U S A 2014;111:13361-6. [PMID: 25157146 DOI: 10.1073/pnas.1407293111] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Roberts TC, Morris KV. Not so pseudo anymore: pseudogenes as therapeutic targets. Pharmacogenomics 2014;14:2023-34. [PMID: 24279857 DOI: 10.2217/pgs.13.172] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Harrison PM. Computational methods for pseudogene annotation based on sequence homology. Methods Mol Biol 2014;1167:27-39. [PMID: 24823769 DOI: 10.1007/978-1-4939-0835-6_3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Balakirev ES, Chechetkin VR, Lobzin VV, Ayala FJ. Computational methods of identification of pseudogenes based on functionality: entropy and GC content. Methods Mol Biol 2014;1167:41-62. [PMID: 24823770 DOI: 10.1007/978-1-4939-0835-6_4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Lu Y, Zhang Y, Hang X, Qu W, Lubec G, Chen C, Zhang C. Genome-wide computational identification of bicistronic mRNA in humans. Amino Acids 2012;44:597-606. [PMID: 22945903 DOI: 10.1007/s00726-012-1380-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2012] [Accepted: 07/26/2012] [Indexed: 11/30/2022]

Detecting transcription of ribosomal protein pseudogenes in diverse human tissues from RNA-seq data. BMC Genomics 2012;13:412. [PMID: 22908858 PMCID: PMC3478165 DOI: 10.1186/1471-2164-13-412] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2012] [Accepted: 08/10/2012] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Ribosomal proteins (RPs) have about 2000 pseudogenes in the human genome. While anecdotal reports for RP pseudogene transcription exists, it is unclear to what extent these pseudogenes are transcribed. The RP pseudogene transcription is difficult to identify in microarrays due to potential cross-hybridization between transcripts from the parent genes and pseudogenes. Recently, transcriptome sequencing (RNA-seq) provides an opportunity to ascertain the transcription of pseudogenes. A challenge for pseudogene expression discovery in RNA-seq data lies in the difficulty to uniquely identify reads mapped to pseudogene regions, which are typically also similar to the parent genes.

RESULTS

Here we developed a specialized pipeline for pseudogene transcription discovery. We first construct a "composite genome" that includes the entire human genome sequence as well as mRNA sequences of real ribosomal protein genes. We then map all sequence reads to the composite genome, and only exact matches were retained. Moreover, we restrict our analysis to strictly defined mappable regions and calculate the RPKM values as measurement of pseudogene transcription levels. We report evidences for the transcription of RP pseudogenes in 16 human tissues. By analyzing the Human Body Map 2.0 study RNA-sequencing data using our pipeline, we identified that one ribosomal protein (RP) pseudogene (PGOHUM-249508) is transcribed with RPKM 170 in thyroid. Moreover, three other RP pseudogenes are transcribed with RPKM > 10, a level similar to that of the normal RP genes, in white blood cell, kidney, and testes, respectively. Furthermore, an additional thirteen RP pseudogenes are of RPKM > 5, corresponding to the 20-30 percentile among all genes. Unlike ribosomal protein genes that are constitutively expressed in almost all tissues, RP pseudogenes are differentially expressed, suggesting that they may contribute to tissue-specific biological processes.

CONCLUSIONS

Using a specialized bioinformatics method, we identified the transcription of ribosomal protein pseudogenes in human tissues using RNA-seq data.

Collapse

Cardoso-Moreira M, Long M. The origin and evolution of new genes. Methods Mol Biol 2012;856:161-86. [PMID: 22399459 DOI: 10.1007/978-1-61779-585-5_7] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]

Han YJ, Ma SF, Yourek G, Park YD, Garcia JGN. A transcribed pseudogene of MYLK promotes cell proliferation. FASEB J 2011;25:2305-12. [PMID: 21441351 DOI: 10.1096/fj.10-177808] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]

Chen SM, Ma KY, Zeng J. Pseudogene: lessons from PCR bias, identification and resurrection. Mol Biol Rep 2010;38:3709-15. [PMID: 21116863 DOI: 10.1007/s11033-010-0485-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2010] [Accepted: 11/09/2010] [Indexed: 11/26/2022]

Pseudogene-mediated posttranscriptional silencing of HMGA1 can result in insulin resistance and type 2 diabetes. Nat Commun 2010;1:40. [PMID: 20975707 DOI: 10.1038/ncomms1040] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2010] [Accepted: 06/25/2010] [Indexed: 11/08/2022] Open

Hung MS, Lin YC, Mao JH, Kim IJ, Xu Z, Yang CT, Jablons DM, You L. Functional polymorphism of the CK2alpha intronless gene plays oncogenic roles in lung cancer. PLoS One 2010;5:e11418. [PMID: 20625391 PMCID: PMC2896393 DOI: 10.1371/journal.pone.0011418] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2009] [Accepted: 06/05/2010] [Indexed: 01/22/2023] Open

Evolvability and Speed of Evolutionary Algorithms in Light of Recent Developments in Biology. ACTA ACUST UNITED AC 2010. [DOI: 10.1155/2010/568375] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Harrison PM, Khachane A, Kumar M. Genomic assessment of the evolution of the prion protein gene family in vertebrates. Genomics 2010;95:268-77. [DOI: 10.1016/j.ygeno.2010.02.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2009] [Revised: 02/16/2010] [Accepted: 02/24/2010] [Indexed: 02/09/2023]

Liu YJ, Zheng D, Balasubramanian S, Carriero N, Khurana E, Robilotto R, Gerstein MB. Comprehensive analysis of the pseudogenes of glycolytic enzymes in vertebrates: the anomalously high number of GAPDH pseudogenes highlights a recent burst of retrotrans-positional activity. BMC Genomics 2009;10:480. [PMID: 19835609 PMCID: PMC2770531 DOI: 10.1186/1471-2164-10-480] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2009] [Accepted: 10/16/2009] [Indexed: 11/30/2022] Open

Abstract

BACKGROUND

Pseudogenes provide a record of the molecular evolution of genes. As glycolysis is such a highly conserved and fundamental metabolic pathway, the pseudogenes of glycolytic enzymes comprise a standardized genomic measuring stick and an ideal platform for studying molecular evolution. One of the glycolytic enzymes, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), has already been noted to have one of the largest numbers of associated pseudogenes, among all proteins.

RESULTS

We assembled the first comprehensive catalog of the processed and duplicated pseudogenes of glycolytic enzymes in many vertebrate model-organism genomes, including human, chimpanzee, mouse, rat, chicken, zebrafish, pufferfish, fruitfly, and worm (available at http://pseudogene.org/glycolysis/). We found that glycolytic pseudogenes are predominantly processed, i.e. retrotransposed from the mRNA of their parent genes. Although each glycolytic enzyme plays a unique role, GAPDH has by far the most pseudogenes, perhaps reflecting its large number of non-glycolytic functions or its possession of a particularly retrotranspositionally active sub-sequence. Furthermore, the number of GAPDH pseudogenes varies significantly among the genomes we studied: none in zebrafish, pufferfish, fruitfly, and worm, 1 in chicken, 50 in chimpanzee, 62 in human, 331 in mouse, and 364 in rat. Next, we developed a simple method of identifying conserved syntenic blocks (consistently applicable to the wide range of organisms in the study) by using orthologous genes as anchors delimiting a conserved block between a pair of genomes. This approach showed that few glycolytic pseudogenes are shared between primate and rodent lineages. Finally, by estimating pseudogene ages using Kimura's two-parameter model of nucleotide substitution, we found evidence for bursts of retrotranspositional activity approximately 42, 36, and 26 million years ago in the human, mouse, and rat lineages, respectively.

CONCLUSION

Overall, we performed a consistent analysis of one group of pseudogenes across multiple genomes, finding evidence that most of them were created within the last 50 million years, subsequent to the divergence of rodent and primate lineages.

Collapse

Khachane AN, Harrison PM. Strong association between pseudogenization mechanisms and gene sequence length. Biol Direct 2009;4:38. [PMID: 19807910 PMCID: PMC2768697 DOI: 10.1186/1745-6150-4-38] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2009] [Accepted: 10/06/2009] [Indexed: 11/20/2022] Open

Khachane AN, Harrison PM. Assessing the genomic evidence for conserved transcribed pseudogenes under selection. BMC Genomics 2009;10:435. [PMID: 19754956 PMCID: PMC2753554 DOI: 10.1186/1471-2164-10-435] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2009] [Accepted: 09/15/2009] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Transcribed pseudogenes are copies of protein-coding genes that have accumulated indicators of coding-sequence decay (such as frameshifts and premature stop codons), but nonetheless remain transcribed. Recent experimental evidence indicates that transcribed pseudogenes may regulate the expression of homologous genes, through antisense interference, or generation of small interfering RNAs (siRNAs). Here, we assessed the genomic evidence for such transcribed pseudogenes of potential functional importance, in the human genome. The most obvious indicators of such functional importance are significant evidence of conservation and selection pressure.

RESULTS

A variety of pseudogene annotations from multiple sources were pooled and filtered to obtain a subset of sequences that have significant mid-sequence disablements (frameshifts and premature stop codons), and that have clear evidence of full-length mRNA transcription. We found 1750 such transcribed pseudogene annotations (TPAs) in the human genome (corresponding to approximately 11.5% of human pseudogene annotations). We checked for syntenic conservation of TPAs in other mammals (rhesus monkey, mouse, rat, dog and cow). About half of the human TPAs are conserved in rhesus monkey, but strikingly, very few in mouse (approximately 3%). The TPAs conserved in rhesus monkey show evidence of selection pressure (relative to surrounding intergenic DNA) on: (i) their GC content, and (ii) their rate of nucleotide substitution. This is in spite of distributions of Ka/Ks (ratios of non-synonymous to synonymous substitution rates), congruent with a lack of protein-coding ability. Furthermore, we have identified 68 human TPAs that are syntenically conserved in at least two other mammals. Interestingly, we observe three TPA sequences conserved in dog that have intermediate character (i.e., evidence of both protein-coding ability and pseudogenicity), and discuss the implications of this.

CONCLUSION

Through evolutionary analysis, we have identified candidate sequences for functional human transcribed pseudogenes, and have pinpointed 68 strong candidates for further investigation as potentially functional transcribed pseudogenes across multiple mammal species.

Collapse

Zou C, Lehti-Shiu MD, Thibaud-Nissen F, Prakash T, Buell CR, Shiu SH. Evolutionary and expression signatures of pseudogenes in Arabidopsis and rice. PLANT PHYSIOLOGY 2009;151:3-15. [PMID: 19641029 PMCID: PMC2736005 DOI: 10.1104/pp.109.140632] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2009] [Accepted: 07/18/2009] [Indexed: 05/18/2023]

Morais DD, Harrison PM. Genomic evidence for non-random endemic populations of decaying exons from mammalian genes. BMC Genomics 2009;10:309. [PMID: 19594905 PMCID: PMC2718932 DOI: 10.1186/1471-2164-10-309] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2009] [Accepted: 07/13/2009] [Indexed: 11/13/2022] Open

Abstract

Background

Functional diversification of genes in mammalian genomes is engendered by a number of processes, e.g., gene duplication and alternative splicing. Gene duplication is classically discussed as leading to neofunctionalization (generation of new functions), subfunctionalization (generation of a varied function), or pseudogenization (loss of the gene and its function).

Results

Here, we focus on the process of pseudogenization, but specifically for individual exons from genes. It is at present unclear to what extent pseudogenization of individual exon duplications affects gene evolution, i.e., is it a random phenomenon, or is it associated with specific types of genes and encoded proteins, and positions in gene structures? We gathered genomic evidence for pseudogenic exons (ΨEs, i.e., exons disabled by frameshifts and premature stop codons), to examine for significant trends in their distribution across four mammalian genomes (specifically human, cow, mouse and rat). Across these four genomes, we observed a consistent population of ΨEs, associated with 0.4–1.0% of genes. These ΨE populations exhibit codon substitution patterns that are typical of an endemic population of decaying sequences. In human, ΨEs have significant over-representation for functional categories related to 'ion binding' and 'nucleic-acid binding', compared to duplicated exons in general. Also, ΨEs tend to be associated with some protein domains that are abundant generally, e.g., Zinc-finger and immunoglobulin protein domains, but not others, e.g., EGF-like domains. Positionally, ΨEs are also significantly associated with the 5' end of genes, but despite this, individual stop codons are positioned so that there is significant avoidance of potential targeting to nonsense-mediated decay. In human, ΨEs are often associated with alternative splicing (in 22 out of 284 genes with ΨEs in their milieu), and can have different parts of their sequence differentially spliced in alternative transcripts. Some unusual cases of ΨEs embedded within 5' and 3' non-coding exons are observed.

Conclusion

Our results indicate the types of genes that harbour ΨEs, and demonstrate that ΨEs have non-random distribution within gene structures. These ΨEs may function in gene regulation through generation of transcribed pseudogenes, or regulatory alternate transcripts.

Collapse

Identification of a new rice blast resistance gene, Pid3, by genomewide comparison of paired nucleotide-binding site--leucine-rich repeat genes and their pseudogene alleles between the two sequenced rice genomes. Genetics 2009;182:1303-11. [PMID: 19506306 DOI: 10.1534/genetics.109.102871] [Citation(s) in RCA: 152] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Kojima KK, Okada N. mRNA retrotransposition coupled with 5' inversion as a possible source of new genes. Mol Biol Evol 2009;26:1405-20. [PMID: 19289598 DOI: 10.1093/molbev/msp050] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Ortutay C, Vihinen M. PseudoGeneQuest - service for identification of different pseudogene types in the human genome. BMC Bioinformatics 2008;9:299. [PMID: 18597685 PMCID: PMC2453144 DOI: 10.1186/1471-2105-9-299] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2008] [Accepted: 07/02/2008] [Indexed: 01/29/2023] Open

Moon S, Cho S, Kim H. Organization and evolution of mitochondrial gene clusters in human. Genomics 2008;92:85-93. [PMID: 18559289 DOI: 10.1016/j.ygeno.2008.01.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2007] [Revised: 01/07/2008] [Accepted: 01/08/2008] [Indexed: 11/29/2022]

Comparative genomics search for losses of long-established genes on the human lineage. PLoS Comput Biol 2008;3:e247. [PMID: 18085818 PMCID: PMC2134963 DOI: 10.1371/journal.pcbi.0030247] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2007] [Accepted: 10/30/2007] [Indexed: 02/01/2023] Open

Abstract

Taking advantage of the complete genome sequences of several mammals, we developed a novel method to detect losses of well-established genes in the human genome through syntenic mapping of gene structures between the human, mouse, and dog genomes. Unlike most previous genomic methods for pseudogene identification, this analysis is able to differentiate losses of well-established genes from pseudogenes formed shortly after segmental duplication or generated via retrotransposition. Therefore, it enables us to find genes that were inactivated long after their birth, which were likely to have evolved nonredundant biological functions before being inactivated. The method was used to look for gene losses along the human lineage during the approximately 75 million years (My) since the common ancestor of primates and rodents (the euarchontoglire crown group). We identified 26 losses of well-established genes in the human genome that were all lost at least 50 My after their birth. Many of them were previously characterized pseudogenes in the human genome, such as GULO and UOX. Our methodology is highly effective at identifying losses of single-copy genes of ancient origin, allowing us to find a few well-known pseudogenes in the human genome missed by previous high-throughput genome-wide studies. In addition to confirming previously known gene losses, we identified 16 previously uncharacterized human pseudogenes that are definitive losses of long-established genes. Among them is ACYL3, an ancient enzyme present in archaea, bacteria, and eukaryotes, but lost approximately 6 to 8 Mya in the ancestor of humans and chimps. Although losses of well-established genes do not equate to adaptive gene losses, they are a useful proxy to use when searching for such genetic changes. This is especially true for adaptive losses that occurred more than 250,000 years ago, since any genetic evidence of the selective sweep indicative of such an event has been erased.

Collapse

Hu G, Yang Q, Cui X, Yue G, Azaro MA, Wang HY, Li H. A highly sensitive and specific system for large-scale gene expression profiling. BMC Genomics 2008;9:9. [PMID: 18186939 PMCID: PMC2267712 DOI: 10.1186/1471-2164-9-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2007] [Accepted: 01/10/2008] [Indexed: 12/02/2022] Open

Harrison P, Yu Z. Frame disruptions in human mRNA transcripts, and their relationship with splicing and protein structures. BMC Genomics 2007;8:371. [PMID: 17937804 PMCID: PMC2194788 DOI: 10.1186/1471-2164-8-371] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2007] [Accepted: 10/15/2007] [Indexed: 11/24/2022] Open

Yu Z, Morais D, Ivanga M, Harrison PM. Analysis of the role of retrotransposition in gene evolution in vertebrates. BMC Bioinformatics 2007;8:308. [PMID: 17718914 PMCID: PMC2048973 DOI: 10.1186/1471-2105-8-308] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2007] [Accepted: 08/24/2007] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The dynamics of gene evolution are influenced by several genomic processes. One such process is retrotransposition, where an mRNA transcript is reverse-transcribed and reintegrated into the genomic DNA.

RESULTS

We have surveyed eight vertebrate genomes (human, chimp, dog, cow, rat, mouse, chicken and the puffer-fish T. nigriviridis), for putatively retrotransposed copies of genes. To gain a complete picture of the role of retrotransposition, a robust strategy to identify putative retrogenes (PRs) was derived, in tandem with an adaptation of previous procedures to annotate processed pseudogenes, also called retropseudogenes (RpsiGs). Mammalian genomes are estimated to contain 400-800 PRs (corresponding to approximately 3% of genes), with fewer PRs and RpsiGs in the non-mammalian vertebrates. Focussing on human and mouse, we aged the PRs, analysed for evidence of transcription and selection pressures, and assigned functional categories. The PRs have significantly less transcription evidence mappable to them, are significantly less likely to arise from alternatively-spliced genes, and are statistically overrepresented for ribosomal-protein genes, when compared to the proteome in general. We find evidence for spurts of gene retrotransposition in human and mouse, since the lineage of either species split from the dog lineage, with >200 PRs formed in mouse since its divergence from rat. To examine for selection, we calculated: (i) Ka/Ks values (ratios of non-synonymous and synonymous substitutions in codons), and (ii) the significance of conservation of reading frames in PRs. We found >50 PRs in both human and mouse formed since divergence from dog, that are under pressure to maintain the integrity of their coding sequences. For different subsets of PRs formed at different stages of mammalian evolution, we find some evidence for non-neutral evolution, despite significantly less expression evidence for these sequences.

CONCLUSION

These results indicate that retrotranspositions are a significant source of novel coding sequences in mammalian gene evolution.

Collapse

Ruan Y, Ooi HS, Choo SW, Chiu KP, Zhao XD, Srinivasan K, Yao F, Choo CY, Liu J, Ariyaratne P, Bin WG, Kuznetsov VA, Shahab A, Sung WK, Bourque G, Palanisamy N, Wei CL. Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs). Genome Res 2007;17:828-38. [PMID: 17568001 PMCID: PMC1891342 DOI: 10.1101/gr.6018607] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Affiliation(s)

Yijun Ruan Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore Corresponding authors.E-mail ; fax 65-64789059.E-mail ; fax 65-64789059
Hong Sain Ooi Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
Siew Woh Choo Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
Kuo Ping Chiu Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
Xiao Dong Zhao Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
K.G. Srinivasan Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
Fei Yao Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
Chiou Yu Choo Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
Jun Liu Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
Pramila Ariyaratne Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
Wilson G.W. Bin Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
Vladimir A. Kuznetsov Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
Atif Shahab Bioinformatics Institute, Singapore 138671, Singapore
Wing-Kin Sung Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore School of Computing, National University of Singapore, Singapore 117543, Singapore
Guillaume Bourque Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
Nallasivam Palanisamy Cancer Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
Chia-Lin Wei Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore Corresponding authors.E-mail ; fax 65-64789059.E-mail ; fax 65-64789059

Collapse

Zheng D, Frankish A, Baertsch R, Kapranov P, Reymond A, Choo SW, Lu Y, Denoeud F, Antonarakis SE, Snyder M, Ruan Y, Wei CL, Gingeras TR, Guigó R, Harrow J, Gerstein MB. Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res 2007;17:839-51. [PMID: 17568002 PMCID: PMC1891343 DOI: 10.1101/gr.5586307] [Citation(s) in RCA: 152] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Abstract

Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction ( approximately 80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.

Collapse

Affiliation(s)

Deyou Zheng Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA Corresponding authors.E-mail ; fax (360) 838-7861.E-mail ; fax (360) 838-7861
Adam Frankish Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1HH, United Kingdom
Robert Baertsch Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California 95064, USA
Philipp Kapranov Affymetrix, Inc., Santa Clara, California 92024, USA
Alexandre Reymond Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
Siew Woh Choo Genome Institute of Singapore, Singapore 138672, Singapore
Yontao Lu Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California 95064, USA
France Denoeud Grup de Recerca en Informática Biomèdica, Institut Municipal d’Investigació Mèdica/Universitat Pompeu Fabra, Passeig Marítim de la Barceloneta, 37-49, 08003, Barcelona, Catalonia, Spain
Stylianos E. Antonarakis Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
Michael Snyder Molecular, Cellular & Developmental Biology Department, Yale University, New Haven, Connecticut 06520, USA
Yijun Ruan Genome Institute of Singapore, Singapore 138672, Singapore
Chia-Lin Wei Genome Institute of Singapore, Singapore 138672, Singapore
Thomas R. Gingeras Affymetrix, Inc., Santa Clara, California 92024, USA
Roderic Guigó Grup de Recerca en Informática Biomèdica, Institut Municipal d’Investigació Mèdica/Universitat Pompeu Fabra, Passeig Marítim de la Barceloneta, 37-49, 08003, Barcelona, Catalonia, Spain Center for Genomic Regulation, Passeig Marítim de la Barceloneta, 37-49, 08003, Barcelona, Catalonia, Spain
Jennifer Harrow Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1HH, United Kingdom
Mark B. Gerstein Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA Corresponding authors.E-mail ; fax (360) 838-7861.E-mail ; fax (360) 838-7861

Collapse

Tecle E, Zielinski L, Kass DH. Recent integrations of mammalian Hmg retropseudogenes. J Genet 2007;85:179-85. [PMID: 17406091 DOI: 10.1007/bf02935328] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Zheng D, Gerstein MB. The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they? Trends Genet 2007;23:219-24. [PMID: 17382428 DOI: 10.1016/j.tig.2007.03.003] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2006] [Revised: 02/06/2007] [Accepted: 03/09/2007] [Indexed: 10/23/2022]

Ortutay C, Siermala M, Vihinen M. Molecular characterization of the immune system: emergence of proteins, processes, and domains. Immunogenetics 2007;59:333-48. [PMID: 17294181 DOI: 10.1007/s00251-007-0191-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2006] [Accepted: 01/08/2007] [Indexed: 12/27/2022]

Prasanth KV, Spector DL. Eukaryotic regulatory RNAs: an answer to the 'genome complexity' conundrum. Genes Dev 2007;21:11-42. [PMID: 17210785 DOI: 10.1101/gad.1484207] [Citation(s) in RCA: 301] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]