1
|
Engineering Ribosomes to Alleviate Abiotic Stress in Plants: A Perspective. PLANTS 2022; 11:plants11162097. [PMID: 36015400 PMCID: PMC9415564 DOI: 10.3390/plants11162097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 08/10/2022] [Accepted: 08/10/2022] [Indexed: 11/16/2022]
Abstract
As the centerpiece of the biomass production process, ribosome activity is highly coordinated with environmental cues. Findings revealing ribosome subgroups responsive to adverse conditions suggest this tight coordination may be grounded in the induction of variant ribosome compositions and the differential translation outcomes they might produce. In this perspective, we go through the literature linking ribosome heterogeneity to plants’ abiotic stress response. Once unraveled, this crosstalk may serve as the foundation of novel strategies to custom cultivars tolerant to challenging environments without the yield penalty.
Collapse
|
2
|
Pseudogene MSTO2P Interacts with miR-128-3p to Regulate Coptisine Sensitivity of Non-Small-Cell Lung Cancer (NSCLC) through TGF-β Signaling and VEGFC. JOURNAL OF ONCOLOGY 2022; 2022:9864411. [PMID: 35794983 PMCID: PMC9251142 DOI: 10.1155/2022/9864411] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/06/2022] [Accepted: 06/08/2022] [Indexed: 12/02/2022]
Abstract
Background Coptisine has been widely used for treating a variety of cancer types. To date, whether pseudogene is implicated in coptisine resistance of NSCLC remains unknown. Methods We performed MTT to assess the cell viability of A549 and Calu-1 cells. The transwell assay was used to examine the invasion of cells. TUNEL was used to determine apoptosis. Results Our data showed that coptisine treatment suppressed cell viability and invasion of NSCLC cells while contributing to apoptosis. MiR-128-3p negatively regulated MSTO2P. miR-128-3p reverted MSTO2P knockdown-attenuated cell viability and invasion, as well as promoted cell apoptosis of A549 cells. Moreover, we identified TGF-β signaling and VEGFC as key downstream effectors for MSTO2P and miR-128-3p in A549 cells. MiR-128-3p mimic inhibited TGF-β pathway-associated genes (TGFBR1, Smad2, Smad5, and Smad9), whereas miR-128-3p inhibitor exerted opposite effect. MSTO2P knockdown led to attenuated expression levels of TGFBR1, Smad2, Smad5 and Smad9. VEGFC overexpression greatly rescued miR-128-3p-modulated cell viability, invasion, and apoptosis of A549 cells. Conclusion MSTO2P plays a role in coptisine therapy of NSCLC through miR-128-3p. The findings will advance our understanding of NSCLC treatment.
Collapse
|
3
|
Schultz JA, Hebert PDN. Do pseudogenes pose a problem for metabarcoding marine animal communities? Mol Ecol Resour 2022; 22:2897-2914. [PMID: 35700118 DOI: 10.1111/1755-0998.13667] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 06/01/2022] [Indexed: 11/30/2022]
Abstract
Because DNA metabarcoding typically employs sequence diversity among mitochondrial amplicons to estimate species composition, nuclear mitochondrial pseudogenes (NUMTs) can inflate diversity. This study quantifies the incidence and attributes of NUMTs derived from the 658 bp barcode region of cytochrome c oxidase I (COI) in 156 marine animal genomes. NUMTs were examined to ascertain if they could be recognized by their possession of indels or stop codons. In total, 309 NUMTs ≥ 150 bp were detected, with an average of 1.98 per species (range = 0-33) and a mean length of 391 bp ± 200 bp. Among this total, 75 (24.3 %) lacked indels or stop codons. NUMTs appear to pose the greatest interpretational risk when short (< 313 bp) amplicons are used, such as in eDNA studies, dietary analyses, or processed fish identification. Employing the standard amplicon length (313 bp) for marine metabarcoding, NUMTs could potentially inflate the OTU count by 21% above the true species count while also raising intraspecific variation at COI by 15%. However, when both amplicon length and position are considered, inflation in OTU counts and in barcode variation were just 9% and 10%, respectively, suggesting NUMTs will not seriously distort biodiversity assessments. There was a weak positive correlation between genome size and NUMT count but no variation among phyla or trophic groups. Until bioinformatic advances improve NUMT detection, the best defense involves targeting long amplicons and developing reference databases that include both mitochondrial sequences and their NUMT derivatives.
Collapse
Affiliation(s)
- Jessica A Schultz
- Department of Integrative Biology, University of Guelph, Guelph, ON, CANADA.,Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, CANADA
| | - Paul D N Hebert
- Department of Integrative Biology, University of Guelph, Guelph, ON, CANADA.,Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, CANADA
| |
Collapse
|
4
|
Oh J, Lee SG, Park C. PIC-Me: paralogs and isoforms classifier based on machine-learning approaches. BMC Bioinformatics 2021; 22:311. [PMID: 34674638 PMCID: PMC8529730 DOI: 10.1186/s12859-021-04229-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Accepted: 06/01/2021] [Indexed: 11/10/2022] Open
Abstract
Background Paralogs formed through gene duplication and isoforms formed through alternative splicing have been important processes for increasing protein diversity and maintaining cellular homeostasis. Despite their recognized importance and the advent of large-scale genomic and transcriptomic analyses, paradoxically, accurate annotations of all gene loci to allow the identification of paralogs and isoforms remain surprisingly incomplete. In particular, the global analysis of the transcriptome of a non-model organism for which there is no reference genome is especially challenging. Results To reliably discriminate between the paralogs and isoforms in RNA-seq data, we redefined the pre-existing sequence features (sequence similarity, inverse count of consecutive identical or non-identical blocks, and match-mismatch fraction) previously derived from full-length cDNAs and EST sequences and described newly discovered genomic and transcriptomic features (twilight zone of protein sequence alignment and expression level difference). In addition, the effectiveness and relevance of the proposed features were verified with two widely used support vector machine (SVM) and random forest (RF) models. From nine RNA-seq datasets, all AUC (area under the curve) scores of ROC (receiver operating characteristic) curves were over 0.9 in the RF model and significantly higher than those in the SVM model. Conclusions In this study, using an RF model with five proposed RNA-seq features, we implemented our method called Paralogs and Isoforms Classifier based on Machine-learning approaches (PIC-Me) and showed that it outperformed an existing method. Finally, we envision that our tool will be a valuable computational resource for the genomics community to help with gene annotation and will aid in comparative transcriptomics and evolutionary genomics studies, especially those on non-model organisms. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04229-x.
Collapse
Affiliation(s)
- Jooseong Oh
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea
| | - Sung-Gwon Lee
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea
| | - Chungoo Park
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea.
| |
Collapse
|
5
|
Garewal N, Goyal N, Pathania S, Kaur J, Singh K. Gauging the trends of pseudogenes in plants. Crit Rev Biotechnol 2021; 41:1114-1129. [PMID: 33993808 DOI: 10.1080/07388551.2021.1901648] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Pseudogenes, the debilitated parts of ancient genes, were previously scrapped off as junk or discarded genes with no functional significance. Pseudogenes have come under scrutiny for their functionality, since recent studies have unveiled their importance in the regulation of their corresponding parent genes and various biological mechanisms. Despite the enormous occurrence of pseudogenes in plants, the lack of experimental validation has contributed toward their unresolved roles in gene regulation. Contrarily, most of the studies associated with gene regulation have been mainly reported for humans, mice, and other mammalian genomes. Consequently, in order to present a cumulative report on plant-based pseudogenes research, an attempt has been made to assemble multiple studies presenting the pseudogene classification, the prediction and the determination of comparative accuracies of various computational pipelines, and recent trends in analyzing their biological functions, and regulatory mechanisms. This review represents the classical, as well as the recent advances on pseudogene identification and their potential roles in transcriptional regulation, which could possibly invigorate the quality of genome annotation, evolutionary analysis, and complexity surrounding the regulatory pathways in plants. Thus, when the ambiguous boundary girdling the pseudogenes eventually recedes on account of their explicit orchestration role, research in flora would no longer saunter compared to that on fauna.
Collapse
Affiliation(s)
- Naina Garewal
- Department of Biotechnology, Panjab University, Chandigarh, India
| | - Neetu Goyal
- Department of Biotechnology, Panjab University, Chandigarh, India
| | | | - Jagdeep Kaur
- Department of Biotechnology, Panjab University, Chandigarh, India
| | - Kashmir Singh
- Department of Biotechnology, Panjab University, Chandigarh, India
| |
Collapse
|
6
|
Abstract
The number of complete genome sequences explodes more and more with each passing year. Thus, methods for genome annotation need to be honed constantly to handle the deluge of information. Annotation of pseudogenes (i.e., gene copies that appear not to make a functional protein) in genomes is a persistent problem; here, we overview pseudogene annotation methods that are based on the detection of sequence homology in genomic DNA.
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Biology, McGill University, Montreal, QC, Canada.
| |
Collapse
|
7
|
Characterization and molecular evolution of claudin genes in the Pungitius sinensis. J Comp Physiol B 2020; 190:749-759. [PMID: 32778926 DOI: 10.1007/s00360-020-01301-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 07/20/2020] [Accepted: 08/04/2020] [Indexed: 10/23/2022]
Abstract
Claudins are a family of integrated membrane-bound proteins involving in paracellular tightness, barrier forming, ion permeability, and substrate selection at tight junctions of chordate epithelial and endothelial cells. Here, 39 putative claudin genes were identified in the Pungitius sinensis based on the high throughput RNA-seq. Conservative motif distribution in each group suggested functional relevance. Divergence of duplicated genes implied the species' adaptation to the environment. In addition, selective pressure analyses identified one site, which may accelerate functional divergence in this protein family. Pesticides cause environmental pollution and have a serious impact on aquatic organisms when entering the water. The expression pattern of most claudin genes was affected by organophosphorus pesticide, indicating that they may be involved in the immune regulation of organisms and the detoxification of xenobiotics. Protein-protein network analyses also exhibited 439 interactions, which implied the functional diversity. It will provide some references for the functional study on claudin genes.
Collapse
|
8
|
Pseudogene MSTO2P enhances hypoxia-induced osteosarcoma malignancy by upregulating PD-L1. Biochem Biophys Res Commun 2020; 530:673-679. [PMID: 32768186 DOI: 10.1016/j.bbrc.2020.07.113] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 07/22/2020] [Accepted: 07/23/2020] [Indexed: 11/20/2022]
Abstract
Hypoxia has been shown to be related to osteosarcoma development and progression. Pseudogene MSTO2P was reported to be dysregulated in hepatocellular carcinoma and lung cancer. However, the mechanism by which MSTO2P-modulated osteosarcoma remains unclear. MSTO2P and PD-L1 expression levels were examined by RT-qPCR and westernblot. Tumor cell invasion was determined by tranwell assay. EMT process was probed by determining E-cadherin and Vimentin levels. Soft agar assay was used to examine anchorage-independent growth of osteosarcoma cells. In vivo tumor growth was measured by xenografting tumor experiment. Hypoxia treatment promoted cell growth, invasion and EMT of osteosarcoma cells. MSTO2P knockdown led to attenuated cell growth, invasion and EMT of osteosarcoma cells under hypoxia condition. More interestingly, our data revealed that MSTO2P was positively associated with tumor growth in immunodeficient mice and human clinical tissues. PD-L1 was shown to act as a key effector for MSTO2P-regulated osteosarcoma progression under hypoxia condition. In conclusion, we unravel a novel mechanism for explaining MSTO2P-involved osteosarcoma progression under hypoxia condition, which will facilitate development of potential diagnostic and therapeutical strategies for osteosarcoma.
Collapse
|
9
|
Blommaert J, Riss S, Hecox-Lea B, Mark Welch DB, Stelzer CP. Small, but surprisingly repetitive genomes: transposon expansion and not polyploidy has driven a doubling in genome size in a metazoan species complex. BMC Genomics 2019; 20:466. [PMID: 31174483 PMCID: PMC6555955 DOI: 10.1186/s12864-019-5859-y] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 05/29/2019] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND The causes and consequences of genome size variation across Eukaryotes, which spans five orders of magnitude, have been hotly debated since before the advent of genome sequencing. Previous studies have mostly examined variation among larger taxonomic units (e.g., orders, or genera), while comparisons among closely related species are rare. Rotifers of the Brachionus plicatilis species complex exhibit a seven-fold variation in genome size and thus represent a unique opportunity to study such changes on a relatively short evolutionary timescale. Here, we sequenced and analysed the genomes of four species of this complex with nuclear DNA contents spanning 110-422 Mbp. To establish the likely mechanisms of genome size change, we analysed both sequencing read libraries and assemblies for signatures of polyploidy and repetitive element content. We also compared these genomes to that of B. calyciflorus, the closest relative with a sequenced genome (293 Mbp nuclear DNA content). RESULTS Despite the very large differences in genome size, we saw no evidence of ploidy level changes across the B. plicatilis complex. However, repetitive element content explained a large portion of genome size variation (at least 54%). The species with the largest genome, B. asplanchnoidis, has a strikingly high 44% repetitive element content, while the smaller B. plicatilis genomes contain between 14 and 25% repetitive elements. According to our analyses, the B. calyciflorus genome contains 39% repetitive elements, which is substantially higher than previously reported (21%), and suggests that high repetitive element load could be widespread in monogonont rotifers. CONCLUSIONS Even though the genome sizes of these species are at the low end of the metazoan spectrum, their genomes contain substantial amounts of repetitive elements. Polyploidy does not appear to play a role in genome size variations in these species, and these variations can be mostly explained by changes in repetitive element content. This contradicts the naïve expectation that small genomes are streamlined, or less complex, and that large variations in nuclear DNA content between closely related species are due to polyploidy.
Collapse
Affiliation(s)
- J. Blommaert
- Research Department for Limnology, University of Innsbruck, Mondsee, Austria
| | - S. Riss
- Research Department for Limnology, University of Innsbruck, Mondsee, Austria
| | - B. Hecox-Lea
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA USA
| | - D. B. Mark Welch
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA USA
| | - C. P. Stelzer
- Research Department for Limnology, University of Innsbruck, Mondsee, Austria
| |
Collapse
|
10
|
Ershov NI, Mordvinov VA, Prokhortchouk EB, Pakharukova MY, Gunbin KV, Ustyantsev K, Genaev MA, Blinov AG, Mazur A, Boulygina E, Tsygankova S, Khrameeva E, Chekanov N, Fan G, Xiao A, Zhang H, Xu X, Yang H, Solovyev V, Lee SMY, Liu X, Afonnikov DA, Skryabin KG. New insights from Opisthorchis felineus genome: update on genomics of the epidemiologically important liver flukes. BMC Genomics 2019; 20:399. [PMID: 31117933 PMCID: PMC6530080 DOI: 10.1186/s12864-019-5752-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 04/29/2019] [Indexed: 01/25/2023] Open
Abstract
Background The three epidemiologically important Opisthorchiidae liver flukes Opisthorchis felineus, O. viverrini, and Clonorchis sinensis, are believed to harbour similar potencies to provoke hepatobiliary diseases in their definitive hosts, although their populations have substantially different ecogeographical aspects including habitat, preferred hosts, population structure. Lack of O. felineus genomic data is an obstacle to the development of comparative molecular biological approaches necessary to obtain new knowledge about the biology of Opisthorchiidae trematodes, to identify essential pathways linked to parasite-host interaction, to predict genes that contribute to liver fluke pathogenesis and for the effective prevention and control of the disease. Results Here we present the first draft genome assembly of O. felineus and its gene repertoire accompanied by a comparative analysis with that of O. viverrini and Clonorchis sinensis. We observed both noticeably high heterozygosity of the sequenced individual and substantial genetic diversity in a pooled sample. This indicates that potency of O. felineus population for rapid adaptive response to control and preventive measures of opisthorchiasis is higher than in O. viverrini and C. sinensis. We also have found that all three species are characterized by more intensive involvement of trans-splicing in RNA processing compared to other trematodes. Conclusion All revealed peculiarities of structural organization of genomes are of extreme importance for a proper description of genes and their products in these parasitic species. This should be taken into account both in academic and applied research of epidemiologically important liver flukes. Further comparative genomics studies of liver flukes and non-carcinogenic flatworms allow for generation of well-grounded hypotheses on the mechanisms underlying development of cholangiocarcinoma associated with opisthorchiasis and clonorchiasis as well as species-specific mechanisms of these diseases. Electronic supplementary material The online version of this article (10.1186/s12864-019-5752-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nikita I Ershov
- Institute of Cytology and Genetics SB RAS, 10 Lavrentiev Ave, Novosibirsk, 630090, Russia.
| | | | - Egor B Prokhortchouk
- Russian Federal Research Center for Biotechnology, 33/2 Leninsky prospect, Moscow, 119071, Russia. .,ZAO Genoanalytica, 1 Leninskie Gory street, Moscow, 119234, Russia.
| | - Mariya Y Pakharukova
- Institute of Cytology and Genetics SB RAS, 10 Lavrentiev Ave, Novosibirsk, 630090, Russia.,Novosibirsk State University, 2 Pirogova Str, Novosibirsk, 630090, Russia
| | - Konstantin V Gunbin
- Institute of Cytology and Genetics SB RAS, 10 Lavrentiev Ave, Novosibirsk, 630090, Russia
| | - Kirill Ustyantsev
- Institute of Cytology and Genetics SB RAS, 10 Lavrentiev Ave, Novosibirsk, 630090, Russia
| | - Mikhail A Genaev
- Institute of Cytology and Genetics SB RAS, 10 Lavrentiev Ave, Novosibirsk, 630090, Russia
| | - Alexander G Blinov
- Institute of Cytology and Genetics SB RAS, 10 Lavrentiev Ave, Novosibirsk, 630090, Russia
| | - Alexander Mazur
- Russian Federal Research Center for Biotechnology, 33/2 Leninsky prospect, Moscow, 119071, Russia
| | | | | | | | - Nikolay Chekanov
- Russian Federal Research Center for Biotechnology, 33/2 Leninsky prospect, Moscow, 119071, Russia
| | - Guangyi Fan
- BGI-Shenzhen, 11 Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China.,State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macao, China
| | - An Xiao
- BGI-Shenzhen, 11 Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
| | - He Zhang
- BGI-Shenzhen, 11 Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
| | - Xun Xu
- BGI-Shenzhen, 11 Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
| | - Huanming Yang
- BGI-Shenzhen, 11 Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
| | - Victor Solovyev
- Softberry Inc., 116 Radio Circle, Suite 400, Mount Kisco, NY, 10549, USA
| | - Simon Ming-Yuen Lee
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macao, China
| | - Xin Liu
- BGI-Shenzhen, 11 Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
| | - Dmitry A Afonnikov
- Institute of Cytology and Genetics SB RAS, 10 Lavrentiev Ave, Novosibirsk, 630090, Russia.,Novosibirsk State University, 2 Pirogova Str, Novosibirsk, 630090, Russia
| | - Konstantin G Skryabin
- Russian Federal Research Center for Biotechnology, 33/2 Leninsky prospect, Moscow, 119071, Russia.,Federal Research Center Kurchatov Institute, Moscow, Russia
| |
Collapse
|
11
|
Zhao X, Hao S, Wang M, Xing D, Wang C. Knockdown of pseudogene DUXAP8 expression in glioma suppresses tumor cell proliferation. Oncol Lett 2019; 17:3511-3516. [PMID: 30867791 DOI: 10.3892/ol.2019.9994] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2018] [Accepted: 12/03/2018] [Indexed: 12/17/2022] Open
Abstract
A large number of pseudogenes as well as long non-coding RNAs (lncRNAs) have been identified as important regulators in human tumors. However, the clinical role and potential functional effects of the double homeobox A pseudogene 8 (DUXAP8) in glioma remains unknown. In the present study, it was revealed that pseudogene DUXAP8 is significantly upregulated in glioma tissues, compared with adjacent normal tissues. Patients with increased DUXAP8 expression were associated with higher Karnofsky Performance Status, advanced World Health Organization grade, poor disease-free survival and overall survival rates of patients with glioma. Furthermore, in vitro assays, Cell-Counting Kit-8 cell viability and cell colony forming assays demonstrated that reduced DUXAP8 expression significantly suppressed proliferation capacity. Therefore, the results of the present study indicate that pseudogene DUXAP8 is an oncogenic lncRNA and may serve as a potentially prognostic biomarker and novel target of glioma treatment.
Collapse
Affiliation(s)
- Xu Zhao
- Department of Neurosurgery, The Second Hospital of Shandong University, Jinan, Shandong 250033, P.R. China
| | - Shuai Hao
- Department of Neurosurgery, People's Hospital of Juye County, Juye, Shandong 274900, P.R. China
| | - Minqing Wang
- Department of Neurosurgery, The Second Hospital of Shandong University, Jinan, Shandong 250033, P.R. China
| | - Deguang Xing
- Department of Neurosurgery, The Second Hospital of Shandong University, Jinan, Shandong 250033, P.R. China
| | - Chengwei Wang
- Department of Neurosurgery, The Second Hospital of Shandong University, Jinan, Shandong 250033, P.R. China
| |
Collapse
|
12
|
Emadi-Baygi M, Sedighi R, Nourbakhsh N, Nikpour P. Pseudogenes in gastric cancer pathogenesis: a review article. Brief Funct Genomics 2018; 16:348-360. [PMID: 28459995 DOI: 10.1093/bfgp/elx004] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Cancer burden rises globally at an alarming pace. According to GLOBOCAN 2012, gastric cancer (GC) is regarded as the fifth most common malignancy in the world. Being twice as high in men as in women, GC is the third leading cause of cancer mortality in both sexes globally. Being labeled as 'junk DNA', pseudogenes were considered as nonfunctional 'trash', which contribute nothing to survival of the organism; therefore, a number of strategies have been developed to circumvent their accidental detection. Recent progresses have confirmed that pseudogenes can have broad and multifaceted spectrum of activities in human cancers in general and GC in particular. Furthermore, the mentioned functions are parental gene-dependent and/or -independent. Therefore, pseudogenes can be regarded as the emerging class of elaborate modulators of gene expression involved in pathogenesis of human cancers including gastric adenocarcinoma.
Collapse
|
13
|
Wei Y, Chang Z, Wu C, Zhu Y, Li K, Xu Y. Identification of potential cancer-related pseudogenes in lung adenocarcinoma based on ceRNA hypothesis. Oncotarget 2017; 8:59036-59047. [PMID: 28938616 PMCID: PMC5601712 DOI: 10.18632/oncotarget.19933] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 07/26/2017] [Indexed: 01/01/2023] Open
Abstract
Pseudogenes are initially regarded as non-functional genomic fossils resulted from inactivating gene mutations during evolution. Far from being silent, pseudogenes are proved to regulate the expression of protein-coding genes through function as microRNA sponge in vivo. The aim of our study was to propose an integrative systems biology approach to identify disease pseudogenes base on competitive endogenous RNA (ceRNA) hypothesis. Here, we applied our method to lung adenocarcinoma (LUAD) RNASeq data from TCGA and identified 33 candidate pseudogenes. We described the characteristics of the candidate pseudogenes and performed functional enrichment. Through analyzing neighboring genes we found these pseudogenes were surrounded by tumor genes and may involve in tumor pathway. Furthermore, the DNA methylation analysis indicated that 21 pseudogenes co-methylated with their competitive mRNAs. In the co-methylated network, we discovered 6 differentially expressed pseudogenes, which we termed potential LUAD-associated pseudogenes. We further revealed that the 3 ceRNA triples (miR-21-5p-NKAPP1-PRDM11, miR-29c-3p-MSTO2P-EZH2 and miR-29c-3p-RPLP0P2-EZH2), whose high risk groups were associated with the poor prognosis of LUAD, may be considered as potential prognostic signatures. Moreover, by integrating target information of microRNA we also provided a new perspective for the discovery of potential small molecule drugs. This work may facilitate cancer research and serve as the basis for future efforts to understand the role of pseudogenes, develop novel biomarkers and improve knowledge of tumor biology.
Collapse
Affiliation(s)
- Yunzhen Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Zhiqiang Chang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Cheng Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yinling Zhu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Kun Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
14
|
Casola C, Betrán E. The Genomic Impact of Gene Retrocopies: What Have We Learned from Comparative Genomics, Population Genomics, and Transcriptomic Analyses? Genome Biol Evol 2017; 9:1351-1373. [PMID: 28605529 PMCID: PMC5470649 DOI: 10.1093/gbe/evx081] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/18/2017] [Indexed: 02/07/2023] Open
Abstract
Gene duplication is a major driver of organismal evolution. Gene retroposition is a mechanism of gene duplication whereby a gene's transcript is used as a template to generate retroposed gene copies, or retrocopies. Intriguingly, the formation of retrocopies depends upon the enzymatic machinery encoded by retrotransposable elements, genomic parasites occurring in the majority of eukaryotes. Most retrocopies are depleted of the regulatory regions found upstream of their parental genes; therefore, they were initially considered transcriptionally incompetent gene copies, or retropseudogenes. However, examples of functional retrocopies, or retrogenes, have accumulated since the 1980s. Here, we review what we have learned about retrocopies in animals, plants and other eukaryotic organisms, with a particular emphasis on comparative and population genomic analyses complemented with transcriptomic datasets. In addition, these data have provided information about the dynamics of the different "life cycle" stages of retrocopies (i.e., polymorphic retrocopy number variants, fixed retropseudogenes and retrogenes) and have provided key insights into the retroduplication mechanisms, the patterns and evolutionary forces at work during the fixation process and the biological function of retrogenes. Functional genomic and transcriptomic data have also revealed that many retropseudogenes are transcriptionally active and a biological role has been experimentally determined for many. Finally, we have learned that not only non-long terminal repeat retroelements but also long terminal repeat retroelements play a role in the emergence of retrocopies across eukaryotes. This body of work has shown that mRNA-mediated duplication represents a widespread phenomenon that produces an array of new genes that contribute to organismal diversity and adaptation.
Collapse
Affiliation(s)
- Claudio Casola
- Department of Ecosystem Science and Management, Texas A&M University, TX
| | - Esther Betrán
- Department of Biology, University of Texas at Arlington, Arlington, TX
| |
Collapse
|
15
|
Xiao J, Sekhwal MK, Li P, Ragupathy R, Cloutier S, Wang X, You FM. Pseudogenes and Their Genome-Wide Prediction in Plants. Int J Mol Sci 2016; 17:E1991. [PMID: 27916797 PMCID: PMC5187791 DOI: 10.3390/ijms17121991] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2016] [Revised: 11/20/2016] [Accepted: 11/22/2016] [Indexed: 11/17/2022] Open
Abstract
Pseudogenes are paralogs generated from ancestral functional genes (parents) during genome evolution, which contain critical defects in their sequences, such as lacking a promoter, having a premature stop codon or frameshift mutations. Generally, pseudogenes are functionless, but recent evidence demonstrates that some of them have potential roles in regulation. The majority of pseudogenes are generated from functional progenitor genes either by gene duplication (duplicated pseudogenes) or retro-transposition (processed pseudogenes). Pseudogenes are primarily identified by comparison to their parent genes. Bioinformatics tools for pseudogene prediction have been developed, among which PseudoPipe, PSF and Shiu's pipeline are publicly available. We compared these three tools using the well-annotated Arabidopsis thaliana genome and its known 924 pseudogenes as a test data set. PseudoPipe and Shiu's pipeline identified ~80% of A. thaliana pseudogenes, of which 94% were shared, while PSF failed to generate adequate results. A need for improvement of the bioinformatics tools for pseudogene prediction accuracy in plant genomes was thus identified, with the ultimate goal of improving the quality of genome annotation in plants.
Collapse
Affiliation(s)
- Jin Xiao
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
- Department of Agronomy, Nanjing Agricultural University, Nanjing 210095, China.
| | - Manoj Kumar Sekhwal
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
- Department of Soil Science, University of Saskatchewan, Saskatoon, SK S7N 5A8, Canada.
| | - Pingchuan Li
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
| | - Raja Ragupathy
- Department of Plant Science, University of Saskatchewan, Saskatoon, SK S7N 5A2, Canada.
| | - Sylvie Cloutier
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0C6, Canada.
| | - Xiue Wang
- Department of Agronomy, Nanjing Agricultural University, Nanjing 210095, China.
| | - Frank M You
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
| |
Collapse
|
16
|
Tine M. Evolutionary significance and diversification of the phosphoglucose isomerase genes in vertebrates. BMC Res Notes 2015; 8:799. [PMID: 26682538 PMCID: PMC4684624 DOI: 10.1186/s13104-015-1683-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Accepted: 11/09/2015] [Indexed: 01/20/2024] Open
Abstract
Background Phosphoglucose isomerase (PGI) genes are important multifunctional proteins whose evolution has, until now, not been well elucidated because of the limited number of completely sequenced genomes. Although the multifunctionality of this gene family has been considered as an original and innate characteristic, PGI genes may have acquired novel functions through changes in coding sequences and exon/intron structure, which are known to lead to functional divergence after gene duplication. A whole-genome comparative approach was used to estimate the rates of molecular evolution of this protein family. Results The results confirm the presence of two isoforms in teleost fishes and only one variant in all other vertebrates. Phylogenetic reconstructions grouped the PGI genes into five main groups: lungfishes/coelacanth/cartilaginous fishes, teleost fishes, amphibians, reptiles/birds and mammals, with the teleost group being subdivided into two subclades comprising PGI1 and PGI2. This PGI partitioning into groups is consistent with the synteny and molecular evolution results based on the estimation of the ratios of nonsynonymous to synonymous changes (Ka/Ks) and divergence rates between both PGI paralogs and orthologs. Teleost PGI2 shares more similarity with the variant found in all other vertebrates, suggesting that it has less evolved than PGI1 relative to the PGI of common vertebrate ancestor. Conclusions The diversification of PGI genes into PGI1 and PGI2 is consistent with a teleost-specific duplication before the radiation of this lineage, and after its split from the other infraclasses of ray-finned fishes. The low average Ka/Ks ratios within teleost and mammalian lineages suggest that both PGI1 and PGI2 are functionally constrained by purifying selection and may, therefore, have the same functions. By contrast, the high average Ka/Ks ratios and divergence rates within reptiles and birds indicate that PGI may be involved in different functions. The synteny analyses show that the genomic region harbouring PGI genes has independently undergone genomic rearrangements in mammals versus the reptile/bird lineage in particular, which may have contributed to the actual functional diversification of this gene family. Electronic supplementary material The online version of this article (doi:10.1186/s13104-015-1683-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mbaye Tine
- Molecular Zoology Laboratory, Department of Zoology, University of Johannesburg, Auckland Park, 2006, South Africa. .,Genome Centre Cologne at MPI for Plant Breeding Research, 22 Carl-von-Linné-Weg 10, 50829, Cologne, Germany.
| |
Collapse
|
17
|
Esposito F, De Martino M, Forzati F, Fusco A. HMGA1-pseudogene overexpression contributes to cancer progression. Cell Cycle 2015; 13:3636-9. [PMID: 25483074 DOI: 10.4161/15384101.2014.974440] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Two pseudogenes for HMGA1, whose overexpression has a critical role in cancer progression, have been identified. They act as decoy for miRNAs that are able to target the HMGA1 gene then enhancing cell proliferation and migration. Moreover, these pseudogenes contain sequences that are potential target sites for cancer-related miRNAs. Interestingly, HMGA1 pseudogenes are highly expressed in human anaplastic thyroid carcinomas, that is one of the most aggressive tumor in mankind, but almost undetectable in well differentiated thyroid carcinomas.
Collapse
Affiliation(s)
- Francesco Esposito
- a Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli ; Università degli Studi di Napoli "Federico II," ; Naples , Italy
| | | | | | | |
Collapse
|
18
|
Esposito F, De Martino M, Petti MG, Forzati F, Tornincasa M, Federico A, Arra C, Pierantoni GM, Fusco A. HMGA1 pseudogenes as candidate proto-oncogenic competitive endogenous RNAs. Oncotarget 2015; 5:8341-54. [PMID: 25268743 PMCID: PMC4226687 DOI: 10.18632/oncotarget.2202] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The High Mobility Group A (HMGA) are nuclear proteins that participate in the organization of nucleoprotein complexes involved in chromatin structure, replication and gene transcription. HMGA overexpression is a feature of human cancer and plays a causal role in cell transformation. Since non-coding RNAs and pseudogenes are now recognized to be important in physiology and disease, we investigated HMGA1 pseudogenes in cancer settings using bioinformatics analysis. Here we report the identification and characterization of two HMGA1 non-coding pseudogenes, HMGA1P6 and HMGA1P7. We show that their overexpression increases the levels of HMGA1 and other cancer-related proteins by inhibiting the suppression of their synthesis mediated by microRNAs. Consistently, embryonic fibroblasts from HMGA1P7-overexpressing transgenic mice displayed a higher growth rate and reduced susceptibility to senescence. Moreover, HMGA1P6 and HMGA1P7 were overexpressed in human anaplastic thyroid carcinomas, which are highly aggressive, but not in differentiated papillary carcinomas, which are less aggressive. Lastly, the expression of the HMGA1 pseudogenes was significantly correlated with HMGA1 protein levels thereby implicating HMGA1P overexpression in cancer progression. In conclusion, HMGA1P6 and HMGA1P7 are potential proto-oncogenic competitive endogenous RNAs.
Collapse
Affiliation(s)
- Francesco Esposito
- Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy
| | - Marco De Martino
- Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy
| | - Maria Grazia Petti
- Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy
| | - Floriana Forzati
- Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy
| | - Mara Tornincasa
- Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy
| | - Antonella Federico
- Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy
| | - Claudio Arra
- Istituto Nazionale dei Tumori, Fondazione Pascale, Naples, Italy
| | - Giovanna Maria Pierantoni
- Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy
| | - Alfredo Fusco
- Istituto di Endocrinologia ed Oncologia Sperimentale del CNR c/o Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Scuola di Medicina e Chirurgia di Napoli, Università degli Studi di Napoli "Federico II", Naples, Italy
| |
Collapse
|
19
|
Abstract
Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organism's genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (∼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.
Collapse
|
20
|
Roberts TC, Morris KV. Not so pseudo anymore: pseudogenes as therapeutic targets. Pharmacogenomics 2014; 14:2023-34. [PMID: 24279857 DOI: 10.2217/pgs.13.172] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Pseudogenes are junk DNA gene remnants generated by inactivating mutations or the loss of regulatory sequences, often following gene duplication or retrotransposition events. These pseudogenes have previously been considered to be molecular fossils derived from once-coding genes. In many cases, pseudogenes confer no observable selective advantage to the host organism and may be on a path towards removal from the genome. However, pseudogenes can also serve as raw material for the exaptation of novel functions, particularly in relation to the regulation of gene expression. Many pseudogenes are resurrected as noncoding RNA genes, which function in RNA-based gene regulatory circuits. As such, functional pseudogenes might simply be considered as 'genes'. Here, we discuss the role of these pseudogene-derived RNAs as regulators of gene expression in the context of human disease. In particular, we consider the manipulation of pseudogene transcripts through the use of antisense oligonucleotides, siRNAs, aptamers or classical gene therapy approaches as novel pharmacological strategies.
Collapse
Affiliation(s)
- Thomas C Roberts
- Department of Molecular & Experimental Medicine, The Scripps Research Institute, 10550 N Torrey Pines Road, La Jolla, CA 92037, USA
| | | |
Collapse
|
21
|
Abstract
The number of complete genome sequences explodes more and more with each passing year. Thus, methods for genome annotation need to be honed constantly to handle the deluge of information. Annotation of pseudogenes (i.e., gene copies that appear not to make a functional protein) in genomes is a persistent problem; here, we overview pseudogene annotation methods that are based on the detection of sequence homology in genomic DNA.
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Biology, McGill University, Stewart Biology Building, 1205 Doctor Penfield Avenue, Montreal, QC, Canada, H3A 1B1,
| |
Collapse
|
22
|
Balakirev ES, Chechetkin VR, Lobzin VV, Ayala FJ. Computational methods of identification of pseudogenes based on functionality: entropy and GC content. Methods Mol Biol 2014; 1167:41-62. [PMID: 24823770 DOI: 10.1007/978-1-4939-0835-6_4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Spectral entropy and GC content analyses reveal comprehensive structural features of DNA sequences. To illustrate the significance of these features, we analyze the β-esterase gene cluster, including the Est-6 gene and the ψEst-6 putative pseudogene, in seven species of the Drosophila melanogaster subgroup. The spectral entropies show distinctly lower structural ordering for ψEst-6 than for Est-6 in all species studied. However, entropy accumulation is not a completely random process for either gene and it shows to be nucleotide dependent. Furthermore, GC content in synonymous positions is uniformly higher in Est-6 than in ψEst-6, in agreement with the reduced GC content generally observed in pseudogenes and nonfunctional sequences. The observed differences in entropy and GC content reflect an evolutionary shift associated with the process of pseudogenization and subsequent functional divergence of ψEst-6 and Est-6 after the duplication event. The data obtained show the relevance and significance of entropy and GC content analyses for pseudogene identification and for the comparative study of gene-pseudogene evolution.
Collapse
Affiliation(s)
- Evgeniy S Balakirev
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, USA,
| | | | | | | |
Collapse
|
23
|
Lu Y, Zhang Y, Hang X, Qu W, Lubec G, Chen C, Zhang C. Genome-wide computational identification of bicistronic mRNA in humans. Amino Acids 2012; 44:597-606. [PMID: 22945903 DOI: 10.1007/s00726-012-1380-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2012] [Accepted: 07/26/2012] [Indexed: 11/30/2022]
Abstract
Mammalian bicistronic mRNA is a recently discovered mammalian gene structure. Several reported cases of mammalian bicistronic mRNA indicated that genes of this structure play roles in some important biological processes. However, a genome-wide computational identification of bicistronic mRNA in mammalian genome, such as human genome, is still lacking. Here we used a comparative genomics approach to identify the frequency of human bicistronic mRNA. We then validated the result by using a new support vector machine (SVM) model. We identified 43 human bicistronic mRNAs in 30 distinct genes. Our literature analysis shows that our method recovered 100 % (6/6) of the previously known bicistronic mRNAs which had been experimentally confirmed by other groups. Our graph theory-based analysis and GO analysis indicated that human bicistronic mRNAs are prone to produce different yet closely functionally related proteins. In addition, we also described and analyzed three different mechanisms of ORF fusion. Our method of identifying bicistronic mRNAs in human genome provides a model for the computational identification of characteristic gene structures in mammalian genomes. We anticipate that our data will facilitate further molecular characterization and functional study of human bicistronic mRNA.
Collapse
Affiliation(s)
- Yiming Lu
- Beijing Institute of Radiation Medicine, State Key Laboratory of Proteomics, Cognitive and Mental Health Research Center, Beijing 100850, China
| | | | | | | | | | | | | |
Collapse
|
24
|
Detecting transcription of ribosomal protein pseudogenes in diverse human tissues from RNA-seq data. BMC Genomics 2012; 13:412. [PMID: 22908858 PMCID: PMC3478165 DOI: 10.1186/1471-2164-13-412] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2012] [Accepted: 08/10/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ribosomal proteins (RPs) have about 2000 pseudogenes in the human genome. While anecdotal reports for RP pseudogene transcription exists, it is unclear to what extent these pseudogenes are transcribed. The RP pseudogene transcription is difficult to identify in microarrays due to potential cross-hybridization between transcripts from the parent genes and pseudogenes. Recently, transcriptome sequencing (RNA-seq) provides an opportunity to ascertain the transcription of pseudogenes. A challenge for pseudogene expression discovery in RNA-seq data lies in the difficulty to uniquely identify reads mapped to pseudogene regions, which are typically also similar to the parent genes. RESULTS Here we developed a specialized pipeline for pseudogene transcription discovery. We first construct a "composite genome" that includes the entire human genome sequence as well as mRNA sequences of real ribosomal protein genes. We then map all sequence reads to the composite genome, and only exact matches were retained. Moreover, we restrict our analysis to strictly defined mappable regions and calculate the RPKM values as measurement of pseudogene transcription levels. We report evidences for the transcription of RP pseudogenes in 16 human tissues. By analyzing the Human Body Map 2.0 study RNA-sequencing data using our pipeline, we identified that one ribosomal protein (RP) pseudogene (PGOHUM-249508) is transcribed with RPKM 170 in thyroid. Moreover, three other RP pseudogenes are transcribed with RPKM > 10, a level similar to that of the normal RP genes, in white blood cell, kidney, and testes, respectively. Furthermore, an additional thirteen RP pseudogenes are of RPKM > 5, corresponding to the 20-30 percentile among all genes. Unlike ribosomal protein genes that are constitutively expressed in almost all tissues, RP pseudogenes are differentially expressed, suggesting that they may contribute to tissue-specific biological processes. CONCLUSIONS Using a specialized bioinformatics method, we identified the transcription of ribosomal protein pseudogenes in human tissues using RNA-seq data.
Collapse
|
25
|
Abstract
New genes are a major source of genetic innovation in genomes. However, until recently, understanding how new genes originate and how they evolve was hampered by the lack of appropriate genetic datasets. The advent of the genomic era brought about a revolution in the amount of data available to study new genes. For the first time, decades-old theoretical principles could be tested empirically and novel and unexpected avenues of research opened up. This chapter explores how genomic data can and is being used to study both the origin and evolution of new genes and the surprising discoveries made thus far.
Collapse
|
26
|
Han YJ, Ma SF, Yourek G, Park YD, Garcia JGN. A transcribed pseudogene of MYLK promotes cell proliferation. FASEB J 2011; 25:2305-12. [PMID: 21441351 DOI: 10.1096/fj.10-177808] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Pseudogenes are considered nonfunctional genomic artifacts of catastrophic pathways. Recent evidence, however, indicates novel roles for pseudogenes as regulators of gene expression. We tested the functionality of myosin light chain kinase pseudogene (MYLKP1) in human cells and tissues by RT-PCR, promoter activity, and cell proliferation assays. MYLKP1 is partially duplicated from the original MYLK gene that encodes nonmuscle and smooth muscle myosin light chain kinase (smMLCK) isoforms and regulates cell contractility and cytokinesis. Despite strong homology with the smMLCK promoter (∼ 89.9%), the MYLKP1 promoter is minimally active in normal bronchial epithelial cells but highly active in lung adenocarcinoma cells. Moreover, MYLKP1 and smMLCK exhibit negatively correlated transcriptional patterns in normal and cancer cells with MYLKP1 strongly expressed in cancer cells and smMLCK highly expressed in non-neoplastic cells. For instance, expression of smMLCK decreased (19.5 ± 4.7 fold) in colon carcinoma tissues compared to normal colon tissues. Mechanistically, MYLKP1 overexpression inhibits smMLCK expression in cancer cells by decreasing RNA stability, leading to increased cell proliferation. These studies provide strong evidence for the functional involvement of pseudogenes in carcinogenesis and suggest MYLKP1 as a potential novel diagnostic or therapeutic target in human cancers.
Collapse
Affiliation(s)
- Yoo Jeong Han
- Department of Medicine, University of Illinois at Chicago, Chicago, Illinois 60612-7227, USA
| | | | | | | | | |
Collapse
|
27
|
Chen SM, Ma KY, Zeng J. Pseudogene: lessons from PCR bias, identification and resurrection. Mol Biol Rep 2010; 38:3709-15. [PMID: 21116863 DOI: 10.1007/s11033-010-0485-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2010] [Accepted: 11/09/2010] [Indexed: 11/26/2022]
Abstract
Pseudogenes are fragments of non-functional genomic DNA with high sequences similarity to normal functional genes. They are a kind of non-coding DNA produced by gene duplications or retrotranspositions. Pseudogenes exist in human genome at a large quantity which is nearly as much as that of normal functional genes. They could cause PCR bias in molecular biology experiments and confuse related analysis. On the other hand, pesudogenes are important elements in genomics study for getting an integral picture of genome annotation. They give diverse information of evolutionary history and are regarded as genome fossils. Worldwide research project "encyclopedia of DNA elements"(ENCODE) founded in recent years have enhanced our understanding of pseudogenes. Approaches established to identify pseudogenes include PseudoPipe, HAVANA method, PseudoFinder, RetroFinder, GIS-PET method and consensus method. This paper discuss pseudogenes with respect to the formation mechanisms, distribution, and problems for PCR, importance and identification of pseudogenes. Furthermore, potential resurrection of pseudogenes and their potential function are discussed.
Collapse
Affiliation(s)
- Shan-Min Chen
- School of Life Science and Food Engineering, Yibin University, Yibin, Sichuan, China
| | | | | |
Collapse
|
28
|
Pseudogene-mediated posttranscriptional silencing of HMGA1 can result in insulin resistance and type 2 diabetes. Nat Commun 2010; 1:40. [PMID: 20975707 DOI: 10.1038/ncomms1040] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2010] [Accepted: 06/25/2010] [Indexed: 11/08/2022] Open
Abstract
Processed pseudogenes are non-functional copies of normal genes that arise by a process of mRNA retrotransposition. The human genome contains thousands of pseudogenes; however, knowledge regarding their biological role is limited. Previously, we demonstrated that high mobility group A1 (HMGA1) protein regulates the insulin receptor (INSR) gene and that two diabetic patients demonstrated a marked destabilization of HMGA1 mRNA. In this paper we report that this destabilization of HMGA1 mRNA is triggered by enhanced expression of RNA from an HMGA1 pseudogene, HMGA1-p. Targeted knockdown of HMGA1-p mRNA in patient cells results in a reciprocal increase in HMGA1 mRNA stability and expression levels with a parallel correction in cell-surface INSR expression and insulin binding. These data provide evidence for a regulatory role of an expressed pseudogene in humans and establishes a novel mechanistic linkage between pseudogene HMGA1-p expression and type 2 diabetes mellitus.
Collapse
|
29
|
Hung MS, Lin YC, Mao JH, Kim IJ, Xu Z, Yang CT, Jablons DM, You L. Functional polymorphism of the CK2alpha intronless gene plays oncogenic roles in lung cancer. PLoS One 2010; 5:e11418. [PMID: 20625391 PMCID: PMC2896393 DOI: 10.1371/journal.pone.0011418] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2009] [Accepted: 06/05/2010] [Indexed: 01/22/2023] Open
Abstract
Protein kinase CK2 is frequently up-regulated in human cancers, although the mechanism of CK2 activation in cancer remains unknown. In this study, we investigated the role of the CK2α intronless gene (CSNK2A1P, a presumed CK2α pseudogene) in the pathogenesis of human cancers. We found evidence of amplification and over-expression of the CSNK2A1P gene in non- small cell lung cancer and leukemia cell lines and 25% of the lung cancer tissues studied. The mRNA expression levels correlated with the copy numbers of the CSNK2A1P gene. We also identified a novel polymorphic variant (398T/C, I133T) of the CSNK2A1P gene and showed that the 398T allele is selectively amplified over the 398C allele in 101 non-small cell lung cancer tissue samples compared to those in 48 normal controls (p = 0.013<0.05). We show for the first time CSNK2A1P protein expression in transfected human embryonic kidney 293T and mouse embryonic fibroblast NIH-3T3 cell lines. Both alleles are transforming in these cell lines, and the 398T allele appears to be more transforming than the 398C allele. Moreover, the 398T allele degrades PML tumor suppressor protein more efficiently than the 398C allele and shows a relatively stronger binding to PML. Knockdown of the CSNK2A1P gene expression with specific siRNA increased the PML protein level in lung cancer cells. We report, for the first time, that the CSNK2A1P gene is a functional proto-oncogene in human cancers and its functional polymorphism appears to degrade PML differentially in cancer cells. These results are consistent with an important role for the 398T allele of the CSNK2A1P in human lung cancer susceptibility.
Collapse
MESH Headings
- Animals
- Blotting, Western
- Casein Kinase II/genetics
- Casein Kinase II/metabolism
- Cell Line
- Cell Line, Tumor
- Gene Expression Regulation, Neoplastic/genetics
- Gene Expression Regulation, Neoplastic/physiology
- Humans
- Immunoprecipitation
- In Situ Hybridization, Fluorescence
- In Vitro Techniques
- Lung Neoplasms/genetics
- Mice
- NIH 3T3 Cells
- Nuclear Proteins/genetics
- Nuclear Proteins/metabolism
- Polymorphism, Genetic/genetics
- Polymorphism, Genetic/physiology
- Promyelocytic Leukemia Protein
- Protein Binding
- Proto-Oncogene Mas
- RNA, Small Interfering/genetics
- RNA, Small Interfering/physiology
- Reverse Transcriptase Polymerase Chain Reaction
- Sequence Analysis, DNA
- Transcription Factors/genetics
- Transcription Factors/metabolism
- Tumor Suppressor Proteins/genetics
- Tumor Suppressor Proteins/metabolism
Collapse
Affiliation(s)
- Ming-Szu Hung
- Thoracic Oncology Laboratory, Department of Surgery, Comprehensive Cancer Center, University of California San Francisco, San Francisco, California, United States of America
- Division of Pulmonary and Critical Care Medicine, Chang Gung Memorial Hospital, Chiayi, Taiwan
- Graduate Institute of Clinical Medical Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan
| | - Yu-Ching Lin
- Division of Pulmonary and Critical Care Medicine, Chang Gung Memorial Hospital, Chiayi, Taiwan
| | - Jian-Hua Mao
- Life Sciences Division, Lawrence Berkeley National Laboratory, University of California, Berkeley, California, United States of America
| | - Il-Jin Kim
- Thoracic Oncology Laboratory, Department of Surgery, Comprehensive Cancer Center, University of California San Francisco, San Francisco, California, United States of America
| | - Zhidong Xu
- Thoracic Oncology Laboratory, Department of Surgery, Comprehensive Cancer Center, University of California San Francisco, San Francisco, California, United States of America
| | - Cheng-Ta Yang
- Division of Pulmonary and Critical Care Medicine, Chang Gung Memorial Hospital, Chiayi, Taiwan
- Department of Respiratory Care, College of Medicine, Chang Gung University, Taoyuan, Taiwan
| | - David M. Jablons
- Thoracic Oncology Laboratory, Department of Surgery, Comprehensive Cancer Center, University of California San Francisco, San Francisco, California, United States of America
- * E-mail: (DMJ); (LY)
| | - Liang You
- Thoracic Oncology Laboratory, Department of Surgery, Comprehensive Cancer Center, University of California San Francisco, San Francisco, California, United States of America
- * E-mail: (DMJ); (LY)
| |
Collapse
|
30
|
Evolvability and Speed of Evolutionary Algorithms in Light of Recent Developments in Biology. ACTA ACUST UNITED AC 2010. [DOI: 10.1155/2010/568375] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Biological and artificial evolutionary systems exhibit varying degrees of evolvability and different rates of evolution. Such quantities can be affected by various factors. Here, we review some evolutionary mechanisms and discuss new developments in biology that can potentially improve evolvability or accelerate evolution in artificial systems. Biological notions are discussed to the degree they correspond to notions in Evolutionary Computation. We hope that the findings put forward here can be used to design computational models of evolution that produce significant gains in evolvability and evolutionary speed.
Collapse
|
31
|
Harrison PM, Khachane A, Kumar M. Genomic assessment of the evolution of the prion protein gene family in vertebrates. Genomics 2010; 95:268-77. [DOI: 10.1016/j.ygeno.2010.02.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2009] [Revised: 02/16/2010] [Accepted: 02/24/2010] [Indexed: 02/09/2023]
|
32
|
Liu YJ, Zheng D, Balasubramanian S, Carriero N, Khurana E, Robilotto R, Gerstein MB. Comprehensive analysis of the pseudogenes of glycolytic enzymes in vertebrates: the anomalously high number of GAPDH pseudogenes highlights a recent burst of retrotrans-positional activity. BMC Genomics 2009; 10:480. [PMID: 19835609 PMCID: PMC2770531 DOI: 10.1186/1471-2164-10-480] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2009] [Accepted: 10/16/2009] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Pseudogenes provide a record of the molecular evolution of genes. As glycolysis is such a highly conserved and fundamental metabolic pathway, the pseudogenes of glycolytic enzymes comprise a standardized genomic measuring stick and an ideal platform for studying molecular evolution. One of the glycolytic enzymes, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), has already been noted to have one of the largest numbers of associated pseudogenes, among all proteins. RESULTS We assembled the first comprehensive catalog of the processed and duplicated pseudogenes of glycolytic enzymes in many vertebrate model-organism genomes, including human, chimpanzee, mouse, rat, chicken, zebrafish, pufferfish, fruitfly, and worm (available at http://pseudogene.org/glycolysis/). We found that glycolytic pseudogenes are predominantly processed, i.e. retrotransposed from the mRNA of their parent genes. Although each glycolytic enzyme plays a unique role, GAPDH has by far the most pseudogenes, perhaps reflecting its large number of non-glycolytic functions or its possession of a particularly retrotranspositionally active sub-sequence. Furthermore, the number of GAPDH pseudogenes varies significantly among the genomes we studied: none in zebrafish, pufferfish, fruitfly, and worm, 1 in chicken, 50 in chimpanzee, 62 in human, 331 in mouse, and 364 in rat. Next, we developed a simple method of identifying conserved syntenic blocks (consistently applicable to the wide range of organisms in the study) by using orthologous genes as anchors delimiting a conserved block between a pair of genomes. This approach showed that few glycolytic pseudogenes are shared between primate and rodent lineages. Finally, by estimating pseudogene ages using Kimura's two-parameter model of nucleotide substitution, we found evidence for bursts of retrotranspositional activity approximately 42, 36, and 26 million years ago in the human, mouse, and rat lineages, respectively. CONCLUSION Overall, we performed a consistent analysis of one group of pseudogenes across multiple genomes, finding evidence that most of them were created within the last 50 million years, subsequent to the divergence of rodent and primate lineages.
Collapse
Affiliation(s)
- Yuen-Jong Liu
- Department of Surgery, Beth Israel Deaconess Medical Center, Harvard Medical School, 110 Francis Street, Boston, MA, USA
- Department of Molecular Biophysics and Biochemistry, P.O. Box 208114, Yale University, New Haven, CT 06520, USA
| | - Deyou Zheng
- Albert Einstein College of Medicine of Yeshiva University, Department of Neurology, Rose F. Kennedy Center, 1410 Pelham Parkway South, Room 915B, Bronx, NY 10461, USA
| | - Suganthi Balasubramanian
- Department of Molecular Biophysics and Biochemistry, P.O. Box 208114, Yale University, New Haven, CT 06520, USA
| | - Nicholas Carriero
- Department of Molecular Biophysics and Biochemistry, P.O. Box 208114, Yale University, New Haven, CT 06520, USA
| | - Ekta Khurana
- Department of Molecular Biophysics and Biochemistry, P.O. Box 208114, Yale University, New Haven, CT 06520, USA
| | - Rebecca Robilotto
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Mark B Gerstein
- Department of Molecular Biophysics and Biochemistry, P.O. Box 208114, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| |
Collapse
|
33
|
Khachane AN, Harrison PM. Strong association between pseudogenization mechanisms and gene sequence length. Biol Direct 2009; 4:38. [PMID: 19807910 PMCID: PMC2768697 DOI: 10.1186/1745-6150-4-38] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2009] [Accepted: 10/06/2009] [Indexed: 11/20/2022] Open
Abstract
Abstract Pseudogenes arise from the decay of gene copies following either RNA-mediated duplication (processed pseudogenes) or DNA-mediated duplication (nonprocessed pseudogenes). Here, we show that long protein-coding genes tend to produce more nonprocessed pseudogenes than short genes, whereas the opposite is true for processed pseudogenes. Protein-coding genes longer than 3000 bp are 6 times more likely to produce nonprocessed pseudogenes than processed ones. Reviewers This article was reviewed by Dr. Dan Graur and Dr. Craig Nelson (nominated by Dr. J Peter Gogarten).
Collapse
Affiliation(s)
- Amit N Khachane
- Department of Biology, McGill University, Stewart Biology Building, Montreal, QC, H3A 1B1, Canada.
| | | |
Collapse
|
34
|
Khachane AN, Harrison PM. Assessing the genomic evidence for conserved transcribed pseudogenes under selection. BMC Genomics 2009; 10:435. [PMID: 19754956 PMCID: PMC2753554 DOI: 10.1186/1471-2164-10-435] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2009] [Accepted: 09/15/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transcribed pseudogenes are copies of protein-coding genes that have accumulated indicators of coding-sequence decay (such as frameshifts and premature stop codons), but nonetheless remain transcribed. Recent experimental evidence indicates that transcribed pseudogenes may regulate the expression of homologous genes, through antisense interference, or generation of small interfering RNAs (siRNAs). Here, we assessed the genomic evidence for such transcribed pseudogenes of potential functional importance, in the human genome. The most obvious indicators of such functional importance are significant evidence of conservation and selection pressure. RESULTS A variety of pseudogene annotations from multiple sources were pooled and filtered to obtain a subset of sequences that have significant mid-sequence disablements (frameshifts and premature stop codons), and that have clear evidence of full-length mRNA transcription. We found 1750 such transcribed pseudogene annotations (TPAs) in the human genome (corresponding to approximately 11.5% of human pseudogene annotations). We checked for syntenic conservation of TPAs in other mammals (rhesus monkey, mouse, rat, dog and cow). About half of the human TPAs are conserved in rhesus monkey, but strikingly, very few in mouse (approximately 3%). The TPAs conserved in rhesus monkey show evidence of selection pressure (relative to surrounding intergenic DNA) on: (i) their GC content, and (ii) their rate of nucleotide substitution. This is in spite of distributions of Ka/Ks (ratios of non-synonymous to synonymous substitution rates), congruent with a lack of protein-coding ability. Furthermore, we have identified 68 human TPAs that are syntenically conserved in at least two other mammals. Interestingly, we observe three TPA sequences conserved in dog that have intermediate character (i.e., evidence of both protein-coding ability and pseudogenicity), and discuss the implications of this. CONCLUSION Through evolutionary analysis, we have identified candidate sequences for functional human transcribed pseudogenes, and have pinpointed 68 strong candidates for further investigation as potentially functional transcribed pseudogenes across multiple mammal species.
Collapse
Affiliation(s)
- Amit N Khachane
- Department of Biology, McGill University, Stewart Biology Building, 1205 Docteur Penfield Ave, Montreal, QC, H3A 1B1 Canada.
| | | |
Collapse
|
35
|
Zou C, Lehti-Shiu MD, Thibaud-Nissen F, Prakash T, Buell CR, Shiu SH. Evolutionary and expression signatures of pseudogenes in Arabidopsis and rice. PLANT PHYSIOLOGY 2009; 151:3-15. [PMID: 19641029 PMCID: PMC2736005 DOI: 10.1104/pp.109.140632] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2009] [Accepted: 07/18/2009] [Indexed: 05/18/2023]
Abstract
Pseudogenes (Psi) are nonfunctional genomic sequences resembling functional genes. Knowledge of Psis can improve genome annotation and our understanding of genome evolution. However, there has been relatively little systemic study of Psis in plants. In this study, we characterized the evolution and expression patterns of Psis in Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa). In contrast to animal Psis, many plant Psis experienced much stronger purifying selection. In addition, plant Psis experiencing stronger selective constraints tend to be derived from relatively ancient duplicates, suggesting that they were functional for a relatively long time but became Psis recently. Interestingly, the regions 5' to the first stops in the Psis have experienced stronger selective constraints compared with 3' regions, suggesting that the 5' regions were functional for a longer period of time after the premature stops appeared. We found that few Psis have expression evidence, and their expression levels tend to be lower compared with annotated genes. Furthermore, Psis with expressed sequence tags tend to be derived from relatively recent duplication events, indicating that Psi expression may be due to insufficient time for complete degeneration of regulatory signals. Finally, larger protein domain families have significantly more Psis in general. However, while families involved in environmental stress responses have a significant excess of Psis, transcription factors and receptor-like kinases have lower than expected numbers of Psis, consistent with their elevated retention rate in plant genomes. Our findings illustrate peculiar properties of plant Psis, providing additional insight into the evolution of duplicate genes and benefiting future genome annotation.
Collapse
Affiliation(s)
- Cheng Zou
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824, USA
| | | | | | | | | | | |
Collapse
|
36
|
Morais DD, Harrison PM. Genomic evidence for non-random endemic populations of decaying exons from mammalian genes. BMC Genomics 2009; 10:309. [PMID: 19594905 PMCID: PMC2718932 DOI: 10.1186/1471-2164-10-309] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2009] [Accepted: 07/13/2009] [Indexed: 11/13/2022] Open
Abstract
Background Functional diversification of genes in mammalian genomes is engendered by a number of processes, e.g., gene duplication and alternative splicing. Gene duplication is classically discussed as leading to neofunctionalization (generation of new functions), subfunctionalization (generation of a varied function), or pseudogenization (loss of the gene and its function). Results Here, we focus on the process of pseudogenization, but specifically for individual exons from genes. It is at present unclear to what extent pseudogenization of individual exon duplications affects gene evolution, i.e., is it a random phenomenon, or is it associated with specific types of genes and encoded proteins, and positions in gene structures? We gathered genomic evidence for pseudogenic exons (ΨEs, i.e., exons disabled by frameshifts and premature stop codons), to examine for significant trends in their distribution across four mammalian genomes (specifically human, cow, mouse and rat). Across these four genomes, we observed a consistent population of ΨEs, associated with 0.4–1.0% of genes. These ΨE populations exhibit codon substitution patterns that are typical of an endemic population of decaying sequences. In human, ΨEs have significant over-representation for functional categories related to 'ion binding' and 'nucleic-acid binding', compared to duplicated exons in general. Also, ΨEs tend to be associated with some protein domains that are abundant generally, e.g., Zinc-finger and immunoglobulin protein domains, but not others, e.g., EGF-like domains. Positionally, ΨEs are also significantly associated with the 5' end of genes, but despite this, individual stop codons are positioned so that there is significant avoidance of potential targeting to nonsense-mediated decay. In human, ΨEs are often associated with alternative splicing (in 22 out of 284 genes with ΨEs in their milieu), and can have different parts of their sequence differentially spliced in alternative transcripts. Some unusual cases of ΨEs embedded within 5' and 3' non-coding exons are observed. Conclusion Our results indicate the types of genes that harbour ΨEs, and demonstrate that ΨEs have non-random distribution within gene structures. These ΨEs may function in gene regulation through generation of transcribed pseudogenes, or regulatory alternate transcripts.
Collapse
Affiliation(s)
- David Delima Morais
- Department of Biology, McGill University, Stewart Biology Building, 1205 Docteur Penfield Ave, Montreal, QC, H3A 1B1, Canada.
| | | |
Collapse
|
37
|
Identification of a new rice blast resistance gene, Pid3, by genomewide comparison of paired nucleotide-binding site--leucine-rich repeat genes and their pseudogene alleles between the two sequenced rice genomes. Genetics 2009; 182:1303-11. [PMID: 19506306 DOI: 10.1534/genetics.109.102871] [Citation(s) in RCA: 152] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Rice blast, caused by Magnaporthe oryzae, is one of the most devastating diseases. The two major subspecies of Asian cultivated rice (Oryza sativa L.), indica and japonica, have shown obvious differences in rice blast resistance, but the genomic basis that underlies the difference is not clear. We performed a genomewide comparison of the major class of resistant gene family, the nucleotide-binding site-leucine-rich repeat (NBS-LRR) gene family, between 93-11 (indica) and Nipponbare (japonica) with a focus on their pseudogene members. We found great differences in either constitution or distribution of pseudogenes between the two genomes. According to this comparison, we designed the PCR-based molecular markers specific to the Nipponbare NBS-LRR pseudogene alleles and used them as cosegregation markers for blast susceptibility in a segregation population from a cross between a rice blast-resistant indica variety and a susceptible japonica variety. Through this approach, we identified a new blast resistance gene, Pid3, in the indica variety, Digu. The allelic Pid3 loci in most of the tested japonica varieties were identified as pseudogenes due to a nonsense mutation at the nucleotide position 2208 starting from the translation initiation site. However, this mutation was not found in any of the tested indica varieties, African cultivated rice varieties, or AA genome-containing wild rice species. These results suggest that the pseudogenization of Pid3 in japonica occurred after the divergence of indica and japonica.
Collapse
|
38
|
Kojima KK, Okada N. mRNA retrotransposition coupled with 5' inversion as a possible source of new genes. Mol Biol Evol 2009; 26:1405-20. [PMID: 19289598 DOI: 10.1093/molbev/msp050] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Human long interspersed nuclear element-1 (L1) occupies one-sixth of our genome and has contributed to genome evolution in various ways. Approximately 10% of human L1 copies are composed of two L1 segments; the 5' segment and 3' segment are in head-to-head (i.e., 5'-inverted) orientation. Besides mediating their own retrotransposition, L1 has the ability to mobilize mRNA "in trans," and the number of retrotransposed mRNA sequences (retrocopies) is estimated to be >6,000. In this study, we identified 48 human-specific retrocopies and 95 chimpanzee-specific retrocopies by comparing the human and chimpanzee genomes. Among these retrocopies, 12 were 5'-inverted. The characteristics of these 5'-inverted retrocopies were similar to those of 5'-inverted L1 copies, indicating that the 5' inversion is generated by the same mechanism. With these findings, we examined the possibility that 5' inversion of the retrocopy generates a new gene that codes for a peptide with a different N terminus. We identified several potential 5'-inverted retrogenes, including those of thymopoietin beta (TMPO) and eukaryotic translation initiation factor 3 subunit 5 (EIF3F). The most interesting candidate was the 5'-inverted retrocopy of small nuclear ribonucleoprotein polypeptide N (SNRPN). This retrocopy was transcribed in the reverse orientation in several organs, had multiple transcript variants, and encoded a protein containing a peptide fragment derived from the N-terminal portion of SNRPN. Our results suggest that mRNA retrotransposition coupled with 5' inversion may be a mechanism to generate new genes distinct from parental genes.
Collapse
Affiliation(s)
- Kenji K Kojima
- Department of Biological Sciences, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Yokohama, Japan
| | | |
Collapse
|
39
|
Ortutay C, Vihinen M. PseudoGeneQuest - service for identification of different pseudogene types in the human genome. BMC Bioinformatics 2008; 9:299. [PMID: 18597685 PMCID: PMC2453144 DOI: 10.1186/1471-2105-9-299] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2008] [Accepted: 07/02/2008] [Indexed: 01/29/2023] Open
Abstract
Background Pseudogenes, nonfunctional copies of genes, evolve fast due the lack of evolutionary pressures and thus appear in several different forms. PseudoGeneQuest is an online tool to search the human genome for a given query sequence and to identify different types of pseudogenes as well as novel genes and gene fragments. Description The service can detect pseudogenes, that have arisen either by retrotransposition or segmental genome duplication, many of which are not listed in the public pseudogene databases. The service has a user-friendly web interface and uses a powerful computer cluster in order to perform parallel searches and provide relatively fast runtimes despite exhaustive database searches and analyses. Conclusion PseudoGeneQuest is a versatile tool for detecting novel pseudogene candidates from the human genome. The service searches human genome sequences for five types of pseudogenes and provides an output that allows easy further analysis of observations. In addition to the result file the system provides visualization of the results linked to Ensembl Genome Browser. PseudoGeneQuest service is freely available.
Collapse
Affiliation(s)
- Csaba Ortutay
- Institute of Medical Technology, University of Tampere, FI-33014 Tampere, Finland.
| | | |
Collapse
|
40
|
Moon S, Cho S, Kim H. Organization and evolution of mitochondrial gene clusters in human. Genomics 2008; 92:85-93. [PMID: 18559289 DOI: 10.1016/j.ygeno.2008.01.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2007] [Revised: 01/07/2008] [Accepted: 01/08/2008] [Indexed: 11/29/2022]
Abstract
Currently, the spatial patterns of mitochondrial genes and how the genomic localization of (pseudo)genes originated from mitochondrial DNA remain largely unexplained. The aim of this study was to elucidate the organization of mitochondrial (pseudo)genes given their evolutionary origin. We used a keyword finding method and a bootstrapping method to estimate parameter values that represent the distribution pattern of mitochondrial genes in the nuclear genome. Almost half of mitochondrial genes showing physical clusters were located in the pericentromeric and subtelomeric regions of the chromosome. Most interestingly, the size of these clusters ranged from 0.085 to 3.2 Mb (average+/-SD 1.3+/-0.73 Mb), which coincides with the size of the evolutionary pocket, or the average size of evolutionary breakpoint regions. Our findings imply that the localization of mitochondrial genes in the human genome is determined independent of adaptation.
Collapse
Affiliation(s)
- Sunjin Moon
- Laboratory of Bioinformatics and Population Genetics, Department of Agricultural Biotechnology, Seoul National University, Seoul 151-742, Korea
| | | | | |
Collapse
|
41
|
Comparative genomics search for losses of long-established genes on the human lineage. PLoS Comput Biol 2008; 3:e247. [PMID: 18085818 PMCID: PMC2134963 DOI: 10.1371/journal.pcbi.0030247] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2007] [Accepted: 10/30/2007] [Indexed: 02/01/2023] Open
Abstract
Taking advantage of the complete genome sequences of several mammals, we developed a novel method to detect losses of well-established genes in the human genome through syntenic mapping of gene structures between the human, mouse, and dog genomes. Unlike most previous genomic methods for pseudogene identification, this analysis is able to differentiate losses of well-established genes from pseudogenes formed shortly after segmental duplication or generated via retrotransposition. Therefore, it enables us to find genes that were inactivated long after their birth, which were likely to have evolved nonredundant biological functions before being inactivated. The method was used to look for gene losses along the human lineage during the approximately 75 million years (My) since the common ancestor of primates and rodents (the euarchontoglire crown group). We identified 26 losses of well-established genes in the human genome that were all lost at least 50 My after their birth. Many of them were previously characterized pseudogenes in the human genome, such as GULO and UOX. Our methodology is highly effective at identifying losses of single-copy genes of ancient origin, allowing us to find a few well-known pseudogenes in the human genome missed by previous high-throughput genome-wide studies. In addition to confirming previously known gene losses, we identified 16 previously uncharacterized human pseudogenes that are definitive losses of long-established genes. Among them is ACYL3, an ancient enzyme present in archaea, bacteria, and eukaryotes, but lost approximately 6 to 8 Mya in the ancestor of humans and chimps. Although losses of well-established genes do not equate to adaptive gene losses, they are a useful proxy to use when searching for such genetic changes. This is especially true for adaptive losses that occurred more than 250,000 years ago, since any genetic evidence of the selective sweep indicative of such an event has been erased.
Collapse
|
42
|
Hu G, Yang Q, Cui X, Yue G, Azaro MA, Wang HY, Li H. A highly sensitive and specific system for large-scale gene expression profiling. BMC Genomics 2008; 9:9. [PMID: 18186939 PMCID: PMC2267712 DOI: 10.1186/1471-2164-9-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2007] [Accepted: 01/10/2008] [Indexed: 12/02/2022] Open
Abstract
Background Rapid progress in the field of gene expression-based molecular network integration has generated strong demand on enhancing the sensitivity and data accuracy of experimental systems. To meet the need, a high-throughput gene profiling system of high specificity and sensitivity has been developed. Results By using specially designed primers, the new system amplifies sequences in neighboring exons separated by big introns so that mRNA sequences may be effectively discriminated from other highly related sequences including their genes, unprocessed transcripts, pseudogenes and pseudogene transcripts. Probes used for microarray detection consist of sequences in the two neighboring exons amplified by the primers. In conjunction with a newly developed high-throughput multiplex amplification system and highly simplified experimental procedures, the system can be used to analyze >1,000 mRNA species in a single assay. It may also be used for gene expression profiling of very few (n = 100) or single cells. Highly reproducible results were obtained from duplicate samples with the same number of cells, and from those with a small number (100) and a large number (10,000) of cells. The specificity of the system was demonstrated by comparing results from a breast cancer cell line, MCF-7, and an ovarian cancer cell line, NCI/ADR-RES, and by using genomic DNA as starting material. Conclusion Our approach may greatly facilitate the analysis of combinatorial expression of known genes in many important applications, especially when the amount of RNA is limited.
Collapse
Affiliation(s)
- Guohong Hu
- Department of Molecular Genetics, Microbiology and Immunology/The Cancer Institute of New Jersey, University of Medicine and Dentistry of New Jersey Robert Wood Johnson Medical School, Piscataway, New Jersey 08854, USA.
| | | | | | | | | | | | | |
Collapse
|
43
|
Harrison P, Yu Z. Frame disruptions in human mRNA transcripts, and their relationship with splicing and protein structures. BMC Genomics 2007; 8:371. [PMID: 17937804 PMCID: PMC2194788 DOI: 10.1186/1471-2164-8-371] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2007] [Accepted: 10/15/2007] [Indexed: 11/24/2022] Open
Abstract
Background Efforts to gather genomic evidence for the processes of gene evolution are ongoing, and are closely coupled to improved gene annotation methods. Such annotation is complicated by the occurrence of disrupted mRNAs (dmRNAs), harbouring frameshifts and premature stop codons, which can be considered indicators of decay into pseudogenes. Results We have derived a procedure to annotate dmRNAs, and have applied it to human data. Subsequences are generated from parsing at key frame-disruption positions and are required to align significantly within any original protein homology. We find 419 high-quality human dmRNAs (3% of total). Significant dmRNA subpopulations include: zinc-finger-containing transcription factors with long disrupted exons, and antisense homologies to distal genes. We analysed the distribution of initial frame disruptions in dmRNAs with respect to positions of: (i) protein domains, (ii) alternatively-spliced exons, and (iii) regions susceptible to nonsense-mediated decay (NMD). We find significant avoidance of protein-domain disruption (indicating a selection pressure for this), and highly significant overrepresentation of disruptions in alternatively-spliced exons, and 'non-NMD' regions. We do not find any evidence for evolution of novelty in protein structures through frameshifting. Conclusion Our results indicate largely negative selection pressures related to frame disruption during gene evolution.
Collapse
Affiliation(s)
- Paul Harrison
- Department of Biology, McGill University, Stewart Biology Building, 1205 Docteur Penfield Ave,, Montreal, QC, H3A 1B1 Canada.
| | | |
Collapse
|
44
|
Yu Z, Morais D, Ivanga M, Harrison PM. Analysis of the role of retrotransposition in gene evolution in vertebrates. BMC Bioinformatics 2007; 8:308. [PMID: 17718914 PMCID: PMC2048973 DOI: 10.1186/1471-2105-8-308] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2007] [Accepted: 08/24/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The dynamics of gene evolution are influenced by several genomic processes. One such process is retrotransposition, where an mRNA transcript is reverse-transcribed and reintegrated into the genomic DNA. RESULTS We have surveyed eight vertebrate genomes (human, chimp, dog, cow, rat, mouse, chicken and the puffer-fish T. nigriviridis), for putatively retrotransposed copies of genes. To gain a complete picture of the role of retrotransposition, a robust strategy to identify putative retrogenes (PRs) was derived, in tandem with an adaptation of previous procedures to annotate processed pseudogenes, also called retropseudogenes (RpsiGs). Mammalian genomes are estimated to contain 400-800 PRs (corresponding to approximately 3% of genes), with fewer PRs and RpsiGs in the non-mammalian vertebrates. Focussing on human and mouse, we aged the PRs, analysed for evidence of transcription and selection pressures, and assigned functional categories. The PRs have significantly less transcription evidence mappable to them, are significantly less likely to arise from alternatively-spliced genes, and are statistically overrepresented for ribosomal-protein genes, when compared to the proteome in general. We find evidence for spurts of gene retrotransposition in human and mouse, since the lineage of either species split from the dog lineage, with >200 PRs formed in mouse since its divergence from rat. To examine for selection, we calculated: (i) Ka/Ks values (ratios of non-synonymous and synonymous substitutions in codons), and (ii) the significance of conservation of reading frames in PRs. We found >50 PRs in both human and mouse formed since divergence from dog, that are under pressure to maintain the integrity of their coding sequences. For different subsets of PRs formed at different stages of mammalian evolution, we find some evidence for non-neutral evolution, despite significantly less expression evidence for these sequences. CONCLUSION These results indicate that retrotranspositions are a significant source of novel coding sequences in mammalian gene evolution.
Collapse
Affiliation(s)
- Zhan Yu
- Department of Biology, McGill University, Stewart Biology Building, 1205 Docteur Penfield Ave., Montreal, QC, H3A 1B1 Canada
| | - David Morais
- Department of Biology, McGill University, Stewart Biology Building, 1205 Docteur Penfield Ave., Montreal, QC, H3A 1B1 Canada
| | - Mahine Ivanga
- Department of Biology, McGill University, Stewart Biology Building, 1205 Docteur Penfield Ave., Montreal, QC, H3A 1B1 Canada
| | - Paul M Harrison
- Department of Biology, McGill University, Stewart Biology Building, 1205 Docteur Penfield Ave., Montreal, QC, H3A 1B1 Canada
| |
Collapse
|
45
|
Ruan Y, Ooi HS, Choo SW, Chiu KP, Zhao XD, Srinivasan K, Yao F, Choo CY, Liu J, Ariyaratne P, Bin WG, Kuznetsov VA, Shahab A, Sung WK, Bourque G, Palanisamy N, Wei CL. Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs). Genome Res 2007; 17:828-38. [PMID: 17568001 PMCID: PMC1891342 DOI: 10.1101/gr.6018607] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Identification of unconventional functional features such as fusion transcripts is a challenging task in the effort to annotate all functional DNA elements in the human genome. Paired-End diTag (PET) analysis possesses a unique capability to accurately and efficiently characterize the two ends of DNA fragments, which may have either normal or unusual compositions. This unique nature of PET analysis makes it an ideal tool for uncovering unconventional features residing in the human genome. Using the PET approach for comprehensive transcriptome analysis, we were able to identify fusion transcripts derived from genome rearrangements and actively expressed retrotransposed pseudogenes, which would be difficult to capture by other means. Here, we demonstrate this unique capability through the analysis of 865,000 individual transcripts in two types of cancer cells. In addition to the characterization of a large number of differentially expressed alternative 5' and 3' transcript variants and novel transcriptional units, we identified 70 fusion transcript candidates in this study. One was validated as the product of a fusion gene between BCAS4 and BCAS3 resulting from an amplification followed by a translocation event between the two loci, chr20q13 and chr17q23. Through an examination of PETs that mapped to multiple genomic locations, we identified 4055 retrotransposed loci in the human genome, of which at least three were found to be transcriptionally active. The PET mapping strategy presented here promises to be a useful tool in annotating the human genome, especially aberrations in human cancer genomes.
Collapse
Affiliation(s)
- Yijun Ruan
- Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
- Corresponding authors.E-mail ; fax 65-64789059.E-mail ; fax 65-64789059
| | - Hong Sain Ooi
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Siew Woh Choo
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Kuo Ping Chiu
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Xiao Dong Zhao
- Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - K.G. Srinivasan
- Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Fei Yao
- Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Chiou Yu Choo
- Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Jun Liu
- Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Pramila Ariyaratne
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Wilson G.W. Bin
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Vladimir A. Kuznetsov
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Atif Shahab
- Bioinformatics Institute, Singapore 138671, Singapore
| | - Wing-Kin Sung
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
- School of Computing, National University of Singapore, Singapore 117543, Singapore
| | - Guillaume Bourque
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | | | - Chia-Lin Wei
- Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
- Corresponding authors.E-mail ; fax 65-64789059.E-mail ; fax 65-64789059
| |
Collapse
|
46
|
Zheng D, Frankish A, Baertsch R, Kapranov P, Reymond A, Choo SW, Lu Y, Denoeud F, Antonarakis SE, Snyder M, Ruan Y, Wei CL, Gingeras TR, Guigó R, Harrow J, Gerstein MB. Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res 2007; 17:839-51. [PMID: 17568002 PMCID: PMC1891343 DOI: 10.1101/gr.5586307] [Citation(s) in RCA: 152] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction ( approximately 80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.
Collapse
Affiliation(s)
- Deyou Zheng
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Corresponding authors.E-mail ; fax (360) 838-7861.E-mail ; fax (360) 838-7861
| | - Adam Frankish
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1HH, United Kingdom
| | - Robert Baertsch
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California 95064, USA
| | | | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
| | - Siew Woh Choo
- Genome Institute of Singapore, Singapore 138672, Singapore
| | - Yontao Lu
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California 95064, USA
| | - France Denoeud
- Grup de Recerca en Informática Biomèdica, Institut Municipal d’Investigació Mèdica/Universitat Pompeu Fabra, Passeig Marítim de la Barceloneta, 37-49, 08003, Barcelona, Catalonia, Spain
| | - Stylianos E. Antonarakis
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
| | - Michael Snyder
- Molecular, Cellular & Developmental Biology Department, Yale University, New Haven, Connecticut 06520, USA
| | - Yijun Ruan
- Genome Institute of Singapore, Singapore 138672, Singapore
| | - Chia-Lin Wei
- Genome Institute of Singapore, Singapore 138672, Singapore
| | | | - Roderic Guigó
- Grup de Recerca en Informática Biomèdica, Institut Municipal d’Investigació Mèdica/Universitat Pompeu Fabra, Passeig Marítim de la Barceloneta, 37-49, 08003, Barcelona, Catalonia, Spain
- Center for Genomic Regulation, Passeig Marítim de la Barceloneta, 37-49, 08003, Barcelona, Catalonia, Spain
| | - Jennifer Harrow
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1HH, United Kingdom
| | - Mark B. Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Corresponding authors.E-mail ; fax (360) 838-7861.E-mail ; fax (360) 838-7861
| |
Collapse
|
47
|
Abstract
We propose that select retropseudogenes of the high mobility group nonhistone chromosomal protein genes have recently integrated into mammalian genomes on the basis of the high sequence identity of the copies to the cDNA sequences derived from the original genes. These include the Hmg1 gene family in mice and the Hmgn2 family in humans. We investigated orthologous loci of several strains and species of Mus for presence or absence of apparently young Hmg1 retropseudogenes. Three of four analysed elements were specific to Mus musculus, two of which were not fixed, indicative of recent evolutionary origins. Additionally, we datamined a presumptive subfamily (Hmgz) of mouse Hmg1, but only identified one true element in the GenBank database, which is not consistent with a separate subfamily status. Two of four analysed Hmgn2 retropseudogenes were specific for the human genome, whereas a third was identified in human, chimpanzee and gorilla genomes, and a fourth additionally found in orangutan but absent in African green monkey. Flanking target-site duplications were consistent with LINE integration sites supporting LINE machinery for their mechanism of amplification. The human Hmgn2 retropseudogenes were full length, whereas the mouse Hmg1 elements were either full length or 3'-truncated at specific positions, most plausibly the result of use of alternative polyadenylation sites. The nature of their recent amplification success in relation to other retropseudogenes is unclear, although availability of a large number of transcripts during gametogenesis may be a reason. It is apparent that retropseudogenes continue to shape mammalian genomes, and may provide insight into the process of retrotransposition, as well as offer potential use as phylogenetic markers.
Collapse
Affiliation(s)
- Eillen Tecle
- Department of Biology, Eastern Michigan University, Ypsilanti, MI 48197, USA
| | | | | |
Collapse
|
48
|
Zheng D, Gerstein MB. The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they? Trends Genet 2007; 23:219-24. [PMID: 17382428 DOI: 10.1016/j.tig.2007.03.003] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2006] [Revised: 02/06/2007] [Accepted: 03/09/2007] [Indexed: 10/23/2022]
Abstract
Pseudogenes have long been considered to be 'dead', nonfunctional by-products of genome evolution. However, several lines of evidence now show that some pseudogenes are transcriptionally 'alive', and a few might even have biochemical roles. Therefore, the boundary between genes (often considered to be 'living') and pseudogenes (often considered to be 'dead') might be ambiguous and difficult to define. Here, we examine the evidence for and against pseudogene functionality, and we argue that the time is ripe for revising the definition of a pseudogene. Furthermore, we suggest a classification system to accommodate pseudogenes with various levels of functionality.
Collapse
Affiliation(s)
- Deyou Zheng
- Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, New Haven, CT 06520, USA.
| | | |
Collapse
|
49
|
Ortutay C, Siermala M, Vihinen M. Molecular characterization of the immune system: emergence of proteins, processes, and domains. Immunogenetics 2007; 59:333-48. [PMID: 17294181 DOI: 10.1007/s00251-007-0191-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2006] [Accepted: 01/08/2007] [Indexed: 12/27/2022]
Abstract
Many genes and proteins are required to carry out the processes of innate and adaptive immunity. For many studies, including systems biology, it is necessary to have a clear and comprehensive definition of the immune system, including the genes and proteins that take part in immunological processes. We have identified and cataloged a large portion of the human immunology-related genes, which we call the essential immunome. The 847 identified genes and proteins were annotated, and their chromosomal localizations were compared to the mouse genome. Relation to disease was also taken into account. We identified numerous pseudogenes, many of which are expressed, and found two putative new genes. We also carried out an evolutionary analysis of immune processes based on gene orthologs to gain an overview of the evolutionary past and molecular present of the human immune system. A list of genes and proteins were compiled. A comprehensive characterization of the member genes and proteins, including the corresponding pseudogenes is presented. Immunome genes were found to have three types of emergence in independent studies of their ontologies, domains, and functions.
Collapse
Affiliation(s)
- Csaba Ortutay
- Institute of Medical Technology, University of Tampere, 33014, Tampere, Finland
| | | | | |
Collapse
|
50
|
Prasanth KV, Spector DL. Eukaryotic regulatory RNAs: an answer to the 'genome complexity' conundrum. Genes Dev 2007; 21:11-42. [PMID: 17210785 DOI: 10.1101/gad.1484207] [Citation(s) in RCA: 301] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
A large portion of the eukaryotic genome is transcribed as noncoding RNAs (ncRNAs). While once thought of primarily as "junk," recent studies indicate that a large number of these RNAs play central roles in regulating gene expression at multiple levels. The increasing diversity of ncRNAs identified in the eukaryotic genome suggests a critical nexus between the regulatory potential of ncRNAs and the complexity of genome organization. We provide an overview of recent advances in the identification and function of eukaryotic ncRNAs and the roles played by these RNAs in chromatin organization, gene expression, and disease etiology.
Collapse
|