101
|
Huda A, Mariño-Ramírez L, Jordan IK. Epigenetic histone modifications of human transposable elements: genome defense versus exaptation. Mob DNA 2010; 1:2. [PMID: 20226072 PMCID: PMC2836006 DOI: 10.1186/1759-8753-1-2] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2009] [Accepted: 01/25/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transposition is disruptive in nature and, thus, it is imperative for host genomes to evolve mechanisms that suppress the activity of transposable elements (TEs). At the same time, transposition also provides diverse sequences that can be exapted by host genomes as functional elements. These notions form the basis of two competing hypotheses pertaining to the role of epigenetic modifications of TEs in eukaryotic genomes: the genome defense hypothesis and the exaptation hypothesis. To date, all available evidence points to the genome defense hypothesis as the best explanation for the biological role of TE epigenetic modifications. RESULTS We evaluated several predictions generated by the genome defense hypothesis versus the exaptation hypothesis using recently characterized epigenetic histone modification data for the human genome. To this end, we mapped chromatin immunoprecipitation sequence tags from 38 histone modifications, characterized in CD4+ T cells, to the human genome and calculated their enrichment and depletion in all families of human TEs. We found that several of these families are significantly enriched or depleted for various histone modifications, both active and repressive. The enrichment of human TE families with active histone modifications is consistent with the exaptation hypothesis and stands in contrast to previous analyses that have found mammalian TEs to be exclusively repressively modified. Comparisons between TE families revealed that older families carry more histone modifications than younger ones, another observation consistent with the exaptation hypothesis. However, data from within family analyses on the relative ages of epigenetically modified elements are consistent with both the genome defense and exaptation hypotheses. Finally, TEs located proximal to genes carry more histone modifications than the ones that are distal to genes, as may be expected if epigenetically modified TEs help to regulate the expression of nearby host genes. CONCLUSIONS With a few exceptions, most of our findings support the exaptation hypothesis for the role of TE epigenetic modifications when vetted against the genome defense hypothesis. The recruitment of epigenetic modifications may represent an additional mechanism by which TEs can contribute to the regulatory functions of their host genomes.
Collapse
|
102
|
Huda A, Jordan IK. Epigenetic Regulation of Mammalian Genomes by Transposable Elements. Ann N Y Acad Sci 2009; 1178:276-84. [DOI: 10.1111/j.1749-6632.2009.05007.x] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
103
|
Wang J, Bowen NJ, Mariño-Ramírez L, Jordan IK. A c-Myc regulatory subnetwork from human transposable element sequences. MOLECULAR BIOSYSTEMS 2009; 5:1831-9. [PMID: 19763338 DOI: 10.1039/b908494k] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Transposable elements (TEs) can donate regulatory sequences that help to control the expression of human genes. The oncogene c-Myc is a promiscuous transcription factor that is thought to regulate the expression of hundreds of genes. We evaluated the contribution of TEs to the c-Myc regulatory network by searching for c-Myc binding sites derived from TEs and by analyzing the expression and function of target genes with nearby TE-derived c-Myc binding sites. There are thousands of TE sequences in the human genome that are bound by c-Myc. A conservative analysis indicated that 816-4564 of these TEs contain canonical c-Myc binding site motifs. c-Myc binding sites are over-represented among sequences derived from the ancient TE families L2 and MIR, consistent with their preservation by purifying selection. Genes associated with TE-derived c-Myc binding sites are co-expressed with each other and with c-Myc. A number of these putative TE-derived c-Myc target genes are differentially expressed between Burkitt's lymphoma cells versus normal B cells and encode proteins with cancer-related functions. Despite several lines of evidence pointing to their regulation by c-Myc and relevance to cancer, the set of genes identified as TE-derived c-Myc targets does not significantly overlap with two previously characterized c-Myc target gene sets. These data point to a substantial contribution of TEs to the regulation of human genes by c-Myc. Genes that are regulated by TE-derived c-Myc binding sites appear to form a distinct c-Myc regulatory subnetwork.
Collapse
|
104
|
Katz LS, Bolen CR, Harcourt BH, Schmink S, Wang X, Kislyuk A, Taylor RT, Mayer LW, Jordan IK. Meningococcus genome informatics platform: a system for analyzing multilocus sequence typing data. Nucleic Acids Res 2009; 37:W606-11. [PMID: 19468047 PMCID: PMC2703879 DOI: 10.1093/nar/gkp288] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Meningococcus Genome Informatics Platform (MGIP) is a suite of computational tools for the analysis of multilocus sequence typing (MLST) data, at http://mgip.biology.gatech.edu. MLST is used to generate allelic profiles to characterize strains of Neisseria meningitidis, a major cause of bacterial meningitis worldwide. Neisseria meningitidis strains are characterized with MLST as specific sequence types (ST) and clonal complexes (CC) based on the DNA sequences at defined loci. These data are vital to molecular epidemiology studies of N. meningitidis, including outbreak investigations and population biology. MGIP analyzes DNA sequence trace files, returns individual allele calls and characterizes the STs and CCs. MGIP represents a substantial advance over existing software in several respects: (i) ease of use-MGIP is user friendly, intuitive and thoroughly documented; (ii) flexibility--because MGIP is a website, it is compatible with any computer with an internet connection, can be used from any geographic location, and there is no installation; (iii) speed--MGIP takes just over one minute to process a set of 96 trace files; and (iv) expandability--MGIP has the potential to expand to more loci than those used in MLST and even to other bacterial species.
Collapse
|
105
|
Huda A, Mariño-Ramírez L, Landsman D, Jordan IK. Repetitive DNA elements, nucleosome binding and human gene expression. Gene 2009; 436:12-22. [PMID: 19393174 DOI: 10.1016/j.gene.2009.01.013] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2009] [Accepted: 01/23/2009] [Indexed: 11/26/2022]
Abstract
We evaluated the epigenetic contributions of repetitive DNA elements to human gene regulation. Human proximal promoter sequences show distinct distributions of transposable elements (TEs) and simple sequence repeats (SSRs). TEs are enriched distal from transcriptional start sites (TSSs) and their frequency decreases closer to TSSs, being largely absent from the core promoter region. SSRs, on the other hand, are found at low frequency distal to the TSS and then increase in frequency starting approximately 150 bp upstream of the TSS. The peak of SSR density is centered around the -35 bp position where the basal transcriptional machinery assembles. These trends in repetitive sequence distribution are strongly correlated, positively for TEs and negatively for SSRs, with relative nucleosome binding affinities along the promoters. Nucleosomes bind with highest probability distal from the TSS and the nucleosome binding affinity steadily decreases reaching its nadir just upstream of the TSS at the same point where SSR frequency is at its highest. Promoters that are enriched for TEs are more highly and broadly expressed, on average, than promoters that are devoid of TEs. In addition, promoters that have similar repetitive DNA profiles regulate genes that have more similar expression patterns and encode proteins with more similar functions than promoters that differ with respect to their repetitive DNA. Furthermore, distinct repetitive DNA promoter profiles are correlated with tissue-specific patterns of expression. These observations indicate that repetitive DNA elements mediate chromatin accessibility in proximal promoter regions and the repeat content of promoters is relevant to both gene expression and function.
Collapse
|
106
|
Jordan IK, Katz LS, Denver DR, Streelman JT. Natural selection governs local, but not global, evolutionary gene coexpression networks in Caenorhabditis elegans. BMC SYSTEMS BIOLOGY 2008; 2:96. [PMID: 19014554 PMCID: PMC2596099 DOI: 10.1186/1752-0509-2-96] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2008] [Accepted: 11/13/2008] [Indexed: 11/13/2022]
Abstract
Background Large-scale evaluation of gene expression variation among Caenorhabditis elegans lines that have diverged from a common ancestor allows for the analysis of a novel class of biological networks – evolutionary gene coexpression networks. Comparative analysis of these evolutionary networks has the potential to uncover the effects of natural selection in shaping coexpression network topologies since C. elegans mutation accumulation (MA) lines evolve essentially free from the effects of natural selection, whereas natural isolate (NI) populations are subject to selective constraints. Results We compared evolutionary gene coexpression networks for C. elegans MA lines versus NI populations to evaluate the role that natural selection plays in shaping the evolution of network topologies. MA and NI evolutionary gene coexpression networks were found to have very similar global topological properties as measured by a number of network topological parameters. Observed MA and NI networks show node degree distributions and average values for node degree, clustering coefficient, path length, eccentricity and betweeness that are statistically indistinguishable from one another yet highly distinct from randomly simulated networks. On the other hand, at the local level the MA and NI coexpression networks are highly divergent; pairs of genes coexpressed in the MA versus NI lines are almost entirely different as are the connectivity and clustering properties of individual genes. Conclusion It appears that selective forces shape how local patterns of coexpression change over time but do not control the global topology of C. elegans evolutionary gene coexpression networks. These results have implications for the evolutionary significance of global network topologies, which are known to be conserved across disparate complex systems.
Collapse
|
107
|
Conley AB, Piriyapongsa J, Jordan IK. Retroviral promoters in the human genome. ACTA ACUST UNITED AC 2008; 24:1563-7. [PMID: 18535086 DOI: 10.1093/bioinformatics/btn243] [Citation(s) in RCA: 92] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
MOTIVATION Endogenous retrovirus (ERV) elements have been shown to contribute promoter sequences that can initiate transcription of adjacent human genes. However, the extent to which retroviral sequences initiate transcription within the human genome is currently unknown. We analyzed genome sequence and high-throughput expression data to systematically evaluate the presence of retroviral promoters in the human genome. RESULTS We report the existence of 51,197 ERV-derived promoter sequences that initiate transcription within the human genome, including 1743 cases where transcription is initiated from ERV sequences that are located in gene proximal promoter or 5' untranslated regions (UTRs). A total of 114 of the ERV-derived transcription start sites can be demonstrated to drive transcription of 97 human genes, producing chimeric transcripts that are initiated within ERV long terminal repeat (LTR) sequences and read-through into known gene sequences. ERV promoters drive tissue-specific and lineage-specific patterns of gene expression and contribute to expression divergence between paralogs. These data illustrate the potential of retroviral sequences to regulate human transcription on a large scale consistent with a substantial effect of ERVs on the function and evolution of the human genome.
Collapse
|
108
|
Antezana MA, Jordan IK. Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes. PLoS One 2008; 3:e2145. [PMID: 18478116 PMCID: PMC2366069 DOI: 10.1371/journal.pone.0002145] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2007] [Accepted: 03/17/2008] [Indexed: 01/01/2023] Open
Abstract
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.
Collapse
|
109
|
Piriyapongsa J, Jordan IK. Dual coding of siRNAs and miRNAs by plant transposable elements. RNA (NEW YORK, N.Y.) 2008; 14:814-21. [PMID: 18367716 PMCID: PMC2327354 DOI: 10.1261/rna.916708] [Citation(s) in RCA: 154] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2007] [Accepted: 02/15/2008] [Indexed: 05/18/2023]
Abstract
We recently proposed a specific model whereby miRNAs encoded from short nonautonomous DNA-type TEs known as MITEs evolved from corresponding ancestral full-length (autonomous) elements that originally encoded short interfering (siRNAs). Our miRNA-origins model predicts that evolutionary intermediates may exist as TEs that encode both siRNAs and miRNAs, and we analyzed Arabidopsis thaliana and Oryza sativa (rice) genomic sequence and expression data to test this prediction. We found a number of examples of individual plant TE insertions that encode both siRNAs and miRNAs. We show evidence that these dual coding TEs can be expressed as readthrough transcripts from the intronic regions of spliced RNA messages. These TE transcripts can fold to form the hairpin (stem-loop) structures characteristic of miRNA genes along with longer double-stranded RNA regions that typically are processed as siRNAs. Taken together with a recent study showing Drosha independent processing of miRNAs from Drosophila introns, our results indicate that ancestral miRNAs could have evolved from TEs prior to the full elaboration of the miRNA biogenesis pathway. Later, as the specific miRNA biogenesis pathway evolved, and numerous other expressed inverted repeat regions came to be recognized by the miRNA processing endonucleases, the host gene-related regulatory functions of miRNAs emerged. In this way, host genomes were afforded an additional level of regulatory complexity as a by-product of TE defense mechanisms. The siRNA-to-miRNA evolutionary transition is representative of a number of other regulatory mechanisms that evolved to silence TEs and were later co-opted to serve as regulators of host gene expression.
Collapse
MESH Headings
- Arabidopsis/genetics
- Base Sequence
- Computational Biology
- DNA Transposable Elements/genetics
- DNA, Plant/chemistry
- DNA, Plant/genetics
- Evolution, Molecular
- Genes, Plant
- MicroRNAs/chemistry
- MicroRNAs/genetics
- Models, Genetic
- Models, Molecular
- Molecular Sequence Data
- Nucleic Acid Conformation
- Oryza/genetics
- Plants/genetics
- RNA, Plant/chemistry
- RNA, Plant/genetics
- RNA, Small Interfering/chemistry
- RNA, Small Interfering/genetics
Collapse
|
110
|
Piriyapongsa J, Jordan IK. Dual coding of siRNAs and miRNAs by plant transposable elements. RNA (NEW YORK, N.Y.) 2008. [PMID: 18367716 DOI: 10.1261/rna.916708.ferred] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
We recently proposed a specific model whereby miRNAs encoded from short nonautonomous DNA-type TEs known as MITEs evolved from corresponding ancestral full-length (autonomous) elements that originally encoded short interfering (siRNAs). Our miRNA-origins model predicts that evolutionary intermediates may exist as TEs that encode both siRNAs and miRNAs, and we analyzed Arabidopsis thaliana and Oryza sativa (rice) genomic sequence and expression data to test this prediction. We found a number of examples of individual plant TE insertions that encode both siRNAs and miRNAs. We show evidence that these dual coding TEs can be expressed as readthrough transcripts from the intronic regions of spliced RNA messages. These TE transcripts can fold to form the hairpin (stem-loop) structures characteristic of miRNA genes along with longer double-stranded RNA regions that typically are processed as siRNAs. Taken together with a recent study showing Drosha independent processing of miRNAs from Drosophila introns, our results indicate that ancestral miRNAs could have evolved from TEs prior to the full elaboration of the miRNA biogenesis pathway. Later, as the specific miRNA biogenesis pathway evolved, and numerous other expressed inverted repeat regions came to be recognized by the miRNA processing endonucleases, the host gene-related regulatory functions of miRNAs emerged. In this way, host genomes were afforded an additional level of regulatory complexity as a by-product of TE defense mechanisms. The siRNA-to-miRNA evolutionary transition is representative of a number of other regulatory mechanisms that evolved to silence TEs and were later co-opted to serve as regulators of host gene expression.
Collapse
MESH Headings
- Arabidopsis/genetics
- Base Sequence
- Computational Biology
- DNA Transposable Elements/genetics
- DNA, Plant/chemistry
- DNA, Plant/genetics
- Evolution, Molecular
- Genes, Plant
- MicroRNAs/chemistry
- MicroRNAs/genetics
- Models, Genetic
- Models, Molecular
- Molecular Sequence Data
- Nucleic Acid Conformation
- Oryza/genetics
- Plants/genetics
- RNA, Plant/chemistry
- RNA, Plant/genetics
- RNA, Small Interfering/chemistry
- RNA, Small Interfering/genetics
Collapse
|
111
|
Huda A, Polavarapu N, Jordan IK, McDonald JF. Endogenous retroviruses of the chicken genome. Biol Direct 2008; 3:9. [PMID: 18361801 PMCID: PMC2329609 DOI: 10.1186/1745-6150-3-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2008] [Accepted: 03/24/2008] [Indexed: 11/17/2022] Open
Abstract
We analyzed the chicken (Gallus gallus) genome sequence to search for previously uncharacterized endogenous retrovirus (ERV) sequences using ab initio and combined evidence approaches. We discovered 11 novel families of ERVs that occupy more than 21 million base pairs, approximately 2%, of the chicken genome. These novel families include a number of recently active full-length elements possessing identical long terminal repeats (LTRs) as well as intact gag and pol open reading frames. The abundance and diversity of chicken ERVs we discovered underscore the utility of an approach that combines multiple methods for the identification of interspersed repeats in vertebrate genomes. This article was reviewed by Igor Zhulin and Itai Yanai.
Collapse
|
112
|
Piriyapongsa J, Rutledge MT, Patel S, Borodovsky M, Jordan IK. Evaluating the protein coding potential of exonized transposable element sequences. Biol Direct 2007; 2:31. [PMID: 18036258 PMCID: PMC2203978 DOI: 10.1186/1745-6150-2-31] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2007] [Accepted: 11/26/2007] [Indexed: 11/10/2022] Open
Abstract
Background Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. Reviewers: This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.).
Collapse
|
113
|
Mariño-Ramírez L, Jordan IK, Landsman D. Multiple independent evolutionary solutions to core histone gene regulation. Genome Biol 2007; 7:R122. [PMID: 17184543 PMCID: PMC1794435 DOI: 10.1186/gb-2006-7-12-r122] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2006] [Revised: 10/20/2006] [Accepted: 12/21/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Core histone genes are periodically expressed along the cell cycle and peak during S phase. Core histone gene expression is deeply evolutionarily conserved from the yeast Saccharomyces cerevisiae to human. RESULTS We evaluated the evolutionary dynamics of the specific regulatory mechanisms that give rise to the conserved histone regulatory phenotype. In contrast to the conservation of core histone gene expression patterns, the core histone regulatory machinery is highly divergent between species. There has been substantial evolutionary turnover of cis-regulatory sequence motifs along with the transcription factors that bind them. The regulatory mechanisms employed by members of the four core histone families are more similar within species than within gene families. The presence of species-specific histone regulatory mechanisms is opposite to what is seen at the protein sequence level. Core histone proteins are more similar within families, irrespective of their species of origin, than between families, which is consistent with the shared common ancestry of the members of individual histone families. Structure and sequence comparisons between histone families reveal that H2A and H2B form one related group whereas H3 and H4 form a distinct group, which is consistent with the nucleosome assembly dynamics. CONCLUSION The dissonance between the evolutionary conservation of the core histone gene regulatory phenotypes and the divergence of their regulatory mechanisms indicates a highly dynamic mode of regulatory evolution. This distinct mode of regulatory evolution is probably facilitated by a solution space for promoter sequences, in terms of functionally viable cis-regulatory sites, that is substantially greater than that of protein sequences.
Collapse
|
114
|
Piriyapongsa J, Mariño-Ramírez L, Jordan IK. Origin and evolution of human microRNAs from transposable elements. Genetics 2007; 176:1323-37. [PMID: 17435244 PMCID: PMC1894593 DOI: 10.1534/genetics.107.072553] [Citation(s) in RCA: 254] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
We sought to evaluate the extent of the contribution of transposable elements (TEs) to human microRNA (miRNA) genes along with the evolutionary dynamics of TE-derived human miRNAs. We found 55 experimentally characterized human miRNA genes that are derived from TEs, and these TE-derived miRNAs have the potential to regulate thousands of human genes. Sequence comparisons revealed that TE-derived human miRNAs are less conserved, on average, than non-TE-derived miRNAs. However, there are 18 TE-derived miRNAs that are relatively conserved, and 14 of these are related to the ancient L2 and MIR families. Comparison of miRNA vs. mRNA expression patterns for TE-derived miRNAs and their putative target genes showed numerous cases of anti-correlated expression that are consistent with regulation via mRNA degradation. In addition to the known human miRNAs that we show to be derived from TE sequences, we predict an additional 85 novel TE-derived miRNA genes. TE sequences are typically disregarded in genomic surveys for miRNA genes and target sites; this is a mistake. Our results indicate that TEs provide a natural mechanism for the origination miRNAs that can contribute to regulatory divergence between species as well as a rich source for the discovery of as yet unknown miRNA genes.
Collapse
|
115
|
Mariño-Ramírez L, Bodenreider O, Kantz N, Jordan IK. Co-evolutionary rates of functionally related yeast genes. Evol Bioinform Online 2007; 2:271-6. [PMID: 18345352 PMCID: PMC2674680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
Abstract
Evolutionary knowledge is often used to facilitate computational attempts at gene function prediction. One rich source of evolutionary information is the relative rates of gene sequence divergence, and in this report we explore the connection between gene evolutionary rates and function. We performed a genome-scale evaluation of the relationship between evolutionary rates and functional annotations for the yeast Saccharomyces cerevisiae. Non-synonymous (dN) and synonymous (dS) substitution rates were calculated for 1,095 orthologous gene sets common to S. cerevisiae and six other closely related yeast species. Differences in evolutionary rates between pairs of genes (DeltadN & DeltadS) were then compared to their functional similarities (sGO), which were measured using Gene Ontology (GO) annotations. Substantial and statistically significant correlations were found between DeltadN and sGO, whereas there is no apparent relationship between DeltadS and sGO. These results are consistent with a mode of action for natural selection that is based on similar rates of elimination of deleterious protein coding sequence variants for functionally related genes. The connection between gene evolutionary rates and function was stronger than seen for phylogenetic profiles, which have previously been employed to inform functional inference. The co-evolution of functionally related yeast genes points to the relevance of specific function for the efficacy of natural selection and underscores the utility of gene evolutionary rates for functional predictions.
Collapse
|
116
|
Piriyapongsa J, Jordan IK. A family of human microRNA genes from miniature inverted-repeat transposable elements. PLoS One 2007; 2:e203. [PMID: 17301878 PMCID: PMC1784062 DOI: 10.1371/journal.pone.0000203] [Citation(s) in RCA: 247] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2006] [Accepted: 01/21/2007] [Indexed: 12/26/2022] Open
Abstract
While hundreds of novel microRNA (miRNA) genes have been discovered in the last few years alone, the origin and evolution of these non-coding regulatory sequences remain largely obscure. In this report, we demonstrate that members of a recently discovered family of human miRNA genes, hsa-mir-548, are derived from Made1 transposable elements. Made1 elements are short miniature inverted-repeat transposable elements (MITEs), which consist of two 37 base pair (bp) terminal inverted repeats that flank 6 bp of internal sequence. Thus, Made1 elements are nearly perfect palindromes, and when expressed as RNA they form highly stable hairpin loops. Apparently, these Made1-related structures are recognized by the RNA interference enzymatic machinery and processed to form 22 bp mature miRNA sequences. Consistent with their origin from MITEs, hsa-mir-548 genes are primate-specific and have many potential paralogs in the human genome. There are more than 3,500 putative hsa-mir-548 target genes; analysis of their expression profiles and functional affinities suggests cancer-related regulatory roles for hsa-mir-548. Taken together, the characteristics of Made1 elements, and MITEs in general, point to a specific mechanism for the generation of numerous small regulatory RNAs and target sites throughout the genome. The evolutionary lineage-specific nature of MITEs could also provide for the generation of novel regulatory phenotypes related to species diversification. Finally, we propose that MITEs may represent an evolutionary link between siRNAs and miRNAs.
Collapse
|
117
|
Marino‐Ramirez L, Jordan IK, Landsman D. Multiple Evolutionary Solutions to Core Histone Gene Regulation. FASEB J 2007. [DOI: 10.1096/fasebj.21.6.a1033-a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
118
|
Bowen NJ, Jordan IK. Exaptation of protein coding sequences from transposable elements. GENOME DYNAMICS 2007; 3:147-162. [PMID: 18753790 DOI: 10.1159/000107609] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The activity of transposable elements (TEs) has had a profound impact on the evolution of eukaryotic genomes. Once thought to be purely selfish genomic entities, TEs are now recognized to occupy a continuum of relationships, ranging from parasitic to mutualistic, with their host genomes. One of the many ways that TEs contribute to the function and evolution of the genomes in which they reside is through the donation of host protein coding sequences (CDSs). In this chapter, we will describe several notable examples of eukaryotic host CDSs that are derived from TEs. Despite the existence of a number of such well-established cases, the overall extent and significance of this phenomenon remains a matter of controversy. Genome-scale computational analyses have yielded vastly different estimates for the fraction of host CDSs that are derived from TEs. We explain how these seemingly contradictory findings are the result of specific ascertainment biases introduced by the different methods used to detect TE-related sequences. In light of this problem, we propose a comprehensive and systematic framework for definitively characterizing the contribution of TEs to eukaryotic CDSs.
Collapse
|
119
|
Tsaparas P, Mariño-Ramírez L, Bodenreider O, Koonin EV, Jordan IK. Global similarity and local divergence in human and mouse gene co-expression networks. BMC Evol Biol 2006; 6:70. [PMID: 16968540 PMCID: PMC1601971 DOI: 10.1186/1471-2148-6-70] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2006] [Accepted: 09/12/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A genome-wide comparative analysis of human and mouse gene expression patterns was performed in order to evaluate the evolutionary divergence of mammalian gene expression. Tissue-specific expression profiles were analyzed for 9,105 human-mouse orthologous gene pairs across 28 tissues. Expression profiles were resolved into species-specific coexpression networks, and the topological properties of the networks were compared between species. RESULTS At the global level, the topological properties of the human and mouse gene coexpression networks are, essentially, identical. For instance, both networks have topologies with small-world and scale-free properties as well as closely similar average node degrees, clustering coefficients, and path lengths. However, the human and mouse coexpression networks are highly divergent at the local level: only a small fraction (<10%) of coexpressed gene pair relationships are conserved between the two species. A series of controls for experimental and biological variance show that most of this divergence does not result from experimental noise. We further show that, while the expression divergence between species is genuinely rapid, expression does not evolve free from selective (functional) constraint. Indeed, the coexpression networks analyzed here are demonstrably functionally coherent as indicated by the functional similarity of coexpressed gene pairs, and this pattern is most pronounced in the conserved human-mouse intersection network. Numerous dense network clusters show evidence of dedicated functions, such as spermatogenesis and immune response, that are clearly consistent with the coherence of the expression patterns of their constituent gene members. CONCLUSION The dissonance between global versus local network divergence suggests that the interspecies similarity of the global network properties is of limited biological significance, at best, and that the biologically relevant aspects of the architectures of gene coexpression are specific and particular, rather than universal. Nevertheless, there is substantial evolutionary conservation of the local network structure which is compatible with the notion that gene coexpression networks are subject to purifying selection.
Collapse
|
120
|
Mariño-Ramírez L, Jordan IK. Transposable element derived DNaseI-hypersensitive sites in the human genome. Biol Direct 2006; 1:20. [PMID: 16857058 PMCID: PMC1538576 DOI: 10.1186/1745-6150-1-20] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2006] [Accepted: 07/20/2006] [Indexed: 11/10/2022] Open
Abstract
Background Transposable elements (TEs) are abundant genomic sequences that have been found to contribute to genome evolution in unexpected ways. Here, we characterize the evolutionary and functional characteristics of TE-derived human genome regulatory sequences uncovered by the high throughput mapping of DNaseI-hypersensitive (HS) sites. Conclusion The results reported here support the notion that TEs provide a specific genome-wide mechanism for generating functionally relevant gene regulatory divergence between evolutionary lineages. Reviewers This article was reviewed by Wolfgang J. Miller (nominated by Jerzy Jurka), Itai Yanai and Mikhail S.Gelfand.
Collapse
|
121
|
|
122
|
Jordan IK. Abstracts. Brief Bioinform 2006. [DOI: 10.1093/bib/bbk009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
123
|
Mariño-Ramírez L, Lewis KC, Landsman D, Jordan IK. Transposable elements donate lineage-specific regulatory sequences to host genomes. Cytogenet Genome Res 2005; 110:333-41. [PMID: 16093685 PMCID: PMC1803082 DOI: 10.1159/000084965] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2003] [Accepted: 01/13/2004] [Indexed: 12/11/2022] Open
Abstract
The evolutionary implications of transposable element (TE) influences on gene regulation are explored here. An historical perspective is presented to underscore the importance of TE influences on gene regulation with respect to both the discovery of TEs and the early conceptualization of their potential impact on host genome evolution. Evidence that points to a role for TEs in host gene regulation is reviewed, and comparisons between genome sequences are used to demonstrate the fact that TEs are particularly lineage-specific components of their host genomes. Consistent with these two properties of TEs, regulatory effects and evolutionary specificity, human-mouse genome wide sequence comparisons reveal that the regulatory sequences that are contributed by TEs are exceptionally lineage specific. This suggests a particular mechanism by which TEs may drive the diversification of gene regulation between evolutionary lineages.
Collapse
|
124
|
Rogozin IB, Basu MK, Jordan IK, Pavlov YI, Koonin EV. APOBEC4, a new member of the AID/APOBEC family of polynucleotide (deoxy)cytidine deaminases predicted by computational analysis. Cell Cycle 2005; 4:1281-5. [PMID: 16082223 DOI: 10.4161/cc.4.9.1994] [Citation(s) in RCA: 105] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Using iterative database searches, we identified a new subfamily of the AID/APOBEC family of RNA/DNA editing cytidine deaminases. The new subfamily, which is represented by readily identifiable orthologs in mammals, chicken, and frog, but not fishes, was designated APOBEC4. The zinc-coordinating motifs involved in catalysis and the secondary structure of the APOBEC4 deaminase domain are evolutionarily conserved, suggesting that APOBEC4 proteins are active polynucleotide (deoxy)cytidine deaminases. In reconstructed maximum likelihood phylogenetic trees, APOBEC4 forms a distinct clade with a high statistical support. APOBEC4 and APOBEC1 are joined in a moderately supported cluster clearly separated from AID, APOBEC2 and APOBEC3 subfamilies. In mammals, APOBEC4 is expressed primarily in testis which suggests the possibility that it is an editing enzyme for mRNAs involved in spermatogenesis.
Collapse
|
125
|
Jordan IK, Kondrashov FA, Adzhubei IA, Wolf YI, Koonin EV, Kondrashov AS, Sunyaev S. Erratum: A universal trend of amino acid gain and loss in protein evolution. Nature 2005. [DOI: 10.1038/nature03656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|