Bush EC, Lahn BT. The evolution of word composition in metazoan promoter sequence.
PLoS Comput Biol 2006;
2:e150. [PMID:
17083273 PMCID:
PMC1630712 DOI:
10.1371/journal.pcbi.0020150]
[Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2006] [Accepted: 09/27/2006] [Indexed: 12/02/2022] Open
Abstract
The field of molecular evolution provides many examples of the principle that molecular differences between species contain information about evolutionary history. One surprising case can be found in the frequency of short words in DNA: more closely related species have more similar word compositions. Interest in this has often focused on its utility in deducing phylogenetic relationships. However, it is also of interest because of the opportunity it provides for studying the evolution of genome function. Word-frequency differences between species change too slowly to be purely the result of random mutational drift. Rather, their slow pattern of change reflects the direct or indirect action of purifying selection and the presence of functional constraints. Many such constraints are likely to exist, and an important challenge is to distinguish them. Here we develop a method to do so by isolating the effects acting at different word sizes. We apply our method to 2-, 4-, and 8-base-pair (bp) words across several classes of noncoding sequence. Our major result is that similarities in 8-bp word frequencies scale with evolutionary time for regions immediately upstream of genes. This association is present although weaker in intronic sequence, but cannot be detected in intergenic sequence using our method. In contrast, 2-bp and 4-bp word frequencies scale with time in all classes of noncoding sequence. These results suggest that different genomic processes are involved at different word sizes. The pattern in 2-bp and 4-bp words may be due to evolutionary changes in processes such as DNA replication and repair, as has been suggested before. The pattern in 8-bp words may reflect evolutionary changes in gene-regulatory machinery, such as changes in the frequencies of transcription-factor binding sites, or in the affinity of transcription factors for particular sequences.
One of the foundations of molecular evolution is the idea that more closely related species are more similar on the molecular level. One example that has been known for several years is the genomic composition of short words (i.e., short segments) of DNA. Given a sample of genome sequence, one can count the occurrences of all words of a certain length. It turns out that closely related species have more similar word frequencies. The pattern of how these frequencies change over evolutionary time is likely to be influenced by the many functions of the genome (coding for proteins, controlling gene expression, etc.). Bush and Lahn investigated the influence of genomic function on word-frequency variation in 13 animal genomes. Using a method designed to isolate the effects acting at particular word sizes, the authors examined how word frequencies vary in different categories of noncoding sequence. They found that interspecies patterns of word-frequency variation change depending on word size and sequence category. These results suggest that noncoding sequence is subject to different functional constraints depending on its location in the genome. An especially interesting possibility is that the patterns in longer words may reflect evolutionary changes in gene regulatory machinery.
Collapse