1
|
Loewenthal G, Wygoda E, Nagar N, Glick L, Mayrose I, Pupko T. The evolutionary dynamics that retain long neutral genomic sequences in face of indel deletion bias: a model and its application to human introns. Open Biol 2022; 12:220223. [PMID: 36514983 PMCID: PMC9748784 DOI: 10.1098/rsob.220223] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Insertions and deletions (indels) of short DNA segments are common evolutionary events. Numerous studies showed that deletions occur more often than insertions in both prokaryotes and eukaryotes. It raises the question why neutral sequences are not eradicated from the genome. We suggest that this is due to a phenomenon we term border-induced selection. Accordingly, a neutral sequence is bordered between conserved regions. Deletions occurring near the borders occasionally protrude to the conserved region and are thereby subject to strong purifying selection. Thus, for short neutral sequences, an insertion bias is expected. Here, we develop a set of increasingly complex models of indel dynamics that incorporate border-induced selection. Furthermore, we show that short conserved sequences within the neutrally evolving sequence help explain: (i) the presence of very long sequences; (ii) the high variance of sequence lengths; and (iii) the possible emergence of multimodality in sequence length distributions. Finally, we fitted our models to the human intron length distribution, as introns are thought to be mostly neutral and bordered by conserved exons. We show that when accounting for the occurrence of short conserved sequences within introns, we reproduce the main features, including the presence of long introns and the multimodality of intron distribution.
Collapse
Affiliation(s)
- Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv 69978, Israel
| | - Elya Wygoda
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv 69978, Israel
| | - Natan Nagar
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv 69978, Israel
| | - Lior Glick
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Itay Mayrose
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
2
|
Barton HJ, Zeng K. New Methods for Inferring the Distribution of Fitness Effects for INDELs and SNPs. Mol Biol Evol 2019; 35:1536-1546. [PMID: 29635416 PMCID: PMC5967470 DOI: 10.1093/molbev/msy054] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Small insertions and deletions (INDELs; ≤50 bp) are the most common type of variability after single nucleotide polymorphism (SNP). However, compared with SNPs, we know little about the distribution of fitness effects (DFE) of new INDEL mutations and how prevalent adaptive INDEL substitutions are. Studying INDELs has been difficult partly because identifying ancestral states at these sites is error-prone and misidentification can lead to severely biased estimates of the strength of selection. To solve these problems, we develop new maximum likelihood methods, which use polymorphism data to simultaneously estimate the DFE, the mutation rate, and the misidentification rate. These methods are applicable to both INDELs and SNPs. Simulations show that they can provide highly accurate results. We applied the methods to an INDEL polymorphism data set in Drosophila melanogaster. We found that the DFE for polymorphic INDELs in protein-coding regions is bimodal, with the variants being either nearly neutral or strongly deleterious. Based on the DFE, we estimated that 71.5–83.7% of the INDEL substitutions that took place along the D. melanogaster lineage were fixed by positive selection, which is comparable with the prevalence of adaptive substitutions at nonsynonymous sites. The new methods have been implemented in the software package anavar.
Collapse
Affiliation(s)
- Henry J Barton
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| | - Kai Zeng
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| |
Collapse
|
3
|
Tidball AM, Dang LT, Glenn TW, Kilbane EG, Klarr DJ, Margolis JL, Uhler MD, Parent JM. Rapid Generation of Human Genetic Loss-of-Function iPSC Lines by Simultaneous Reprogramming and Gene Editing. Stem Cell Reports 2017; 9:725-731. [PMID: 28781079 PMCID: PMC5599229 DOI: 10.1016/j.stemcr.2017.07.003] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Revised: 07/03/2017] [Accepted: 07/04/2017] [Indexed: 11/30/2022] Open
Abstract
Specifically ablating genes in human induced pluripotent stem cells (iPSCs) allows for studies of gene function as well as disease mechanisms in disorders caused by loss-of-function (LOF) mutations. While techniques exist for engineering such lines, we have developed and rigorously validated a method of simultaneous iPSC reprogramming while generating CRISPR/Cas9-dependent insertions/deletions (indels). This approach allows for the efficient and rapid formation of genetic LOF human disease cell models with isogenic controls. The rate of mutagenized lines was strikingly consistent across experiments targeting four different human epileptic encephalopathy genes and a metabolic enzyme-encoding gene, and was more efficient and consistent than using CRISPR gene editing of established iPSC lines. The ability of our streamlined method to reproducibly generate heterozygous and homozygous LOF iPSC lines with passage-matched isogenic controls in a single step provides for the rapid development of LOF disease models with ideal control lines, even in the absence of patient tissue.
Collapse
Affiliation(s)
- Andrew M Tidball
- Department of Neurology, University of Michigan Medical School, 5021 BSRB, 109 Zina Pitcher Place, Ann Arbor, MI 48109-2200, USA
| | - Louis T Dang
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Trevor W Glenn
- Department of Neurology, University of Michigan Medical School, 5021 BSRB, 109 Zina Pitcher Place, Ann Arbor, MI 48109-2200, USA
| | - Emma G Kilbane
- Department of Neurology, University of Michigan Medical School, 5021 BSRB, 109 Zina Pitcher Place, Ann Arbor, MI 48109-2200, USA
| | - Daniel J Klarr
- Department of Neurology, University of Michigan Medical School, 5021 BSRB, 109 Zina Pitcher Place, Ann Arbor, MI 48109-2200, USA
| | - Joshua L Margolis
- Department of Neurology, University of Michigan Medical School, 5021 BSRB, 109 Zina Pitcher Place, Ann Arbor, MI 48109-2200, USA
| | - Michael D Uhler
- Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Molecular and Behavioral Neuroscience Institute, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Jack M Parent
- Department of Neurology, University of Michigan Medical School, 5021 BSRB, 109 Zina Pitcher Place, Ann Arbor, MI 48109-2200, USA; VA Ann Arbor HealthCare System, Ann Arbor, MI 48105, USA.
| |
Collapse
|
4
|
Charlesworth B. Stabilizing selection, purifying selection, and mutational bias in finite populations. Genetics 2013; 194:955-71. [PMID: 23709636 PMCID: PMC3730922 DOI: 10.1534/genetics.113.151555] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2013] [Accepted: 05/18/2013] [Indexed: 12/16/2022] Open
Abstract
Genomic traits such as codon usage and the lengths of noncoding sequences may be subject to stabilizing selection rather than purifying selection. Mutations affecting these traits are often biased in one direction. To investigate the potential role of stabilizing selection on genomic traits, the effects of mutational bias on the equilibrium value of a trait under stabilizing selection in a finite population were investigated, using two different mutational models. Numerical results were generated using a matrix method for calculating the probability distribution of variant frequencies at sites affecting the trait, as well as by Monte Carlo simulations. Analytical approximations were also derived, which provided useful insights into the numerical results. A novel conclusion is that the scaled intensity of selection acting on individual variants is nearly independent of the effective population size over a wide range of parameter space and is strongly determined by the logarithm of the mutational bias parameter. This is true even when there is a very small departure of the mean from the optimum, as is usually the case. This implies that studies of the frequency spectra of DNA sequence variants may be unable to distinguish between stabilizing and purifying selection. A similar investigation of purifying selection against deleterious mutations was also carried out. Contrary to previous suggestions, the scaled intensity of purifying selection with synergistic fitness effects is sensitive to population size, which is inconsistent with the general lack of sensitivity of codon usage to effective population size.
Collapse
Affiliation(s)
- Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom.
| |
Collapse
|
5
|
Rao YS, Wang ZF, Chai XW, Wu GZ, Nie QH, Zhang XQ. Indel segregating within introns in the chicken genome are positively correlated with the recombination rates. Hereditas 2010. [DOI: 10.1111/j.1601-5223.2009.2141.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|
6
|
Parsch J, Novozhilov S, Saminadin-Peter SS, Wong KM, Andolfatto P. On the utility of short intron sequences as a reference for the detection of positive and negative selection in Drosophila. Mol Biol Evol 2010; 27:1226-34. [PMID: 20150340 DOI: 10.1093/molbev/msq046] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The detection of selection, both positive and negative, acting on a DNA sequence or class of nucleotide sites requires comparison with a reference sequence that is unaffected by selection. In Drosophila, recent findings of widespread selective constraint, as well as adaptive evolution, in both coding and noncoding regions highlight the difficulties in choosing such a reference sequence. Here, we investigate the utility of short intron sequences as a reference for the detection of selection. For a set of 119 Drosophila melanogaster genes containing 195 short introns (<or=120 bp), we analyzed polymorphism and divergence at 1) 4-fold synonymous sites, 2) all sites of introns <or=120 bp, 3) all sites of introns <or=65 bp, 4) bases 8-30 of introns <or=120 bp, and 5) bases 8-30 of introns <or=65 bp. The last class of sites shows the highest levels of both interspecific divergence and intraspecific polymorphism, suggesting that these sites are under the least selective constraint. Bases 8-30 of introns <or=65 bp also have the lowest ratio of divergence to polymorphism, which may indicate that a small proportion of substitutions in the other classes of sites are the result of adaptive evolution. Although there is little signal of selection on the primary sequence of short introns, patterns of insertion-deletion polymorphism and divergence suggest that both positive and negative selection act to maintain an optimal intron length.
Collapse
Affiliation(s)
- John Parsch
- Department of Biology II, University of Munich, Planegg-Martinsried, Germany.
| | | | | | | | | |
Collapse
|
7
|
Halligan DL, Keightley PD. Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison. Genome Res 2006; 16:875-84. [PMID: 16751341 PMCID: PMC1484454 DOI: 10.1101/gr.5022906] [Citation(s) in RCA: 181] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Non-coding DNA comprises approximately 80% of the euchromatic portion of the Drosophila melanogaster genome. Non-coding sequences are known to contain functionally important elements controlling gene expression, but the proportion of sites that are selectively constrained is still largely unknown. We have compared the complete D. melanogaster and Drosophila simulans genome sequences to estimate mean selective constraint (the fraction of mutations that are eliminated by selection) in coding and non-coding DNA by standardizing to substitution rates in putatively unconstrained sequences. We show that constraint is positively correlated with intronic and intergenic sequence length and is generally remarkably strong in non-coding DNA, implying that more than half of all point mutations in the Drosophila genome are deleterious. This fraction is also likely to be an underestimate if many substitutions in non-coding DNA are adaptively driven to fixation. We also show that substitutions in long introns and intergenic sequences are clustered, such that there is an excess of substitutions <8 bp apart and a deficit farther apart. These results suggest that there are blocks of constrained nucleotides, presumably involved in gene expression control, that are concentrated in long non-coding sequences. Furthermore, we infer that there is more than three times as much functional non-coding DNA as protein-coding DNA in the Drosophila genome. Most deleterious mutations therefore occur in non-coding DNA, and these may make an important contribution to a wide variety of evolutionary processes.
Collapse
Affiliation(s)
- Daniel L Halligan
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom.
| | | |
Collapse
|
8
|
Bouaynaya N, Schonfeld D. The genomic structure: proof of the role of non-coding DNA. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2006; 2006:4544-4547. [PMID: 17947097 DOI: 10.1109/iembs.2006.259446] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
We prove that the introns play the role of a decoy in absorbing mutations in the same way hollow uninhabited structures are used by the military to protect important installations. Our approach is based on a probability of error analysis, where errors are mutations which occur in the exon sequences. We derive the optimal exon length distribution, which minimizes the probability of error in the genome. Furthermore, to understand how can Nature generate the optimal distribution, we propose a diffusive random walk model for exon generation throughout evolution. This model results in an alpha stable exon length distribution, which is asymptotically equivalent to the optimal distribution. Experimental results show that both distributions accurately fit the real data. Given that introns also drive biological evolution by increasing the rate of unequal crossover between genes, we conclude that the role of introns is to maintain a genius balance between stability and adaptability in eukaryotic genomes.
Collapse
|
9
|
Sironi M, Menozzi G, Comi GP, Bresolin N, Cagliani R, Pozzoli U. Fixation of conserved sequences shapes human intron size and influences transposon-insertion dynamics. Trends Genet 2005; 21:484-8. [PMID: 16005101 DOI: 10.1016/j.tig.2005.06.009] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2004] [Revised: 04/11/2005] [Accepted: 06/16/2005] [Indexed: 11/25/2022]
Abstract
The basis for intron expansion in humans is largely unexplored. In this article, we demonstrate that intron expansion has primarily been determined by fixation of multispecies conserved sequences (MCSs) over time. The presence of MCSs has shaped intron features: the insertion of transposable elements (TEs) has been constrained as more MCSs were fixed. Analysis of TE and MCS distribution suggested an unprecedented estimate of information requirements for proper splicing of long introns with indication of sequence constraints extending up to >3 kb downstream 5' splice sites.
Collapse
Affiliation(s)
- Manuela Sironi
- Scientific Institute IRCCS E, Medea, Via Don Luigi Monza 20, 23842 Bosisio Parini (LC), Italy
| | | | | | | | | | | |
Collapse
|
10
|
Ometto L, Stephan W, De Lorenzo D. Insertion/deletion and nucleotide polymorphism data reveal constraints in Drosophila melanogaster introns and intergenic regions. Genetics 2005; 169:1521-7. [PMID: 15654088 PMCID: PMC1449560 DOI: 10.1534/genetics.104.037689] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Our study of nucleotide sequence and insertion/deletion polymorphism in Drosophila melanogaster noncoding DNA provides evidence for selective pressures in both intergenic regions and introns (of the large size class). Intronic and intergenic sequences show a similar polymorphic deletion bias. Insertions have smaller sizes and higher frequencies than deletions, supporting the hypothesis that insertions are selected to compensate for the loss of DNA caused by deletion bias. Analysis of a simple model of selective constraints suggests that the blocks of functional elements located in intergenic sequences are on average larger than those in introns, while the length distribution of relatively unconstrained sequences interspaced between these blocks is similar in intronic and intergenic regions.
Collapse
Affiliation(s)
- Lino Ometto
- Section of Evolutionary Biology, Biocenter, University of Munich, Planegg-Martinsried, Germany
| | | | | |
Collapse
|
11
|
Nelson CE, Hersh BM, Carroll SB. The regulatory content of intergenic DNA shapes genome architecture. Genome Biol 2004; 5:R25. [PMID: 15059258 PMCID: PMC395784 DOI: 10.1186/gb-2004-5-4-r25] [Citation(s) in RCA: 97] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2003] [Revised: 01/09/2004] [Accepted: 02/08/2004] [Indexed: 11/21/2022] Open
Abstract
The relationship between regulatory complexity and gene spacing was examined in Caenorhabditis elegans and Drosophila melanogaster. Intergenic distance, and hence genome architecture, is shaped by regulatory information contained in noncoding DNA. Background Factors affecting the organization and spacing of functionally unrelated genes in metazoan genomes are not well understood. Because of the vast size of a typical metazoan genome compared to known regulatory and protein-coding regions, functional DNA is generally considered to have a negligible impact on gene spacing and genome organization. In particular, it has been impossible to estimate the global impact, if any, of regulatory elements on genome architecture. Results To investigate this, we examined the relationship between regulatory complexity and gene spacing in Caenorhabditis elegans and Drosophila melanogaster. We found that gene density directly reflects local regulatory complexity, such that the amount of noncoding DNA between a gene and its nearest neighbors correlates positively with that gene's regulatory complexity. Genes with complex functions are flanked by significantly more noncoding DNA than genes with simple or housekeeping functions. Genes of low regulatory complexity are associated with approximately the same amount of noncoding DNA in D. melanogaster and C. elegans, while loci of high regulatory complexity are significantly larger in the more complex animal. Complex genes in C. elegans have larger 5' than 3' noncoding intervals, whereas those in D. melanogaster have roughly equivalent 5' and 3' noncoding intervals. Conclusions Intergenic distance, and hence genome architecture, is highly nonrandom. Rather, it is shaped by regulatory information contained in noncoding DNA. Our findings suggest that in compact genomes, the species-specific loss of nonfunctional DNA reveals a landscape of regulatory information by leaving a profile of functional DNA in its wake.
Collapse
Affiliation(s)
- Craig E Nelson
- Howard Hughes Medical Institute, University of Wisconsin-Madison, 1525 Linden Drive, Madison, WI 53703, USA.
| | | | | |
Collapse
|
12
|
Abstract
Numerous theories have been proposed to account for the pronounced differences in the quantity of non-coding DNA among eukaryotic genomes, but the current repertoire remains incomplete because the only explicit mechanisms it provides involve DNA gain. It has been proposed more recently that biases in spontaneous insertions and deletions (indels) can lead to genome shrinkage by mutational mechanisms alone. The present article provides the first detailed critical discussion of this approach, and covers three different ideas related to it: (1) the general notion of DNA loss by deletion bias, (2) the "DNA loss hypothesis" which supposes that variation in genome size can be attributed to differences in DNA loss rate, and (3) the "mutational equilibrium model" which attempts to describe the long-term evolution of genome size. The mutational equilibrium model is found to be problematic, and it is noted that DNA loss by small indels is too slow in real time to determine variation in genome size above a relatively low threshold. Some alternative explanations for the observed patterns are provided, and the critique also identifies some potential problems with the current dataset. These include a failure to cite a more detailed (and somewhat contradictory) mammalian dataset, a questionable use of arithmetic means with highly skewed data, and important discrepancies among the particular DNA sequences so far analyzed. Overall, evolutionary reductions in genome size are considered important, but the specific mechanism relating to small deletion bias is far too weak to be accepted as a primary determinant of genome size variation in general.
Collapse
Affiliation(s)
- T Ryan Gregory
- Division of Invertebrate Zoology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA.
| |
Collapse
|
13
|
Pollard DA, Bergman CM, Stoye J, Celniker SE, Eisen MB. Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics 2004; 5:6. [PMID: 14736341 PMCID: PMC344529 DOI: 10.1186/1471-2105-5-6] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2003] [Accepted: 01/21/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Numerous tools have been developed to align genomic sequences. However, their relative performance in specific applications remains poorly characterized. Alignments of protein-coding sequences typically have been benchmarked against "correct" alignments inferred from structural data. For noncoding sequences, where such independent validation is lacking, simulation provides an effective means to generate "correct" alignments with which to benchmark alignment tools. RESULTS Using rates of noncoding sequence evolution estimated from the genus Drosophila, we simulated alignments over a range of divergence times under varying models incorporating point substitution, insertion/deletion events, and short blocks of constrained sequences such as those found in cis-regulatory regions. We then compared "correct" alignments generated by a modified version of the ROSE simulation platform to alignments of the simulated derived sequences produced by eight pairwise alignment tools (Avid, BlastZ, Chaos, ClustalW, DiAlign, Lagan, Needle, and WABA) to determine the off-the-shelf performance of each tool. As expected, the ability to align noncoding sequences accurately decreases with increasing divergence for all tools, and declines faster in the presence of insertion/deletion evolution. Global alignment tools (Avid, ClustalW, Lagan, and Needle) typically have higher sensitivity over entire noncoding sequences as well as in constrained sequences. Local tools (BlastZ, Chaos, and WABA) have lower overall sensitivity as a consequence of incomplete coverage, but have high specificity to detect constrained sequences as well as high sensitivity within the subset of sequences they align. Tools such as DiAlign, which generate both local and global outputs, produce alignments of constrained sequences with both high sensitivity and specificity for divergence distances in the range of 1.25-3.0 substitutions per site. CONCLUSION For species with genomic properties similar to Drosophila, we conclude that a single pair of optimally diverged species analyzed with a high performance alignment tool can yield accurate and specific alignments of functionally constrained noncoding sequences. Further algorithm development, optimization of alignment parameters, and benchmarking studies will be necessary to extract the maximal biological information from alignments of functional noncoding DNA.
Collapse
Affiliation(s)
- Daniel A Pollard
- Biophysics Graduate Group, University of California, Berkeley, CA 94720, USA
| | - Casey M Bergman
- Department of Genome Science, Life Science Division, Lawrence Orlando Berkeley National Laboratory, Berkeley, CA 94720, USA
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Department of Genetics, University of Cambridge, Cambridge, UK CB2 3EH
| | - Jens Stoye
- Technische Fakultät, Universität Bielefeld, 33594 Bielefeld, Germany
| | - Susan E Celniker
- Department of Genome Science, Life Science Division, Lawrence Orlando Berkeley National Laboratory, Berkeley, CA 94720, USA
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Michael B Eisen
- Department of Genome Science, Life Science Division, Lawrence Orlando Berkeley National Laboratory, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
14
|
Abstract
AbstractIntron sizes show an asymmetrical distribution in a number of organisms, with a large number of “short” introns clustered around a minimal intron length and a much broader distribution of longer introns. In Drosophila melanogaster, the short intron class is centered around 61 bp. The narrow length distribution suggests that natural selection may play a role in maintaining intron size. A comparison of 15 orthologous introns among species of the D. melanogaster subgroup indicates that, in general, short introns are not under greater DNA sequence or length constraints than long introns. There is a bias toward deletions in all introns (deletion/insertion ratio is 1.66), and the vast majority of indels are of short length (<10 bp). Indels occurring on the internal branches of the phylogenetic tree are significantly longer than those occurring on the terminal branches. These results are consistent with a compensatory model of intron length evolution in which slightly deleterious short deletions are frequently fixed within species by genetic drift, and relatively rare larger insertions that restore intron length are fixed by positive selection. A comparison of paralogous introns shared among duplicated genes suggests that length constraints differ between introns within the same gene. The janusA, janusB, and ocnus genes share two short introns derived from a common ancestor. The first of these introns shows significantly fewer indels than the second intron, although the two introns show a comparable number of substitutions. This indicates that intron-specific selective constraints have been maintained following gene duplication, which preceded the divergence of the D. melanogaster species subgroup.
Collapse
Affiliation(s)
- John Parsch
- Department of Biology II, Section of Evolutionary Biology, University of Munich (LMU), Munich 80333, Germany.
| |
Collapse
|