151
|
Kozlowski P, de Mezer M, Krzyzosiak WJ. Trinucleotide repeats in human genome and exome. Nucleic Acids Res 2010; 38:4027-39. [PMID: 20215431 PMCID: PMC2896521 DOI: 10.1093/nar/gkq127] [Citation(s) in RCA: 94] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Trinucleotide repeats (TNRs) are of interest in genetics because they are used as markers for tracing genotype–phenotype relations and because they are directly involved in numerous human genetic diseases. In this study, we searched the human genome reference sequence and annotated exons (exome) for the presence of uninterrupted triplet repeat tracts composed of six or more repeated units. A list of 32 448 TNRs and 878 TNR-containing genes was generated and is provided herein. We found that some triplet repeats, specifically CNG, are overrepresented, while CTT, ATC, AAC and AAT are underrepresented in exons. This observation suggests that the occurrence of TNRs in exons is not random, but undergoes positive or negative selective pressure. Additionally, TNR types strongly determine their localization in mRNA sections (ORF, UTRs). Most genes containing exon-overrepresented TNRs are associated with gene ontology-defined functions. Surprisingly, many groups of genes that contain TNR types coding for different homo-amino acid tracts associate with the same transcription-related GO categories. We propose that TNRs have potential to be functional genetic elements and that their variation may be involved in the regulation of many common phenotypes; as such, TNR polymorphisms should be considered a priority in association studies.
Collapse
Affiliation(s)
- Piotr Kozlowski
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland.
| | | | | |
Collapse
|
152
|
Falster DS, Nakken S, Bergem-Ohr M, Rødland EA, Breivik J. Unstable DNA repair genes shaped by their own sequence modifying phenotypes. J Mol Evol 2010; 70:266-74. [PMID: 20213140 PMCID: PMC2846273 DOI: 10.1007/s00239-010-9328-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2010] [Accepted: 02/10/2010] [Indexed: 11/27/2022]
Abstract
The question of whether natural selection favors genetic stability or genetic variability is a fundamental problem in evolutionary biology. Bioinformatic analyses demonstrate that selection favors genetic stability by avoiding unstable nucleotide sequences in protein encoding DNA. Yet, such unstable sequences are maintained in several DNA repair genes, thereby promoting breakdown of repair and destabilizing the genome. Several studies have therefore argued that selection favors genetic variability at the expense of stability. Here we propose a new evolutionary mechanism, with supporting bioinformatic evidence, that resolves this paradox. Combining the concepts of gene-dependent mutation biases and meiotic recombination, we argue that unstable sequences in the DNA mismatch repair (MMR) genes are maintained by their own phenotype. In particular, we predict that human MMR maintains an overrepresentation of mononucleotide repeats (monorepeats) within and around the MMR genes. In support of this hypothesis, we report a 31% excess in monorepeats in 250 kb regions surrounding the seven MMR genes compared to all other RefSeq genes (1.75 vs. 1.34%, P = 0.0047), with a particularly high content in PMS2 (2.41%, P = 0.0047) and MSH6 (2.07%, P = 0.043). Based on a mathematical model of monorepeat frequency, we argue that the proposed mechanism may suffice to explain the observed excess of repeats around MMR genes. Our findings thus indicate that unstable sequences in MMR genes are maintained through evolution by the MMR mechanism. The evolutionary paradox of genetically unstable DNA repair genes may thus be explained by an equilibrium in which the phenotype acts back on its own genotype.
Collapse
Affiliation(s)
- Daniel S. Falster
- Institute of Basic Medical Science, University of Oslo, P.O. Box 1018 Blindern, 0315 Oslo, Norway
- Present Address: Department of Biological Sciences, Macquarie University, Sydney, Australia
| | - Sigve Nakken
- Centre for Molecular Biology and Neuroscience, Institute of Medical Microbiology, Rikshospitalet University Hospital, 0027 Oslo, Norway
- Present Address: Bioinformatics Core Facility, Institute of Medical Informatics, Rikshospitalet, 0310 Oslo, Norway
| | - Marie Bergem-Ohr
- Institute of Basic Medical Science, University of Oslo, P.O. Box 1018 Blindern, 0315 Oslo, Norway
| | - Einar Andreas Rødland
- Department of Informatics and Center for Cancer Biomedicine, University of Oslo, 0316 Oslo, Norway
- Norwegian Computing Center, 0314 Oslo, Norway
| | - Jarle Breivik
- Institute of Basic Medical Science, University of Oslo, P.O. Box 1018 Blindern, 0315 Oslo, Norway
| |
Collapse
|
153
|
Amos W. Heterozygosity and mutation rate: evidence for an interaction and its implications: the potential for meiotic gene conversions to influence both mutation rate and distribution. Bioessays 2010; 32:82-90. [PMID: 19967709 DOI: 10.1002/bies.200900108] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
If natural selection chose where new mutations occur it might well favour placing them near existing polymorphisms, thereby avoiding disruption of areas that work while adding novelty to regions where variation is tolerated or even beneficial. Such a system could operate if heterozygous sites are recognised and 'repaired' during the initial stages of crossing over. Such repairs involve an extra round of DNA replication, providing an opportunity for further mutations, thereby raising the local mutation rate. If so, the changes in heterozygosity that occur when populations grow or shrink could feed back to modulate both the rate and the distribution of mutations. Here, I review evidence from isozymes, microsatellites and single nucleotide polymorphisms that this potential is realised in real populations. I then consider the likely implications, focusing particularly on how these processes might affect microsatellites, concluding that heterozygosity does impact on the rate and distribution of mutations.
Collapse
Affiliation(s)
- William Amos
- Department of Zoology, University of Cambridge, UK.
| |
Collapse
|
154
|
Mamedov IZ, Shagina IA, Kurnikova MA, Novozhilov SN, Shagin DA, Lebedev YB. A new set of markers for human identification based on 32 polymorphic Alu insertions. Eur J Hum Genet 2010; 18:808-14. [PMID: 20179741 DOI: 10.1038/ejhg.2010.22] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
A number of genetic systems for human genetic identification based on short tandem repeats or single nucleotide polymorphisms are widely used for crime detection, kinship studies and in analysis of victims of mass disasters. Here, we have developed a new set of 32 molecular genetic markers for human genetic identification based on polymorphic retroelement insertions. Allele frequencies were determined in a group of 90 unrelated individuals from four genetically distant populations of the Russian Federation. The mean match probability and probability of paternal exclusion, calculated based on population data, were 5.53 x 10(-14) and 99.784%, respectively. The developed system is cheap and easy to use as compared to all previously published methods. The application of fluorescence-based methods for allele discrimination allows to use the human genetic identification set in automatic and high-throughput formats.
Collapse
Affiliation(s)
- Ilgar Z Mamedov
- Laboratory of Comparative and Functional Genomics, Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry RAS, Miklukho-Maklaya, Moscow, Russia.
| | | | | | | | | | | |
Collapse
|
155
|
Buschiazzo E, Gemmell NJ. Conservation of human microsatellites across 450 million years of evolution. Genome Biol Evol 2010; 2:153-65. [PMID: 20333231 PMCID: PMC2839350 DOI: 10.1093/gbe/evq007] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/02/2010] [Indexed: 11/21/2022] Open
Abstract
The sequencing and comparison of vertebrate genomes have enabled the
identification of widely conserved genomic elements. Chief among these are genes
and cis-regulatory regions, which are often under selective
constraints that promote their retention in related organisms. The conservation
of elements that either lack function or whose functions are yet to be ascribed
has been relatively little investigated. In particular, microsatellites, a class
of highly polymorphic repetitive sequences considered by most to be neutrally
evolving junk DNA that is too labile to be maintained in distant species, have
not been comprehensively studied in a comparative genomic framework. Here, we
used the UCSC alignment of the human genome against those of 11 mammalian and
five nonmammalian vertebrates to identify and examine the extent of conservation
of human microsatellites in vertebrate genomes. Out of 696,016 microsatellites
found in human sequences, 85.39% were conserved in at least one other species,
whereas 28.65% and 5.98% were found in at least one and three nonprimate
species, respectively. An exponential decline of microsatellite conservation
with increasing evolutionary time, a comparable distribution of conserved versus
nonconserved microsatellites in the human genome, and a positive correlation
between microsatellite conservation and overall sequence conservation, all
suggest that most microsatellites are only maintained in genomes by chance,
although exceptionally conserved human microsatellites were also found in
distant mammals and other vertebrates. Our findings provide the first
comprehensive survey of microsatellite conservation across deep evolutionary
timescales, in this case 450 Myr of vertebrate evolution, and provide new tools
for the identification of functional conserved microsatellites, the development
of cross-species microsatellite markers and the study of microsatellite
evolution above the species level.
Collapse
Affiliation(s)
- Emmanuel Buschiazzo
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand.
| | | |
Collapse
|
156
|
Shah SN, Hile SE, Eckert KA. Defective mismatch repair, microsatellite mutation bias, and variability in clinical cancer phenotypes. Cancer Res 2010; 70:431-5. [PMID: 20068152 DOI: 10.1158/0008-5472.can-09-3049] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Microsatellite instability is associated with 10% to 15% of colorectal, endometrial, ovarian, and gastric cancers, and has long been used as a diagnostic tool for hereditary nonpolyposis colorectal carcinoma-related cancers. Tumor-specific length alterations within microsatellites are generally accepted to be a consequence of strand slippage events during DNA replication, which are uncorrected due to a defective postreplication mismatch repair (MMR) system. Mutations arising within microsatellites associated with critical target genes are believed to play a causative role in the evolution of MMR-defective tumors. In this review, we summarize current evidence of mutational biases within microsatellites arising as a consequence of intrinsic DNA sequence effects as well as variation in MMR efficiency. Microsatellite mutational biases are generally not considered during clinical testing; however, we suggest that such biases may be clinically significant as a factor contributing to phenotypic variation among microsatellite instability-positive tumors.
Collapse
Affiliation(s)
- Sandeep N Shah
- Department of Pathology, Gittlen Cancer Research Foundation, Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
| | | | | |
Collapse
|
157
|
Abstract
Single nucleotide polymorphisms (SNPs) are widely distributed in the human genome and although most SNPs are the result of independent point-mutations, there are exceptions. When studying distances between SNPs, a periodic pattern in the distance between pairs of identical SNPs has been found to be heavily correlated with periodicity in short tandem repeats (STRs). STRs are short DNA segments, widely distributed in the human genome and mainly found outside known tandem repeats. Because of the biased occurrence of SNPs, special care has to be taken when analyzing SNP-variation in STRs. We present a review of STRs in the human genome and discuss molecular mechanisms related to the biased occurrence of SNPs in STRs, and its implications for genome comparisons and genetic association studies.
Collapse
Affiliation(s)
- Bo Eskerod Madsen
- AgroTech, Institute for Agri Technology and Food Innovation, Aarhus N, Denmark
| | | | | |
Collapse
|
158
|
Heterogeneous distribution of SNPs in the human genome: microsatellites as predictors of nucleotide diversity and divergence. Genomics 2009; 95:151-9. [PMID: 20026267 DOI: 10.1016/j.ygeno.2009.12.003] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2009] [Revised: 12/10/2009] [Accepted: 12/11/2009] [Indexed: 01/22/2023]
Abstract
Understanding the forces that govern the distribution of single nucleotide polymorphisms is vital for many of their applications. Here we conducted a systematic search to quantify how both SNP density and human-chimpanzee divergence vary around different repetitive sequences. We uncovered a highly complicated picture in which these quantities often differ significantly from the genome-wide average in regions extending more than 20 kb, the direction of the deviation varying with repeat number and motif. AT microsatellites in particular are potent predictors of SNP density, long (AT)(n) repeat tracts tending to be found in regions of significantly reduced SNP density and low GC content. Although the causal relationships remain difficult to determine, our results indicate a strong relationship between microsatellites and the DNA that flanks them. Our results help to explain the mixed picture that emerges from other studies and have important implications for the way in which genetic diversity is distributed in our genomes.
Collapse
|
159
|
Pemberton TJ, Sandefur CI, Jakobsson M, Rosenberg NA. Sequence determinants of human microsatellite variability. BMC Genomics 2009; 10:612. [PMID: 20015383 PMCID: PMC2806349 DOI: 10.1186/1471-2164-10-612] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2009] [Accepted: 12/16/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Microsatellite loci are frequently used in genomic studies of DNA sequence repeats and in population studies of genetic variability. To investigate the effect of sequence properties of microsatellites on their level of variability we have analyzed genotypes at 627 microsatellite loci in 1,048 worldwide individuals from the HGDP-CEPH cell line panel together with the DNA sequences of these microsatellites in the human RefSeq database. RESULTS Calibrating PCR fragment lengths in individual genotypes by using the RefSeq sequence enabled us to infer repeat number in the HGDP-CEPH dataset and to calculate the mean number of repeats (as opposed to the mean PCR fragment length), under the assumption that differences in PCR fragment length reflect differences in the numbers of repeats in the embedded repeat sequences. We find the mean and maximum numbers of repeats across individuals to be positively correlated with heterozygosity. The size and composition of the repeat unit of a microsatellite are also important factors in predicting heterozygosity, with tetra-nucleotide repeat units high in G/C content leading to higher heterozygosity. Finally, we find that microsatellites containing more separate sets of repeated motifs generally have higher heterozygosity. CONCLUSIONS These results suggest that sequence properties of microsatellites have a significant impact in determining the features of human microsatellite variability.
Collapse
Affiliation(s)
- Trevor J Pemberton
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA.
| | | | | | | |
Collapse
|
160
|
Abstract
Sex chromosomes have evolved multiple times in many taxa. The recent explosion in the availability of whole genome sequences from a variety of organisms makes it possible to investigate sex chromosome evolution within and across genomes. Comparative genomic studies have shown that quite distant species may share fundamental properties of sex chromosome evolution, while very similar species can evolve unique sex chromosome systems. Furthermore, within-species genomic analyses can illuminate chromosome-wide sequence and expression polymorphisms. Here, we explore recent advances in the study of vertebrate sex chromosomes achieved using genomic analyses.
Collapse
Affiliation(s)
- Melissa A Wilson
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.
| | | |
Collapse
|
161
|
Cordaux R, Batzer MA. The impact of retrotransposons on human genome evolution. Nat Rev Genet 2009; 10:691-703. [PMID: 19763152 DOI: 10.1038/nrg2640] [Citation(s) in RCA: 1168] [Impact Index Per Article: 73.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Their ability to move within genomes gives transposable elements an intrinsic propensity to affect genome evolution. Non-long terminal repeat (LTR) retrotransposons--including LINE-1, Alu and SVA elements--have proliferated over the past 80 million years of primate evolution and now account for approximately one-third of the human genome. In this Review, we focus on this major class of elements and discuss the many ways that they affect the human genome: from generating insertion mutations and genomic instability to altering gene expression and contributing to genetic innovation. Increasingly detailed analyses of human and other primate genomes are revealing the scale and complexity of the past and current contributions of non-LTR retrotransposons to genomic change in the human lineage.
Collapse
Affiliation(s)
- Richard Cordaux
- CNRS UMR 6556 Ecologie, Evolution, Symbiose, Université de Poitiers, 40 Avenue du Recteur Pineau, Poitiers, France
| | | |
Collapse
|
162
|
Marriage TN, Hudman S, Mort ME, Orive ME, Shaw RG, Kelly JK. Direct estimation of the mutation rate at dinucleotide microsatellite loci in Arabidopsis thaliana (Brassicaceae). Heredity (Edinb) 2009; 103:310-7. [PMID: 19513093 PMCID: PMC2749907 DOI: 10.1038/hdy.2009.67] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
The mutation rate at 54 perfect (uninterrupted) dinucleotide microsatellite loci is estimated by direct genotyping of 96 Arabidopsis thaliana mutation accumulation lines. The estimated rate differs significantly among motif types with the highest rate for AT repeats (2.03 x 10(-3) per allele per generation), intermediate for CT (3.31 x 10(-4)), and lowest for CA (4.96 x 10(-5)). The average mutation rate per generation for this sample of loci is 8.87 x 10(-4) (s.e.=2.57 x 10(-4)). There is a strong effect of initial repeat number, particularly for AT repeats, with mutation rate increasing with the length of the microsatellite locus in the progenitor line. Controlling for motif and initial repeat number, chromosome 4 exhibited an elevated mutation rate relative to other chromosomes. The great majority of mutations were gains or losses of a single repeat. Generally, the data are consistent with the stepwise mutation model of microsatellite evolution. Several lines exhibited multiple step changes from the progenitor sequence, but it is unclear whether these are multi-step mutations or multiple single-step mutations. A survey of dinucleotide repeats across the entire Arabidopsis genome indicates that AT repeats are most abundant, followed by CT, and CA.
Collapse
Affiliation(s)
- Tara N. Marriage
- Department of Ecology and Evolutionary Biology University of Kansas, Lawrence, KS
| | - Stephen Hudman
- Department of Ecology and Evolutionary Biology University of Kansas, Lawrence, KS
- Department of Ecology, Evolution, and Behavior, University of Minnesota
| | - Mark E. Mort
- Department of Ecology and Evolutionary Biology University of Kansas, Lawrence, KS
| | - Maria E. Orive
- Department of Ecology and Evolutionary Biology University of Kansas, Lawrence, KS
| | - Ruth G. Shaw
- Department of Ecology, Evolution, and Behavior, University of Minnesota
| | - John K. Kelly
- Department of Ecology and Evolutionary Biology University of Kansas, Lawrence, KS
| |
Collapse
|
163
|
Galindo CL, McIver LJ, McCormick JF, Skinner MA, Xie Y, Gelhausen RA, Ng K, Kumar NM, Garner HR. Global microsatellite content distinguishes humans, primates, animals, and plants. Mol Biol Evol 2009; 26:2809-19. [PMID: 19717526 DOI: 10.1093/molbev/msp192] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Microsatellites are highly mutable, repetitive sequences commonly used as genetic markers, but they have never been studied en masse. Using a custom microarray to measure hybridization intensities of every possible repetitive nucleotide motif from 1-mers to 6-mers, we examined 25 genomes. Here, we show that global microsatellite content varies predictably by species, as measured by array hybridization signal intensities, correlating with established taxonomic relationships, and particular motifs are characteristic of one species versus another. For instance, hominid-specific microsatellite motifs were identified despite alignment of the human reference, Celera, and Venter genomic sequences indicating substantial variation (30-50%) among individuals. Differential microsatellite motifs were mainly associated with genes involved in developmental processes, whereas those found in intergenic regions exhibited no discernible pattern. This is the first description of a method for evaluating microsatellite content to classify individual genomes.
Collapse
Affiliation(s)
- C L Galindo
- McDermott Center for Human Growth and Development of the University of Texas Southwestern Medical Center, Dallas, Texas, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
164
|
Castoe TA, Poole AW, Gu W, Jason de Koning AP, Daza JM, Smith EN, Pollock DD. Rapid identification of thousands of copperhead snake (Agkistrodon contortrix) microsatellite loci from modest amounts of 454 shotgun genome sequence. Mol Ecol Resour 2009; 10:341-7. [PMID: 21565030 DOI: 10.1111/j.1755-0998.2009.02750.x] [Citation(s) in RCA: 167] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Optimal integration of next-generation sequencing into mainstream research requires re-evaluation of how problems can be reasonably overcome and what questions can be asked. One potential application is the rapid acquisition of genomic information to identify microsatellite loci for evolutionary, population genetic and chromosome linkage mapping research on non-model and not previously sequenced organisms. Here, we report on results using high-throughput sequencing to obtain a large number of microsatellite loci from the venomous snake Agkistrodon contortrix, the copperhead. We used the 454 Genome Sequencer FLX next-generation sequencing platform to sample randomly ∼27 Mbp (128 773 reads) of the copperhead genome, thus sampling about 2% of the genome of this species. We identified microsatellite loci in 11.3% of all reads obtained, with 14 612 microsatellite loci identified in total, 4564 of which had flanking sequences suitable for polymerase chain reaction primer design. The random sequencing-based approach to identify microsatellites was rapid, cost-effective and identified thousands of useful microsatellite loci in a previously unstudied species.
Collapse
Affiliation(s)
- Todd A Castoe
- Consortium for Comparative Genomics, Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA Department of Biology, University of Central Florida, 4000 Central Florida Blvd., Orlando, FL 32816, USA Department of Biology & Amphibian and Reptile Diversity Research Center, The University of Texas at Arlington, Arlington, TX 76019, USA
| | | | | | | | | | | | | |
Collapse
|
165
|
Understanding what determines the frequency and pattern of human germline mutations. Nat Rev Genet 2009; 10:478-88. [PMID: 19488047 DOI: 10.1038/nrg2529] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Surprising findings about human germline mutation have come from applying new technologies to detect rare mutations in germline DNA, from analysing DNA sequence divergence between humans and closely related species, and from investigating human polymorphic variation. In this Review we discuss how these approaches affect our current understanding of the roles of sex, age, mutation hot spots, germline selection and genomic factors in determining human nucleotide substitution mutation patterns and frequencies. To enhance our understanding of mutation and disease, more extensive molecular data on the human germ line with regard to mutation origin, DNA repair, epigenetic status and the effect of newly arisen mutations on gamete development are needed.
Collapse
|
166
|
Kvikstad EM, Chiaromonte F, Makova KD. Ride the wavelet: A multiscale analysis of genomic contexts flanking small insertions and deletions. Genome Res 2009; 19:1153-64. [PMID: 19502380 DOI: 10.1101/gr.088922.108] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Recent studies have revealed that insertions and deletions (indels) are more different in their formation than previously assumed. What remains enigmatic is how the local DNA sequence context contributes to these differences. To investigate the relative impact of various molecular mechanisms to indel formation, we analyzed sequence contexts of indels in the non protein- or RNA-coding, nonrepetitive (NCNR) portion of the human genome. We considered small (<or=30-bp) indels occurring in the human lineage since its divergence from chimpanzee and used wavelet techniques to study, simultaneously for multiple scales, the spatial patterns of short sequence motifs associated with indel mutagenesis. In particular, we focused on motifs associated with DNA polymerase activity, topoisomerase cleavage, double-strand breaks (DSBs), and their repair. We came to the following conclusions. First, many motifs are characterized by unique enrichment profiles in the vicinity of indels vs. indel-free portions of the genome, verifying the importance of sequence context in indel mutagenesis. Second, only limited similarity in motif frequency profiles is evident flanking insertions vs. deletions, confirming differences in their mutagenesis. Third, substantial similarity in frequency profiles exists between pairs of individual motifs flanking insertions (and separately deletions), suggesting "cooperation" among motifs, and thus molecular mechanisms, during indel formation. Fourth, the wavelet analyses demonstrate that all these patterns are highly dependent on scale (the size of an interval considered). Finally, our results depict a model of indel mutagenesis comprising both replication and recombination (via repair of paused replication forks and site-specific recombination).
Collapse
Affiliation(s)
- Erika M Kvikstad
- Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania 16802, USA
| | | | | |
Collapse
|
167
|
Eckert KA, Hile SE. Every microsatellite is different: Intrinsic DNA features dictate mutagenesis of common microsatellites present in the human genome. Mol Carcinog 2009; 48:379-88. [PMID: 19306292 DOI: 10.1002/mc.20499] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Microsatellite sequences are ubiquitous in the human genome and are important regulators of genome function. Here, we examine the mutational mechanisms governing the stability of highly abundant mono-, di-, and tetranucleotide microsatellites. Microsatellite mutation rate estimates from pedigree analyses and experimental models range from a low of approximately 10(-6) to a high of approximately 10(-2) mutations per locus per generation. The vast majority of observed mutational variation can be attributed to features intrinsic to the allele itself, including motif size, length, and sequence composition. A greater than linear relationship between motif length and mutagenesis has been observed in several model systems. Motif sequence differences contribute up to 10-fold to the variation observed in human cell mutation rates. The major mechanism of microsatellite mutagenesis is strand slippage during DNA synthesis. DNA polymerases produce errors within microsatellites at a frequency that is 10- to 100-fold higher than the frequency of frameshifts in coding sequences. Motif sequence significantly affects both polymerase error rate and specificity, resulting in strand biases within complementary microsatellites. Importantly, polymerase errors within microsatellites include base substitutions, deletions, and complex mutations, all of which produced interrupted alleles from pure microsatellites. Postreplication mismatch repair efficiency is affected by microsatellite motif size and sequence, also contributing to the observed variation in microsatellite mutagenesis. Inhibition of DNA synthesis within common microsatellites is highly sequence-dependent, and is positively correlated with the production of errors. DNA secondary structure within common microsatellites can account for some DNA polymerase pause sites, and may be an important factor influencing mutational specificity.
Collapse
Affiliation(s)
- Kristin A Eckert
- Department of Pathology, The Jake Gittlen Cancer Research Foundation, The Pennsylvania State University College of Medicine, 500 University Drive, PA, USA
| | | |
Collapse
|
168
|
Fast, cost-effective development of species-specific microsatellite markers by genomic sequencing. Biotechniques 2009; 46:185-92. [PMID: 19317661 DOI: 10.2144/000113084] [Citation(s) in RCA: 206] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Microsatellites are the genetic markers of choice for many population genetic studies, but must be isolated de novo using recombinant approaches where prior genetic data are lacking. Here we utilized high-throughput genomic sequencing technology to produce millions of base pairs of short fragment reads, which were screened with bioinformatics toolsets to identify primers that amplify polymorphic microsatellite loci. Using this approach we isolated 13 polymorphic microsatellites for the blue duck (Hymenolaimus malacorhynchos), a species for which limited genetic data were available. Our genomic approach eliminates recombinant genetic steps, significantly reducing the time and cost requirements of marker development compared with traditional approaches. While this application of genomic sequencing may seem obvious to many, this study is, to the best of our knowledge, the first attempt to describe the use of genomic sequencing for the development of microsatellite markers in a non-model organism or indeed any organism.
Collapse
|
169
|
Ancestral origin of the ATTCT repeat expansion in spinocerebellar ataxia type 10 (SCA10). PLoS One 2009; 4:e4553. [PMID: 19234597 PMCID: PMC2639644 DOI: 10.1371/journal.pone.0004553] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2008] [Accepted: 01/05/2009] [Indexed: 12/04/2022] Open
Abstract
Spinocerebellar ataxia type 10 (SCA10) is an autosomal dominant neurodegenerative disease characterized by cerebellar ataxia and seizures. The disease is caused by a large ATTCT repeat expansion in the ATXN10 gene. The first families reported with SCA10 were of Mexican origin, but the disease was soon after described in Brazilian families of mixed Portuguese and Amerindian ancestry. The origin of the SCA10 expansion and a possible founder effect that would account for its geographical distribution have been the source of speculation over the last years. To unravel the mutational origin and spread of the SCA10 expansion, we performed an extensive haplotype study, using closely linked STR markers and intragenic SNPs, in families from Brazil and Mexico. Our results showed (1) a shared disease haplotype for all Brazilian and one of the Mexican families, and (2) closely-related haplotypes for the additional SCA10 Mexican families; (3) little or null genetic distance in small normal alleles of different repeat sizes, from the same SNP lineage, indicating that they are being originated by a single step mechanism; and (4) a shared haplotype for pure and interrupted expanded alleles, pointing to a gene conversion model for its generation. In conclusion, we show evidence for an ancestral common origin for SCA10 in Latin America, which might have arisen in an ancestral Amerindian population and later have been spread into the mixed populations of Mexico and Brazil.
Collapse
|
170
|
Evidence for Nonindependent Evolution of Adjacent Microsatellites in the Human Genome. J Mol Evol 2009; 68:160-70. [DOI: 10.1007/s00239-008-9192-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2008] [Revised: 12/02/2008] [Accepted: 12/02/2008] [Indexed: 10/21/2022]
|
171
|
Park C, Makova KD. Coding region structural heterogeneity and turnover of transcription start sites contribute to divergence in expression between duplicate genes. Genome Biol 2009; 10:R10. [PMID: 19175934 PMCID: PMC2687787 DOI: 10.1186/gb-2009-10-1-r10] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2008] [Revised: 12/24/2008] [Accepted: 01/28/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene expression divergence is one manifestation of functional differences between duplicate genes. Although rapid accumulation of expression divergence between duplicate gene copies has been observed, the driving mechanisms behind this phenomenon have not been explored in detail. RESULTS We examine which factors influence expression divergence between human duplicate genes, utilizing the latest genome-wide data sets. We conclude that the turnover of transcription start sites between duplicate genes occurs rapidly after gene duplication and that gene pairs with shared transcription start sites have significantly higher expression similarity than those without shared transcription start sites. Moreover, we find that most (55%) duplicate gene pairs do not retain the same coding sequence structure between the two duplicate copies and this also contributes to divergence in their expression. Furthermore, the proportion of aligned sequences in cis-regulatory regions between the two copies is positively correlated with expression similarity. Surprisingly, we find no effect of copy-specific transposable element insertions on the divergence of duplicate gene expression. CONCLUSIONS Our results suggest that turnover of transcription start sites, structural heterogeneity of coding sequences, and divergence of cis-regulatory regions between copies play a pivotal role in determining the expression divergence of duplicate genes.
Collapse
Affiliation(s)
- Chungoo Park
- Center for Comparative Genomics and Bioinformatics, Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA.
| | | |
Collapse
|
172
|
Vanpé C, Buschiazzo E, Abdelkrim J, Morrow G, Nicol SC, Gemmell NJ. Development of microsatellite markers for the short-beaked echidna using three different approaches. AUST J ZOOL 2009. [DOI: 10.1071/zo09033] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
We used three different methods, size-selected genomic library, cross-species amplification of a mammal-wide set of conserved microsatellites and genomic sequencing, to develop a panel of 43 microsatellite loci for the short-beaked echidna (Tachyglossus aculeatus). These loci were screened against 13 individuals from three different regions (Tasmania, Kangaroo Island, Perth region), spanning the breadth of the range of the short-beaked echidna. Nine of the 43 tested loci amplified reliably, generated clear peaks on the electropherogram and were polymorphic, with the number of alleles per locus ranging from two to eight (mean = 3.78) in the individuals tested. Polymorphic information content ranged from 0.16 to 0.78, and expected heterozygosity ranged from 0.19 to 0.84. One of the nine microsatellites showed a heterozygote deficit, suggesting a high probability of null alleles. The genomic sequencing approach using data derived from the Roche FLX platform is likely to provide the most promising method to develop echidna microsatellites. The microsatellite markers developed here will be useful tools to study population genetic structure, gene flow, kinship and parentage in Tachyglossus sp. and potentially also in endangered Zaglossus species.
Collapse
|
173
|
Phillips N, Salomon M, Custer A, Ostrow D, Baer CF. Spontaneous mutational and standing genetic (co)variation at dinucleotide microsatellites in Caenorhabditis briggsae and Caenorhabditis elegans. Mol Biol Evol 2008; 26:659-69. [PMID: 19109257 DOI: 10.1093/molbev/msn287] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Understanding the evolutionary processes responsible for shaping genetic variation within and between species requires separating the effects of mutation and selection. Differences between the patterns of genetic variation observed in nature and when mutations are allowed to accumulate in the relative absence of selection can reveal biases imposed by selection. We characterize the genetic variation at dinucleotide microsatellite repeats in four sets of 250-generation mutation accumulation (MA) lines, two in the species Caenorhabditis briggsae and two in Caenorhabditis elegans, and compare the mutational variation with the standing variation in those species. We also compare the mutational properties of microsatellites with the cumulative effects of mutations on fitness in the same lines. Integrated over the whole genome, we infer that the mutation rate of C. briggsae is about twice that of C. elegans, consistent with the cumulative mutational effects on fitness. The mutational spectrum (ratio of insertions to deletions) differs between repeat types and, in some cases, between species. The per-locus mutation rate is significantly positively correlated with the standing genetic variation at the same locus in both species, providing justification for the common practice of using the standing genetic variance as a surrogate for the mutation rate.
Collapse
|
174
|
Amos W, Flint J, Xu X. Heterozygosity increases microsatellite mutation rate, linking it to demographic history. BMC Genet 2008; 9:72. [PMID: 19014581 PMCID: PMC2615044 DOI: 10.1186/1471-2156-9-72] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2008] [Accepted: 11/14/2008] [Indexed: 01/11/2023] Open
Abstract
Background Biochemical experiments in yeast suggest a possible mechanism that would cause heterozygous sites to mutate faster than equivalent homozygous sites. If such a process operates, it could undermine a key assumption at the core of population genetic theory, namely that mutation rate and population size are indpendent, because population expansion would increase heterozygosity that in turn would increase mutation rate. Here we test this hypothesis using both direct counting of microsatellite mutations in human pedigrees and an analysis of the relationship between microsatellite length and patterns of demographically-induced variation in heterozygosity. Results We find that microsatellite alleles of any given length are more likely to mutate when their homologue is unusually different in length. Furthermore, microsatellite lengths in human populations do not vary randomly, but instead exhibit highly predictable trends with both distance from Africa, a surrogate measure of genome-wide heterozygosity, and modern population size. This predictability remains even after statistically controlling for non-independence due to shared ancestry among populations. Conclusion Our results reveal patterns that are unexpected under classical population genetic theory, where no mechanism exists capable of linking allele length to extrinsic variables such as geography or population size. However, the predictability of microsatellite length is consistent with heterozygote instability and suggest that this has an important impact on microsatellite evolution. Whether similar processes impact on single nucleotide polymorphisms remains unclear.
Collapse
Affiliation(s)
- William Amos
- Department of Zoology, Downing Street, Cambridge, CB4 3DB, UK.
| | | | | |
Collapse
|
175
|
Loire E, Praz F, Higuet D, Netter P, Achaz G. Hypermutability of Genes in Homo sapiens Due to the Hosting of Long Mono-SSR. Mol Biol Evol 2008; 26:111-21. [DOI: 10.1093/molbev/msn230] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
|
176
|
Amos W, Clarke A. Body temperature predicts maximum microsatellite length in mammals. Biol Lett 2008; 4:399-401. [PMID: 18522923 DOI: 10.1098/rsbl.2008.0209] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A long-standing mystery in genome evolution is why short tandem repeats vary so much in length and frequency. Here, we test the hypothesis that body temperature acts to influence the rate and nature of slippage-based mutations. Using the data from both 28 species where genome sequencing is advanced and 76 species from which marker loci have been published, we show that in mammals, maximum repeat number is inversely correlated with body temperature, with warmer-blooded species having shorter 'long' microsatellites. Our results support a model of microsatellite evolution in which maximum length is limited by a temperature-dependent stability threshold.
Collapse
Affiliation(s)
- William Amos
- Department of Zoology, University of Cambridge, Cambridge, UK.
| | | |
Collapse
|
177
|
Madsen BE, Villesen P, Wiuf C. Short tandem repeats in human exons: a target for disease mutations. BMC Genomics 2008; 9:410. [PMID: 18789129 PMCID: PMC2543027 DOI: 10.1186/1471-2164-9-410] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Accepted: 09/12/2008] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND In recent years it has been demonstrated that structural variations, such as indels (insertions and deletions), are common throughout the genome, but the implications of structural variations are still not clearly understood. Long tandem repeats (e.g. microsatellites or simple repeats) are known to be hypermutable (indel-rich), but are rare in exons and only occasionally associated with diseases. Here we focus on short (imperfect) tandem repeats (STRs) which fall below the radar of conventional tandem repeat detection, and investigate whether STRs are targets for disease-related mutations in human exons. In particular, we test whether they share the hypermutability of the longer tandem repeats and whether disease-related genes have a higher STR content than non-disease-related genes. RESULTS We show that validated human indels are extremely common in STR regions compared to non-STR regions. In contrast to longer tandem repeats, our definition of STRs found them to be present in exons of most known human genes (92%), 99% of all STR sequences in exons are shorter than 33 base pairs and 62% of all STR sequences are imperfect repeats. We also demonstrate that STRs are significantly overrepresented in disease-related genes in both human and mouse. These results are preserved when we limit the analysis to STRs outside known longer tandem repeats. CONCLUSION Based on our findings we conclude that STRs represent hypermutable regions in the human genome that are linked to human disease. In addition, STRs constitute an obvious target when screening for rare mutations, because of the relatively low amount of STRs in exons (1,973,844 bp) and the limited length of STR regions.
Collapse
Affiliation(s)
- Bo Eskerod Madsen
- Bioinformatics Research Center (BiRC), University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Palle Villesen
- Bioinformatics Research Center (BiRC), University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Carsten Wiuf
- Bioinformatics Research Center (BiRC), University of Aarhus, DK-8000 Aarhus C, Denmark
| |
Collapse
|
178
|
Bacolla A, Larson JE, Collins JR, Li J, Milosavljevic A, Stenson PD, Cooper DN, Wells RD. Abundance and length of simple repeats in vertebrate genomes are determined by their structural properties. Genome Res 2008; 18:1545-53. [PMID: 18687880 DOI: 10.1101/gr.078303.108] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Microsatellites are abundant in vertebrate genomes, but their sequence representation and length distributions vary greatly within each family of repeats (e.g., tetranucleotides). Biophysical studies of 82 synthetic single-stranded oligonucleotides comprising all tetra- and trinucleotide repeats revealed an inverse correlation between the stability of folded-back hairpin and quadruplex structures and the sequence representation for repeats > or =30 bp in length in nine vertebrate genomes. Alternatively, the predicted energies of base-stacking interactions correlated directly with the longest length distributions in vertebrate genomes. Genome-wide analyses indicated that unstable sequences, such as CAG:CTG and CCG:CGG, were over-represented in coding regions and that micro/minisatellites were recruited in genes involved in transcription and signaling pathways, particularly in the nervous system. Microsatellite instability (MSI) is a hallmark of cancer, and length polymorphism within genes can confer susceptibility to inherited disease. Sequences that manifest the highest MSI values also displayed the strongest base-stacking interactions; analyses of 62 tri- and tetranucleotide repeat-containing genes associated with human genetic disease revealed enrichments similar to those noted for micro/minisatellite-containing genes. We conclude that DNA structure and base-stacking determined the number and length distributions of microsatellite repeats in vertebrate genomes over evolutionary time and that micro/minisatellites have been recruited to participate in both gene and protein function.
Collapse
Affiliation(s)
- Albino Bacolla
- Institute of Biosciences and Technology, Center for Genome Research, Texas A&M University Health Science Center, Houston, Texas 77030, USA.
| | | | | | | | | | | | | | | |
Collapse
|
179
|
Merkel A, Gemmell N. Detecting short tandem repeats from genome data: opening the software black box. Brief Bioinform 2008; 9:355-66. [PMID: 18621747 DOI: 10.1093/bib/bbn028] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Short tandem repeats, specifically microsatellites, are widely used genetic markers, associated with human genetic diseases, and play an important role in various regulatory mechanisms and evolution. Despite their importance, much is yet unknown about their mutational dynamics. The increasing availability of genome data has led to several in silico studies of microsatellite evolution which have produced a vast range of algorithms and software for tandem repeat detection. Documentation of these tools is often sparse, or provided in a format that is impenetrable to most biologists without informatics background. This article introduces the major concepts behind repeat detecting software essential for informed tool selection. We reflect on issues such as parameter settings and program bias, as well as redundancy filtering and efficiency using examples from the currently available range of programs, to provide an integrated comparison and practical guide to microsatellite detecting programs.
Collapse
Affiliation(s)
- Angelika Merkel
- School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch 8041, New Zealand.
| | | |
Collapse
|