1
|
Singh AK, Amar I, Ramadasan H, Kappagantula KS, Chavali S. Proteins with amino acid repeats constitute a rapidly evolvable and human-specific essentialome. Cell Rep 2023; 42:112811. [PMID: 37453061 DOI: 10.1016/j.celrep.2023.112811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 05/30/2023] [Accepted: 06/29/2023] [Indexed: 07/18/2023] Open
Abstract
Protein products of essential genes, indispensable for organismal survival, are highly conserved and bring about fundamental functions. Interestingly, proteins that contain amino acid homorepeats that tend to evolve rapidly are enriched in eukaryotic essentialomes. Why are proteins with hypermutable homorepeats enriched in conserved and functionally vital essential proteins? We solve this functional versus evolutionary paradox by demonstrating that human essential proteins with homorepeats bring about crosstalk across biological processes through high interactability and have distinct regulatory functions affecting expansive global regulation. Importantly, essential proteins with homorepeats rapidly diverge with the amino acid substitutions frequently affecting functional sites, likely facilitating rapid adaptability. Strikingly, essential proteins with homorepeats influence human-specific embryonic and brain development, implying that the presence of homorepeats could contribute to the emergence of human-specific processes. Thus, we propose that homorepeat-containing essential proteins affecting species-specific traits can be potential intervention targets across pathologies, including cancers and neurological disorders.
Collapse
Affiliation(s)
- Anjali K Singh
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati 517507, Andhra Pradesh, India
| | - Ishita Amar
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati 517507, Andhra Pradesh, India
| | - Harikrishnan Ramadasan
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati 517507, Andhra Pradesh, India
| | - Keertana S Kappagantula
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati 517507, Andhra Pradesh, India
| | - Sreenivas Chavali
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati 517507, Andhra Pradesh, India.
| |
Collapse
|
2
|
Revisiting mutagenesis at non-B DNA motifs in the human genome. Nat Struct Mol Biol 2023; 30:417-424. [PMID: 36914796 DOI: 10.1038/s41594-023-00936-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 02/03/2023] [Indexed: 03/16/2023]
Abstract
Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs and recurrent sequencing errors. Here, we show that accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports. Mutagenesis not attributable to artifacts revealed several biological mechanisms. Polymerase slippage generates frequent indels within every variety of short tandem repeat motif, implicating slipped-strand structures. Interruption-correcting single nucleotide variants within short tandem repeats may originate from error-prone polymerases. Secondary-structure formation promotes single nucleotide variants within palindromic repeats and duplications within direct repeats. G-quadruplex motifs cause recurrent sequencing errors, whereas mutagenesis at Z-DNAs is conspicuously absent.
Collapse
|
3
|
Aska EM, Dermadi D, Kauppi L. Single-Cell Sequencing of Mouse Thymocytes Reveals Mutational Landscape Shaped by Replication Errors, Mismatch Repair, and H3K36me3. iScience 2020; 23:101452. [PMID: 32858340 PMCID: PMC7474001 DOI: 10.1016/j.isci.2020.101452] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 02/27/2020] [Accepted: 08/10/2020] [Indexed: 12/14/2022] Open
Abstract
DNA mismatch repair (MMR) corrects replication errors and is recruited by the histone mark H3K36me3, enriched in exons of transcriptionally active genes. To dissect in vivo the mutational landscape shaped by these processes, we employed single-cell exome sequencing on T cells of wild-type and MMR-deficient (Mlh1-/-) mice. Within active genes, we uncovered a spatial bias in MMR efficiency: 3' exons, often H3K36me3-enriched, acquire significantly fewer MMR-dependent mutations compared with 5' exons. Huwe1 and Mcm7 genes, both active during lymphocyte development, stood out as mutational hotspots in MMR-deficient cells, demonstrating their intrinsic vulnerability to replication error in this cell type. Both genes are H3K36me3-enriched, which can explain MMR-mediated elimination of replication errors in wild-type cells. Thus, H3K36me3 can boost MMR in transcriptionally active regions, both locally and globally. This offers an attractive concept of thrifty MMR targeting, where critical genes in each cell type enjoy preferential shielding against de novo mutations.
Collapse
Affiliation(s)
- Elli-Mari Aska
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, 00290 Helsinki, Finland
- Department of Biochemistry and Developmental Biology, Faculty of Medicine, University of Helsinki, 00290 Helsinki, Finland
| | - Denis Dermadi
- Department of Biochemistry and Developmental Biology, Faculty of Medicine, University of Helsinki, 00290 Helsinki, Finland
- Laboratory of Immunology and Vascular Biology, Department of Pathology, School of Medicine, Stanford University, Stanford, CA 94305, USA
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Liisa Kauppi
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, 00290 Helsinki, Finland
- Department of Biochemistry and Developmental Biology, Faculty of Medicine, University of Helsinki, 00290 Helsinki, Finland
| |
Collapse
|
4
|
Zheng Y, Chen J, Zhang X, Xie L, Zhang Y, Sun Y. Sensitivity and polymorphism of Bethesda panel markers in Chinese population. Bull Cancer 2020; 107:1091-1097. [PMID: 32980144 DOI: 10.1016/j.bulcan.2020.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 07/01/2020] [Accepted: 08/01/2020] [Indexed: 10/23/2022]
Abstract
PURPOSE This study aims to analyze sensitivities and polymorphisms of the Bethesda panel markers (BAT25, BAT26, D2S123, D5S346 and D17S250) for microsatellite instability testing in Chinese from Jiangsu Province and their clinical implication. METHODS MSI, sensitivity and polymorphism analysis in 541 colorectal cancer (CRC) patients were detected by fragment analysis. RESULTS Five hundred and twenty-five tissue samples and 541 blood samples of the 541 sample pairs were successfully amplified. Thirty-four (6.5%) cases were MSI-high (MSI-H) while 33 (6.3%) and 458 (87.2%) were MSI-low (MSI-L) and microsatellite stable (MSS), respectively. BAT26 (85.3%) exhibited the highest instability followed by BAT25 (82.4%), D2S123 (67.6%), D17S250 (64.7%) and D5S346 (50.0%) in MSI-H cases. The median ages of CRC patients with LS, MSI-H, MSI-L and MSS status were 38-43, 48, 60 and 63, respectively. 75.0%, 44.1%, 12.1% and 7.0% CRC cases were mucinous carcinomas in LS, MSI-H, MSI-L and MSS group, respectively. For D2S123, D17S250 and D5S346, heterozygosity was 80.8%, 74.1% and 57.7% and sizes of polymorphic variation range (PVR) were 207bp to 234bp, 140bp to 169bp and 109bp to 137bp, respectively. For D2S123 and D5S346, there was a bimodal distribution distinguishing the D17S250 from an indistinct trimodal or tetramodal distribution. CONCLUSION MSI-H cases showed earlier onset and higher proportion of mucinous carcinomas. Mononucleotide BAT26 and BAT25 exhibited higher sensitivity than dinucleotides D2S123, D17S250 and D5S346 in the Chinese population. The dinucleotide markers were highly polymorphic with high percent of heterozygosity, great variation in repeat length and non-normal distribution in Chinese population from Jiangsu Province.
Collapse
Affiliation(s)
- Yanying Zheng
- Affiliated Hospital of Nanjing University of Chinese Medicine (Jiangsu Province Hospital of Chinese Medicine), Department of Pathology, No.155 Hanzhong Road, 210029 Nanjing, China
| | - Jie Chen
- Affiliated Hospital of Nanjing University of Chinese Medicine (Jiangsu Province Hospital of Chinese Medicine), Department of Pathology, No.155 Hanzhong Road, 210029 Nanjing, China
| | - Xiang Zhang
- Affiliated Hospital of Nanjing University of Chinese Medicine (Jiangsu Province Hospital of Chinese Medicine), Department of Pathology, No.155 Hanzhong Road, 210029 Nanjing, China
| | - Ling Xie
- Affiliated Hospital of Nanjing University of Chinese Medicine (Jiangsu Province Hospital of Chinese Medicine), Department of Pathology, No.155 Hanzhong Road, 210029 Nanjing, China
| | - Yifen Zhang
- Affiliated Hospital of Nanjing University of Chinese Medicine (Jiangsu Province Hospital of Chinese Medicine), Department of Pathology, No.155 Hanzhong Road, 210029 Nanjing, China
| | - Yi Sun
- Affiliated Hospital of Nanjing University of Chinese Medicine (Jiangsu Province Hospital of Chinese Medicine), Department of Pathology, No.155 Hanzhong Road, 210029 Nanjing, China.
| |
Collapse
|
5
|
Shortt JA, Ruggiero RP, Cox C, Wacholder AC, Pollock DD. Finding and extending ancient simple sequence repeat-derived regions in the human genome. Mob DNA 2020; 11:11. [PMID: 32095164 PMCID: PMC7027126 DOI: 10.1186/s13100-020-00206-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 02/04/2020] [Indexed: 12/19/2022] Open
Abstract
Background Previously, 3% of the human genome has been annotated as simple sequence repeats (SSRs), similar to the proportion annotated as protein coding. The origin of much of the genome is not well annotated, however, and some of the unidentified regions are likely to be ancient SSR-derived regions not identified by current methods. The identification of these regions is complicated because SSRs appear to evolve through complex cycles of expansion and contraction, often interrupted by mutations that alter both the repeated motif and mutation rate. We applied an empirical, kmer-based, approach to identify genome regions that are likely derived from SSRs. Results The sequences flanking annotated SSRs are enriched for similar sequences and for SSRs with similar motifs, suggesting that the evolutionary remains of SSR activity abound in regions near obvious SSRs. Using our previously described P-clouds approach, we identified ‘SSR-clouds’, groups of similar kmers (or ‘oligos’) that are enriched near a training set of unbroken SSR loci, and then used the SSR-clouds to detect likely SSR-derived regions throughout the genome. Conclusions Our analysis indicates that the amount of likely SSR-derived sequence in the human genome is 6.77%, over twice as much as previous estimates, including millions of newly identified ancient SSR-derived loci. SSR-clouds identified poly-A sequences adjacent to transposable element termini in over 74% of the oldest class of Alu (roughly, AluJ), validating the sensitivity of the approach. Poly-A’s annotated by SSR-clouds also had a length distribution that was more consistent with their poly-A origins, with mean about 35 bp even in older Alus. This work demonstrates that the high sensitivity provided by SSR-Clouds improves the detection of SSR-derived regions and will enable deeper analysis of how decaying repeats contribute to genome structure.
Collapse
Affiliation(s)
- Jonathan A Shortt
- 1Colorado Center for Personalized Medicine, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Robert P Ruggiero
- 2Department of Biology, Southeast Missouri State University, Cape Girardeau, MO 63701 USA
| | - Corey Cox
- 1Colorado Center for Personalized Medicine, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Aaron C Wacholder
- 3Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213 USA
| | - David D Pollock
- 4Department of Biochemistry & Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045 USA
| |
Collapse
|
6
|
Defects in the GINS complex increase the instability of repetitive sequences via a recombination-dependent mechanism. PLoS Genet 2019; 15:e1008494. [PMID: 31815930 PMCID: PMC6922473 DOI: 10.1371/journal.pgen.1008494] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 12/19/2019] [Accepted: 10/25/2019] [Indexed: 12/16/2022] Open
Abstract
Faithful replication and repair of DNA lesions ensure genome maintenance. During replication in eukaryotic cells, DNA is unwound by the CMG helicase complex, which is composed of three major components: the Cdc45 protein, Mcm2-7, and the GINS complex. The CMG in complex with DNA polymerase epsilon (CMG-E) participates in the establishment and progression of the replisome. Impaired functioning of the CMG-E was shown to induce genomic instability and promote the development of various diseases. Therefore, CMG-E components play important roles as caretakers of the genome. In Saccharomyces cerevisiae, the GINS complex is composed of the Psf1, Psf2, Psf3, and Sld5 essential subunits. The Psf1-1 mutant form fails to interact with Psf3, resulting in impaired replisome assembly and chromosome replication. Here, we show increased instability of repeat tracts (mononucleotide, dinucleotide, trinucleotide and longer) in yeast psf1-1 mutants. To identify the mechanisms underlying this effect, we analyzed repeated sequence instability using derivatives of psf1-1 strains lacking genes involved in translesion synthesis, recombination, or mismatch repair. Among these derivatives, deletion of RAD52, RAD51, MMS2, POL32, or PIF1 significantly decreased DNA repeat instability. These results, together with the observed increased amounts of single-stranded DNA regions and Rfa1 foci suggest that recombinational mechanisms make important contributions to repeat tract instability in psf1-1 cells. We propose that defective functioning of the CMG-E complex in psf1-1 cells impairs the progression of DNA replication what increases the contribution of repair mechanisms such as template switch and break-induced replication. These processes require sequence homology search which in case of a repeated DNA tract may result in misalignment leading to its expansion or contraction. Processes that ensure genome stability are crucial for all organisms to avoid mutations and decrease the risk of diseases. The coordinated activity of mechanisms underlying the maintenance of high-fidelity DNA duplication and repair is critical to deal with the malfunction of replication forks or DNA damage. Repeated sequences in DNA are particularly prone to instability; these sequences undergo expansions or contractions, leading in humans to various neurological, neurodegenerative, and neuromuscular disorders. A mutant form of one of the noncatalytic subunits of active DNA helicase complex impairs DNA replication. Here, we show that this form also significantly increases the instability of mononucleotide, dinucleotide, trinucleotide and longer repeat tracts. Our results suggest that in cells that harbor a mutated variant of the helicase complex, continuation of DNA replication is facilitated by recombination processes, and this mechanism can be highly mutagenic during repair synthesis through repetitive regions, especially regions that form secondary structures. Our results indicate that proper functioning of the DNA helicase complex is crucial for maintenance of the stability of repeated DNA sequences, especially in the context of recently described disorders in which mutations or deregulation of the human homologs of genes encoding DNA helicase subunits were observed.
Collapse
|
7
|
Christopher J, Thorsen AS, Abujudeh S, Lourenço FC, Kemp R, Potter PK, Morrissey E, Hazelwood L, Winton DJ. Quantifying Microsatellite Mutation Rates from Intestinal Stem Cell Dynamics in Msh2-Deficient Murine Epithelium. Genetics 2019; 212:655-665. [PMID: 31126976 PMCID: PMC6614890 DOI: 10.1534/genetics.119.302268] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 05/14/2019] [Indexed: 12/12/2022] Open
Abstract
Microsatellite sequences have an enhanced susceptibility to mutation, and can act as sentinels indicating elevated mutation rates and increased risk of cancer. The probability of mutant fixation within the intestinal epithelium is dictated by a combination of stem cell dynamics and mutation rate. Here, we exploit this relationship to infer microsatellite mutation rates. First a sensitive, multiplexed, and quantitative method for detecting somatic changes in microsatellite length was developed that allowed the parallel detection of mutant [CA]n sequences from hundreds of low-input tissue samples at up to 14 loci. The method was applied to colonic crypts in Mus musculus, and enabled detection of mutant subclones down to 20% of the cellularity of the crypt (∼50 of 250 cells). By quantifying age-related increases in clone frequencies for multiple loci, microsatellite mutation rates in wild-type and Msh2-deficient epithelium were established. An average 388-fold increase in mutation per mitosis rate was observed in Msh2-deficient epithelium (2.4 × 10-2) compared to wild-type epithelium (6.2 × 10-5).
Collapse
Affiliation(s)
- Joseph Christopher
- Li Ka Shing Centre, Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE, United Kingdom
| | - Ann-Sofie Thorsen
- Li Ka Shing Centre, Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE, United Kingdom
| | - Sam Abujudeh
- Li Ka Shing Centre, Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE, United Kingdom
| | - Filipe C Lourenço
- Li Ka Shing Centre, Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE, United Kingdom
| | - Richard Kemp
- Li Ka Shing Centre, Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE, United Kingdom
| | - Paul K Potter
- Department Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, OX3 0BP, United Kingdom
| | - Edward Morrissey
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, OX3 9DS, United Kingdom
| | - Lee Hazelwood
- Li Ka Shing Centre, Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE, United Kingdom
| | - Douglas J Winton
- Li Ka Shing Centre, Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE, United Kingdom
| |
Collapse
|
8
|
Pirogov SA, Maksimenko OG, Georgiev PG. Transposable Elements in the Evolution of Gene Regulatory Networks. RUSS J GENET+ 2019. [DOI: 10.1134/s1022795419010113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
9
|
Fungtammasan A, Tomaszkiewicz M, Campos-Sánchez R, Eckert KA, DeGiorgio M, Makova KD. Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats. Mol Biol Evol 2016; 33:2744-58. [PMID: 27413049 PMCID: PMC5026258 DOI: 10.1093/molbev/msw139] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA–DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD.
Collapse
Affiliation(s)
- Arkarachai Fungtammasan
- Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Huck Institute of Genome Sciences, Pennsylvania State University
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University
| | - Rebeca Campos-Sánchez
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University
| | - Kristin A Eckert
- Center for Medical Genomics, Pennsylvania State University Department of Pathology, The Jake Gittlen Laboratories for Cancer Research, The Pennsylvania State University College of Medicine
| | - Michael DeGiorgio
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Institute for CyberScience, Pennsylvania State University
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Huck Institute of Genome Sciences, Pennsylvania State University
| |
Collapse
|
10
|
Vaksman Z, Garner HR. Somatic microsatellite variability as a predictive marker for colorectal cancer and liver cancer progression. Oncotarget 2016; 6:5760-71. [PMID: 25691061 PMCID: PMC4467400 DOI: 10.18632/oncotarget.3306] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 01/02/2015] [Indexed: 12/13/2022] Open
Abstract
Microsatellites (MSTs) are short tandem repeated genetic motifs that comprise ~3% of the genome. MST instability (MSI), defined as acquired/lost primary alleles at a small subset of microsatellite loci (e.g. Bethesda markers), is a clinically relevant marker for colorectal cancer. However, these markers are not applicable to other types of cancers, specifically, for liver cancer which has a high mortality rate. Here we show that somatic MST variability (SMV), defined as the presence of additional, non-primary (aka minor) alleles at MST loci, is a complementary measure of MSI, and a genetic marker for colorectal and liver cancer. Re-analysis of Illumina sequenced exomes from The Cancer Genome Atlas indicates that SMV may distinguish a subpopulation of African American patients with colorectal cancer, which represents ~33% of the population in this study. Further, for liver cancer, a higher rate of SMV may be indicative of an earlier age of onset. The work presented here suggests that classical MSI should be expanded to include SMV, going beyond alterations of the primary alleles at a small number of microsatellite loci. This measure of SMV may represent a potential new diagnostic for a variety of cancers and may provide new information for colorectal cancer patients.
Collapse
Affiliation(s)
- Zalman Vaksman
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA
| | - Harold R Garner
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA
| |
Collapse
|
11
|
Abe H, Gemmell NJ. Evolutionary Footprints of Short Tandem Repeats in Avian Promoters. Sci Rep 2016; 6:19421. [PMID: 26766026 PMCID: PMC4725869 DOI: 10.1038/srep19421] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2015] [Accepted: 12/11/2015] [Indexed: 01/12/2023] Open
Abstract
Short tandem repeats (STRs) or microsatellites are well-known sequence elements that may change the spacing between transcription factor binding sites (TFBSs) in promoter regions by expansion or contraction of repetitive units. Some of these mutations have the potential to contribute to phenotypic diversity by altering patterns of gene expression. To explore how repetitive sequence motifs within promoters have evolved in avian lineages under mutation-selection balance, more than 400 evolutionary conserved STRs (ecSTRs) were identified in this study by comparing the 2 kb upstream promoter sequences of chicken against those of other birds (turkey, duck, zebra finch, and flycatcher). The rate of conservation was significantly higher in AG dinucleotide repeats than in AC or AT repeats, with the expansion of AG motifs being noticeably constrained in passerines. Analysis of the relative distance between ecSTRs and TFBSs revealed a significantly higher rate of conserved TFBSs in the vicinity of ecSTRs in both chicken-duck and chicken-passerine comparisons. Our comparative study provides a novel insight into which intrinsic factors have influenced the degree of constraint on repeat expansion/contraction during avian promoter evolution.
Collapse
Affiliation(s)
- Hideaki Abe
- Department of Anatomy, University of Otago, Dunedin 9054, New Zealand
| | - Neil J Gemmell
- Department of Anatomy, University of Otago, Dunedin 9054, New Zealand.,Allan Wilson Centre for Molecular Ecology and Evolution, University of Otago, Dunedin 9054, New Zealand
| |
Collapse
|
12
|
Chapuis MP, Plantamp C, Streiff R, Blondin L, Piou C. Microsatellite evolutionary rate and pattern in Schistocerca gregaria inferred from direct observation of germline mutations. Mol Ecol 2015; 24:6107-19. [PMID: 26562076 DOI: 10.1111/mec.13465] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2015] [Revised: 11/05/2015] [Accepted: 11/06/2015] [Indexed: 01/21/2023]
Abstract
Unravelling variation among taxonomic orders regarding the rate of evolution in microsatellites is crucial for evolutionary biology and population genetics research. The mean mutation rate of microsatellites tends to be lower in arthropods than in vertebrates, but data are scarce and mostly concern accumulation of mutations in model species. Based on parent-offspring segregations and a hierarchical Bayesian model, the mean rate of mutation in the orthopteran insect Schistocerca gregaria was estimated at 2.1e(-4) per generation per untranscribed dinucleotide locus. This is close to vertebrate estimates and one order of magnitude higher than estimates from species of other arthropod orders, such as Drosophila melanogaster and Daphnia pulex. We also found evidence of a directional bias towards expansions even for long alleles and exceptionally large ranges of allele sizes. Finally, at transcribed microsatellites, the mean rate of mutation was half the rate found at untranscribed loci and the mutational model deviated from that usually considered, with most mutations involving multistep changes that avoid disrupting the reading frame. Our direct estimates of mutation rate were discussed in the light of peculiar biological and genomic features of S. gregaria, including specificities in mismatch repair and the dependence of its activity to allele length. Shedding new light on the mutational dynamics of grasshopper microsatellites is of critical importance for a number of research fields. As an illustration, we showed how our findings improve microsatellite application in population genetics, by obtaining a more precise estimation of S. gregaria effective population size from a published data set based on the same microsatellites.
Collapse
Affiliation(s)
- M-P Chapuis
- CIRAD, UMR CBGP, Montpellier, F-34398, France
| | - C Plantamp
- Laboratoire de Biométrie et Biologie Evolutive, CNRS, UMR 5558, Université Lyon 1, Villeurbanne, 69622, France
| | - R Streiff
- INRA, UMR CBGP, Montpellier, F-34398, France.,INRA, UMR DGIMI, Montpellier, F-34000, France
| | - L Blondin
- CIRAD, UPR B-AMR, Montpellier, F-34398, France
| | - C Piou
- CIRAD, UMR CBGP, Montpellier, F-34398, France
| |
Collapse
|
13
|
Fungtammasan A, Ananda G, Hile SE, Su MSW, Sun C, Harris R, Medvedev P, Eckert K, Makova KD. Accurate typing of short tandem repeats from genome-wide sequencing data and its applications. Genome Res 2015; 25:736-49. [PMID: 25823460 PMCID: PMC4417121 DOI: 10.1101/gr.185892.114] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Accepted: 03/16/2015] [Indexed: 11/24/2022]
Abstract
Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution.
Collapse
Affiliation(s)
- Arkarachai Fungtammasan
- Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; The Genome Science Institute at the Huck Institutes of Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Guruprasad Ananda
- Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; The Genome Science Institute at the Huck Institutes of Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Biochemistry and Molecular Biology, Pennsylvania State University, Pennsylvania 16802, USA
| | - Suzanne E Hile
- Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Pathology, The Jake Gittlen Laboratories for Cancer Research, Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
| | - Marcia Shu-Wei Su
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Chen Sun
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Robert Harris
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Paul Medvedev
- Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; The Genome Science Institute at the Huck Institutes of Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Biochemistry and Molecular Biology, Pennsylvania State University, Pennsylvania 16802, USA; Department of Computer Science and Engineering, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Kristin Eckert
- Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Pathology, The Jake Gittlen Laboratories for Cancer Research, Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; The Genome Science Institute at the Huck Institutes of Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
14
|
Kwong M, Pemberton TJ. Sequence differences at orthologous microsatellites inflate estimates of human-chimpanzee differentiation. BMC Genomics 2014; 15:990. [PMID: 25407736 PMCID: PMC4253012 DOI: 10.1186/1471-2164-15-990] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 10/30/2014] [Indexed: 02/06/2023] Open
Abstract
Background Microsatellites---contiguous arrays of 2–6 base-pair motifs---have formed the cornerstone of population-genetic studies for over two decades. Their genotype data typically takes the form of PCR fragment lengths obtained using locus-specific primer pairs to amplify the genomic region encompassing the microsatellite. Recently, we reported a dataset of 5,795 human and 84 chimpanzee individuals with genotypes at 246 human-derived autosomal microsatellites as a resource to facilitate interspecies comparisons. A major assumption underlying this dataset is that PCR amplicons at orthologous microsatellites are commensurable between species. Results We find this assumption to be frequently incorrect owing to discordance in microsatellite organization and variability, as well as nontrivial length imbalances caused by small species-specific indels in microsatellite flanking sequences. Converting PCR fragment lengths into the repeat numbers they represent at 138 microsatellites whose organization and variability was found to be highly similar in both species, we show that interspecies incommensurability among PCR amplicons can inflate FST and DPS estimates by up to 10.6%. Separate investigations of determinants of microsatellite variability in humans and chimpanzees uncover similar patterns with mean and maximum numbers of repeats, as well as numbers and ranges of distinct alleles, all important factors in predicting heterozygosity. In contrast, across microsatellites, numbers of repeats were significantly smaller in chimpanzees than in humans, while numbers and ranges of distinct alleles were instead larger. Conclusions Our findings have fundamental implications for interspecies comparisons using microsatellites and offer new opportunities for more accurate comparisons of patterns of human and chimpanzee genetic variation in numerous areas of application. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-990) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Trevor J Pemberton
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Manitoba, Canada.
| |
Collapse
|
15
|
Exome-wide somatic microsatellite variation is altered in cells with DNA repair deficiencies. PLoS One 2014; 9:e110263. [PMID: 25402475 PMCID: PMC4234249 DOI: 10.1371/journal.pone.0110263] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 09/18/2014] [Indexed: 11/19/2022] Open
Abstract
Microsatellites (MST), tandem repeats of 1–6 nucleotide motifs, are mutational hot-spots with a bias for insertions and deletions (INDELs) rather than single nucleotide polymorphisms (SNPs). The majority of MST instability studies are limited to a small number of loci, the Bethesda markers, which are only informative for a subset of colorectal cancers. In this paper we evaluate non-haplotype alleles present within next-gen sequencing data to evaluate somatic MST variation (SMV) within DNA repair proficient and DNA repair defective cell lines. We confirm that alleles present within next-gen data that do not contribute to the haplotype can be reliably quantified and utilized to evaluate the SMV without requiring comparisons of matched samples. We observed that SMV patterns found in DNA repair proficient cell lines without DNA repair defects, MCF10A, HEK293 and PD20 RV:D2, had consistent patterns among samples. Further, we were able to confirm that changes in SMV patterns in cell lines lacking functional BRCA2, FANCD2 and mismatch repair were consistent with the different pathways perturbed. Using this new exome sequencing analysis approach we show that DNA instability can be identified in a sample and that patterns of instability vary depending on the impaired DNA repair mechanism, and that genes harboring minor alleles are strongly associated with cancer pathways. The MST Minor Allele Caller used for this study is available at https://github.com/zalmanv/MST_minor_allele_caller.
Collapse
|
16
|
Ananda G, Hile SE, Breski A, Wang Y, Kelkar Y, Makova KD, Eckert KA. Microsatellite interruptions stabilize primate genomes and exist as population-specific single nucleotide polymorphisms within individual human genomes. PLoS Genet 2014; 10:e1004498. [PMID: 25033203 PMCID: PMC4102424 DOI: 10.1371/journal.pgen.1004498] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Accepted: 05/28/2014] [Indexed: 01/01/2023] Open
Abstract
Interruptions of microsatellite sequences impact genome evolution and can alter disease manifestation. However, human polymorphism levels at interrupted microsatellites (iMSs) are not known at a genome-wide scale, and the pathways for gaining interruptions are poorly understood. Using the 1000 Genomes Phase-1 variant call set, we interrogated mono-, di-, tri-, and tetranucleotide repeats up to 10 units in length. We detected ∼26,000–40,000 iMSs within each of four human population groups (African, European, East Asian, and American). We identified population-specific iMSs within exonic regions, and discovered that known disease-associated iMSs contain alleles present at differing frequencies among the populations. By analyzing longer microsatellites in primate genomes, we demonstrate that single interruptions result in a genome-wide average two- to six-fold reduction in microsatellite mutability, as compared with perfect microsatellites. Centrally located interruptions lowered mutability dramatically, by two to three orders of magnitude. Using a biochemical approach, we tested directly whether the mutability of a specific iMS is lower because of decreased DNA polymerase strand slippage errors. Modeling the adenomatous polyposis coli tumor suppressor gene sequence, we observed that a single base substitution interruption reduced strand slippage error rates five- to 50-fold, relative to a perfect repeat, during synthesis by DNA polymerases α, β, or η. Computationally, we demonstrate that iMSs arise primarily by base substitution mutations within individual human genomes. Our biochemical survey of human DNA polymerase α, β, δ, κ, and η error rates within certain microsatellites suggests that interruptions are created most frequently by low fidelity polymerases. Our combined computational and biochemical results demonstrate that iMSs are abundant in human genomes and are sources of population-specific genetic variation that may affect genome stability. The genome-wide identification of iMSs in human populations presented here has important implications for current models describing the impact of microsatellite polymorphisms on gene expression. Microsatellites are short tandem repeat DNA sequences located throughout the human genome that display a high degree of inter-individual variation. This characteristic makes microsatellites an attractive tool for population genetics and forensics research. Some microsatellites affect gene expression, and mutations within such microsatellites can cause disease. Interruption mutations disrupt the perfect repeated array and are frequently associated with altered disease risk, but they have not been thoroughly studied in human genomes. We identified interrupted mono-, di-, tri- and tetranucleotide MSs (iMS) within individual genomes from African, European, Asian and American population groups. We show that many iMSs, including some within disease-associated genes, are unique to a single population group. By measuring the conservation of microsatellites between human and chimpanzee genomes, we demonstrate that interruptions decrease the probability of microsatellite mutations throughout the genome. We demonstrate that iMSs arise in the human genome by single base changes within the DNA, and provide biochemical data suggesting that these stabilizing changes may be created by error-prone DNA polymerases. Our genome-wide study supports the model in which iMSs act to stabilize individual genomes, and suggests that population-specific differences in microsatellite architecture may be an avenue by which genetic ancestry impacts individual disease risk.
Collapse
Affiliation(s)
- Guruprasad Ananda
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
| | - Suzanne E. Hile
- Department of Pathology, Gittlen Cancer Research Foundation, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, United States of America
| | - Amanda Breski
- Department of Pathology, Gittlen Cancer Research Foundation, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, United States of America
| | - Yanli Wang
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
| | - Yogeshwar Kelkar
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
| | - Kateryna D. Makova
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
- Center for Medical Genomics, Penn State University, University Park, Pennsylvania, United States of America
- * E-mail: (KDM); (KAE)
| | - Kristin A. Eckert
- Department of Pathology, Gittlen Cancer Research Foundation, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, United States of America
- Center for Medical Genomics, Penn State University, University Park, Pennsylvania, United States of America
- * E-mail: (KDM); (KAE)
| |
Collapse
|
17
|
Haasl RJ, Payseur BA. Remarkable selective constraints on exonic dinucleotide repeats. Evolution 2014; 68:2737-44. [PMID: 24899386 DOI: 10.1111/evo.12460] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 05/14/2014] [Indexed: 01/07/2023]
Abstract
Long dinucleotide repeats found in exons present a substantial mutational hazard: mutations at these loci occur often and generate frameshifts. Here, we provide clear and compelling evidence that exonic dinucleotides experience strong selective constraint. In humans, only 18 exonic dinucleotides have repeat lengths greater than six, which contrasts sharply with the genome-wide distribution of dinucleotides. We genotyped each of these dinucleotides in 200 humans from eight 1000 Genomes Project populations and found a near-absence of polymorphism. More remarkably, divergence data demonstrate that repeat lengths have been conserved across the primate phylogeny in spite of what is likely considerable mutational pressure. Coalescent simulations show that even a very low mutation rate at these loci fails to explain the anomalous patterns of polymorphism and divergence. Our data support two related selective constraints on the evolution of exonic dinucleotides: a short-term intolerance for any change to repeat length and a long-term prevention of increases to repeat length. In general, our results implicate purifying selection as the force that eliminates new, deleterious mutants at exonic dinucleotides. We briefly discuss the evolution of the longest exonic dinucleotide in the human genome--a 10 x CA repeat in fibroblast growth factor receptor-like 1 (FGFRL1)--that should possess a considerably greater mutation rate than any other exonic dinucleotide and therefore generate a large number of deleterious variants.
Collapse
Affiliation(s)
- Ryan J Haasl
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, 53706.
| | | |
Collapse
|
18
|
Grandi FC, An W. Non-LTR retrotransposons and microsatellites: Partners in genomic variation. Mob Genet Elements 2013; 3:e25674. [PMID: 24195012 PMCID: PMC3812793 DOI: 10.4161/mge.25674] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2013] [Revised: 07/07/2013] [Accepted: 07/09/2013] [Indexed: 01/10/2023] Open
Abstract
The human genome is laden with both non-LTR (long-terminal repeat) retrotransposons and microsatellite repeats. Both types of sequences are able to, either actively or passively, mutagenize the genomes of human individuals and are therefore poised to dynamically alter the human genomic landscape across generations. Non-LTR retrotransposons, such as L1 and Alu, are a major source of new microsatellites, which are born both concurrently and subsequently to L1 and Alu integration into the genome. Likewise, the mutation dynamics of microsatellite repeats have a direct impact on the fitness of their non-LTR retrotransposon parent owing to microsatellite expansion and contraction. This review explores the interactions and dynamics between non-LTR retrotransposons and microsatellites in the context of genomic variation and evolution.
Collapse
Affiliation(s)
- Fiorella C Grandi
- School of Molecular Biosciences and Center for Reproductive Biology; Washington State University; Pullman, WA USA
| | | |
Collapse
|