1
|
Wieland J, Buchan S, Sen Gupta S, Mantzouratou A. Genomic instability and the link to infertility: A focus on microsatellites and genomic instability syndromes. Eur J Obstet Gynecol Reprod Biol 2022; 274:229-237. [PMID: 35671666 DOI: 10.1016/j.ejogrb.2022.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/25/2022] [Accepted: 06/01/2022] [Indexed: 12/01/2022]
Abstract
Infertility is associated to multiple types of different genomic instabilities and is a genetic feature of genomic instability syndromes. While the mismatch repair machinery contributes to the maintenance of genome integrity, surprisingly its potential role in infertility is overlooked. Defects in mismatch repair mechanisms contribute to microsatellite instability and genomic instability syndromes, due to the inability to repair newly replicated DNA. This article reviews the literature to date to elucidate the contribution of microsatellite instability to genomic instability syndromes and infertility. The key findings presented reveal microsatellite instability is poorly researched in genomic instability syndromes and infertility.
Collapse
Affiliation(s)
- Jack Wieland
- Department of Life and Environmental Sciences, Faculty of Science and Technology, Bournemouth University, Poole BH12 5BB, UK.
| | - Sarah Buchan
- Department of Life and Environmental Sciences, Faculty of Science and Technology, Bournemouth University, Poole BH12 5BB, UK.
| | - Sioban Sen Gupta
- Institute for Women's Health, 86-96 Chenies Mews, University College London, London WC1E 6HX, UK.
| | - Anna Mantzouratou
- Department of Life and Environmental Sciences, Faculty of Science and Technology, Bournemouth University, Poole BH12 5BB, UK.
| |
Collapse
|
2
|
Srivastava S, Avvaru AK, Sowpati DT, Mishra RK. Patterns of microsatellite distribution across eukaryotic genomes. BMC Genomics 2019; 20:153. [PMID: 30795733 PMCID: PMC6387519 DOI: 10.1186/s12864-019-5516-5] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Accepted: 02/07/2019] [Indexed: 11/28/2022] Open
Abstract
Background Microsatellites, or Simple Sequence Repeats (SSRs), are short tandem repeats of 1–6 nt motifs present in all genomes. Emerging evidence points to their role in cellular processes and gene regulation. Despite the huge resource of genomic information currently available, SSRs have been studied in a limited context and compared across relatively few species. Results We have identified ~ 685 million eukaryotic microsatellites and analyzed their genomic trends across 15 taxonomic subgroups from protists to mammals. The distribution of SSRs reveals taxon-specific variations in their exonic, intronic and intergenic densities. Our analysis reveals the differences among non-related species and novel patterns uniquely demarcating closely related species. We document several repeats common across subgroups as well as rare SSRs that are excluded almost throughout evolution. We further identify species-specific signatures in pathogens like Leishmania as well as in cereal crops, Drosophila, birds and primates. We also find that distinct SSRs preferentially exist as long repeating units in different subgroups; most unicellular organisms show no length preference for any SSR class, while many SSR motifs accumulate as long repeats in complex organisms, especially in mammals. Conclusions We present a comprehensive analysis of SSRs across taxa at an unprecedented scale. Our analysis indicates that the SSR composition of organisms with heterogeneous cell types is highly constrained, while simpler organisms such as protists, green algae and fungi show greater diversity in motif abundance, density and GC content. The microsatellite dataset generated in this work provides a large number of candidates for functional analysis and for studying their roles across the evolutionary landscape. Electronic supplementary material The online version of this article (10.1186/s12864-019-5516-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Surabhi Srivastava
- CSIR - Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad, 500007, India
| | - Akshay Kumar Avvaru
- CSIR - Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad, 500007, India
| | - Divya Tej Sowpati
- CSIR - Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad, 500007, India.
| | - Rakesh K Mishra
- CSIR - Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad, 500007, India.
| |
Collapse
|
3
|
Avvaru AK, Saxena S, Sowpati DT, Mishra RK. MSDB: A Comprehensive Database of Simple Sequence Repeats. Genome Biol Evol 2018; 9:1797-1802. [PMID: 28854643 PMCID: PMC5533116 DOI: 10.1093/gbe/evx132] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/12/2017] [Indexed: 11/13/2022] Open
Abstract
Microsatellites, also known as Simple Sequence Repeats (SSRs), are short tandem repeats of 1-6 nt motifs present in all genomes, particularly eukaryotes. Besides their usefulness as genome markers, SSRs have been shown to perform important regulatory functions, and variations in their length at coding regions are linked to several disorders in humans. Microsatellites show a taxon-specific enrichment in eukaryotic genomes, and some may be functional. MSDB (Microsatellite Database) is a collection of >650 million SSRs from 6,893 species including Bacteria, Archaea, Fungi, Plants, and Animals. This database is by far the most exhaustive resource to access and analyze SSR data of multiple species. In addition to exploring data in a customizable tabular format, users can view and compare the data of multiple species simultaneously using our interactive plotting system. MSDB is developed using the Django framework and MySQL. It is freely available at http://tdb.ccmb.res.in/msdb.
Collapse
Affiliation(s)
| | - Saketh Saxena
- CSIR - Centre for Cellular and Molecular Biology, Hyderabad, India
| | | | | |
Collapse
|
4
|
Srivastava A, Kumar AS, Mishra RK. Vertebrate GAF/ThPOK: emerging functions in chromatin architecture and transcriptional regulation. Cell Mol Life Sci 2018; 75:623-633. [PMID: 28856379 PMCID: PMC11105447 DOI: 10.1007/s00018-017-2633-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Revised: 08/09/2017] [Accepted: 08/25/2017] [Indexed: 12/31/2022]
Abstract
GAGA factor of Drosophila melanogaster (DmGAF) is a multifaceted transcription factor with diverse roles in chromatin regulation. Recently, ThPOK/c-Krox was identified as its vertebrate homologue (vGAF), which has a basic domain structure similar to DmGAF and is decorated with a number of post-translationally modified residues. In vertebrate genomes, vGAF associates with purine-rich GAGA sequences and performs diverse chromatin-mediated functions, viz., gene activation, repression and enhancer blocking. Expansion of regulatory chromatin proteins with the acquisition of PTMs appears to be the general trend that facilitated the evolution of complexity in vertebrates. Here, we compare the structural and functional features of vGAF with those of DmGAF and also assess the possible functional redundancy among paralogues of vGAF. We also discuss the underlying mechanisms which aid in the diverse and context-dependent functions of this protein.
Collapse
Affiliation(s)
- Avinash Srivastava
- CSIR-Centre for Cellular and Molecular Biology (CCMB), Uppal Road, Hyderabad, 500007, India
| | - Amitha Sampath Kumar
- CSIR-Centre for Cellular and Molecular Biology (CCMB), Uppal Road, Hyderabad, 500007, India
| | - Rakesh K Mishra
- CSIR-Centre for Cellular and Molecular Biology (CCMB), Uppal Road, Hyderabad, 500007, India.
| |
Collapse
|
5
|
Abstract
Microsatellite repeat DNA is best known for its length mutability, which is implicated in several neurological diseases and cancers, and often exploited as a genetic marker. Less well-known is the body of work exploring the widespread and surprisingly diverse functional roles of microsatellites. Recently, emerging evidence includes the finding that normal microsatellite polymorphism contributes substantially to the heritability of human gene expression on a genome-wide scale, calling attention to the task of elucidating the mechanisms involved. At present, these are underexplored, but several themes have emerged. I review evidence demonstrating roles for microsatellites in modulation of transcription factor binding, spacing between promoter elements, enhancers, cytosine methylation, alternative splicing, mRNA stability, selection of transcription start and termination sites, unusual structural conformations, nucleosome positioning and modification, higher order chromatin structure, noncoding RNA, and meiotic recombination hot spots.
Collapse
|
6
|
Krishnan J, Athar F, Rani TS, Mishra RK. Simple sequence repeats showing 'length preference' have regulatory functions in humans. Gene 2017; 628:156-161. [PMID: 28712775 DOI: 10.1016/j.gene.2017.07.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Revised: 05/18/2017] [Accepted: 07/10/2017] [Indexed: 11/15/2022]
Abstract
Simple sequence repeats (SSRs), simple tandem repeats (STRs) or microsatellites are short tandem repeats of 1-6 nucleotide motifs. They are twice as abundant as the protein coding DNA in the human genome and yet little is known about their functional relevance. Analysis of genomes across various taxa show that despite the instability associated with longer stretches of repeats, few SSRs with specific longer repeat lengths are enriched in the genomes indicating a positive selection. This conserved feature of length dependent enrichment hints at not only sequence but also length dependent functionality for SSRs. In the present study, we selected 23 SSRs of the human genome that show specific repeat length dependent enrichment and analysed their cis-regulatory potential using promoter modulation, boundary and barrier assays. We find that the 23 SSR sequences, which are mostly intergenic and intronic, possess distinct cis-regulatory potential. They modulate minimal promoter activity in transient luciferase assays and are capable of functioning as enhancer-blockers and barrier elements. The results of our functional assays propose cis-gene regulatory roles for these specific length enriched SSRs and opens avenues for further investigations.
Collapse
Affiliation(s)
- Jaya Krishnan
- Stowers Institute for Medical Research, MO, United States; International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, India; CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - Fathima Athar
- Stowers Institute for Medical Research, MO, United States; International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, India; CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - Tirupaati Swaroopa Rani
- Stowers Institute for Medical Research, MO, United States; International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, India; CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - Rakesh Kumar Mishra
- Stowers Institute for Medical Research, MO, United States; International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, India; CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India.
| |
Collapse
|
7
|
Characterization of porcine simple sequence repeat variation on a population scale with genome resequencing data. Sci Rep 2017; 7:2376. [PMID: 28539617 PMCID: PMC5443785 DOI: 10.1038/s41598-017-02600-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Accepted: 04/13/2017] [Indexed: 12/23/2022] Open
Abstract
Simple sequence repeats (SSRs) are used as polymorphic molecular markers in many species. They contribute very important functional variations in a range of complex traits; however, little is known about the variation of most SSRs in pig populations. Here, using genome resequencing data, we identified ~0.63 million polymorphic SSR loci from more than 100 individuals. Through intensive analysis of this dataset, we found that the SSR motif composition, motif length, total length of alleles and distribution of alleles all contribute to SSR variability. Furthermore, we found that CG-containing SSRs displayed significantly lower polymorphism and higher cross-species conservation. With a rigorous filter procedure, we provided a catalogue of 16,527 high-quality polymorphic SSRs, which displayed reliable results for the analysis of phylogenetic relationships and provided valuable summary statistics for 30 individuals equally selected from eight local Chinese pig breeds, six commercial lean pig breeds and Chinese wild boars. In addition, from the high-quality polymorphic SSR catalogue, we identified four loci with potential loss-of-function alleles. Overall, these analyses provide a valuable catalogue of polymorphic SSRs to the existing pig genetic variation database, and we believe this catalogue could be used for future genome-wide genetic analysis.
Collapse
|
8
|
Nikumbh S, Pfeifer N. Genetic sequence-based prediction of long-range chromatin interactions suggests a potential role of short tandem repeat sequences in genome organization. BMC Bioinformatics 2017; 18:218. [PMID: 28420341 PMCID: PMC5395875 DOI: 10.1186/s12859-017-1624-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 04/05/2017] [Indexed: 11/25/2022] Open
Abstract
Background Knowing the three-dimensional (3D) structure of the chromatin is important for obtaining a complete picture of the regulatory landscape. Changes in the 3D structure have been implicated in diseases. While there exist approaches that attempt to predict the long-range chromatin interactions, they focus only on interactions between specific genomic regions — the promoters and enhancers, neglecting other possibilities, for instance, the so-called structural interactions involving intervening chromatin. Results We present a method that can be trained on 5C data using the genetic sequence of the candidate loci to predict potential genome-wide interaction partners of a particular locus of interest. We have built locus-specific support vector machine (SVM)-based predictors using the oligomer distance histograms (ODH) representation. The method shows good performance with a mean test AUC (area under the receiver operating characteristic (ROC) curve) of 0.7 or higher for various regions across cell lines GM12878, K562 and HeLa-S3. In cases where any locus did not have sufficient candidate interaction partners for model training, we employed multitask learning to share knowledge between models of different loci. In this scenario, across the three cell lines, the method attained an average performance increase of 0.09 in the AUC. Performance evaluation of the models trained on 5C data regarding prediction on an independent high-resolution Hi-C dataset (which is a rather hard problem) shows 0.56 AUC, on average. Additionally, we have developed new, intuitive visualization methods that enable interpretation of sequence signals that contributed towards prediction of locus-specific interaction partners. The analysis of these sequence signals suggests a potential general role of short tandem repeat sequences in genome organization. Conclusions We demonstrated how our approach can 1) provide insights into sequence features of locus-specific interaction partners, and 2) also identify their cell-line specificity. That our models deem short tandem repeat sequences as discriminative for prediction of potential interaction partners, suggests that they could play a larger role in genome organization. Thus, our approach can (a) be beneficial to broadly understand, at the sequence-level, chromatin interactions and higher-order structures like (meta-) topologically associating domains (TADs); (b) study regions omitted from existing prediction approaches using various information sources (e.g., epigenetic information); and (c) improve methods that predict the 3D structure of the chromatin. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1624-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sarvesh Nikumbh
- Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Building E1.4, Saarbruecken, D-66123, Germany.
| | - Nico Pfeifer
- Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Building E1.4, Saarbruecken, D-66123, Germany.,Present address: Department of Computer Science, University of Tübingen, Sand 14, Tübingen, D-72076, Germany
| |
Collapse
|
9
|
Single Amino Acid Repeats in the Proteome World: Structural, Functional, and Evolutionary Insights. PLoS One 2016; 11:e0166854. [PMID: 27893794 PMCID: PMC5125637 DOI: 10.1371/journal.pone.0166854] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2016] [Accepted: 11/05/2016] [Indexed: 12/15/2022] Open
Abstract
Microsatellites or simple sequence repeats (SSR) are abundant, highly diverse stretches of short DNA repeats present in all genomes. Tandem mono/tri/hexanucleotide repeats in the coding regions contribute to single amino acids repeats (SAARs) in the proteome. While SSRs in the coding region always result in amino acid repeats, a majority of SAARs arise due to a combination of various codons representing the same amino acid and not as a consequence of SSR events. Certain amino acids are abundant in repeat regions indicating a positive selection pressure behind the accumulation of SAARs. By analysing 22 proteomes including the human proteome, we explored the functional and structural relationship of amino acid repeats in an evolutionary context. Only ~15% of repeats are present in any known functional domain, while ~74% of repeats are present in the disordered regions, suggesting that SAARs add to the functionality of proteins by providing flexibility, stability and act as linker elements between domains. Comparison of SAAR containing proteins across species reveals that while shorter repeats are conserved among orthologs, proteins with longer repeats, >15 amino acids, are unique to the respective organism. Lysine repeats are well conserved among orthologs with respect to their length and number of occurrences in a protein. Other amino acids such as glutamic acid, proline, serine and alanine repeats are generally conserved among the orthologs with varying repeat lengths. These findings suggest that SAARs have accumulated in the proteome under positive selection pressure and that they provide flexibility for optimal folding of functional/structural domains of proteins. The insights gained from our observations can help in effective designing and engineering of proteins with novel features.
Collapse
|