1
|
An empirical analysis of mtSSRs: could microsatellite distribution patterns explain the evolution of mitogenomes in plants? Funct Integr Genomics 2021; 22:35-53. [PMID: 34751851 DOI: 10.1007/s10142-021-00815-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 10/18/2021] [Accepted: 10/19/2021] [Indexed: 10/19/2022]
Abstract
Microsatellites (SSRs) are tandem repeat sequences in eukaryote genomes, including plant cytoplasmic genomes. The mitochondrial genome (mtDNA) has been shown to vary in size, number, and distribution of SSRs among different plant groups. Thus, SSRs contribute with genomic diversity in mtDNAs. However, the abundance, distribution, and evolutionary significance of SSRs in mtDNA from a wide range of algae and plants have not been explored. In this study, the mtDNAs of 204 plant and algal species were investigated related to the presence of SSRs. The number of SSRs was positively correlated with genome size. Its distribution is dependent on plant and algal groups analyzed, although the cluster analysis indicates the conservation of some common motifs in algal and terrestrial plants that reflect common ancestry of groups. Many SSRs in coding and non-coding regions can be useful for molecular markers. Moreover, mitochondrial SSRs are highly abundant, representing an important source for natural or induced genetic variation, i.e., for biotechnological approaches that can modulate mtDNA gene regulation. Thus, this comparative study increases the understanding of the plant and algal SSR evolution and brings perspectives for further studies.
Collapse
|
2
|
Shukla R, Srivastava RC. The statistical analysis of direct repeats in nucleic acid sequences. J Appl Probab 2016. [DOI: 10.2307/3213744] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Sequence symmetries in DNA and RNA are being discovered at an increasing rate. Conjectures and hypotheses are being proposed for their possible structural and functional role in the nucleic acid. In this paper a probability model is studied which evaluates the probabilities of various repeats occurring by chance alone. Expressions are derived for the mean and variance of the statistics employed. The central limit theorem for dependent trials is used to obtain the asymptotic distributions. An indication is given of how to use the model to search for various gene amplification events in the evolutionary history of the sequences.
Collapse
|
3
|
On Statistical Analysis of Sequence Symmetries in DNA/RNA. ADV APPL PROBAB 2016. [DOI: 10.1017/s0001867800022321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
A predominance of certain ‘sequence systemmetries’ in DNA/RNA has led to various conjectures about the possible structural/functional role these symmetries might play in nucleic acid sequences. De Wachter employed a binomial probability model to compare the observed number of ‘direct repeats’ with those expected in a random sequence. Counting of direct repeats essentially leads to a sequence of m-dependent trials. We develop a stochastic model for studying various types of symmetries. Expressions for means and variances of the statistics employed are derived. The asymptotic distributions are obtained using the central limit theorem for m-dependent random variables. It is proposed that each sequence pattern be examined separately for its chance occurrence as opposed to what de Wachter suggests, i.e., clumping of all patterns together. It is also shown how our model can be used to detect various gene-amplification events, if any, in nucleic acid sequences. Finally, for certain types of patterns, it is indicated how the theory of recurrent events can be used to get a better handle on the analysis of direct repeats.
Collapse
|
4
|
George B, Alam CM, Kumar RV, Gnanasekaran P, Chakraborty S. Potential linkage between compound microsatellites and recombination in geminiviruses: Evidence from comparative analysis. Virology 2015; 482:41-50. [DOI: 10.1016/j.virol.2015.03.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2014] [Revised: 02/16/2015] [Accepted: 03/05/2015] [Indexed: 01/10/2023]
|
5
|
George B, George B, awasthi M, Singh RN. Genome wide survey and analysis of microsatellites in Tombusviridae family. Genes Genomics 2015. [DOI: 10.1007/s13258-015-0295-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
6
|
George B, Bhatt BS, Awasthi M, George B, Singh AK. Comparative analysis of microsatellites in chloroplast genomes of lower and higher plants. Curr Genet 2015; 61:665-77. [PMID: 25999216 DOI: 10.1007/s00294-015-0495-9] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Revised: 05/05/2015] [Accepted: 05/08/2015] [Indexed: 12/29/2022]
Abstract
Microsatellites, or simple sequence repeats (SSRs), contain repetitive DNA sequence where tandem repeats of one to six base pairs are present number of times. Chloroplast genome sequences have been shown to possess extensive variations in the length, number and distribution of SSRs. However, a comparative analysis of chloroplast microsatellites is not available. Considering their potential importance in generating genomic diversity, we have systematically analysed the abundance and distribution of simple and compound microsatellites in 164 sequenced chloroplast genomes from wide range of plants. The key findings of these studies are (1) a large number of mononucleotide repeats as compared to SSR(2-6)(di-, tri-, tetra-, penta-, hexanucleotide repeats) are present in all chloroplast genomes investigated, (2) lower plants such as algae show wide variation in relative abundance, density and distribution of microsatellite repeats as compared to flowering plants, (3) longer SSRs are excluded from coding regions of most chloroplast genomes, (4) GC content has a weak influence on number, relative abundance and relative density of mononucleotide as well as SSR(2-6). However, GC content strongly showed negative correlation with relative density (R (2) = 0.5, P < 0.05) and relative abundance (R (2) = 0.6, P < 0.05) of cSSRs. In summary, our comparative studies of chloroplast genomes illustrate the variable distribution of microsatellites and revealed that chloroplast genome of smaller plants possesses relatively more genomic diversity compared to higher plants.
Collapse
Affiliation(s)
- Biju George
- Blessy Software Solution, Sector 4/441, Malviya Nagar, Jaipur, 302017, Rajasthan, India.
| | - Bhavin S Bhatt
- School of Life Sciences, Central University of Gujarat, Gandhinagar, 382030, Gujarat, India
| | - Mayur Awasthi
- Mahatma Gandhi Chitrakoot Gramodaya Vishwavidhyalaya, Satna, 485334, Madhya Pradesh, India
| | - Binu George
- Blessy Software Solution, Sector 4/441, Malviya Nagar, Jaipur, 302017, Rajasthan, India
| | - Achuit K Singh
- School of Life Sciences, Central University of Gujarat, Gandhinagar, 382030, Gujarat, India.
| |
Collapse
|
7
|
George B, Gnanasekaran P, Jain SK, Chakraborty S. Genome wide survey and analysis of small repetitive sequences in caulimoviruses. INFECTION GENETICS AND EVOLUTION 2014; 27:15-24. [PMID: 24999243 DOI: 10.1016/j.meegid.2014.06.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Revised: 06/01/2014] [Accepted: 06/22/2014] [Indexed: 12/19/2022]
Abstract
Microsatellites are known to exhibit ubiquitous presence across all kingdoms of life including viruses. Members of the Caulimoviridae family severely affect growth of vegetable and fruit plants and reduce economic yield in diverse cropping systems worldwide. Here, we analyzed the nature and distribution of both simple and complex microsatellites present in complete genome of 44 species of Caulimoviridae. Our results showed, in all analyzed genomes, genome size and GC content had a weak influence on number, relative abundance and relative density of microsatellites, respectively. For each genome, mono- and dinucleotide repeats were found to be highly predominant and are overrepresented in genome of majority of caulimoviruses. AT/TA and GAA/AAG/AGA was the most abundant di- and trinucleotide repeat motif, respectively. Repeats larger than trinucleotide were rarely found in these genomes. Comparative study of occurrence, abundance and density of microsatellite among available RNA and DNA viral genomes indicated that simple repeats were least abundant in genomes of caulimoviruses. Polymorphic repeats even though rare were observed in the large intergenic region of the genome, indicating strand slippage and/or unequal recombination processes do occur in caulimoviruses. To our knowledge, this is the first analysis of microsatellites occurring in any dsDNA viral genome. Characterization of such variations in repeat sequences would be important in deciphering the origin, mutational processes, and role of repeat sequences in viral genomes.
Collapse
Affiliation(s)
- Biju George
- Molecular Virology Laboratory, School of Life Sciences, Jawaharlal Nehru University, New Delhi 110067, India
| | - Prabu Gnanasekaran
- Molecular Virology Laboratory, School of Life Sciences, Jawaharlal Nehru University, New Delhi 110067, India
| | - S K Jain
- Department of Biotechnology, Jamia Hamdard University, New Delhi, Delhi 110062, India
| | - Supriya Chakraborty
- Molecular Virology Laboratory, School of Life Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
| |
Collapse
|
8
|
Loire E, Higuet D, Netter P, Achaz G. Evolution of coding microsatellites in primate genomes. Genome Biol Evol 2013; 5:283-95. [PMID: 23315383 PMCID: PMC3590770 DOI: 10.1093/gbe/evt003] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Microsatellites (SSRs) are highly susceptible to expansions and contractions. When located in a coding sequence, the insertion or the deletion of a single unit for a mono-, di-, tetra-, or penta(nucleotide)-SSR creates a frameshift. As a consequence, one would expect to find only very few of these SSRs in coding sequences because of their strong deleterious potential. Unexpectedly, genomes contain many coding SSRs of all types. Here, we report on a study of their evolution in a phylogenetic context using the genomes of four primates: human, chimpanzee, orangutan, and macaque. In a set of 5,015 orthologous genes unambiguously aligned among the four species, we show that, except for tri- and hexa-SSRs, for which insertions and deletions are frequently observed, SSRs in coding regions evolve mainly by substitutions. We show that the rate of substitution in all types of coding SSRs is typically two times higher than in the rest of coding sequences. Additionally, we observe that although numerous coding SSRs are created and lost by substitutions in the lineages, their numbers remain constant. This last observation suggests that the coding SSRs have reached equilibrium. We hypothesize that this equilibrium involves a combination of mutation, drift, and selection. We thus estimated the fitness cost of mono-SSRs and show that it increases with the number of units. We finally show that the cost of coding mono-SSRs greatly varies from function to function, suggesting that the strength of the selection that acts against them can be correlated to gene functions.
Collapse
Affiliation(s)
- Etienne Loire
- UMR 7138, Systématique, Adaptation, Evolution (UPMC, CNRS, MNHN, IRD), Paris, France
| | | | | | | |
Collapse
|
9
|
Chen M, Tan Z, Zeng G. Microsatellite is an important component of complete Hepatitis C virus genomes. INFECTION GENETICS AND EVOLUTION 2011; 11:1646-54. [DOI: 10.1016/j.meegid.2011.06.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2011] [Revised: 06/02/2011] [Accepted: 06/16/2011] [Indexed: 12/15/2022]
|
10
|
Kelkar YD, Strubczewski N, Hile SE, Chiaromonte F, Eckert KA, Makova KD. What is a microsatellite: a computational and experimental definition based upon repeat mutational behavior at A/T and GT/AC repeats. Genome Biol Evol 2010; 2:620-35. [PMID: 20668018 PMCID: PMC2940325 DOI: 10.1093/gbe/evq046] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Microsatellites are abundant in eukaryotic genomes and have high rates of strand slippage-induced repeat number alterations. They are popular genetic markers, and their mutations are associated with numerous neurological diseases. However, the minimal number of repeats required to constitute a microsatellite has been debated, and a definition of a microsatellite that considers its mutational behavior has been lacking. To define a microsatellite, we investigated slippage dynamics for a range of repeat sizes, utilizing two approaches. Computationally, we assessed length polymorphism at repeat loci in ten ENCODE regions resequenced in four human populations, assuming that the occurrence of polymorphism reflects strand slippage rates. Experimentally, we determined the in vitro DNA polymerase-mediated strand slippage error rates as a function of repeat number. In both approaches, we compared strand slippage rates at tandem repeats with the background slippage rates. We observed two distinct modes of mutational behavior. At small repeat numbers, slippage rates were low and indistinguishable from background measurements. A marked transition in mutability was observed as the repeat array lengthened, such that slippage rates at large repeat numbers were significantly higher than the background rates. For both mononucleotide and dinucleotide microsatellites studied, the transition length corresponded to a similar number of nucleotides (approximately 10). Thus, microsatellite threshold is determined not by the presence/absence of strand slippage at repeats but by an abrupt alteration in slippage rates relative to background. These findings have implications for understanding microsatellite mutagenesis, standardization of genome-wide microsatellite analyses, and predicting polymorphism levels of individual microsatellite loci.
Collapse
|
11
|
Loire E, Praz F, Higuet D, Netter P, Achaz G. Hypermutability of Genes in Homo sapiens Due to the Hosting of Long Mono-SSR. Mol Biol Evol 2008; 26:111-21. [DOI: 10.1093/molbev/msn230] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
|
12
|
Rajendrakumar P, Biswal AK, Balachandran SM, Srinivasarao K, Sundaram RM. Simple sequence repeats in organellar genomes of rice: frequency and distribution in genic and intergenic regions. Bioinformatics 2006; 23:1-4. [PMID: 17077096 DOI: 10.1093/bioinformatics/btl547] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Simple sequence repeats (SSRs) are abundant across genomes. However, the significance of SSRs in organellar genomes of rice has not been completely understood. The availability of organellar genome sequences allows us to understand the organization of SSRs in their genic and intergenic regions. RESULTS We have analyzed SSRs in mitochondrial and chloroplast genomes of rice. We identified 2528 SSRs in the mitochondrial genome and average 870 SSRs in the chloroplast genomes. About 8.7% of the mitochondrial and 27.5% of the chloroplast SSRs were observed in the genic region. Dinucleotides were the most abundant repeats in genic and intergenic regions of the mitochondrial genome while mononucleotides were predominant in the chloroplast genomes. The rps and nad gene clusters of mitochondria had the maximum repeats, while the rpo and ndh gene clusters of chloroplast had the maximum repeats. We identified SSRs in both organellar genomes and validated in different cultivars and species.
Collapse
Affiliation(s)
- Passoupathy Rajendrakumar
- Biotechnology Laboratory, Crop Improvement Section, Directorate of Rice Research Rajendranagar, Hyderabad-500030, India
| | | | | | | | | |
Collapse
|
13
|
Lawson MJ, Zhang L. Distinct patterns of SSR distribution in the Arabidopsis thaliana and rice genomes. Genome Biol 2006; 7:R14. [PMID: 16507170 PMCID: PMC1431726 DOI: 10.1186/gb-2006-7-2-r14] [Citation(s) in RCA: 130] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2005] [Revised: 10/26/2005] [Accepted: 01/30/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Simple sequence repeats (SSRs) in DNA have been traditionally thought of as functionally unimportant and have been studied mainly as genetic markers. A recent handful of studies have shown, however, that SSRs in different positions of a gene can play important roles in determining protein function, genetic development, and regulation of gene expression. We have performed a detailed comparative study of the distribution of SSRs in the sequenced genomes of Arabidopsis thaliana and rice. RESULTS SSRs in different genic regions - 5'untranslated region (UTR), 3'UTR, exon, and intron - show distinct patterns of distribution both within and between the two genomes. Especially notable is the much higher density of SSRs in 5'UTRs compared to the other regions and a strong affinity towards trinucleotide repeats in these regions for both rice and Arabidopsis. On a genomic level, mononucleotide repeats are the most prevalent type of SSRs in Arabidopsis and trinucleotide repeats are the most prevalent type in rice. Both plants have the same most common mononucleotide (A/T) and dinucleotide (AT and AG) repeats, but have little in common for the other types of repeats. CONCLUSION Our work provides insight into the evolution and distribution of SSRs in the two sequenced model plant genomes of monocots and dicots. Our analyses reveal that the distributions of SSRs appear highly non-random and vary a great deal in different regions of the genes in the genomes.
Collapse
Affiliation(s)
- Mark J Lawson
- Department of Computer Science, Virginia Tech, 655 McBryde, Blacksburg, VA 24060, USA
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, 655 McBryde, Blacksburg, VA 24060, USA
| |
Collapse
|
14
|
Abstract
It is shown that DNA sequences can be decomposed into smaller units much the same as texts can be decomposed into syllables, words, or groups of words. Those smaller units (modules) are extracted from DNA sequences according to statistical criteria. Tests with sequences of known modular structure (two novels and a FORTRAN source code) were performed. The rate to which DNA sequences can be decomposed into modules (modularity) turns out to be a very sensitive measure to distinguish DNA sequences from random sequences.
Collapse
Affiliation(s)
- A O Schmitt
- Institut für Physik der Humboldt-Universität zu Berlin, Germany
| | | | | |
Collapse
|
15
|
Hazout S, Guillot A, Meyer J. Extraction of melodies in behavioural sequences. Behav Processes 1989; 20:61-73. [DOI: 10.1016/0376-6357(89)90012-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/02/1989] [Indexed: 10/27/2022]
|
16
|
Pevzner PA, Borodovsky MYu, Mironov AA. Linguistics of nucleotide sequences. I: The significance of deviations from mean statistical characteristics and prediction of the frequencies of occurrence of words. J Biomol Struct Dyn 1989; 6:1013-26. [PMID: 2531596 DOI: 10.1080/07391102.1989.10506528] [Citation(s) in RCA: 75] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Mathematical models of the generation of genetic texts appeared simultaneously with the first sequencing DNA. They are used to establish functional and evolutionary relations between genetic texts, to predict the number and distribution of specific sites in a sequence and to identify "meaningful" words. The present paper deals with two problems: 1) The significance of deviations from the mean statistical characteristics in a genetic text. Anyone who has addressed himself to the statistical analysis of sequenced DNA is familiar with the question: what deviations from the expected frequencies of occurrence of particular words testify to the "biological" significance of those words? We propose a formula for the variance of the number of word's occurrences in the text, with allowance for word overlaps, making it possible to assess the significance of the deviations from the expected statistical characteristics. 2) A new method for predicting the frequencies of occurrence of particular words in a genetic text using the statistical characteristics of "spaced" L-grams. The method can be used for predicting the number of restriction sites in human DNA and in planning experiments on the physical mapping and sequencing of the human genome.
Collapse
Affiliation(s)
- P A Pevzner
- Institute for Genetics of Microorganisms, Moscow, USSR
| | | | | |
Collapse
|
17
|
Brendel V, Beckmann JS, Trifonov EN. Linguistics of nucleotide sequences: morphology and comparison of vocabularies. J Biomol Struct Dyn 1986; 4:11-21. [PMID: 3078230 DOI: 10.1080/07391102.1986.10507643] [Citation(s) in RCA: 120] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The concept of "words" in continuous languages devoid of blanks is introduced and an operational definition of words given. With this novel concept nucleotide sequences become object for linguistic analysis. The typical word size of the nucleotide language is found to be 3 to 5 (tri- to pentamers). Different genomes have distinct vocabularies. Comparison of these vocabularies can serve as a basis for revealing functional and evolutionary relatedness of sequences.
Collapse
Affiliation(s)
- V Brendel
- Department of Polymer Research, Weizmann Institute of Science Rehovot, Israel
| | | | | |
Collapse
|
18
|
Abstract
Sequence symmetries in DNA and RNA are being discovered at an increasing rate. Conjectures and hypotheses are being proposed for their possible structural and functional role in the nucleic acid. In this paper a probability model is studied which evaluates the probabilities of various repeats occurring by chance alone. Expressions are derived for the mean and variance of the statistics employed. The central limit theorem for dependent trials is used to obtain the asymptotic distributions. An indication is given of how to use the model to search for various gene amplification events in the evolutionary history of the sequences.
Collapse
|
19
|
Selby MJ, Barta A, Baxter JD, Bell GI, Eberhardt NL. Analysis of a major human chorionic somatomammotropin gene. Evidence for two functional promoter elements. J Biol Chem 1984. [DOI: 10.1016/s0021-9258(18)90667-8] [Citation(s) in RCA: 47] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
20
|
Dayhoff JE. Distinguished words in data sequences: analysis and applications to neural coding and other fields. Bull Math Biol 1984; 46:529-43. [PMID: 6509230 DOI: 10.1007/bf02459501] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|